Descriptive Statistics For pandas Dataframe¶

Author: Chris Albon, @ChrisAlbon
Date: -
Repo: Python 3 code snippets for data science
Note:

import modules¶

In [40]:

import pandas as pd

Create dataframe¶

In [41]:

data = {'name': ['Jason', 'Molly', 'Tina', 'Jake', 'Amy'], 
        'age': [42, 52, 36, 24, 73], 
        'preTestScore': [4, 24, 31, 2, 3],
        'postTestScore': [25, 94, 57, 62, 70]}
df = pd.DataFrame(data, columns = ['name', 'age', 'preTestScore', 'postTestScore'])
df

Out[41]:

	name	age	preTestScore	postTestScore
0	Jason	42	4	25
1	Molly	52	24	94
2	Tina	36	31	57
3	Jake	24	2	62
4	Amy	73	3	70

5 rows × 4 columns

The sum of all the ages¶

In [42]:

df['age'].sum()

Out[42]:

Mean preTestScore¶

In [43]:

df['preTestScore'].mean()

Out[43]:

12.800000000000001

Cumulative sum of preTestScores, moving from the rows from the top¶

In [44]:

df['preTestScore'].cumsum()

Out[44]:

0     4
1    28
2    59
3    61
4    64
Name: preTestScore, dtype: int64

Summary statistics on preTestScore¶

In [45]:

df['preTestScore'].describe()

Out[45]:

count     5.000000
mean     12.800000
std      13.663821
min       2.000000
25%       3.000000
50%       4.000000
75%      24.000000
max      31.000000
Name: preTestScore, dtype: float64

Count the number of non-NA values¶

In [46]:

df['preTestScore'].count()

Out[46]:

Minimum value of preTestScore¶

In [47]:

df['preTestScore'].min()

Out[47]:

Maximum value of preTestScore¶

In [48]:

df['preTestScore'].max()

Out[48]:

Median value of preTestScore¶

In [49]:

df['preTestScore'].median()

Out[49]:

4.0

Sample variance of preTestScore values¶

In [50]:

df['preTestScore'].var()

Out[50]:

186.69999999999999

Sample standard deviation of preTestScore values¶

In [51]:

df['preTestScore'].std()

Out[51]:

13.663820841916802

Skewness of preTestScore values¶

In [52]:

df['preTestScore'].skew()

Out[52]:

0.74334524573267591

Kurtosis of preTestScore values¶

In [53]:

df['preTestScore'].kurt()

Out[53]:

-2.4673543738411525

Correlation Matrix Of Values¶

In [54]:

df.corr()

Out[54]:

	age	preTestScore	postTestScore
age	1.000000	-0.105651	0.328852
preTestScore	-0.105651	1.000000	0.378039
postTestScore	0.328852	0.378039	1.000000

3 rows × 3 columns

Covariance Matrix Of Values¶

In [55]:

df.cov()

Out[55]:

	age	preTestScore	postTestScore
age	340.80	-26.65	151.20
preTestScore	-26.65	186.70	128.65
postTestScore	151.20	128.65	620.30

3 rows × 3 columns