The pandas website.
Wes McKinney: pandas in 10 minutes | Walkthrough
https://www.youtube.com/watch?v=_T8LGqJtuGc
Video by the creator of pandas.
Python for Data Analysis notebooks
https://github.com/wesm/pydata-book
Materials and IPython notebooks for "Python for Data Analysis" by Wes McKinney, published by O'Reilly Media
http://pandas.pydata.org/pandas-docs/stable/10min.html
Official pandas tutorial.
UC Irvine Machine Learning Repository: Iris Data Set
https://archive.ics.uci.edu/ml/datasets/iris
About the Iris data set from UC Irvine's machine learning repository.
# Import pandas.
import pandas as pd
# Load the iris data set from a URL.
df = pd.read_csv("https://raw.githubusercontent.com/uiuc-cse/data-fa14/gh-pages/data/iris.csv")
df
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
5 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
6 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
7 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
8 | 4.4 | 2.9 | 1.4 | 0.2 | setosa |
9 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
10 | 5.4 | 3.7 | 1.5 | 0.2 | setosa |
11 | 4.8 | 3.4 | 1.6 | 0.2 | setosa |
12 | 4.8 | 3.0 | 1.4 | 0.1 | setosa |
13 | 4.3 | 3.0 | 1.1 | 0.1 | setosa |
14 | 5.8 | 4.0 | 1.2 | 0.2 | setosa |
15 | 5.7 | 4.4 | 1.5 | 0.4 | setosa |
16 | 5.4 | 3.9 | 1.3 | 0.4 | setosa |
17 | 5.1 | 3.5 | 1.4 | 0.3 | setosa |
18 | 5.7 | 3.8 | 1.7 | 0.3 | setosa |
19 | 5.1 | 3.8 | 1.5 | 0.3 | setosa |
20 | 5.4 | 3.4 | 1.7 | 0.2 | setosa |
21 | 5.1 | 3.7 | 1.5 | 0.4 | setosa |
22 | 4.6 | 3.6 | 1.0 | 0.2 | setosa |
23 | 5.1 | 3.3 | 1.7 | 0.5 | setosa |
24 | 4.8 | 3.4 | 1.9 | 0.2 | setosa |
25 | 5.0 | 3.0 | 1.6 | 0.2 | setosa |
26 | 5.0 | 3.4 | 1.6 | 0.4 | setosa |
27 | 5.2 | 3.5 | 1.5 | 0.2 | setosa |
28 | 5.2 | 3.4 | 1.4 | 0.2 | setosa |
29 | 4.7 | 3.2 | 1.6 | 0.2 | setosa |
... | ... | ... | ... | ... | ... |
120 | 6.9 | 3.2 | 5.7 | 2.3 | virginica |
121 | 5.6 | 2.8 | 4.9 | 2.0 | virginica |
122 | 7.7 | 2.8 | 6.7 | 2.0 | virginica |
123 | 6.3 | 2.7 | 4.9 | 1.8 | virginica |
124 | 6.7 | 3.3 | 5.7 | 2.1 | virginica |
125 | 7.2 | 3.2 | 6.0 | 1.8 | virginica |
126 | 6.2 | 2.8 | 4.8 | 1.8 | virginica |
127 | 6.1 | 3.0 | 4.9 | 1.8 | virginica |
128 | 6.4 | 2.8 | 5.6 | 2.1 | virginica |
129 | 7.2 | 3.0 | 5.8 | 1.6 | virginica |
130 | 7.4 | 2.8 | 6.1 | 1.9 | virginica |
131 | 7.9 | 3.8 | 6.4 | 2.0 | virginica |
132 | 6.4 | 2.8 | 5.6 | 2.2 | virginica |
133 | 6.3 | 2.8 | 5.1 | 1.5 | virginica |
134 | 6.1 | 2.6 | 5.6 | 1.4 | virginica |
135 | 7.7 | 3.0 | 6.1 | 2.3 | virginica |
136 | 6.3 | 3.4 | 5.6 | 2.4 | virginica |
137 | 6.4 | 3.1 | 5.5 | 1.8 | virginica |
138 | 6.0 | 3.0 | 4.8 | 1.8 | virginica |
139 | 6.9 | 3.1 | 5.4 | 2.1 | virginica |
140 | 6.7 | 3.1 | 5.6 | 2.4 | virginica |
141 | 6.9 | 3.1 | 5.1 | 2.3 | virginica |
142 | 5.8 | 2.7 | 5.1 | 1.9 | virginica |
143 | 6.8 | 3.2 | 5.9 | 2.3 | virginica |
144 | 6.7 | 3.3 | 5.7 | 2.5 | virginica |
145 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
146 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
147 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
149 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
150 rows × 5 columns
df['species']
0 setosa 1 setosa 2 setosa 3 setosa 4 setosa 5 setosa 6 setosa 7 setosa 8 setosa 9 setosa 10 setosa 11 setosa 12 setosa 13 setosa 14 setosa 15 setosa 16 setosa 17 setosa 18 setosa 19 setosa 20 setosa 21 setosa 22 setosa 23 setosa 24 setosa 25 setosa 26 setosa 27 setosa 28 setosa 29 setosa ... 120 virginica 121 virginica 122 virginica 123 virginica 124 virginica 125 virginica 126 virginica 127 virginica 128 virginica 129 virginica 130 virginica 131 virginica 132 virginica 133 virginica 134 virginica 135 virginica 136 virginica 137 virginica 138 virginica 139 virginica 140 virginica 141 virginica 142 virginica 143 virginica 144 virginica 145 virginica 146 virginica 147 virginica 148 virginica 149 virginica Name: species, Length: 150, dtype: object
df[['petal_length', 'species']]
petal_length | species | |
---|---|---|
0 | 1.4 | setosa |
1 | 1.4 | setosa |
2 | 1.3 | setosa |
3 | 1.5 | setosa |
4 | 1.4 | setosa |
5 | 1.7 | setosa |
6 | 1.4 | setosa |
7 | 1.5 | setosa |
8 | 1.4 | setosa |
9 | 1.5 | setosa |
10 | 1.5 | setosa |
11 | 1.6 | setosa |
12 | 1.4 | setosa |
13 | 1.1 | setosa |
14 | 1.2 | setosa |
15 | 1.5 | setosa |
16 | 1.3 | setosa |
17 | 1.4 | setosa |
18 | 1.7 | setosa |
19 | 1.5 | setosa |
20 | 1.7 | setosa |
21 | 1.5 | setosa |
22 | 1.0 | setosa |
23 | 1.7 | setosa |
24 | 1.9 | setosa |
25 | 1.6 | setosa |
26 | 1.6 | setosa |
27 | 1.5 | setosa |
28 | 1.4 | setosa |
29 | 1.6 | setosa |
... | ... | ... |
120 | 5.7 | virginica |
121 | 4.9 | virginica |
122 | 6.7 | virginica |
123 | 4.9 | virginica |
124 | 5.7 | virginica |
125 | 6.0 | virginica |
126 | 4.8 | virginica |
127 | 4.9 | virginica |
128 | 5.6 | virginica |
129 | 5.8 | virginica |
130 | 6.1 | virginica |
131 | 6.4 | virginica |
132 | 5.6 | virginica |
133 | 5.1 | virginica |
134 | 5.6 | virginica |
135 | 6.1 | virginica |
136 | 5.6 | virginica |
137 | 5.5 | virginica |
138 | 4.8 | virginica |
139 | 5.4 | virginica |
140 | 5.6 | virginica |
141 | 5.1 | virginica |
142 | 5.1 | virginica |
143 | 5.9 | virginica |
144 | 5.7 | virginica |
145 | 5.2 | virginica |
146 | 5.0 | virginica |
147 | 5.2 | virginica |
148 | 5.4 | virginica |
149 | 5.1 | virginica |
150 rows × 2 columns
df[2:6]
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
5 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
df[['petal_length', 'species']][2:6]
petal_length | species | |
---|---|---|
2 | 1.3 | setosa |
3 | 1.5 | setosa |
4 | 1.4 | setosa |
5 | 1.7 | setosa |
df.loc[2:6]
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
5 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
6 | 4.6 | 3.4 | 1.4 | 0.3 | setosa |
df.loc[:, 'species']
0 setosa 1 setosa 2 setosa 3 setosa 4 setosa 5 setosa 6 setosa 7 setosa 8 setosa 9 setosa 10 setosa 11 setosa 12 setosa 13 setosa 14 setosa 15 setosa 16 setosa 17 setosa 18 setosa 19 setosa 20 setosa 21 setosa 22 setosa 23 setosa 24 setosa 25 setosa 26 setosa 27 setosa 28 setosa 29 setosa ... 120 virginica 121 virginica 122 virginica 123 virginica 124 virginica 125 virginica 126 virginica 127 virginica 128 virginica 129 virginica 130 virginica 131 virginica 132 virginica 133 virginica 134 virginica 135 virginica 136 virginica 137 virginica 138 virginica 139 virginica 140 virginica 141 virginica 142 virginica 143 virginica 144 virginica 145 virginica 146 virginica 147 virginica 148 virginica 149 virginica Name: species, Length: 150, dtype: object
df.loc[:, ['sepal_length', 'species']]
sepal_length | species | |
---|---|---|
0 | 5.1 | setosa |
1 | 4.9 | setosa |
2 | 4.7 | setosa |
3 | 4.6 | setosa |
4 | 5.0 | setosa |
5 | 5.4 | setosa |
6 | 4.6 | setosa |
7 | 5.0 | setosa |
8 | 4.4 | setosa |
9 | 4.9 | setosa |
10 | 5.4 | setosa |
11 | 4.8 | setosa |
12 | 4.8 | setosa |
13 | 4.3 | setosa |
14 | 5.8 | setosa |
15 | 5.7 | setosa |
16 | 5.4 | setosa |
17 | 5.1 | setosa |
18 | 5.7 | setosa |
19 | 5.1 | setosa |
20 | 5.4 | setosa |
21 | 5.1 | setosa |
22 | 4.6 | setosa |
23 | 5.1 | setosa |
24 | 4.8 | setosa |
25 | 5.0 | setosa |
26 | 5.0 | setosa |
27 | 5.2 | setosa |
28 | 5.2 | setosa |
29 | 4.7 | setosa |
... | ... | ... |
120 | 6.9 | virginica |
121 | 5.6 | virginica |
122 | 7.7 | virginica |
123 | 6.3 | virginica |
124 | 6.7 | virginica |
125 | 7.2 | virginica |
126 | 6.2 | virginica |
127 | 6.1 | virginica |
128 | 6.4 | virginica |
129 | 7.2 | virginica |
130 | 7.4 | virginica |
131 | 7.9 | virginica |
132 | 6.4 | virginica |
133 | 6.3 | virginica |
134 | 6.1 | virginica |
135 | 7.7 | virginica |
136 | 6.3 | virginica |
137 | 6.4 | virginica |
138 | 6.0 | virginica |
139 | 6.9 | virginica |
140 | 6.7 | virginica |
141 | 6.9 | virginica |
142 | 5.8 | virginica |
143 | 6.8 | virginica |
144 | 6.7 | virginica |
145 | 6.7 | virginica |
146 | 6.3 | virginica |
147 | 6.5 | virginica |
148 | 6.2 | virginica |
149 | 5.9 | virginica |
150 rows × 2 columns
df.loc[2:6, ['sepal_length', 'species']]
sepal_length | species | |
---|---|---|
2 | 4.7 | setosa |
3 | 4.6 | setosa |
4 | 5.0 | setosa |
5 | 5.4 | setosa |
6 | 4.6 | setosa |
df.iloc[2]
sepal_length 4.7 sepal_width 3.2 petal_length 1.3 petal_width 0.2 species setosa Name: 2, dtype: object
df.iloc[2:4, 1]
2 3.2 3 3.1 Name: sepal_width, dtype: float64
df.at[3, 'species']
'setosa'
df.iloc[1:10:2]
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
7 | 5.0 | 3.4 | 1.5 | 0.2 | setosa |
9 | 4.9 | 3.1 | 1.5 | 0.1 | setosa |
df.loc[:, 'species'] == 'setosa'
0 True 1 True 2 True 3 True 4 True 5 True 6 True 7 True 8 True 9 True 10 True 11 True 12 True 13 True 14 True 15 True 16 True 17 True 18 True 19 True 20 True 21 True 22 True 23 True 24 True 25 True 26 True 27 True 28 True 29 True ... 120 False 121 False 122 False 123 False 124 False 125 False 126 False 127 False 128 False 129 False 130 False 131 False 132 False 133 False 134 False 135 False 136 False 137 False 138 False 139 False 140 False 141 False 142 False 143 False 144 False 145 False 146 False 147 False 148 False 149 False Name: species, Length: 150, dtype: bool
df.loc[df.loc[:, 'species'] == 'versicolor']
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
50 | 7.0 | 3.2 | 4.7 | 1.4 | versicolor |
51 | 6.4 | 3.2 | 4.5 | 1.5 | versicolor |
52 | 6.9 | 3.1 | 4.9 | 1.5 | versicolor |
53 | 5.5 | 2.3 | 4.0 | 1.3 | versicolor |
54 | 6.5 | 2.8 | 4.6 | 1.5 | versicolor |
55 | 5.7 | 2.8 | 4.5 | 1.3 | versicolor |
56 | 6.3 | 3.3 | 4.7 | 1.6 | versicolor |
57 | 4.9 | 2.4 | 3.3 | 1.0 | versicolor |
58 | 6.6 | 2.9 | 4.6 | 1.3 | versicolor |
59 | 5.2 | 2.7 | 3.9 | 1.4 | versicolor |
60 | 5.0 | 2.0 | 3.5 | 1.0 | versicolor |
61 | 5.9 | 3.0 | 4.2 | 1.5 | versicolor |
62 | 6.0 | 2.2 | 4.0 | 1.0 | versicolor |
63 | 6.1 | 2.9 | 4.7 | 1.4 | versicolor |
64 | 5.6 | 2.9 | 3.6 | 1.3 | versicolor |
65 | 6.7 | 3.1 | 4.4 | 1.4 | versicolor |
66 | 5.6 | 3.0 | 4.5 | 1.5 | versicolor |
67 | 5.8 | 2.7 | 4.1 | 1.0 | versicolor |
68 | 6.2 | 2.2 | 4.5 | 1.5 | versicolor |
69 | 5.6 | 2.5 | 3.9 | 1.1 | versicolor |
70 | 5.9 | 3.2 | 4.8 | 1.8 | versicolor |
71 | 6.1 | 2.8 | 4.0 | 1.3 | versicolor |
72 | 6.3 | 2.5 | 4.9 | 1.5 | versicolor |
73 | 6.1 | 2.8 | 4.7 | 1.2 | versicolor |
74 | 6.4 | 2.9 | 4.3 | 1.3 | versicolor |
75 | 6.6 | 3.0 | 4.4 | 1.4 | versicolor |
76 | 6.8 | 2.8 | 4.8 | 1.4 | versicolor |
77 | 6.7 | 3.0 | 5.0 | 1.7 | versicolor |
78 | 6.0 | 2.9 | 4.5 | 1.5 | versicolor |
79 | 5.7 | 2.6 | 3.5 | 1.0 | versicolor |
80 | 5.5 | 2.4 | 3.8 | 1.1 | versicolor |
81 | 5.5 | 2.4 | 3.7 | 1.0 | versicolor |
82 | 5.8 | 2.7 | 3.9 | 1.2 | versicolor |
83 | 6.0 | 2.7 | 5.1 | 1.6 | versicolor |
84 | 5.4 | 3.0 | 4.5 | 1.5 | versicolor |
85 | 6.0 | 3.4 | 4.5 | 1.6 | versicolor |
86 | 6.7 | 3.1 | 4.7 | 1.5 | versicolor |
87 | 6.3 | 2.3 | 4.4 | 1.3 | versicolor |
88 | 5.6 | 3.0 | 4.1 | 1.3 | versicolor |
89 | 5.5 | 2.5 | 4.0 | 1.3 | versicolor |
90 | 5.5 | 2.6 | 4.4 | 1.2 | versicolor |
91 | 6.1 | 3.0 | 4.6 | 1.4 | versicolor |
92 | 5.8 | 2.6 | 4.0 | 1.2 | versicolor |
93 | 5.0 | 2.3 | 3.3 | 1.0 | versicolor |
94 | 5.6 | 2.7 | 4.2 | 1.3 | versicolor |
95 | 5.7 | 3.0 | 4.2 | 1.2 | versicolor |
96 | 5.7 | 2.9 | 4.2 | 1.3 | versicolor |
97 | 6.2 | 2.9 | 4.3 | 1.3 | versicolor |
98 | 5.1 | 2.5 | 3.0 | 1.1 | versicolor |
99 | 5.7 | 2.8 | 4.1 | 1.3 | versicolor |
x = df.loc[df.loc[:, 'species'] == 'versicolor']
x.loc[51]
sepal_length 6.4 sepal_width 3.2 petal_length 4.5 petal_width 1.5 species versicolor Name: 51, dtype: object
x.iloc[1]
sepal_length 6.4 sepal_width 3.2 petal_length 4.5 petal_width 1.5 species versicolor Name: 51, dtype: object
df.head()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
0 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
1 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
2 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
3 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
4 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
df.tail()
sepal_length | sepal_width | petal_length | petal_width | species | |
---|---|---|---|---|---|
145 | 6.7 | 3.0 | 5.2 | 2.3 | virginica |
146 | 6.3 | 2.5 | 5.0 | 1.9 | virginica |
147 | 6.5 | 3.0 | 5.2 | 2.0 | virginica |
148 | 6.2 | 3.4 | 5.4 | 2.3 | virginica |
149 | 5.9 | 3.0 | 5.1 | 1.8 | virginica |
df.describe()
sepal_length | sepal_width | petal_length | petal_width | |
---|---|---|---|---|
count | 150.000000 | 150.000000 | 150.000000 | 150.000000 |
mean | 5.843333 | 3.054000 | 3.758667 | 1.198667 |
std | 0.828066 | 0.433594 | 1.764420 | 0.763161 |
min | 4.300000 | 2.000000 | 1.000000 | 0.100000 |
25% | 5.100000 | 2.800000 | 1.600000 | 0.300000 |
50% | 5.800000 | 3.000000 | 4.350000 | 1.300000 |
75% | 6.400000 | 3.300000 | 5.100000 | 1.800000 |
max | 7.900000 | 4.400000 | 6.900000 | 2.500000 |
(df.loc[df.loc[:, 'species'] == 'versicolor']).describe()
sepal_length | sepal_width | petal_length | petal_width | |
---|---|---|---|---|
count | 50.000000 | 50.000000 | 50.000000 | 50.000000 |
mean | 5.936000 | 2.770000 | 4.260000 | 1.326000 |
std | 0.516171 | 0.313798 | 0.469911 | 0.197753 |
min | 4.900000 | 2.000000 | 3.000000 | 1.000000 |
25% | 5.600000 | 2.525000 | 4.000000 | 1.200000 |
50% | 5.900000 | 2.800000 | 4.350000 | 1.300000 |
75% | 6.300000 | 3.000000 | 4.600000 | 1.500000 |
max | 7.000000 | 3.400000 | 5.100000 | 1.800000 |
(df.loc[df.loc[:, 'species'] == 'setosa']).describe()
sepal_length | sepal_width | petal_length | petal_width | |
---|---|---|---|---|
count | 50.00000 | 50.000000 | 50.000000 | 50.00000 |
mean | 5.00600 | 3.418000 | 1.464000 | 0.24400 |
std | 0.35249 | 0.381024 | 0.173511 | 0.10721 |
min | 4.30000 | 2.300000 | 1.000000 | 0.10000 |
25% | 4.80000 | 3.125000 | 1.400000 | 0.20000 |
50% | 5.00000 | 3.400000 | 1.500000 | 0.20000 |
75% | 5.20000 | 3.675000 | 1.575000 | 0.30000 |
max | 5.80000 | 4.400000 | 1.900000 | 0.60000 |
(df.loc[df.loc[:, 'species'] == 'virginica']).describe()
sepal_length | sepal_width | petal_length | petal_width | |
---|---|---|---|---|
count | 50.00000 | 50.000000 | 50.000000 | 50.00000 |
mean | 6.58800 | 2.974000 | 5.552000 | 2.02600 |
std | 0.63588 | 0.322497 | 0.551895 | 0.27465 |
min | 4.90000 | 2.200000 | 4.500000 | 1.40000 |
25% | 6.22500 | 2.800000 | 5.100000 | 1.80000 |
50% | 6.50000 | 3.000000 | 5.550000 | 2.00000 |
75% | 6.90000 | 3.175000 | 5.875000 | 2.30000 |
max | 7.90000 | 3.800000 | 6.900000 | 2.50000 |
df.mean()
sepal_length 5.843333 sepal_width 3.054000 petal_length 3.758667 petal_width 1.198667 dtype: float64
import seaborn as sns
sns.pairplot(df, hue='species')
C:\Users\mclou\Anaconda3\lib\site-packages\scipy\stats\stats.py:1713: FutureWarning: Using a non-tuple sequence for multidimensional indexing is deprecated; use `arr[tuple(seq)]` instead of `arr[seq]`. In the future this will be interpreted as an array index, `arr[np.array(seq)]`, which will result either in an error or a different result. return np.add.reduce(sorted[indexer] * weights, axis=axis) / sumval
<seaborn.axisgrid.PairGrid at 0x1d5fb084f28>