Check out Alcohol Consumption Exercises Video Tutorial to watch a data scientist go through the exercises
GroupBy can be summarized as Split-Apply-Combine.
Special thanks to: https://github.com/justmarkham for sharing the dataset and materials.
Check out this Diagram
import pandas as pd
drinks = pd.read_csv('https://raw.githubusercontent.com/justmarkham/DAT8/master/data/drinks.csv')
drinks.head()
country | beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | continent | |
---|---|---|---|---|---|---|
0 | Afghanistan | 0 | 0 | 0 | 0.0 | AS |
1 | Albania | 89 | 132 | 54 | 4.9 | EU |
2 | Algeria | 25 | 0 | 14 | 0.7 | AF |
3 | Andorra | 245 | 138 | 312 | 12.4 | EU |
4 | Angola | 217 | 57 | 45 | 5.9 | AF |
drinks.groupby('continent').beer_servings.mean()
continent AF 61.471698 AS 37.045455 EU 193.777778 OC 89.687500 SA 175.083333 Name: beer_servings, dtype: float64
drinks.groupby('continent').wine_servings.describe()
continent AF count 53.000000 mean 16.264151 std 38.846419 min 0.000000 25% 1.000000 50% 2.000000 75% 13.000000 max 233.000000 AS count 44.000000 mean 9.068182 std 21.667034 min 0.000000 25% 0.000000 50% 1.000000 75% 8.000000 max 123.000000 EU count 45.000000 mean 142.222222 std 97.421738 min 0.000000 25% 59.000000 50% 128.000000 75% 195.000000 max 370.000000 OC count 16.000000 mean 35.625000 std 64.555790 min 0.000000 25% 1.000000 50% 8.500000 75% 23.250000 max 212.000000 SA count 12.000000 mean 62.416667 std 88.620189 min 1.000000 25% 3.000000 50% 12.000000 75% 98.500000 max 221.000000 dtype: float64
drinks.groupby('continent').mean()
beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | |
---|---|---|---|---|
continent | ||||
AF | 61.471698 | 16.339623 | 16.264151 | 3.007547 |
AS | 37.045455 | 60.840909 | 9.068182 | 2.170455 |
EU | 193.777778 | 132.555556 | 142.222222 | 8.617778 |
OC | 89.687500 | 58.437500 | 35.625000 | 3.381250 |
SA | 175.083333 | 114.750000 | 62.416667 | 6.308333 |
drinks.groupby('continent').median()
beer_servings | spirit_servings | wine_servings | total_litres_of_pure_alcohol | |
---|---|---|---|---|
continent | ||||
AF | 32.0 | 3.0 | 2.0 | 2.30 |
AS | 17.5 | 16.0 | 1.0 | 1.20 |
EU | 219.0 | 122.0 | 128.0 | 10.00 |
OC | 52.5 | 37.0 | 8.5 | 1.75 |
SA | 162.5 | 108.5 | 12.0 | 6.85 |
drinks.groupby('continent').spirit_servings.agg(['mean', 'min', 'max'])
mean | min | max | |
---|---|---|---|
continent | |||
AF | 16.339623 | 0 | 152 |
AS | 60.840909 | 0 | 326 |
EU | 132.555556 | 0 | 373 |
OC | 58.437500 | 0 | 254 |
SA | 114.750000 | 25 | 302 |