Student life is messy and stressful, eventually it might lead to depression. However, we all know that cracking open a cold one with your lads always helps. In this analysis, I, for no particular reasons, would like to welcome you all to dive into Alcohol Consumption by students. Enjoy :)
This dataset is provided and maintained by UCI. The data were obtained in a survey of students math secondary school.
# Data Manipulation
import pandas as pd
import numpy as np
import pandas_profiling
# Data Visualization
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
import seaborn as sns
from pywaffle import Waffle
mpl.style.use(['ggplot']) # optional: for ggplot-like style
sns.set(rc={'figure.figsize':(11.7,8.27)})
# Hide Warning
import warnings
warnings.filterwarnings('ignore')
# Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
# Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
df = pd.read_csv('student-mat.csv')
df.head()
school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | ... | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | G3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GP | F | 18 | U | GT3 | A | 4 | 4 | at_home | teacher | ... | 4 | 3 | 4 | 1 | 1 | 3 | 6 | 5 | 6 | 6 |
1 | GP | F | 17 | U | GT3 | T | 1 | 1 | at_home | other | ... | 5 | 3 | 3 | 1 | 1 | 3 | 4 | 5 | 5 | 6 |
2 | GP | F | 15 | U | LE3 | T | 1 | 1 | at_home | other | ... | 4 | 3 | 2 | 2 | 3 | 3 | 10 | 7 | 8 | 10 |
3 | GP | F | 15 | U | GT3 | T | 4 | 2 | health | services | ... | 3 | 2 | 2 | 1 | 1 | 5 | 2 | 15 | 14 | 15 |
4 | GP | F | 16 | U | GT3 | T | 3 | 3 | other | other | ... | 4 | 3 | 2 | 1 | 2 | 5 | 4 | 6 | 10 | 10 |
5 rows × 33 columns
# Expand to see the whole profile
pandas_profiling.ProfileReport(df)
Dataset info
Number of variables | 33 |
---|---|
Number of observations | 395 |
Total Missing (%) | 0.0% |
Total size in memory | 101.9 KiB |
Average record size in memory | 264.2 B |
Variables types
Numeric | 15 |
---|---|
Categorical | 17 |
Boolean | 0 |
Date | 0 |
Text (Unique) | 0 |
Rejected | 1 |
Unsupported | 0 |
Dalc
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 1.481 |
---|---|
Minimum | 1 |
Maximum | 5 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
Median | 1 |
Q3 | 2 |
95-th percentile | 3 |
Maximum | 5 |
Range | 4 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 0.89074 |
---|---|
Coef of variation | 0.60144 |
Kurtosis | 4.7595 |
Mean | 1.481 |
MAD | 0.6722 |
Skewness | 2.1908 |
Sum | 585 |
Variance | 0.79342 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
1 | 276 | 69.9% | |
2 | 75 | 19.0% | |
3 | 26 | 6.6% | |
5 | 9 | 2.3% | |
4 | 9 | 2.3% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 276 | 69.9% | |
2 | 75 | 19.0% | |
3 | 26 | 6.6% | |
4 | 9 | 2.3% | |
5 | 9 | 2.3% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 276 | 69.9% | |
2 | 75 | 19.0% | |
3 | 26 | 6.6% | |
4 | 9 | 2.3% | |
5 | 9 | 2.3% |
Fedu
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 2.5215 |
---|---|
Minimum | 0 |
Maximum | 4 |
Zeros (%) | 0.5% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 2 |
Median | 2 |
Q3 | 3 |
95-th percentile | 4 |
Maximum | 4 |
Range | 4 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 1.0882 |
---|---|
Coef of variation | 0.43157 |
Kurtosis | -1.1985 |
Mean | 2.5215 |
MAD | 0.96092 |
Skewness | -0.031672 |
Sum | 996 |
Variance | 1.1842 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
2 | 115 | 29.1% | |
3 | 100 | 25.3% | |
4 | 96 | 24.3% | |
1 | 82 | 20.8% | |
0 | 2 | 0.5% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 2 | 0.5% | |
1 | 82 | 20.8% | |
2 | 115 | 29.1% | |
3 | 100 | 25.3% | |
4 | 96 | 24.3% |
Maximum 5 values
Value | Count | Frequency (%) | |
0 | 2 | 0.5% | |
1 | 82 | 20.8% | |
2 | 115 | 29.1% | |
3 | 100 | 25.3% | |
4 | 96 | 24.3% |
Fjob
Categorical
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
other | |
---|---|
services | |
teacher | 29 |
Other values (2) | 38 |
Value | Count | Frequency (%) | |
other | 217 | 54.9% | |
services | 111 | 28.1% | |
teacher | 29 | 7.3% | |
at_home | 20 | 5.1% | |
health | 18 | 4.6% |
G1
Numeric
Distinct count | 17 |
---|---|
Unique (%) | 4.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 10.909 |
---|---|
Minimum | 3 |
Maximum | 19 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 3 |
---|---|
5-th percentile | 6 |
Q1 | 8 |
Median | 11 |
Q3 | 13 |
95-th percentile | 16 |
Maximum | 19 |
Range | 16 |
Interquartile range | 5 |
Descriptive statistics
Standard deviation | 3.3192 |
---|---|
Coef of variation | 0.30427 |
Kurtosis | -0.69383 |
Mean | 10.909 |
MAD | 2.7514 |
Skewness | 0.24061 |
Sum | 4309 |
Variance | 11.017 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
10 | 51 | 12.9% | |
8 | 41 | 10.4% | |
11 | 39 | 9.9% | |
7 | 37 | 9.4% | |
12 | 35 | 8.9% | |
13 | 33 | 8.4% | |
9 | 31 | 7.8% | |
14 | 30 | 7.6% | |
15 | 24 | 6.1% | |
6 | 24 | 6.1% | |
Other values (7) | 50 | 12.7% |
Minimum 5 values
Value | Count | Frequency (%) | |
3 | 1 | 0.3% | |
4 | 1 | 0.3% | |
5 | 7 | 1.8% | |
6 | 24 | 6.1% | |
7 | 37 | 9.4% |
Maximum 5 values
Value | Count | Frequency (%) | |
15 | 24 | 6.1% | |
16 | 22 | 5.6% | |
17 | 8 | 2.0% | |
18 | 8 | 2.0% | |
19 | 3 | 0.8% |
G2
Numeric
Distinct count | 17 |
---|---|
Unique (%) | 4.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 10.714 |
---|---|
Minimum | 0 |
Maximum | 19 |
Zeros (%) | 3.3% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 5 |
Q1 | 9 |
Median | 11 |
Q3 | 13 |
95-th percentile | 16.3 |
Maximum | 19 |
Range | 19 |
Interquartile range | 4 |
Descriptive statistics
Standard deviation | 3.7615 |
---|---|
Coef of variation | 0.35109 |
Kurtosis | 0.62771 |
Mean | 10.714 |
MAD | 2.9421 |
Skewness | -0.43165 |
Sum | 4232 |
Variance | 14.149 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
9 | 50 | 12.7% | |
10 | 46 | 11.6% | |
12 | 41 | 10.4% | |
13 | 37 | 9.4% | |
11 | 35 | 8.9% | |
15 | 34 | 8.6% | |
8 | 32 | 8.1% | |
14 | 23 | 5.8% | |
7 | 21 | 5.3% | |
5 | 15 | 3.8% | |
Other values (7) | 61 | 15.4% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 13 | 3.3% | |
4 | 1 | 0.3% | |
5 | 15 | 3.8% | |
6 | 14 | 3.5% | |
7 | 21 | 5.3% |
Maximum 5 values
Value | Count | Frequency (%) | |
15 | 34 | 8.6% | |
16 | 13 | 3.3% | |
17 | 5 | 1.3% | |
18 | 12 | 3.0% | |
19 | 3 | 0.8% |
G3
Highly correlated
This variable is highly correlated with G2
and should be ignored for analysis
Correlation | 0.90487 |
---|
Medu
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 2.7494 |
---|---|
Minimum | 0 |
Maximum | 4 |
Zeros (%) | 0.8% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 1 |
Q1 | 2 |
Median | 3 |
Q3 | 4 |
95-th percentile | 4 |
Maximum | 4 |
Range | 4 |
Interquartile range | 2 |
Descriptive statistics
Standard deviation | 1.0947 |
---|---|
Coef of variation | 0.39818 |
Kurtosis | -1.09 |
Mean | 2.7494 |
MAD | 0.95517 |
Skewness | -0.31838 |
Sum | 1086 |
Variance | 1.1984 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
4 | 131 | 33.2% | |
2 | 103 | 26.1% | |
3 | 99 | 25.1% | |
1 | 59 | 14.9% | |
0 | 3 | 0.8% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 3 | 0.8% | |
1 | 59 | 14.9% | |
2 | 103 | 26.1% | |
3 | 99 | 25.1% | |
4 | 131 | 33.2% |
Maximum 5 values
Value | Count | Frequency (%) | |
0 | 3 | 0.8% | |
1 | 59 | 14.9% | |
2 | 103 | 26.1% | |
3 | 99 | 25.1% | |
4 | 131 | 33.2% |
Mjob
Categorical
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
other | |
---|---|
services | |
at_home | |
Other values (2) |
Value | Count | Frequency (%) | |
other | 141 | 35.7% | |
services | 103 | 26.1% | |
at_home | 59 | 14.9% | |
teacher | 58 | 14.7% | |
health | 34 | 8.6% |
Pstatus
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
T | |
---|---|
A | 41 |
Value | Count | Frequency (%) | |
T | 354 | 89.6% | |
A | 41 | 10.4% |
Walc
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 2.2911 |
---|---|
Minimum | 1 |
Maximum | 5 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
Median | 2 |
Q3 | 3 |
95-th percentile | 5 |
Maximum | 5 |
Range | 4 |
Interquartile range | 2 |
Descriptive statistics
Standard deviation | 1.2879 |
---|---|
Coef of variation | 0.56212 |
Kurtosis | -0.79085 |
Mean | 2.2911 |
MAD | 1.1124 |
Skewness | 0.61196 |
Sum | 905 |
Variance | 1.6587 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
1 | 151 | 38.2% | |
2 | 85 | 21.5% | |
3 | 80 | 20.3% | |
4 | 51 | 12.9% | |
5 | 28 | 7.1% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 151 | 38.2% | |
2 | 85 | 21.5% | |
3 | 80 | 20.3% | |
4 | 51 | 12.9% | |
5 | 28 | 7.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 151 | 38.2% | |
2 | 85 | 21.5% | |
3 | 80 | 20.3% | |
4 | 51 | 12.9% | |
5 | 28 | 7.1% |
absences
Numeric
Distinct count | 34 |
---|---|
Unique (%) | 8.6% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 5.7089 |
---|---|
Minimum | 0 |
Maximum | 75 |
Zeros (%) | 29.1% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
Median | 4 |
Q3 | 8 |
95-th percentile | 18.3 |
Maximum | 75 |
Range | 75 |
Interquartile range | 8 |
Descriptive statistics
Standard deviation | 8.0031 |
---|---|
Coef of variation | 1.4019 |
Kurtosis | 21.719 |
Mean | 5.7089 |
MAD | 5.2026 |
Skewness | 3.6716 |
Sum | 2255 |
Variance | 64.05 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
0 | 115 | 29.1% | |
2 | 65 | 16.5% | |
4 | 53 | 13.4% | |
6 | 31 | 7.8% | |
8 | 22 | 5.6% | |
10 | 17 | 4.3% | |
14 | 12 | 3.0% | |
12 | 12 | 3.0% | |
3 | 8 | 2.0% | |
7 | 7 | 1.8% | |
Other values (24) | 53 | 13.4% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 115 | 29.1% | |
1 | 3 | 0.8% | |
2 | 65 | 16.5% | |
3 | 8 | 2.0% | |
4 | 53 | 13.4% |
Maximum 5 values
Value | Count | Frequency (%) | |
38 | 1 | 0.3% | |
40 | 1 | 0.3% | |
54 | 1 | 0.3% | |
56 | 1 | 0.3% | |
75 | 1 | 0.3% |
activities
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
yes | |
---|---|
no |
Value | Count | Frequency (%) | |
yes | 201 | 50.9% | |
no | 194 | 49.1% |
address
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
U | |
---|---|
R |
Value | Count | Frequency (%) | |
U | 307 | 77.7% | |
R | 88 | 22.3% |
age
Numeric
Distinct count | 8 |
---|---|
Unique (%) | 2.0% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 16.696 |
---|---|
Minimum | 15 |
Maximum | 22 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 15 |
---|---|
5-th percentile | 15 |
Q1 | 16 |
Median | 17 |
Q3 | 18 |
95-th percentile | 19 |
Maximum | 22 |
Range | 7 |
Interquartile range | 2 |
Descriptive statistics
Standard deviation | 1.276 |
---|---|
Coef of variation | 0.076427 |
Kurtosis | -0.0012218 |
Mean | 16.696 |
MAD | 1.0709 |
Skewness | 0.46627 |
Sum | 6595 |
Variance | 1.6283 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
16 | 104 | 26.3% | |
17 | 98 | 24.8% | |
18 | 82 | 20.8% | |
15 | 82 | 20.8% | |
19 | 24 | 6.1% | |
20 | 3 | 0.8% | |
22 | 1 | 0.3% | |
21 | 1 | 0.3% |
Minimum 5 values
Value | Count | Frequency (%) | |
15 | 82 | 20.8% | |
16 | 104 | 26.3% | |
17 | 98 | 24.8% | |
18 | 82 | 20.8% | |
19 | 24 | 6.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
18 | 82 | 20.8% | |
19 | 24 | 6.1% | |
20 | 3 | 0.8% | |
21 | 1 | 0.3% | |
22 | 1 | 0.3% |
failures
Numeric
Distinct count | 4 |
---|---|
Unique (%) | 1.0% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 0.33418 |
---|---|
Minimum | 0 |
Maximum | 3 |
Zeros (%) | 79.0% |
Quantile statistics
Minimum | 0 |
---|---|
5-th percentile | 0 |
Q1 | 0 |
Median | 0 |
Q3 | 0 |
95-th percentile | 2 |
Maximum | 3 |
Range | 3 |
Interquartile range | 0 |
Descriptive statistics
Standard deviation | 0.74365 |
---|---|
Coef of variation | 2.2253 |
Kurtosis | 5.0047 |
Mean | 0.33418 |
MAD | 0.52792 |
Skewness | 2.387 |
Sum | 132 |
Variance | 0.55302 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
0 | 312 | 79.0% | |
1 | 50 | 12.7% | |
2 | 17 | 4.3% | |
3 | 16 | 4.1% |
Minimum 5 values
Value | Count | Frequency (%) | |
0 | 312 | 79.0% | |
1 | 50 | 12.7% | |
2 | 17 | 4.3% | |
3 | 16 | 4.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
0 | 312 | 79.0% | |
1 | 50 | 12.7% | |
2 | 17 | 4.3% | |
3 | 16 | 4.1% |
famrel
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 3.9443 |
---|---|
Minimum | 1 |
Maximum | 5 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 2 |
Q1 | 4 |
Median | 4 |
Q3 | 5 |
95-th percentile | 5 |
Maximum | 5 |
Range | 4 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 0.89666 |
---|---|
Coef of variation | 0.22733 |
Kurtosis | 1.1398 |
Mean | 3.9443 |
MAD | 0.62159 |
Skewness | -0.95188 |
Sum | 1558 |
Variance | 0.804 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
4 | 195 | 49.4% | |
5 | 106 | 26.8% | |
3 | 68 | 17.2% | |
2 | 18 | 4.6% | |
1 | 8 | 2.0% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 8 | 2.0% | |
2 | 18 | 4.6% | |
3 | 68 | 17.2% | |
4 | 195 | 49.4% | |
5 | 106 | 26.8% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 8 | 2.0% | |
2 | 18 | 4.6% | |
3 | 68 | 17.2% | |
4 | 195 | 49.4% | |
5 | 106 | 26.8% |
famsize
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
GT3 | |
---|---|
LE3 |
Value | Count | Frequency (%) | |
GT3 | 281 | 71.1% | |
LE3 | 114 | 28.9% |
famsup
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
yes | |
---|---|
no |
Value | Count | Frequency (%) | |
yes | 242 | 61.3% | |
no | 153 | 38.7% |
freetime
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 3.2354 |
---|---|
Minimum | 1 |
Maximum | 5 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 2 |
Q1 | 3 |
Median | 3 |
Q3 | 4 |
95-th percentile | 5 |
Maximum | 5 |
Range | 4 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 0.99886 |
---|---|
Coef of variation | 0.30872 |
Kurtosis | -0.30181 |
Mean | 3.2354 |
MAD | 0.80256 |
Skewness | -0.16335 |
Sum | 1278 |
Variance | 0.99773 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
3 | 157 | 39.7% | |
4 | 115 | 29.1% | |
2 | 64 | 16.2% | |
5 | 40 | 10.1% | |
1 | 19 | 4.8% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 19 | 4.8% | |
2 | 64 | 16.2% | |
3 | 157 | 39.7% | |
4 | 115 | 29.1% | |
5 | 40 | 10.1% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 19 | 4.8% | |
2 | 64 | 16.2% | |
3 | 157 | 39.7% | |
4 | 115 | 29.1% | |
5 | 40 | 10.1% |
goout
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 3.1089 |
---|---|
Minimum | 1 |
Maximum | 5 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 2 |
Median | 3 |
Q3 | 4 |
95-th percentile | 5 |
Maximum | 5 |
Range | 4 |
Interquartile range | 2 |
Descriptive statistics
Standard deviation | 1.1133 |
---|---|
Coef of variation | 0.3581 |
Kurtosis | -0.77025 |
Mean | 3.1089 |
MAD | 0.89554 |
Skewness | 0.1165 |
Sum | 1228 |
Variance | 1.2394 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
3 | 130 | 32.9% | |
2 | 103 | 26.1% | |
4 | 86 | 21.8% | |
5 | 53 | 13.4% | |
1 | 23 | 5.8% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 23 | 5.8% | |
2 | 103 | 26.1% | |
3 | 130 | 32.9% | |
4 | 86 | 21.8% | |
5 | 53 | 13.4% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 23 | 5.8% | |
2 | 103 | 26.1% | |
3 | 130 | 32.9% | |
4 | 86 | 21.8% | |
5 | 53 | 13.4% |
guardian
Categorical
Distinct count | 3 |
---|---|
Unique (%) | 0.8% |
Missing (%) | 0.0% |
Missing (n) | 0 |
mother | |
---|---|
father | |
other | 32 |
Value | Count | Frequency (%) | |
mother | 273 | 69.1% | |
father | 90 | 22.8% | |
other | 32 | 8.1% |
health
Numeric
Distinct count | 5 |
---|---|
Unique (%) | 1.3% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 3.5544 |
---|---|
Minimum | 1 |
Maximum | 5 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 3 |
Median | 4 |
Q3 | 5 |
95-th percentile | 5 |
Maximum | 5 |
Range | 4 |
Interquartile range | 2 |
Descriptive statistics
Standard deviation | 1.3903 |
---|---|
Coef of variation | 0.39115 |
Kurtosis | -1.0141 |
Mean | 3.5544 |
MAD | 1.2175 |
Skewness | -0.4946 |
Sum | 1404 |
Variance | 1.9329 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
5 | 146 | 37.0% | |
3 | 91 | 23.0% | |
4 | 66 | 16.7% | |
1 | 47 | 11.9% | |
2 | 45 | 11.4% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 47 | 11.9% | |
2 | 45 | 11.4% | |
3 | 91 | 23.0% | |
4 | 66 | 16.7% | |
5 | 146 | 37.0% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 47 | 11.9% | |
2 | 45 | 11.4% | |
3 | 91 | 23.0% | |
4 | 66 | 16.7% | |
5 | 146 | 37.0% |
higher
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
yes | |
---|---|
no | 20 |
Value | Count | Frequency (%) | |
yes | 375 | 94.9% | |
no | 20 | 5.1% |
internet
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
yes | |
---|---|
no | 66 |
Value | Count | Frequency (%) | |
yes | 329 | 83.3% | |
no | 66 | 16.7% |
nursery
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
yes | |
---|---|
no |
Value | Count | Frequency (%) | |
yes | 314 | 79.5% | |
no | 81 | 20.5% |
paid
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
no | |
---|---|
yes |
Value | Count | Frequency (%) | |
no | 214 | 54.2% | |
yes | 181 | 45.8% |
reason
Categorical
Distinct count | 4 |
---|---|
Unique (%) | 1.0% |
Missing (%) | 0.0% |
Missing (n) | 0 |
course | |
---|---|
home | |
reputation |
Value | Count | Frequency (%) | |
course | 145 | 36.7% | |
home | 109 | 27.6% | |
reputation | 105 | 26.6% | |
other | 36 | 9.1% |
romantic
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
no | |
---|---|
yes |
Value | Count | Frequency (%) | |
no | 263 | 66.6% | |
yes | 132 | 33.4% |
school
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
GP | |
---|---|
MS | 46 |
Value | Count | Frequency (%) | |
GP | 349 | 88.4% | |
MS | 46 | 11.6% |
schoolsup
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
no | |
---|---|
yes | 51 |
Value | Count | Frequency (%) | |
no | 344 | 87.1% | |
yes | 51 | 12.9% |
sex
Categorical
Distinct count | 2 |
---|---|
Unique (%) | 0.5% |
Missing (%) | 0.0% |
Missing (n) | 0 |
F | |
---|---|
M |
Value | Count | Frequency (%) | |
F | 208 | 52.7% | |
M | 187 | 47.3% |
studytime
Numeric
Distinct count | 4 |
---|---|
Unique (%) | 1.0% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 2.0354 |
---|---|
Minimum | 1 |
Maximum | 4 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
Median | 2 |
Q3 | 2 |
95-th percentile | 4 |
Maximum | 4 |
Range | 3 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 0.83924 |
---|---|
Coef of variation | 0.41231 |
Kurtosis | -0.014432 |
Mean | 2.0354 |
MAD | 0.58602 |
Skewness | 0.63214 |
Sum | 804 |
Variance | 0.70432 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
2 | 198 | 50.1% | |
1 | 105 | 26.6% | |
3 | 65 | 16.5% | |
4 | 27 | 6.8% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 105 | 26.6% | |
2 | 198 | 50.1% | |
3 | 65 | 16.5% | |
4 | 27 | 6.8% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 105 | 26.6% | |
2 | 198 | 50.1% | |
3 | 65 | 16.5% | |
4 | 27 | 6.8% |
traveltime
Numeric
Distinct count | 4 |
---|---|
Unique (%) | 1.0% |
Missing (%) | 0.0% |
Missing (n) | 0 |
Infinite (%) | 0.0% |
Infinite (n) | 0 |
Mean | 1.4481 |
---|---|
Minimum | 1 |
Maximum | 4 |
Zeros (%) | 0.0% |
Quantile statistics
Minimum | 1 |
---|---|
5-th percentile | 1 |
Q1 | 1 |
Median | 1 |
Q3 | 2 |
95-th percentile | 3 |
Maximum | 4 |
Range | 3 |
Interquartile range | 1 |
Descriptive statistics
Standard deviation | 0.6975 |
---|---|
Coef of variation | 0.48167 |
Kurtosis | 2.3442 |
Mean | 1.4481 |
MAD | 0.5831 |
Skewness | 1.607 |
Sum | 572 |
Variance | 0.48651 |
Memory size | 3.2 KiB |
Value | Count | Frequency (%) | |
1 | 257 | 65.1% | |
2 | 107 | 27.1% | |
3 | 23 | 5.8% | |
4 | 8 | 2.0% |
Minimum 5 values
Value | Count | Frequency (%) | |
1 | 257 | 65.1% | |
2 | 107 | 27.1% | |
3 | 23 | 5.8% | |
4 | 8 | 2.0% |
Maximum 5 values
Value | Count | Frequency (%) | |
1 | 257 | 65.1% | |
2 | 107 | 27.1% | |
3 | 23 | 5.8% | |
4 | 8 | 2.0% |
# Checking normal distribution
df.hist()
array([[<matplotlib.axes._subplots.AxesSubplot object at 0x0000020B81EF12E8>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B81F22B70>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B8223E400>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B82263C88>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020B82292630>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B822B7F98>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B822E6940>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B82312320>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020B82312358>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B823675F8>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B8238CF60>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B823B8908>], [<matplotlib.axes._subplots.AxesSubplot object at 0x0000020B823E72B0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B8240BC18>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B8243B5C0>, <matplotlib.axes._subplots.AxesSubplot object at 0x0000020B810F2860>]], dtype=object)
Even though some of data are not normalized, this is just for fun, why so serious? :)
# Gender on Alcohol Consumption
ax = df.groupby('sex')[['Dalc', 'Walc']].mean().plot.bar()
ax.set_title('Gender on Alcohol Consumption')
ax.set_ylabel('Average of Alcohol Consumption')
ax.set_xlabel('Gender')
df.groupby('sex')[['Dalc', 'Walc']].mean().head()
Dalc | Walc | |
---|---|---|
sex | ||
F | 1.254808 | 1.956731 |
M | 1.732620 | 2.663102 |
As can be observed, males consumed alcohol more than female students both weekdays and weekends. But it doesn't seem significantly different.
** Dalc - workday alcohol consumption (numeric: from 1 - very low to 5 - very high)
** Walc - weekend alcohol consumption (numeric: from 1 - very low to 5 - very high)
# Parent's cohabitation status on Alcohol Consumption
ax = df.groupby('Pstatus')[['Dalc', 'Walc']].mean().plot.bar()
ax.set_title('Parent\'s cohabitation status on Alcohol Consumption')
ax.set_ylabel('Sum of Alcohol Consumption')
ax.set_xlabel('Parent\'s cohabitation status')
df.groupby('Pstatus')[['Dalc', 'Walc']].mean().head()
Dalc | Walc | |
---|---|---|
Pstatus | ||
A | 1.560976 | 2.268293 |
T | 1.471751 | 2.293785 |
By exploring Parent's cohabitation status, it was found that students whose parents were living apart (A) had a slightly higher alcohol consumption rate than the other. However, it should be noted that (A) reprents only 10% of all students, it still shows a higher rate. Therefore, it can be concluded that students whose parents were living apart comsumed alcohol significantly higher than the students whose parents were living together.
temp_data = df.groupby('guardian')['TotalAlc'].count()
df_gu_count = temp_data.to_dict()
ax = plt.figure(
FigureClass=Waffle,
rows=10,
values=df_gu_count,
legend={'loc': 'upper left', 'bbox_to_anchor': (1.1, 1)}
)
plt.title('Students\' Guardians')
print(df_gu_count)
{'father': 90, 'mother': 273, 'other': 32}
According to the data collected, most of students lived with their mothers, followed by fathers and other consequently.
df_gu = df.groupby('guardian', axis=0).mean()
colors_list = ['gold', 'yellowgreen', 'lightcoral', 'lightskyblue', 'lightgreen', 'pink']
explode_list = [0.1, 0, 0] # ratio for each continent with which to offset each wedge.
ax = df_gu['TotalAlc'].plot(kind='pie',
figsize=(15, 10),
autopct='%1.1f%%',
startangle=90,
shadow=True,
labels=None, # turn off labels on pie chart
pctdistance=1.12, # the ratio between the center of each pie slice and the start of the text generated by autopct
colors=colors_list, # add custom colors
explode=explode_list
)
ax.set_title('Alcohol Consumption based on Students\' Guardians')
ax.legend(df_gu.index.tolist())
df_gu.head()
age | Medu | Fedu | traveltime | studytime | failures | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | G3 | TotalAlc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
guardian | |||||||||||||||||
father | 16.433333 | 2.655556 | 2.744444 | 1.488889 | 2.044444 | 0.277778 | 3.911111 | 3.211111 | 2.944444 | 1.533333 | 2.344444 | 3.700000 | 3.977778 | 11.111111 | 11.155556 | 10.688889 | 3.877778 |
mother | 16.582418 | 2.831502 | 2.487179 | 1.421245 | 2.021978 | 0.267399 | 3.937729 | 3.216117 | 3.168498 | 1.450549 | 2.296703 | 3.531136 | 5.835165 | 10.882784 | 10.677656 | 10.483516 | 3.747253 |
other | 18.406250 | 2.312500 | 2.187500 | 1.562500 | 2.125000 | 1.062500 | 4.093750 | 3.468750 | 3.062500 | 1.593750 | 2.093750 | 3.343750 | 9.500000 | 10.562500 | 9.781250 | 9.062500 | 3.687500 |
As can be observed from the pie chart depicted above, the average of alcohol consumption is almost identical. However, the students whose guardians are fathers had a slightly higher of alcohol consumption (34.3%)
df_abs = df.groupby('TotalAlc')[['absences', 'failures', 'studytime', 'traveltime']].mean()
ax = df_abs.plot(kind='area',
alpha=0.3, # 0-1, default value a= 0.5
stacked=True,
figsize=(20, 10),
)
ax.set_title('Alcohol Consumption with Various Activities')
ax.set_ylabel('Level of Activities')
ax.set_xlabel('Alcohol Consumption Levels (1-10)')
Text(0.5, 0, 'Alcohol Consumption Levels (1-10)')