If you haven't already, follow the setup instructions here to get all necessary software installed.
The first thing I do is import pandas.
import pandas as pd
That line simply imports the pandas
module and calls it pd
.
%pylab inline
This line allows you to view plots you graph inside of your Jupyter Notebook.
import pandas as pd
%pylab inline
Populating the interactive namespace from numpy and matplotlib
The below line of code will import the student-mat.csv
file as a pandas DataFrame object. Note that it specifies how the data is seperated (e.g., using a ';').
NOTE: You'll need to change the following line of code to be the full path to YOUR .csv file.
dataframe = pd.read_csv('/Users/cshan/Downloads/student/student-mat.csv', sep=';')
The .head()
command will show the first 5 lines of the DataFrame.
dataframe.head()
school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | ... | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | G3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GP | F | 18 | U | GT3 | A | 4 | 4 | at_home | teacher | ... | 4 | 3 | 4 | 1 | 1 | 3 | 6 | 5 | 6 | 6 |
1 | GP | F | 17 | U | GT3 | T | 1 | 1 | at_home | other | ... | 5 | 3 | 3 | 1 | 1 | 3 | 4 | 5 | 5 | 6 |
2 | GP | F | 15 | U | LE3 | T | 1 | 1 | at_home | other | ... | 4 | 3 | 2 | 2 | 3 | 3 | 10 | 7 | 8 | 10 |
3 | GP | F | 15 | U | GT3 | T | 4 | 2 | health | services | ... | 3 | 2 | 2 | 1 | 1 | 5 | 2 | 15 | 14 | 15 |
4 | GP | F | 16 | U | GT3 | T | 3 | 3 | other | other | ... | 4 | 3 | 2 | 1 | 2 | 5 | 4 | 6 | 10 | 10 |
5 rows × 33 columns
The .columns
attribute returns a list of column names.
dataframe.columns
Index(['school', 'sex', 'age', 'address', 'famsize', 'Pstatus', 'Medu', 'Fedu', 'Mjob', 'Fjob', 'reason', 'guardian', 'traveltime', 'studytime', 'failures', 'schoolsup', 'famsup', 'paid', 'activities', 'nursery', 'higher', 'internet', 'romantic', 'famrel', 'freetime', 'goout', 'Dalc', 'Walc', 'health', 'absences', 'G1', 'G2', 'G3'], dtype='object')
The .describe()
method returns another DataFrame with descriptive statistics on all of the numerical columns, including the average, quartiles and min/max.
dataframe.describe()
age | Medu | Fedu | traveltime | studytime | failures | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | G3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
count | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 | 395.000000 |
mean | 16.696203 | 2.749367 | 2.521519 | 1.448101 | 2.035443 | 0.334177 | 3.944304 | 3.235443 | 3.108861 | 1.481013 | 2.291139 | 3.554430 | 5.708861 | 10.908861 | 10.713924 | 10.415190 |
std | 1.276043 | 1.094735 | 1.088201 | 0.697505 | 0.839240 | 0.743651 | 0.896659 | 0.998862 | 1.113278 | 0.890741 | 1.287897 | 1.390303 | 8.003096 | 3.319195 | 3.761505 | 4.581443 |
min | 15.000000 | 0.000000 | 0.000000 | 1.000000 | 1.000000 | 0.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 1.000000 | 0.000000 | 3.000000 | 0.000000 | 0.000000 |
25% | 16.000000 | 2.000000 | 2.000000 | 1.000000 | 1.000000 | 0.000000 | 4.000000 | 3.000000 | 2.000000 | 1.000000 | 1.000000 | 3.000000 | 0.000000 | 8.000000 | 9.000000 | 8.000000 |
50% | 17.000000 | 3.000000 | 2.000000 | 1.000000 | 2.000000 | 0.000000 | 4.000000 | 3.000000 | 3.000000 | 1.000000 | 2.000000 | 4.000000 | 4.000000 | 11.000000 | 11.000000 | 11.000000 |
75% | 18.000000 | 4.000000 | 3.000000 | 2.000000 | 2.000000 | 0.000000 | 5.000000 | 4.000000 | 4.000000 | 2.000000 | 3.000000 | 5.000000 | 8.000000 | 13.000000 | 13.000000 | 14.000000 |
max | 22.000000 | 4.000000 | 4.000000 | 4.000000 | 4.000000 | 3.000000 | 5.000000 | 5.000000 | 5.000000 | 5.000000 | 5.000000 | 5.000000 | 75.000000 | 19.000000 | 19.000000 | 20.000000 |
Here, I'm indexing into just the 'age' column of my DataFrame and calling the .hist()
method to plot a histogram.
dataframe['age'].hist()
<matplotlib.axes._subplots.AxesSubplot at 0x107b3d6a0>
Below I'm indexing into my DataFrame to only get the rows where the 'sex' of the student is 'F'.
dataframe[dataframe['sex'] == 'F']
school | sex | age | address | famsize | Pstatus | Medu | Fedu | Mjob | Fjob | ... | famrel | freetime | goout | Dalc | Walc | health | absences | G1 | G2 | G3 | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | GP | F | 18 | U | GT3 | A | 4 | 4 | at_home | teacher | ... | 4 | 3 | 4 | 1 | 1 | 3 | 6 | 5 | 6 | 6 |
1 | GP | F | 17 | U | GT3 | T | 1 | 1 | at_home | other | ... | 5 | 3 | 3 | 1 | 1 | 3 | 4 | 5 | 5 | 6 |
2 | GP | F | 15 | U | LE3 | T | 1 | 1 | at_home | other | ... | 4 | 3 | 2 | 2 | 3 | 3 | 10 | 7 | 8 | 10 |
3 | GP | F | 15 | U | GT3 | T | 4 | 2 | health | services | ... | 3 | 2 | 2 | 1 | 1 | 5 | 2 | 15 | 14 | 15 |
4 | GP | F | 16 | U | GT3 | T | 3 | 3 | other | other | ... | 4 | 3 | 2 | 1 | 2 | 5 | 4 | 6 | 10 | 10 |
7 | GP | F | 17 | U | GT3 | A | 4 | 4 | other | teacher | ... | 4 | 1 | 4 | 1 | 1 | 1 | 6 | 6 | 5 | 6 |
10 | GP | F | 15 | U | GT3 | T | 4 | 4 | teacher | health | ... | 3 | 3 | 3 | 1 | 2 | 2 | 0 | 10 | 8 | 9 |
11 | GP | F | 15 | U | GT3 | T | 2 | 1 | services | other | ... | 5 | 2 | 2 | 1 | 1 | 4 | 4 | 10 | 12 | 12 |
15 | GP | F | 16 | U | GT3 | T | 4 | 4 | health | other | ... | 4 | 4 | 4 | 1 | 2 | 2 | 4 | 14 | 14 | 14 |
16 | GP | F | 16 | U | GT3 | T | 4 | 4 | services | services | ... | 3 | 2 | 3 | 1 | 2 | 2 | 6 | 13 | 14 | 14 |
17 | GP | F | 16 | U | GT3 | T | 3 | 3 | other | other | ... | 5 | 3 | 2 | 1 | 1 | 4 | 4 | 8 | 10 | 10 |
24 | GP | F | 15 | R | GT3 | T | 2 | 4 | services | health | ... | 4 | 3 | 2 | 1 | 1 | 5 | 2 | 10 | 9 | 8 |
25 | GP | F | 16 | U | GT3 | T | 2 | 2 | services | services | ... | 1 | 2 | 2 | 1 | 3 | 5 | 14 | 6 | 9 | 8 |
35 | GP | F | 15 | U | GT3 | T | 2 | 3 | other | other | ... | 3 | 5 | 1 | 1 | 1 | 5 | 0 | 8 | 7 | 6 |
38 | GP | F | 15 | R | GT3 | T | 3 | 4 | services | health | ... | 4 | 3 | 2 | 1 | 1 | 5 | 2 | 12 | 12 | 11 |
39 | GP | F | 15 | R | GT3 | T | 2 | 2 | at_home | other | ... | 4 | 3 | 1 | 1 | 1 | 2 | 8 | 14 | 13 | 13 |
40 | GP | F | 16 | U | LE3 | T | 2 | 2 | other | other | ... | 3 | 3 | 3 | 1 | 2 | 3 | 25 | 7 | 10 | 11 |
44 | GP | F | 16 | U | LE3 | T | 2 | 2 | other | at_home | ... | 4 | 3 | 3 | 2 | 2 | 5 | 14 | 10 | 10 | 9 |
45 | GP | F | 15 | U | LE3 | A | 4 | 3 | other | other | ... | 5 | 2 | 2 | 1 | 1 | 5 | 8 | 8 | 8 | 6 |
46 | GP | F | 16 | U | LE3 | A | 3 | 3 | other | services | ... | 2 | 3 | 5 | 1 | 4 | 3 | 12 | 11 | 12 | 11 |
49 | GP | F | 15 | U | GT3 | T | 4 | 4 | services | teacher | ... | 4 | 4 | 4 | 1 | 1 | 3 | 2 | 7 | 7 | 7 |
50 | GP | F | 16 | U | LE3 | T | 2 | 2 | services | services | ... | 4 | 3 | 3 | 2 | 3 | 4 | 2 | 12 | 13 | 13 |
51 | GP | F | 15 | U | LE3 | T | 4 | 2 | health | other | ... | 4 | 3 | 3 | 1 | 1 | 5 | 2 | 11 | 13 | 13 |
53 | GP | F | 15 | U | GT3 | T | 4 | 4 | services | services | ... | 3 | 3 | 4 | 2 | 3 | 5 | 0 | 8 | 10 | 11 |
54 | GP | F | 15 | U | LE3 | A | 3 | 3 | other | other | ... | 5 | 3 | 4 | 4 | 4 | 1 | 6 | 10 | 13 | 13 |
55 | GP | F | 16 | U | GT3 | A | 2 | 1 | other | other | ... | 5 | 3 | 4 | 1 | 1 | 2 | 8 | 8 | 9 | 10 |
56 | GP | F | 15 | U | GT3 | A | 4 | 3 | services | services | ... | 4 | 3 | 2 | 1 | 1 | 1 | 0 | 14 | 15 | 15 |
59 | GP | F | 16 | U | GT3 | T | 4 | 2 | services | other | ... | 4 | 2 | 3 | 1 | 1 | 5 | 2 | 15 | 16 | 16 |
60 | GP | F | 16 | R | GT3 | T | 4 | 4 | health | teacher | ... | 2 | 4 | 4 | 2 | 3 | 4 | 6 | 10 | 11 | 11 |
61 | GP | F | 16 | U | GT3 | T | 1 | 1 | services | services | ... | 5 | 5 | 5 | 5 | 5 | 5 | 6 | 10 | 8 | 11 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
340 | GP | F | 19 | U | GT3 | T | 2 | 1 | services | services | ... | 4 | 3 | 4 | 1 | 3 | 3 | 4 | 11 | 12 | 11 |
343 | GP | F | 17 | U | GT3 | A | 2 | 2 | at_home | at_home | ... | 3 | 3 | 1 | 1 | 2 | 4 | 0 | 9 | 8 | 0 |
344 | GP | F | 18 | U | GT3 | T | 2 | 3 | at_home | other | ... | 4 | 3 | 3 | 1 | 2 | 3 | 4 | 11 | 10 | 10 |
345 | GP | F | 18 | U | GT3 | T | 3 | 2 | other | services | ... | 5 | 4 | 3 | 2 | 3 | 1 | 7 | 13 | 13 | 14 |
348 | GP | F | 17 | U | GT3 | T | 4 | 3 | health | other | ... | 4 | 4 | 3 | 1 | 3 | 4 | 0 | 13 | 15 | 15 |
355 | MS | F | 18 | U | GT3 | T | 3 | 3 | services | services | ... | 5 | 3 | 4 | 1 | 1 | 5 | 0 | 10 | 9 | 9 |
356 | MS | F | 17 | R | GT3 | T | 4 | 4 | teacher | services | ... | 4 | 3 | 3 | 1 | 2 | 5 | 4 | 12 | 13 | 13 |
357 | MS | F | 17 | U | LE3 | A | 3 | 2 | services | other | ... | 1 | 2 | 3 | 1 | 2 | 5 | 2 | 12 | 12 | 11 |
359 | MS | F | 18 | U | LE3 | T | 1 | 1 | at_home | services | ... | 5 | 3 | 2 | 1 | 1 | 4 | 0 | 18 | 16 | 16 |
360 | MS | F | 18 | R | LE3 | A | 1 | 4 | at_home | other | ... | 4 | 3 | 4 | 1 | 4 | 5 | 0 | 13 | 13 | 13 |
362 | MS | F | 18 | U | GT3 | T | 3 | 3 | services | services | ... | 4 | 3 | 2 | 1 | 3 | 3 | 0 | 11 | 11 | 10 |
363 | MS | F | 17 | U | LE3 | T | 4 | 4 | at_home | at_home | ... | 2 | 3 | 4 | 1 | 1 | 1 | 0 | 16 | 15 | 15 |
364 | MS | F | 17 | R | GT3 | T | 1 | 2 | other | services | ... | 3 | 2 | 2 | 1 | 2 | 3 | 0 | 12 | 11 | 12 |
367 | MS | F | 17 | R | GT3 | T | 1 | 1 | other | services | ... | 5 | 2 | 1 | 1 | 2 | 1 | 0 | 7 | 6 | 0 |
368 | MS | F | 18 | U | GT3 | T | 2 | 3 | at_home | services | ... | 5 | 2 | 3 | 1 | 2 | 4 | 0 | 11 | 10 | 10 |
369 | MS | F | 18 | R | GT3 | T | 4 | 4 | other | teacher | ... | 3 | 2 | 2 | 4 | 2 | 5 | 10 | 14 | 12 | 11 |
370 | MS | F | 19 | U | LE3 | T | 3 | 2 | services | services | ... | 3 | 2 | 2 | 1 | 1 | 3 | 4 | 7 | 7 | 9 |
372 | MS | F | 17 | U | GT3 | T | 2 | 2 | other | at_home | ... | 3 | 4 | 3 | 1 | 1 | 3 | 8 | 13 | 11 | 11 |
373 | MS | F | 17 | R | GT3 | T | 1 | 2 | other | other | ... | 3 | 5 | 5 | 1 | 3 | 1 | 14 | 6 | 5 | 5 |
374 | MS | F | 18 | R | LE3 | T | 4 | 4 | other | other | ... | 5 | 4 | 4 | 1 | 1 | 1 | 0 | 19 | 18 | 19 |
375 | MS | F | 18 | R | GT3 | T | 1 | 1 | other | other | ... | 4 | 3 | 2 | 1 | 2 | 4 | 2 | 8 | 8 | 10 |
376 | MS | F | 20 | U | GT3 | T | 4 | 2 | health | other | ... | 5 | 4 | 3 | 1 | 1 | 3 | 4 | 15 | 14 | 15 |
377 | MS | F | 18 | R | LE3 | T | 4 | 4 | teacher | services | ... | 5 | 4 | 3 | 3 | 4 | 2 | 4 | 8 | 9 | 10 |
378 | MS | F | 18 | U | GT3 | T | 3 | 3 | other | other | ... | 4 | 1 | 3 | 1 | 2 | 1 | 0 | 15 | 15 | 15 |
379 | MS | F | 17 | R | GT3 | T | 3 | 1 | at_home | other | ... | 4 | 5 | 4 | 2 | 3 | 1 | 17 | 10 | 10 | 10 |
385 | MS | F | 18 | R | GT3 | T | 2 | 2 | at_home | other | ... | 5 | 3 | 3 | 1 | 3 | 4 | 2 | 10 | 9 | 10 |
386 | MS | F | 18 | R | GT3 | T | 4 | 4 | teacher | at_home | ... | 4 | 4 | 3 | 2 | 2 | 5 | 7 | 6 | 5 | 6 |
387 | MS | F | 19 | R | GT3 | T | 2 | 3 | services | other | ... | 5 | 4 | 2 | 1 | 2 | 5 | 0 | 7 | 5 | 0 |
388 | MS | F | 18 | U | LE3 | T | 3 | 1 | teacher | services | ... | 4 | 3 | 4 | 1 | 1 | 1 | 0 | 7 | 9 | 8 |
389 | MS | F | 18 | U | GT3 | T | 1 | 1 | other | other | ... | 1 | 1 | 1 | 1 | 1 | 5 | 0 | 6 | 5 | 0 |
208 rows × 33 columns
Here's how I get the average age of students who were in a relationship.
dataframe[dataframe['romantic'] == 'yes']['age'].mean()
16.992424242424242