This Jupyter notebook, for the second part of The Open University's Learn to code for Data Analysis course, has code examples and coding activities for you. Remember to run the code in this notebook before you start.
You'll come across steps in the course directing you to this notebook. Once you've done each exercise, go back to the corresponding step and mark it as complete.
import warnings
warnings.simplefilter('ignore', FutureWarning)
from pandas import *
To read a CSV file into a dataframe you need to call the pandas function called read_csv()
. The simplest usage of this function is with a single argument, a string that holds the name of the CSV file, for example.
df = read_csv('WHO POP TB all.csv')
A dataframe attribute is like a variable that can only be accessed in the context of a dataframe. One such attribute is columns
which holds a dataframe's column names.
df.columns
Index(['Country', 'Population (1000s)', 'TB deaths'], dtype='object')
A dataframe has a default integer index for its rows, which starts at zero. The iloc
attribute can be used to obtain the row at the given index.
df.iloc[0] # first row, index 0
Country Afghanistan Population (1000s) 30552 TB deaths 13000 Name: 0, dtype: object
df.iloc[2] # third row, index 2
Country Algeria Population (1000s) 39208 TB deaths 5100 Name: 2, dtype: object
head()
method¶The head()
method returns a dataframe with the first rows, as many as given in the argument. By default, if the argument is missing, it returns the first five rows.
df.head() # first five rows
Country | Population (1000s) | TB deaths | |
---|---|---|---|
0 | Afghanistan | 30552 | 13000.00 |
1 | Albania | 3173 | 20.00 |
2 | Algeria | 39208 | 5100.00 |
3 | Andorra | 79 | 0.26 |
4 | Angola | 21472 | 6900.00 |
df.head(7) # first seven rows
Country | Population (1000s) | TB deaths | |
---|---|---|---|
0 | Afghanistan | 30552 | 13000.00 |
1 | Albania | 3173 | 20.00 |
2 | Algeria | 39208 | 5100.00 |
3 | Andorra | 79 | 0.26 |
4 | Angola | 21472 | 6900.00 |
5 | Antigua and Barbuda | 90 | 1.20 |
6 | Argentina | 41446 | 570.00 |
tail()
method¶The tail()
method is similar to the head()
method. If no argument is used, the last five rows of the dataframe are returned, otherwise the number of rows returned is dependent on the argument.
df.tail() # last five rows
Country | Population (1000s) | TB deaths | |
---|---|---|---|
189 | Venezuela (Bolivarian Republic of) | 30405 | 480.0 |
190 | Viet Nam | 91680 | 17000.0 |
191 | Yemen | 24407 | 990.0 |
192 | Zambia | 14539 | 3600.0 |
193 | Zimbabwe | 14150 | 5700.0 |
You found in Week 1 that you can select and display the values for a single dataframe column by puting the name of the column (in quotes) within square brackets immediately after the dataframe's name. However you can also select and display the values for multiple columns too.
To get multiple columns you need to use a list. A list in Python is a number of items separated by commas within square brackets, for example ['Country', 'Population (1000s)']
. This list is then put within square brackets immediately after the dataframe's name. The resulting expression represents a new dataframe, just with those columns, and therefore any dataframe method can be applied to it, like for example head()
:
df[['Country', 'Population (1000s)']].head()
Country | Population (1000s) | |
---|---|---|
0 | Afghanistan | 30552 |
1 | Albania | 3173 |
2 | Algeria | 39208 |
3 | Andorra | 79 |
4 | Angola | 21472 |
The iloc
attribute and the head()
and tail()
methods discussed above can be used with single columns.
df['TB deaths'].iloc[2] # third value of deaths column
5100.0
df['Population (1000s)'].tail() # last five values of population column
189 30405 190 91680 191 24407 192 14539 193 14150 Name: Population (1000s), dtype: int64
In the code cell below, write the code to get and display the 55th row in the dataframe df
.
In the code cell below write the code to display the first 10 rows of the dataframe df
.
In the code cell below, select and display the first eight rows from the 'Country'
and 'TB deaths'
columns.
Now go back to the course.
Python has the following comparison operators:
== (equals)
!= (not equal)
< (less than)
> (greater than)
<= (less than or equal to)
>= (greater than or equal to)
The following code will get and display all the rows in df
where it is True
that the value in the 'Population (1000s)'
column is greater than 80000
.
df[df['Population (1000s)'] > 80000]
Country | Population (1000s) | TB deaths | |
---|---|---|---|
13 | Bangladesh | 156595 | 80000.0 |
23 | Brazil | 200362 | 4400.0 |
36 | China | 1393337 | 41000.0 |
53 | Egypt | 82056 | 550.0 |
58 | Ethiopia | 94101 | 30000.0 |
65 | Germany | 82727 | 300.0 |
77 | India | 1252140 | 240000.0 |
78 | Indonesia | 249866 | 64000.0 |
85 | Japan | 127144 | 2100.0 |
109 | Mexico | 122332 | 2200.0 |
124 | Nigeria | 173615 | 160000.0 |
128 | Pakistan | 182143 | 49000.0 |
134 | Philippines | 98394 | 27000.0 |
141 | Russian Federation | 142834 | 17000.0 |
185 | United States of America | 320051 | 490.0 |
190 | Viet Nam | 91680 | 17000.0 |
In the code cell below write code to find all the rows in df
where TB deaths exceed 10000.
Now go back to the course.
Pandas has two operators to make more complicated queries. Use the operator &
(means 'and') to select rows where two conditions are both true. Use the operator |
(means 'or') to select rows where at least one condition is true. Don't forget to put parentheses around each comparison. For example, the following expression selects only countries with a population over 80 million inhabitants and with more that 10 thousand deaths.
df[(df['Population (1000s)'] > 80000) & (df['TB deaths'] > 10000)]
Country | Population (1000s) | TB deaths | |
---|---|---|---|
13 | Bangladesh | 156595 | 80000.0 |
36 | China | 1393337 | 41000.0 |
58 | Ethiopia | 94101 | 30000.0 |
77 | India | 1252140 | 240000.0 |
78 | Indonesia | 249866 | 64000.0 |
124 | Nigeria | 173615 | 160000.0 |
128 | Pakistan | 182143 | 49000.0 |
134 | Philippines | 98394 | 27000.0 |
141 | Russian Federation | 142834 | 17000.0 |
190 | Viet Nam | 91680 | 17000.0 |
If the same columns will be used repeatedly in the program, the code becomes more readable if written as follows:
population = df['Population (1000s)']
deaths = df['TB deaths']
df[(population > 80000) & (deaths > 10000)]
Country | Population (1000s) | TB deaths | |
---|---|---|---|
13 | Bangladesh | 156595 | 80000.0 |
36 | China | 1393337 | 41000.0 |
58 | Ethiopia | 94101 | 30000.0 |
77 | India | 1252140 | 240000.0 |
78 | Indonesia | 249866 | 64000.0 |
124 | Nigeria | 173615 | 160000.0 |
128 | Pakistan | 182143 | 49000.0 |
134 | Philippines | 98394 | 27000.0 |
141 | Russian Federation | 142834 | 17000.0 |
190 | Viet Nam | 91680 | 17000.0 |
In the code cell below find all the countries where the Population (1000s) is less than or equal to 50000 or TB deaths are greater than or equal to 20000.
Now go back to the course.
You have downloaded the file London_2014.csv from our website, it can now be read into a dataframe.
london = read_csv('London_2014.csv')
london.head()
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees<br /> | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2014-1-1 | 11 | 8 | 6 | 9 | 7 | 4 | 94 | 86 | 73 | ... | 31 | 11 | 2 | 40 | 26 | 66.0 | 9.91 | 4.0 | Rain | 186<br /> |
1 | 2014-1-2 | 11 | 9 | 6 | 9 | 6 | 4 | 94 | 81 | 60 | ... | 31 | 15 | 5 | 32 | 21 | 50.0 | 1.02 | 2.0 | Rain | 214<br /> |
2 | 2014-1-3 | 11 | 8 | 6 | 9 | 5 | 2 | 94 | 76 | 54 | ... | 31 | 12 | 0 | 50 | 29 | 69.0 | 7.11 | 2.0 | Rain-Thunderstorm | 219<br /> |
3 | 2014-1-4 | 9 | 6 | 3 | 8 | 6 | 2 | 93 | 85 | 65 | ... | 31 | 12 | 5 | 35 | 21 | 39.0 | 9.91 | 4.0 | Rain | 211<br /> |
4 | 2014-1-5 | 12 | 7 | 1 | 11 | 4 | -1 | 100 | 88 | 77 | ... | 31 | 14 | 5 | 35 | 16 | NaN | 0.25 | 5.0 | Rain | 199<br /> |
5 rows × 23 columns
There are too many columns for the dataframe to fit horizontally in this notebook, but they can be displayed separately.
london.columns
Index(['GMT', 'Max TemperatureC', 'Mean TemperatureC', 'Min TemperatureC', 'Dew PointC', 'MeanDew PointC', 'Min DewpointC', 'Max Humidity', ' Mean Humidity', ' Min Humidity', ' Max Sea Level PressurehPa', ' Mean Sea Level PressurehPa', ' Min Sea Level PressurehPa', ' Max VisibilityKm', ' Mean VisibilityKm', ' Min VisibilitykM', ' Max Wind SpeedKm/h', ' Mean Wind SpeedKm/h', ' Max Gust SpeedKm/h', 'Precipitationmm', ' CloudCover', ' Events', 'WindDirDegrees<br />'], dtype='object')
This shows that ' Max Wind SpeedKm/h'
is prefixed by a space, as are other columm names such as ' Mean Humidity'
and ' Max Sea Level PressurehPa'
.
The read_csv()
function has interpreted spaces after commas as being part of the next value. This can be rectified easily by adding another argument to the read_csv()
function to skip the initial spaces after a comma.
london = read_csv('London_2014.csv', skipinitialspace=True)
Another problem shown above is that the final column is called 'WindDirDegrees< br />'
.
When the dataset was exported from the Weather Underground web site, HTML line breaks were automatically added to each line in the file which read_csv()
has interpreted as part of the column name and its values. This can be seen more clearly by looking at more values in the final column:
london['WindDirDegrees<br />'].head()
0 186<br /> 1 214<br /> 2 219<br /> 3 211<br /> 4 199<br /> Name: WindDirDegrees<br />, dtype: object
'WindDirDegrees< br />'
can be changed to 'WindDirDegrees'
with the rename()
method as follows:
london = london.rename(columns={'WindDirDegrees<br />' : 'WindDirDegrees'})
To remove the '< br />'
html line breaks from the values in the 'WindDirDegrees'
column you need to use the string method rstrip()
which is used to remove characters from the end or 'rear' of a string:
london['WindDirDegrees'] = london['WindDirDegrees'].str.rstrip('<br />')
Display the first few rows of the 'WindDirDegrees'
to confirm the change:
london['WindDirDegrees'].head()
0 186 1 214 2 219 3 211 4 199 Name: WindDirDegrees, dtype: object
Missing (also called null or not available) values are marked as NaN (not a number) in dataframes.
london['Events'].tail()
360 Rain 361 NaN 362 NaN 363 NaN 364 Fog Name: Events, dtype: object
The isnull()
method returns True
for each row in a column that has a null value. The method can be used to select and display those rows. Scroll the table below to the right to check that the events column is only showing missing values.
london[london['Events'].isnull()]
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
32 | 2014-2-2 | 10 | 8 | 5 | 4 | 3 | 1 | 93 | 71 | 43 | ... | 31 | 16 | 10 | 37 | 24 | 47.0 | 0.00 | 2.0 | NaN | 216 |
33 | 2014-2-3 | 8 | 6 | 3 | 5 | 3 | 2 | 93 | 79 | 55 | ... | 31 | 15 | 10 | 32 | 18 | NaN | 0.00 | 4.0 | NaN | 133 |
43 | 2014-2-13 | 8 | 6 | 3 | 2 | 1 | -1 | 81 | 66 | 45 | ... | 31 | 15 | 10 | 42 | 27 | 48.0 | 0.00 | 2.0 | NaN | 236 |
52 | 2014-2-22 | 12 | 8 | 4 | 5 | 3 | 1 | 87 | 68 | 38 | ... | 31 | 16 | 10 | 29 | 21 | NaN | 0.00 | 2.0 | NaN | 228 |
63 | 2014-3-5 | 13 | 7 | 0 | 4 | 2 | -1 | 100 | 72 | 40 | ... | 31 | 15 | 8 | 21 | 6 | NaN | 0.25 | 1.0 | NaN | 248 |
64 | 2014-3-6 | 13 | 9 | 5 | 7 | 5 | 3 | 93 | 77 | 50 | ... | 31 | 15 | 7 | 32 | 14 | 42.0 | 0.00 | 2.0 | NaN | 206 |
66 | 2014-3-8 | 16 | 11 | 7 | 7 | 4 | 2 | 93 | 65 | 29 | ... | 31 | 11 | 6 | 27 | 18 | NaN | 0.00 | 6.0 | NaN | 156 |
67 | 2014-3-9 | 19 | 12 | 6 | 7 | 4 | 3 | 81 | 58 | 28 | ... | 26 | 9 | 5 | 24 | 10 | 34.0 | 0.00 | NaN | NaN | 170 |
68 | 2014-3-10 | 13 | 8 | 4 | 8 | 4 | 2 | 87 | 73 | 57 | ... | 31 | 12 | 6 | 23 | 8 | NaN | 0.00 | 6.0 | NaN | 27 |
70 | 2014-3-12 | 16 | 9 | 2 | 8 | 4 | 1 | 93 | 76 | 40 | ... | 19 | 9 | 3 | 11 | 8 | NaN | 0.00 | 4.0 | NaN | 31 |
73 | 2014-3-15 | 18 | 12 | 7 | 6 | 5 | 2 | 87 | 64 | 24 | ... | 31 | 18 | 5 | 27 | 16 | NaN | 0.00 | 1.0 | NaN | 277 |
74 | 2014-3-16 | 19 | 13 | 7 | 8 | 6 | 3 | 87 | 63 | 28 | ... | 31 | 19 | 7 | 24 | 16 | NaN | 0.00 | NaN | NaN | 266 |
75 | 2014-3-17 | 14 | 9 | 5 | 7 | 6 | 4 | 93 | 74 | 47 | ... | 31 | 13 | 5 | 21 | 14 | NaN | 0.00 | 3.0 | NaN | 263 |
77 | 2014-3-19 | 16 | 11 | 7 | 7 | 5 | 4 | 93 | 68 | 35 | ... | 31 | 18 | 8 | 26 | 18 | NaN | 0.00 | 2.0 | NaN | 241 |
87 | 2014-3-29 | 19 | 13 | 8 | 7 | 4 | -1 | 93 | 57 | 18 | ... | 31 | 9 | 3 | 34 | 16 | NaN | 0.00 | NaN | NaN | 99 |
92 | 2014-4-3 | 18 | 14 | 11 | 11 | 9 | 9 | 88 | 75 | 52 | ... | 21 | 7 | 3 | 26 | 11 | NaN | 0.00 | 5.0 | NaN | 96 |
93 | 2014-4-4 | 15 | 12 | 9 | 10 | 8 | 6 | 88 | 71 | 46 | ... | 31 | 11 | 2 | 24 | 14 | NaN | 0.00 | 5.0 | NaN | 253 |
98 | 2014-4-9 | 17 | 10 | 4 | 7 | 6 | 3 | 93 | 68 | 38 | ... | 31 | 14 | 6 | 23 | 13 | NaN | 0.00 | 2.0 | NaN | 261 |
99 | 2014-4-10 | 16 | 11 | 6 | 6 | 5 | 3 | 93 | 62 | 31 | ... | 31 | 14 | 9 | 19 | 10 | NaN | 0.00 | 3.0 | NaN | 278 |
100 | 2014-4-11 | 16 | 12 | 9 | 5 | 4 | 2 | 71 | 53 | 28 | ... | 31 | 18 | 10 | 13 | 8 | NaN | 0.00 | 1.0 | NaN | 351 |
101 | 2014-4-12 | 14 | 10 | 6 | 7 | 5 | 3 | 87 | 66 | 42 | ... | 31 | 16 | 10 | 29 | 11 | NaN | 0.00 | 1.0 | NaN | 262 |
102 | 2014-4-13 | 16 | 12 | 8 | 8 | 5 | 2 | 93 | 63 | 33 | ... | 31 | 14 | 9 | 26 | 14 | NaN | 0.00 | 2.0 | NaN | 283 |
103 | 2014-4-14 | 16 | 11 | 6 | 5 | 3 | 0 | 87 | 57 | 25 | ... | 31 | 17 | 10 | 21 | 13 | NaN | 0.00 | 1.0 | NaN | 330 |
104 | 2014-4-15 | 14 | 9 | 3 | 4 | 2 | 0 | 81 | 58 | 26 | ... | 31 | 20 | 10 | 21 | 10 | NaN | 0.00 | 1.0 | NaN | 64 |
105 | 2014-4-16 | 17 | 10 | 3 | 4 | 1 | -6 | 87 | 54 | 15 | ... | 31 | 20 | 10 | 16 | 11 | NaN | 0.00 | 1.0 | NaN | 139 |
106 | 2014-4-17 | 17 | 11 | 6 | 7 | 4 | 1 | 81 | 59 | 27 | ... | 31 | 17 | 10 | 21 | 11 | NaN | 0.00 | 2.0 | NaN | 279 |
107 | 2014-4-18 | 12 | 9 | 6 | 5 | 2 | -1 | 81 | 57 | 28 | ... | 31 | 15 | 10 | 19 | 13 | NaN | 0.00 | 3.0 | NaN | 21 |
108 | 2014-4-19 | 14 | 9 | 3 | 4 | 2 | 0 | 81 | 59 | 33 | ... | 31 | 14 | 10 | 23 | 11 | NaN | 0.00 | 4.0 | NaN | 56 |
122 | 2014-5-3 | 14 | 8 | 2 | 3 | 1 | -1 | 87 | 56 | 26 | ... | 31 | 16 | 10 | 14 | 8 | NaN | 0.00 | 1.0 | NaN | 39 |
123 | 2014-5-4 | 16 | 10 | 4 | 6 | 4 | 2 | 87 | 62 | 32 | ... | 31 | 17 | 10 | 19 | 10 | 40.0 | 0.00 | 1.0 | NaN | 196 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
252 | 2014-9-10 | 21 | 16 | 11 | 12 | 10 | 8 | 94 | 68 | 38 | ... | 23 | 13 | 9 | 21 | 8 | NaN | 0.00 | 2.0 | NaN | 92 |
253 | 2014-9-11 | 18 | 14 | 10 | 11 | 10 | 8 | 94 | 70 | 45 | ... | 13 | 8 | 5 | 11 | 6 | NaN | 0.00 | 6.0 | NaN | 42 |
254 | 2014-9-12 | 23 | 18 | 13 | 14 | 12 | 9 | 94 | 70 | 33 | ... | 31 | 11 | 5 | 19 | 11 | NaN | 0.00 | 3.0 | NaN | 76 |
255 | 2014-9-13 | 21 | 17 | 12 | 14 | 12 | 11 | 100 | 75 | 45 | ... | 31 | 12 | 3 | 19 | 11 | NaN | 0.00 | 4.0 | NaN | 56 |
257 | 2014-9-15 | 22 | 17 | 12 | 14 | 12 | 9 | 100 | 74 | 35 | ... | 26 | 12 | 5 | 14 | 6 | NaN | 0.00 | 3.0 | NaN | 71 |
258 | 2014-9-16 | 23 | 18 | 13 | 16 | 14 | 11 | 100 | 77 | 39 | ... | 19 | 6 | 2 | 19 | 8 | NaN | 0.00 | 5.0 | NaN | 70 |
259 | 2014-9-17 | 21 | 18 | 16 | 15 | 14 | 13 | 94 | 78 | 59 | ... | 11 | 6 | 3 | 19 | 13 | NaN | 0.00 | 5.0 | NaN | 58 |
260 | 2014-9-18 | 26 | 20 | 14 | 16 | 15 | 13 | 94 | 74 | 43 | ... | 19 | 6 | 2 | 19 | 11 | 35.0 | 0.00 | 5.0 | NaN | 71 |
264 | 2014-9-22 | 19 | 13 | 7 | 9 | 8 | 6 | 100 | 68 | 35 | ... | 19 | 10 | 7 | 11 | 6 | NaN | 0.25 | 1.0 | NaN | 309 |
267 | 2014-9-25 | 19 | 13 | 7 | 12 | 9 | 5 | 93 | 71 | 48 | ... | 31 | 17 | 10 | 26 | 13 | NaN | 0.00 | 5.0 | NaN | 249 |
269 | 2014-9-27 | 21 | 17 | 12 | 14 | 12 | 11 | 94 | 71 | 43 | ... | 31 | 15 | 10 | 13 | 5 | NaN | 0.00 | 6.0 | NaN | 146 |
270 | 2014-9-28 | 24 | 19 | 14 | 15 | 14 | 13 | 94 | 72 | 41 | ... | 27 | 13 | 6 | 16 | 6 | NaN | 0.00 | 4.0 | NaN | 116 |
272 | 2014-9-30 | 22 | 18 | 14 | 15 | 13 | 12 | 94 | 78 | 44 | ... | 27 | 10 | 4 | 19 | 13 | 32.0 | 0.00 | 4.0 | NaN | 222 |
274 | 2014-10-2 | 22 | 19 | 14 | 14 | 13 | 11 | 94 | 75 | 46 | ... | 31 | 15 | 6 | 21 | 8 | 35.0 | 0.00 | 3.0 | NaN | 203 |
292 | 2014-10-20 | 17 | 14 | 12 | 11 | 11 | 9 | 88 | 76 | 50 | ... | 31 | 15 | 10 | 27 | 19 | NaN | 0.25 | 3.0 | NaN | 238 |
295 | 2014-10-23 | 16 | 13 | 11 | 14 | 10 | 7 | 94 | 76 | 58 | ... | 31 | 14 | 9 | 23 | 16 | NaN | 0.00 | 5.0 | NaN | 214 |
297 | 2014-10-25 | 16 | 11 | 7 | 9 | 8 | 6 | 100 | 75 | 41 | ... | 31 | 13 | 10 | 26 | 13 | NaN | 0.00 | 4.0 | NaN | 237 |
298 | 2014-10-26 | 16 | 13 | 12 | 9 | 8 | 7 | 77 | 67 | 45 | ... | 31 | 12 | 10 | 27 | 18 | 40.0 | 0.00 | 5.0 | NaN | 217 |
299 | 2014-10-27 | 18 | 16 | 12 | 13 | 12 | 9 | 94 | 80 | 55 | ... | 31 | 14 | 8 | 29 | 16 | 40.0 | 0.00 | 3.0 | NaN | 192 |
302 | 2014-10-30 | 18 | 16 | 13 | 15 | 14 | 12 | 100 | 89 | 72 | ... | 13 | 6 | 1 | 16 | 10 | NaN | 0.00 | 5.0 | NaN | 170 |
303 | 2014-10-31 | 23 | 17 | 12 | 15 | 13 | 11 | 100 | 78 | 40 | ... | 31 | 13 | 3 | 23 | 8 | 48.0 | 0.00 | 1.0 | NaN | 170 |
322 | 2014-11-19 | 11 | 9 | 7 | 9 | 7 | 5 | 100 | 88 | 70 | ... | 10 | 5 | 3 | 14 | 10 | NaN | 0.00 | 2.0 | NaN | 93 |
323 | 2014-11-20 | 11 | 8 | 6 | 8 | 6 | 5 | 93 | 87 | 66 | ... | 13 | 6 | 3 | 16 | 8 | NaN | 0.00 | 3.0 | NaN | 69 |
339 | 2014-12-6 | 6 | 2 | -1 | 3 | 1 | -1 | 100 | 85 | 60 | ... | 31 | 15 | 10 | 14 | 8 | NaN | 0.25 | 4.0 | NaN | 265 |
346 | 2014-12-13 | 7 | 3 | 0 | 2 | 1 | -1 | 100 | 83 | 53 | ... | 31 | 19 | 9 | 13 | 8 | NaN | 0.00 | 2.0 | NaN | 278 |
351 | 2014-12-18 | 13 | 12 | 11 | 11 | 11 | 10 | 94 | 85 | 74 | ... | 31 | 12 | 10 | 37 | 29 | 60.0 | 0.00 | 6.0 | NaN | 245 |
358 | 2014-12-25 | 8 | 6 | 2 | 3 | 2 | 1 | 93 | 79 | 51 | ... | 31 | 19 | 10 | 24 | 16 | NaN | 0.00 | 1.0 | NaN | 275 |
361 | 2014-12-28 | 6 | 2 | -2 | 1 | -2 | -3 | 100 | 76 | 45 | ... | 31 | 17 | 8 | 16 | 11 | NaN | 0.00 | 1.0 | NaN | 5 |
362 | 2014-12-29 | 6 | 1 | -4 | 2 | -1 | -4 | 100 | 87 | 62 | ... | 24 | 10 | 5 | 13 | 8 | NaN | 0.25 | 1.0 | NaN | 289 |
363 | 2014-12-30 | 6 | 1 | -4 | 2 | -1 | -4 | 100 | 90 | 68 | ... | 27 | 9 | 3 | 14 | 6 | NaN | 0.25 | 3.0 | NaN | 258 |
114 rows × 23 columns
One way to deal with missing values is to replace them by some value. The column method fillna()
fills all not available value cells with the value given as argument. In the example below, each missing event is replaced by the empty string.
london['Events'] = london['Events'].fillna('')
london[london['Events'].isnull()]
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees |
---|
0 rows × 23 columns
The empty dataframe (no rows) confirms there are no more missing event values.
Another way to deal with missing values is to ignore rows with them. The dropna()
dataframe method returns a new dataframe where all rows with at least one non-available value have been removed.
london.dropna()
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 2014-1-1 | 11 | 8 | 6 | 9 | 7 | 4 | 94 | 86 | 73 | ... | 31 | 11 | 2 | 40 | 26 | 66.0 | 9.91 | 4.0 | Rain | 186 |
1 | 2014-1-2 | 11 | 9 | 6 | 9 | 6 | 4 | 94 | 81 | 60 | ... | 31 | 15 | 5 | 32 | 21 | 50.0 | 1.02 | 2.0 | Rain | 214 |
2 | 2014-1-3 | 11 | 8 | 6 | 9 | 5 | 2 | 94 | 76 | 54 | ... | 31 | 12 | 0 | 50 | 29 | 69.0 | 7.11 | 2.0 | Rain-Thunderstorm | 219 |
3 | 2014-1-4 | 9 | 6 | 3 | 8 | 6 | 2 | 93 | 85 | 65 | ... | 31 | 12 | 5 | 35 | 21 | 39.0 | 9.91 | 4.0 | Rain | 211 |
5 | 2014-1-6 | 12 | 10 | 8 | 11 | 8 | 6 | 94 | 78 | 60 | ... | 31 | 12 | 5 | 50 | 29 | 61.0 | 2.03 | 4.0 | Rain | 219 |
6 | 2014-1-7 | 12 | 10 | 8 | 8 | 7 | 6 | 87 | 78 | 63 | ... | 31 | 13 | 5 | 39 | 27 | 55.0 | 5.08 | 3.0 | Rain | 219 |
7 | 2014-1-8 | 12 | 9 | 7 | 11 | 7 | 6 | 94 | 85 | 69 | ... | 27 | 14 | 6 | 24 | 18 | 37.0 | 1.02 | 3.0 | Rain | 197 |
8 | 2014-1-9 | 11 | 8 | 5 | 11 | 5 | 2 | 94 | 76 | 53 | ... | 31 | 13 | 5 | 40 | 26 | 60.0 | 0.00 | 4.0 | Rain | 247 |
9 | 2014-1-10 | 11 | 7 | 2 | 7 | 4 | 2 | 93 | 85 | 66 | ... | 31 | 17 | 10 | 24 | 13 | 32.0 | 0.25 | 3.0 | Rain | 216 |
15 | 2014-1-16 | 11 | 8 | 6 | 9 | 6 | 4 | 94 | 85 | 67 | ... | 31 | 14 | 3 | 27 | 18 | 47.0 | 3.05 | 3.0 | Rain | 188 |
16 | 2014-1-17 | 9 | 7 | 5 | 7 | 6 | 4 | 93 | 86 | 68 | ... | 31 | 13 | 5 | 29 | 19 | 47.0 | 4.06 | 3.0 | Rain | 192 |
21 | 2014-1-22 | 11 | 8 | 6 | 7 | 6 | 4 | 100 | 87 | 62 | ... | 31 | 13 | 6 | 23 | 14 | 32.0 | 0.25 | 5.0 | Rain | 221 |
22 | 2014-1-23 | 8 | 6 | 2 | 6 | 3 | 0 | 100 | 81 | 50 | ... | 31 | 14 | 5 | 26 | 13 | 45.0 | 1.02 | 2.0 | Rain | 265 |
24 | 2014-1-25 | 12 | 9 | 5 | 8 | 6 | 0 | 100 | 83 | 48 | ... | 31 | 10 | 2 | 40 | 13 | 61.0 | 2.03 | 4.0 | Fog-Rain | 249 |
25 | 2014-1-26 | 9 | 6 | 2 | 8 | 3 | 0 | 93 | 80 | 62 | ... | 31 | 14 | 3 | 37 | 19 | 39.0 | 5.08 | 4.0 | Rain | 232 |
26 | 2014-1-27 | 8 | 5 | 2 | 4 | 2 | -1 | 87 | 80 | 61 | ... | 31 | 16 | 9 | 34 | 23 | 47.0 | 2.03 | 3.0 | Rain | 228 |
30 | 2014-1-31 | 9 | 5 | 1 | 8 | 4 | 1 | 100 | 90 | 72 | ... | 23 | 7 | 0 | 37 | 13 | 50.0 | 7.11 | 5.0 | Fog-Rain | 173 |
31 | 2014-2-1 | 9 | 7 | 5 | 9 | 3 | 0 | 100 | 73 | 46 | ... | 31 | 13 | 3 | 50 | 24 | 71.0 | 0.00 | 2.0 | Rain | 220 |
32 | 2014-2-2 | 10 | 8 | 5 | 4 | 3 | 1 | 93 | 71 | 43 | ... | 31 | 16 | 10 | 37 | 24 | 47.0 | 0.00 | 2.0 | 216 | |
34 | 2014-2-4 | 9 | 7 | 4 | 4 | 3 | 1 | 87 | 76 | 50 | ... | 31 | 14 | 6 | 39 | 21 | 53.0 | 0.51 | 4.0 | Rain | 176 |
35 | 2014-2-5 | 10 | 8 | 5 | 7 | 4 | 2 | 93 | 75 | 56 | ... | 26 | 13 | 6 | 45 | 34 | 66.0 | 2.03 | 4.0 | Rain | 192 |
36 | 2014-2-6 | 10 | 8 | 6 | 9 | 6 | 3 | 100 | 82 | 62 | ... | 29 | 12 | 2 | 35 | 26 | 40.0 | 7.11 | 4.0 | Rain | 190 |
37 | 2014-2-7 | 9 | 7 | 4 | 9 | 4 | 0 | 100 | 78 | 46 | ... | 31 | 12 | 3 | 39 | 19 | 58.0 | 0.51 | 4.0 | Rain | 230 |
38 | 2014-2-8 | 9 | 8 | 7 | 8 | 4 | 0 | 93 | 74 | 50 | ... | 31 | 14 | 3 | 45 | 32 | 69.0 | 2.03 | 3.0 | Rain | 208 |
39 | 2014-2-9 | 8 | 7 | 4 | 3 | 1 | -2 | 81 | 65 | 42 | ... | 31 | 14 | 10 | 47 | 34 | 66.0 | 0.00 | 3.0 | Rain | 228 |
41 | 2014-2-11 | 8 | 5 | 2 | 6 | 2 | -1 | 93 | 75 | 43 | ... | 31 | 14 | 6 | 40 | 19 | 58.0 | 2.03 | 3.0 | Rain | 222 |
42 | 2014-2-12 | 9 | 6 | 2 | 6 | 2 | -1 | 93 | 74 | 49 | ... | 31 | 14 | 5 | 52 | 26 | 76.0 | 7.87 | 3.0 | Rain-Thunderstorm | 211 |
43 | 2014-2-13 | 8 | 6 | 3 | 2 | 1 | -1 | 81 | 66 | 45 | ... | 31 | 15 | 10 | 42 | 27 | 48.0 | 0.00 | 2.0 | 236 | |
44 | 2014-2-14 | 12 | 8 | 3 | 10 | 4 | 0 | 94 | 81 | 60 | ... | 31 | 14 | 5 | 63 | 24 | 90.0 | 8.89 | 5.0 | Rain | 180 |
45 | 2014-2-15 | 11 | 8 | 5 | 5 | 3 | 2 | 87 | 67 | 51 | ... | 31 | 15 | 9 | 66 | 47 | 85.0 | 0.51 | 4.0 | Rain | 237 |
... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
281 | 2014-10-9 | 18 | 14 | 11 | 11 | 10 | 8 | 88 | 77 | 48 | ... | 31 | 15 | 3 | 39 | 21 | 53.0 | 2.03 | 2.0 | Rain-Thunderstorm | 209 |
282 | 2014-10-10 | 18 | 13 | 9 | 12 | 10 | 8 | 94 | 80 | 45 | ... | 31 | 17 | 10 | 26 | 14 | 42.0 | 3.05 | 2.0 | Rain | 207 |
284 | 2014-10-12 | 14 | 10 | 6 | 11 | 8 | 5 | 100 | 84 | 57 | ... | 31 | 14 | 6 | 23 | 6 | 37.0 | 2.03 | 2.0 | Fog-Rain | 53 |
288 | 2014-10-16 | 18 | 14 | 11 | 14 | 12 | 11 | 100 | 87 | 58 | ... | 31 | 12 | 6 | 21 | 10 | 34.0 | 0.25 | 2.0 | Rain | 214 |
289 | 2014-10-17 | 19 | 16 | 13 | 15 | 14 | 12 | 100 | 84 | 62 | ... | 29 | 14 | 8 | 24 | 13 | 42.0 | 0.51 | 2.0 | Rain | 188 |
291 | 2014-10-19 | 21 | 17 | 14 | 16 | 13 | 11 | 94 | 78 | 48 | ... | 31 | 14 | 5 | 34 | 21 | 47.0 | 0.00 | 3.0 | Rain | 225 |
293 | 2014-10-21 | 14 | 11 | 8 | 13 | 6 | 1 | 94 | 65 | 37 | ... | 31 | 14 | 3 | 47 | 27 | 69.0 | 3.05 | 3.0 | Rain | 268 |
298 | 2014-10-26 | 16 | 13 | 12 | 9 | 8 | 7 | 77 | 67 | 45 | ... | 31 | 12 | 10 | 27 | 18 | 40.0 | 0.00 | 5.0 | 217 | |
299 | 2014-10-27 | 18 | 16 | 12 | 13 | 12 | 9 | 94 | 80 | 55 | ... | 31 | 14 | 8 | 29 | 16 | 40.0 | 0.00 | 3.0 | 192 | |
300 | 2014-10-28 | 18 | 14 | 9 | 14 | 12 | 9 | 100 | 83 | 47 | ... | 31 | 13 | 0 | 24 | 11 | 37.0 | 0.00 | 3.0 | Fog-Rain | 181 |
303 | 2014-10-31 | 23 | 17 | 12 | 15 | 13 | 11 | 100 | 78 | 40 | ... | 31 | 13 | 3 | 23 | 8 | 48.0 | 0.00 | 1.0 | 170 | |
304 | 2014-11-1 | 18 | 16 | 13 | 15 | 11 | 6 | 94 | 74 | 39 | ... | 29 | 13 | 9 | 32 | 18 | 48.0 | 0.25 | 3.0 | Rain | 211 |
305 | 2014-11-2 | 16 | 13 | 10 | 13 | 11 | 8 | 94 | 82 | 64 | ... | 29 | 14 | 0 | 27 | 23 | 50.0 | 11.94 | 4.0 | Rain | 207 |
306 | 2014-11-3 | 10 | 8 | 6 | 9 | 7 | 4 | 94 | 84 | 60 | ... | 31 | 14 | 2 | 23 | 14 | 34.0 | 5.08 | 3.0 | Rain | 208 |
310 | 2014-11-7 | 13 | 10 | 8 | 12 | 8 | 5 | 95 | 80 | 52 | ... | 31 | 11 | 2 | 42 | 23 | 42.0 | 5.08 | 4.0 | Rain | 207 |
311 | 2014-11-8 | 14 | 11 | 7 | 11 | 8 | 5 | 94 | 84 | 64 | ... | 31 | 14 | 5 | 34 | 18 | 52.0 | 2.03 | 3.0 | Rain | 192 |
312 | 2014-11-9 | 13 | 10 | 6 | 9 | 7 | 4 | 100 | 83 | 47 | ... | 31 | 11 | 0 | 14 | 6 | 32.0 | 0.00 | 2.0 | Fog | 212 |
336 | 2014-12-3 | 8 | 6 | 4 | 5 | 3 | 1 | 93 | 78 | 56 | ... | 31 | 15 | 9 | 23 | 16 | 37.0 | 0.51 | 4.0 | Rain | 17 |
342 | 2014-12-9 | 12 | 6 | -1 | 11 | 3 | -1 | 100 | 86 | 64 | ... | 31 | 15 | 3 | 37 | 14 | 52.0 | 0.25 | 5.0 | Rain | 224 |
345 | 2014-12-12 | 9 | 6 | 2 | 9 | 4 | -1 | 94 | 79 | 59 | ... | 31 | 11 | 3 | 40 | 26 | 55.0 | 1.02 | 3.0 | Rain | 253 |
347 | 2014-12-14 | 10 | 4 | -1 | 9 | 3 | -1 | 100 | 84 | 62 | ... | 31 | 16 | 9 | 29 | 14 | 45.0 | 0.25 | 3.0 | Rain | 215 |
350 | 2014-12-17 | 13 | 9 | 4 | 11 | 10 | 6 | 100 | 90 | 76 | ... | 31 | 12 | 6 | 34 | 21 | 55.0 | 0.25 | 5.0 | Rain | 252 |
351 | 2014-12-18 | 13 | 12 | 11 | 11 | 11 | 10 | 94 | 85 | 74 | ... | 31 | 12 | 10 | 37 | 29 | 60.0 | 0.00 | 6.0 | 245 | |
353 | 2014-12-20 | 9 | 7 | 4 | 4 | 3 | 2 | 87 | 75 | 53 | ... | 31 | 15 | 10 | 32 | 19 | 45.0 | 0.00 | 3.0 | Rain | 251 |
354 | 2014-12-21 | 12 | 8 | 3 | 9 | 6 | 3 | 93 | 80 | 65 | ... | 31 | 12 | 10 | 35 | 23 | 47.0 | 0.00 | 6.0 | Rain | 237 |
355 | 2014-12-22 | 13 | 12 | 10 | 11 | 9 | 7 | 94 | 77 | 63 | ... | 31 | 14 | 8 | 40 | 32 | 55.0 | 0.00 | 5.0 | Rain | 240 |
356 | 2014-12-23 | 13 | 11 | 10 | 9 | 7 | 5 | 82 | 72 | 58 | ... | 31 | 12 | 10 | 42 | 32 | 61.0 | 0.00 | 4.0 | Rain | 238 |
357 | 2014-12-24 | 12 | 8 | 4 | 10 | 4 | -2 | 94 | 71 | 38 | ... | 31 | 13 | 10 | 34 | 26 | 45.0 | 0.25 | 5.0 | Rain | 262 |
360 | 2014-12-27 | 4 | 3 | 2 | 8 | 1 | -1 | 93 | 83 | 64 | ... | 31 | 13 | 9 | 42 | 26 | 61.0 | 0.00 | 4.0 | Rain | 330 |
364 | 2014-12-31 | 8 | 2 | -4 | 5 | 2 | -4 | 100 | 89 | 70 | ... | 19 | 9 | 0 | 23 | 8 | 37.0 | 0.25 | 4.0 | Fog | 201 |
128 rows × 23 columns
Note that the table above has fewer than 251 of the original 365 rows, so there must be further null values besides the 114 missing events.
The type of every column in a dataframe can be determined by looking at the dataframe's dtypes
attribute, like this:
london.dtypes
GMT object Max TemperatureC int64 Mean TemperatureC int64 Min TemperatureC int64 Dew PointC int64 MeanDew PointC int64 Min DewpointC int64 Max Humidity int64 Mean Humidity int64 Min Humidity int64 Max Sea Level PressurehPa int64 Mean Sea Level PressurehPa int64 Min Sea Level PressurehPa int64 Max VisibilityKm int64 Mean VisibilityKm int64 Min VisibilitykM int64 Max Wind SpeedKm/h int64 Mean Wind SpeedKm/h int64 Max Gust SpeedKm/h float64 Precipitationmm float64 CloudCover float64 Events object WindDirDegrees object dtype: object
The type of all the values in a column can be changed using the astype()
method. The following code will change the values in the 'WindDirDegrees'
column from strings (object
) to integers (int64
).
london['WindDirDegrees'] = london['WindDirDegrees'].astype('int64')
The function to_datetime()
is needed to change the values in the 'GMT'
column from strings (object
) to dates (datetime64
):
london['GMT'] = to_datetime(london['GMT'])
london.dtypes
GMT datetime64[ns] Max TemperatureC int64 Mean TemperatureC int64 Min TemperatureC int64 Dew PointC int64 MeanDew PointC int64 Min DewpointC int64 Max Humidity int64 Mean Humidity int64 Min Humidity int64 Max Sea Level PressurehPa int64 Mean Sea Level PressurehPa int64 Min Sea Level PressurehPa int64 Max VisibilityKm int64 Mean VisibilityKm int64 Min VisibilitykM int64 Max Wind SpeedKm/h int64 Mean Wind SpeedKm/h int64 Max Gust SpeedKm/h float64 Precipitationmm float64 CloudCover float64 Events object WindDirDegrees int64 dtype: object
Values of type datetime64
can be created using the datetime()
function, provided by the package of the same name. The first integer argument is the year, the second the month and the third the day.
The code below will get and display the row in the dataframe whose 'GMT'
value is 4th June 2014.
from datetime import datetime
london[london['GMT'] == datetime(2014, 6, 4)]
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
154 | 2014-06-04 | 14 | 11 | 8 | 11 | 9 | 3 | 94 | 78 | 47 | ... | 31 | 12 | 7 | 47 | 13 | 64.0 | 2.03 | 4.0 | Rain-Thunderstorm | 230 |
1 rows × 23 columns
Queries such as 'Return all the rows where the date is between 8 December and 12 December' can be made:
dates = london['GMT']
start = datetime(2014, 12, 8)
end = datetime(2014, 12, 12)
london[(dates >= start) & (dates <= end)]
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
341 | 2014-12-08 | 7 | 4 | 1 | 2 | 1 | -1 | 93 | 76 | 51 | ... | 31 | 17 | 10 | 23 | 13 | NaN | 0.00 | 2.0 | Rain | 270 |
342 | 2014-12-09 | 12 | 6 | -1 | 11 | 3 | -1 | 100 | 86 | 64 | ... | 31 | 15 | 3 | 37 | 14 | 52.0 | 0.25 | 5.0 | Rain | 224 |
343 | 2014-12-10 | 9 | 7 | 6 | 11 | 3 | 1 | 90 | 70 | 44 | ... | 31 | 18 | 3 | 40 | 27 | NaN | 0.00 | 2.0 | Rain | 246 |
344 | 2014-12-11 | 10 | 8 | 5 | 6 | 3 | 2 | 82 | 72 | 51 | ... | 31 | 15 | 10 | 40 | 31 | NaN | 0.25 | 2.0 | Rain | 246 |
345 | 2014-12-12 | 9 | 6 | 2 | 9 | 4 | -1 | 94 | 79 | 59 | ... | 31 | 11 | 3 | 40 | 26 | 55.0 | 1.02 | 3.0 | Rain | 253 |
5 rows × 23 columns
Now that the wind direction is given by a number, write code to select all days that had a northerly wind. Hint: select the rows where the direction is greater than or equal to 350 or smaller than or equal to 10, as the compass rose shows.
In the code cell below, write code to get and display all the rows in the dataframe that are beween 1 April 2014 and 11 April 2014.
In the cell below, write two lines of code to display the first five rows that have a missing value in the 'Max Gust SpeedKm/h'
column. Hint: first select the missing value rows and store them in a new dataframe, then display the first five rows of the new dataframe.
Now go back to the course.
The following line of code tells Jupyter to display inside this notebook any graph that is created.
%matplotlib inline
The plot()
method can make a graph of the values in a column. Gridlines are turned on by the grid
argument.
london['Max Wind SpeedKm/h'].plot(grid=True)
<matplotlib.axes._subplots.AxesSubplot at 0x109ea33c8>
The graph can be made bigger by giving the method a figsize=(x,y)
argument where x
and y
are integers that determine the length of the x-axis and y-axis.
london['Max Wind SpeedKm/h'].plot(grid=True, figsize=(10,5))
<matplotlib.axes._subplots.AxesSubplot at 0x10a366550>
Multiple lines can be plotted by selecting multiple columns.
london[['Max Wind SpeedKm/h', 'Mean Wind SpeedKm/h']].plot(grid=True, figsize=(10,5))
<matplotlib.axes._subplots.AxesSubplot at 0x112637160>
In the cell below, write code to plot the minimum, mean, and maximum temperature during 2014 in London.
Now go back to the course.
Changing the dataframe's index from the default to datetime64
values is done by assigning to the dataframe's index
attribute the contents of the 'GMT
' column, like this:
london.index = london['GMT']
london.head(2)
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GMT | |||||||||||||||||||||
2014-01-01 | 2014-01-01 | 11 | 8 | 6 | 9 | 7 | 4 | 94 | 86 | 73 | ... | 31 | 11 | 2 | 40 | 26 | 66.0 | 9.91 | 4.0 | Rain | 186 |
2014-01-02 | 2014-01-02 | 11 | 9 | 6 | 9 | 6 | 4 | 94 | 81 | 60 | ... | 31 | 15 | 5 | 32 | 21 | 50.0 | 1.02 | 2.0 | Rain | 214 |
2 rows × 23 columns
The iloc
attribute can still be used to get and display rows by number, but now you can now also use the datetime64
index to get a row by date, using the dataframe's loc
attribute, like this:
london.loc[datetime(2014, 1, 1)]
GMT 2014-01-01 00:00:00 Max TemperatureC 11 Mean TemperatureC 8 Min TemperatureC 6 Dew PointC 9 MeanDew PointC 7 Min DewpointC 4 Max Humidity 94 Mean Humidity 86 Min Humidity 73 Max Sea Level PressurehPa 1002 Mean Sea Level PressurehPa 993 Min Sea Level PressurehPa 984 Max VisibilityKm 31 Mean VisibilityKm 11 Min VisibilitykM 2 Max Wind SpeedKm/h 40 Mean Wind SpeedKm/h 26 Max Gust SpeedKm/h 66 Precipitationmm 9.91 CloudCover 4 Events Rain WindDirDegrees 186 Name: 2014-01-01 00:00:00, dtype: object
A query such as 'Return all the rows where the date is between December 8th and December 12th' can now be done succinctly like this:
london.loc[datetime(2014,12,8) : datetime(2014,12,12)]
#The meaning of the above code is get the rows beween and including
#the indices datetime(2014,12,8) and datetime(2014,12,12)
GMT | Max TemperatureC | Mean TemperatureC | Min TemperatureC | Dew PointC | MeanDew PointC | Min DewpointC | Max Humidity | Mean Humidity | Min Humidity | ... | Max VisibilityKm | Mean VisibilityKm | Min VisibilitykM | Max Wind SpeedKm/h | Mean Wind SpeedKm/h | Max Gust SpeedKm/h | Precipitationmm | CloudCover | Events | WindDirDegrees | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
GMT | |||||||||||||||||||||
2014-12-08 | 2014-12-08 | 7 | 4 | 1 | 2 | 1 | -1 | 93 | 76 | 51 | ... | 31 | 17 | 10 | 23 | 13 | NaN | 0.00 | 2.0 | Rain | 270 |
2014-12-09 | 2014-12-09 | 12 | 6 | -1 | 11 | 3 | -1 | 100 | 86 | 64 | ... | 31 | 15 | 3 | 37 | 14 | 52.0 | 0.25 | 5.0 | Rain | 224 |
2014-12-10 | 2014-12-10 | 9 | 7 | 6 | 11 | 3 | 1 | 90 | 70 | 44 | ... | 31 | 18 | 3 | 40 | 27 | NaN | 0.00 | 2.0 | Rain | 246 |
2014-12-11 | 2014-12-11 | 10 | 8 | 5 | 6 | 3 | 2 | 82 | 72 | 51 | ... | 31 | 15 | 10 | 40 | 31 | NaN | 0.25 | 2.0 | Rain | 246 |
2014-12-12 | 2014-12-12 | 9 | 6 | 2 | 9 | 4 | -1 | 94 | 79 | 59 | ... | 31 | 11 | 3 | 40 | 26 | 55.0 | 1.02 | 3.0 | Rain | 253 |
5 rows × 23 columns
Now we have a datetime64
index, let's plot 'Max Wind SpeedKm/h'
again:
london['Max Wind SpeedKm/h'].plot(grid=True, figsize=(10,5))
<matplotlib.axes._subplots.AxesSubplot at 0x10a4414e0>
Now it is much clearer that the worst winds were in mid February.
Use the code cell below to plot the values of 'Mean Humidity'
during spring (full months of March, April and May).
Now go back to the course.