This is the briefest of introductions to working with data using IPython notebooks in the browser.

To add a text cell like this, use the Add Text option from the menu. You can use markdown to style the text in the text cells.

When you've finished editing a text cell, enter Shift-Return

In [ ]:
#This is a code cell - you can execute all the code in the cell
#  by entering Shift-Return or by hitting the play button

#This line will import the pandas library
import pandas as pd
In [ ]:
#pandas provides a way for us to query the World bank API
from pandas.io import wb

#This command let's me look for indicators that contain "fertility rate" in their name 
wb.search('.*fertility rate.*')[['id','name','sourceOrganization']]
Out[ ]:
id name sourceOrganization
6554 SP.ADO.TFRT Adolescent fertility rate (births per 1,000 wo... United Nations Population Division, World Popu...
6594 SP.DYN.TFRT.IN Fertility rate, total (births per woman) (1) United Nations Population Division. World ...
6595 SP.DYN.TFRT.Q1 Total fertility rate (TFR) (births per woman)... Household Surveys (DHS, MICS)
6596 SP.DYN.TFRT.Q2 Total fertility rate (TFR) (births per woman)... Household Surveys (DHS, MICS)
6597 SP.DYN.TFRT.Q3 Total fertility rate (TFR) (births per woman)... Household Surveys (DHS, MICS)
6598 SP.DYN.TFRT.Q4 Total fertility rate (TFR) (births per woman)... Household Surveys (DHS, MICS)
6599 SP.DYN.TFRT.Q5 Total fertility rate (TFR) (births per woman)... Household Surveys (DHS, MICS)
6602 SP.DYN.WFRT Wanted fertility rate (births per woman) Demographic and Health Surveys by ICF Internat...
6603 SP.DYN.WFRT.Q1 Total wanted fertility rate (births per woman)... Household Surveys (DHS, MICS)
6604 SP.DYN.WFRT.Q2 Total wanted fertility rate (births per woman)... Household Surveys (DHS, MICS)
6605 SP.DYN.WFRT.Q3 Total wanted fertility rate (births per woman)... Household Surveys (DHS, MICS)
6606 SP.DYN.WFRT.Q4 Total wanted fertility rate (births per woman)... Household Surveys (DHS, MICS)
6607 SP.DYN.WFRT.Q5 Total wanted fertility rate (births per woman)... Household Surveys (DHS, MICS)

13 rows × 3 columns

In [ ]:
#Now search for indicator names containing "life expectancy"
wb.search('.*life expectancy.*')[:5]
Out[ ]:
id name source sourceNote sourceOrganization topics
6589 SP.DYN.LE00.FE.IN Life expectancy at birth, female (years) World Development Indicators Life expectancy at birth indicates the number ... (1) United Nations Population Division. World ... Aid Effectiveness ; Health ; Social Developm...
6590 SP.DYN.LE00.IN Life expectancy at birth, total (years) World Development Indicators Life expectancy at birth indicates the number ... Derived from male and female life expectancy a... Health
6591 SP.DYN.LE00.MA.IN Life expectancy at birth, male (years) World Development Indicators Life expectancy at birth indicates the number ... (1) United Nations Population Division. World ... Aid Effectiveness ; Social Development ; Hea...
7632 UIS.SLE.0 School life expectancy (years). Pre-primary. ... Education Statistics School life expectancy (years). Pre-primary. T... UNESCO Institute for Statistics Education
7633 UIS.SLE.0.F School life expectancy (years). Pre-primary. ... Education Statistics School life expectancy (years). Pre-primary. F... UNESCO Institute for Statistics Education

5 rows × 6 columns

We can download data from the World Bank API by identifying:

  • the indicators we want;
  • the countries we want the data for;
  • the range of years we want the data for.

In this case, I'm going to get fertility rate for the UK, Bangladesh and China between 1970 and 2005.

In [ ]:
df = wb.download(indicator='SP.DYN.TFRT.IN', 
                 country=['CN', 'GB', 'BD'], 
                 start=1970, end=2005)
df
Out[ ]:
SP.DYN.TFRT.IN
country year
Bangladesh 2005 2.607
2004 2.702
2003 2.802
2002 2.905
2001 3.011
2000 3.120
1999 3.231
1998 3.346
1997 3.468
1996 3.596
1995 3.732
1994 3.878
1993 4.033
1992 4.197
1991 4.370
1990 4.550
1989 4.738
1988 4.932
1987 5.130
1986 5.327
1985 5.522
1984 5.710
1983 5.890
1982 6.060
1981 6.216
1980 6.356
1979 6.480
1978 6.587
1977 6.680
1976 6.758
1975 6.821
1974 6.869
1973 6.904
1972 6.928
1971 6.942
1970 6.947
China 2005 1.585
2004 1.566
2003 1.546
2002 1.527
2001 1.514
2000 1.510
1999 1.520
1998 1.546
1997 1.591
1996 1.656
1995 1.746
1994 1.865
1993 2.009
1992 2.171
1991 2.342
1990 2.506
1989 2.644
1988 2.745
1987 2.806
1986 2.826
1985 2.811
1984 2.769
1983 2.720
1982 2.682
...

108 rows × 1 columns

We can also plot the data...

In [ ]:
#We need to import some bits to help us with the charts...
import matplotlib.pyplot as plt
%matplotlib inline
In [ ]:
#I'm going to wrangle the data a little to make it easier to plot...
dff=df.reset_index()
#The following line makes sure we treat the years as numbers...
dff['year']=dff['year'].astype(int)

bg=dff[ dff['country']=='Bangladesh' ]
bg.plot(x='year')
Out[ ]:
<matplotlib.axes.AxesSubplot at 0x3c7d64d0>
In [ ]:
df_reshape=dff[['country','year','SP.DYN.TFRT.IN']].pivot('year','country')
df_reshape
Out[ ]:
SP.DYN.TFRT.IN
country Bangladesh China United Kingdom
year
1970 6.947 5.470 2.44
1971 6.942 5.200 2.41
1972 6.928 4.887 2.20
1973 6.904 4.542 2.04
1974 6.869 4.181 1.92
1975 6.821 3.826 1.81
1976 6.758 3.497 1.74
1977 6.680 3.212 1.69
1978 6.587 2.982 1.75
1979 6.480 2.813 1.86
1980 6.356 2.710 1.90
1981 6.216 2.673 1.82
1982 6.060 2.682 1.78
1983 5.890 2.720 1.77
1984 5.710 2.769 1.77
1985 5.522 2.811 1.79
1986 5.327 2.826 1.78
1987 5.130 2.806 1.81
1988 4.932 2.745 1.82
1989 4.738 2.644 1.79
1990 4.550 2.506 1.83
1991 4.370 2.342 1.82
1992 4.197 2.171 1.79
1993 4.033 2.009 1.76
1994 3.878 1.865 1.74
1995 3.732 1.746 1.71
1996 3.596 1.656 1.73
1997 3.468 1.591 1.72
1998 3.346 1.546 1.71
1999 3.231 1.520 1.68
2000 3.120 1.510 1.64
2001 3.011 1.514 1.63
2002 2.905 1.527 1.63
2003 2.802 1.546 1.70
2004 2.702 1.566 1.75
2005 2.607 1.585 1.76

36 rows × 3 columns

In [ ]:
df_reshape.plot()
Out[ ]:
<matplotlib.axes.AxesSubplot at 0x2d247690>

We can also get data for multiple indicators in one request (we could also get data for multiple countries in the same request if we wanted to):

In [ ]:
df = wb.download(indicator=['SP.DYN.TFRT.IN','SP.DYN.LE00.IN'], 
                 country='CN', 
                 start=1970, end=2005)
df.reset_index(inplace=True)
df['year']=df['year'].astype(int)
df.set_index(['year'],inplace=True)
df
Out[ ]:
country SP.DYN.TFRT.IN SP.DYN.LE00.IN
year
2005 China 1.585 74.053902
2004 China 1.566 73.793707
2003 China 1.546 73.465098
2002 China 1.527 73.069610
2001 China 1.514 72.619780
2000 China 1.510 72.140732
1999 China 1.520 71.668707
1998 China 1.546 71.234878
1997 China 1.591 70.863902
1996 China 1.656 70.562829
1995 China 1.746 70.329171
1994 China 1.865 70.147829
1993 China 2.009 69.990659
1992 China 2.171 69.832000
1991 China 2.342 69.661317
1990 China 2.506 69.471561
1989 China 2.644 69.262244
1988 China 2.745 69.042390
1987 China 2.806 68.815537
1986 China 2.826 68.582171
1985 China 2.811 68.341293
1984 China 2.769 68.092902
1983 China 2.720 67.836512
1982 China 2.682 67.572098
1981 China 2.673 67.297659
1980 China 2.710 67.023610
1979 China 2.813 66.762293
1978 China 2.982 66.513732
1977 China 3.212 66.264049
1976 China 3.497 65.994439
1975 China 3.826 65.699585
1974 China 4.181 65.379293
1973 China 4.542 65.003293
1972 China 4.887 64.521463
1971 China 5.200 63.872244
1970 China 5.470 62.906244

36 rows × 3 columns

In [ ]:
df.plot()
Out[ ]:
<matplotlib.axes.AxesSubplot at 0x2d9dc5b0>