World Bank : Projects & Operations Data Analysis


Team members:

  1. Ignacio Perez
  2. Sydney Friedman
  3. Aisha Kigongo
  4. Project Goals

    • Our Primary Goal:

      Find interesting patterns/trends through analysis & comparison of the world bank loan commitments, HDI,Freedom Index and GDP

    • Secondary goal:

      To analyze what insight open data can give us as to how effective initiatives and funding actually is as opposed to what it’s meant to be.

    About the Datasets:

    1. World Bank :

      • Projects & Operations: - lending projects from 1947 - present
        • Dataset includes:
          • project title, task manager,country, project id, sector, themes,
          • commitment amount, product line,financing.
      • GDP per capita ( 2000 - 2011)
        • Dataset includes:
          • country name, country code,years (2000 - 2012*)
          • 185 countries
    2. Heritage Foundation :

      An economics and development thinkTank based in Washington D.C, they analyze and keep track of the economic freedom around the world with the influential Index of Economic Freedom.

      • The Index covers 10 freedoms – from property rights to entrepreneurship
      • Dataset includes:
        • Rule of Law (property rights, freedom from corruption);
        • Limited Government (fiscal freedom, government spending);
        • Regulatory Efficiency (business freedom, labor freedom, monetary freedom);
        • Open Markets (trade freedom, investment freedom, financial freedom).
In [2]:
from pandas import DataFrame, Series
import pandas as pd
import os
import codecs
In [3]:
# Verify existence of & Read in the datasets - project and operations & Freedom Index

DATA_FILES={"projdict":"data/projects_operations_api.csv", "fredict":"data/FreedomIndex.csv"}
def file_path(key):
    return os.path.join(os.pardir, DATA_FILES[key])
for file_key in DATA_FILES.keys():
    abs_fname = file_path(file_key)
    print abs_fname, os.path.exists(abs_fname)
../data/projects_operations_api.csv True
../data/FreedomIndex.csv True
In [4]:
f = codecs.open(file_path("projdict"), encoding='iso-8859-1')
initial_proj_df = pd.read_csv(f)
In [12]:
initial_proj_df.columns
Out[12]:
Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location], dtype=object)

In this section, we explore the world banks commitments to Africa. By looking at the total amount loaned out to different african countries by the years.

In [13]:
is_africa = initial_proj_df['regionname']=='AFRICA'
In [14]:
initial_proj_df[is_africa]['countryname'][:5]
Out[14]:
2         Republic of Mozambique;Republic of Mozambique
4         Republic of Mozambique;Republic of Mozambique
8     Federal Republic of Nigeria;Federal Republic o...
11    United Republic of Tanzania;United Republic of...
14                    Republic of Togo;Republic of Togo
Name: countryname
In [15]:
initial_proj_df[is_africa][['countryname', 'totalamt']][:5]
Out[15]:
countryname totalamt
2 Republic of Mozambique;Republic of Mozambique 37;000;000
4 Republic of Mozambique;Republic of Mozambique 50;000;000
8 Federal Republic of Nigeria;Federal Republic o... 0
11 United Republic of Tanzania;United Republic of... 100;000;000
14 Republic of Togo;Republic of Togo 0
In [16]:
#The totalamt value is not properly formatted. This step cleans up the value by stripping out unnecessary characters.

initial_proj_df['totalamt'] = initial_proj_df['totalamt'].str.replace(';','')
In [17]:
initial_proj_df[is_africa]['totalamt'][:5]
Out[17]:
2      37000000
4      50000000
8             0
11    100000000
14            0
Name: totalamt
In [18]:
initial_proj_df['totalamt'] = initial_proj_df['totalamt'].astype('float32')
In [19]:
sum(initial_proj_df[is_africa]['totalamt'][:5])
Out[19]:
1.87e+08

In the next steps, we perform clean up of the data. For example, The amounts in the projects & operations dataset have comma's as the delimiters, so they need to be stripped out and the values parsed as floats.

In [20]:
initial_proj_df[['regionname','countryname','projectstatusdisplay','totalamt']][:2]
Out[20]:
regionname countryname projectstatusdisplay totalamt
0 EUROPE AND CENTRAL ASIA Republic of Armenia;Republic of Armenia Active 0
1 EAST ASIA AND PACIFIC Socialist Republic of Vietnam;Socialist Republ... Active 156000000
In [21]:
# This step is data cleaning. Removing the semi-column from the money values.

initial_proj_df['lendprojectcost'] = initial_proj_df['lendprojectcost'].str.replace(';','')
initial_proj_df['lendprojectcost'] = initial_proj_df['lendprojectcost'].astype('float32')

initial_proj_df['ibrdcommamt'] = initial_proj_df['ibrdcommamt'].str.replace(';','')
initial_proj_df['ibrdcommamt'] = initial_proj_df['ibrdcommamt'].astype('float32')

initial_proj_df['idacommamt'] = initial_proj_df['idacommamt'].str.replace(';','')
initial_proj_df['idacommamt'] = initial_proj_df['idacommamt'].astype('float32')

initial_proj_df['grantamt'] = initial_proj_df['grantamt'].str.replace(';','')
initial_proj_df['grantamt'] = initial_proj_df['grantamt'].astype('float32')
In [22]:
initial_proj_df[is_africa][['countryname','project_name','boardapprovaldate','status','lendprojectcost','grantamt']][:10]
Out[22]:
countryname project_name boardapprovaldate status lendprojectcost grantamt
2 Republic of Mozambique;Republic of Mozambique Mozambique Nutrition Additional Financing 2013-01-24T00:00:00Z Active 37000000 0
4 Republic of Mozambique;Republic of Mozambique Mozambique Climate Change Development Policy O... 2013-01-24T00:00:00Z Active 50000000 0
8 Federal Republic of Nigeria;Federal Republic o... Nigeria Post-Compliance I EITI 2013-01-18T00:00:00Z Active 900000 900000
11 United Republic of Tanzania;United Republic of... TANZANIA SECOND CENTRAL TRANSP CORRIDOR PROJEC... 2013-01-15T00:00:00Z Active 100000000 0
14 Republic of Togo;Republic of Togo Integrated Disaster and Land Management Project 2013-01-09T00:00:00Z Active 7290000 7290000
17 Africa;Africa Nile Cooperation for Results Project 2013-01-01T00:00:00Z Active 15300000 15300000
21 Republic of Senegal;Republic of Senegal SN- First Governance and Growth Support Project 2012-12-20T00:00:00Z Closed 55000000 0
22 Republic of Namibia;Republic of Namibia Namibian Coast Conservation Additional Finance 2012-12-20T00:00:00Z Active 7800000 1930000
23 Burkina Faso;Burkina Faso Third Phase Community Based Rural Development ... 2012-12-20T00:00:00Z Active 86000000 0
24 Burkina Faso;Burkina Faso Sustainable land and forestry management Project 2012-12-20T00:00:00Z Active 77410000 7410000
In [23]:
projcp_df = initial_proj_df.copy()
In [24]:
projcp_df = projcp_df.drop(['lendinginstrtype','envassesmentcategorycode','productlinetype','closingdate','url','sector2','sector3','sector4','sector5','sector','mjsector1','mjsector2','mjsector3','mjsector4','mjsector5','mjsector','theme1','theme2','theme3','theme4','theme5','financier','mjtheme2name','mjtheme3name','mjtheme4name','mjtheme5name'],axis=1)
In [25]:
del projcp_df['projectstatusdisplay']
In [26]:
projcp_df2 = projcp_df.drop(['prodline','supplementprojectflg','goal','mjtheme1name','location'], axis=1)
In [27]:
projcp_df2.columns
Out[27]:
Index([id, regionname, countryname, lendinginstr, status, project_name, boardapprovaldate, board_approval_month, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, projectdoc , majorsector_percent , sector1, theme ], dtype=object)
In [28]:
projcp_df2[is_africa][:5]
Out[28]:
id regionname countryname lendinginstr status project_name boardapprovaldate board_approval_month lendprojectcost ibrdcommamt idacommamt totalamt grantamt borrower impagency projectdoc majorsector_percent sector1 theme
2 P125477 AFRICA Republic of Mozambique;Republic of Mozambique Specific Investment Loan Active Mozambique Nutrition Additional Financing 2013-01-24T00:00:00Z January 37000000 0 37000000 37000000 0 GOVERNMENT OF MOZAMBIQUE MINISTRY OF HEALTH NaN NaN Health!$!100!$!JA NaN
4 P128434 AFRICA Republic of Mozambique;Republic of Mozambique Development Policy Lending Active Mozambique Climate Change Development Policy O... 2013-01-24T00:00:00Z January 50000000 0 50000000 50000000 0 GOVERNMENT OF MOZAMBIQUE MINISTRY OF PLANNING AND DEVELOPMENT NaN NaN General agriculture; fishing and forestry sect... NaN
8 P132807 AFRICA Federal Republic of Nigeria;Federal Republic o... Technical Assistance Loan Active Nigeria Post-Compliance I EITI 2013-01-18T00:00:00Z January 900000 0 0 0 900000 GOVERNMENT OF NIGERIA NEITI SECRETARIAT NaN NaN Other Mining and Extractive Industries!$!50!$!LS NaN
11 P124114 AFRICA United Republic of Tanzania;United Republic of... Specific Investment Loan Active TANZANIA SECOND CENTRAL TRANSP CORRIDOR PROJEC... 2013-01-15T00:00:00Z January 100000000 0 100000000 100000000 0 MIN. OF FINANCE & ECONOMIC AFFAIRS TANZANIA NATIONAL ROADS AGENCY NaN NaN Urban Transport!$!95!$!TC NaN
14 P123922 AFRICA Republic of Togo;Republic of Togo Specific Investment Loan Active Integrated Disaster and Land Management Project 2013-01-09T00:00:00Z January 7290000 0 0 0 7290000 MINISTRY OF ENVIRONMENT AND FORESTS THE NATIONAL PLATFORM FOR DISASTER RISK REDUCTION NaN NaN General water; sanitation and flood protection... NaN
In [29]:
grouped = projcp_df2.groupby('regionname')

This section we look further into the Projects and operations data sets and try to find any interesting or surprising facts to analyze further. We also perform some basic statistics on the data such as summation, mean, standard deviation, etc

In [30]:
# function to calculate the total amount awarded by the worldbank per country or regional operating body
def func(x):
    totalamt = x['totalamt'].sum()
    return Series([totalamt] ,index=['totalamt'])
In [31]:
# result dataframe 
result = grouped.apply(func)
In [32]:
#create a new column in dataframe to hold the years from the board approval date
projcp_df2['year'] = projcp_df2['boardapprovaldate'].str[:4]
In [33]:
projcp_df2['year'][:2]
Out[33]:
0    2013
1    2013
Name: year
In [34]:
# group data by year and region name
grouped3 = projcp_df2.groupby(['regionname','year'])
In [35]:
# statistics on the banks lending commitments to different regions over time
grouped3['totalamt'].describe()
Out[35]:
regionname  year       
AFRICA      1950  count           2.000000
                  mean      3500000.000000
                  std       2121320.547267
                  min       2000000.000000
                  25%       2750000.000000
                  50%       3500000.000000
                  75%       4250000.000000
                  max       5000000.000000
            1951  count           4.000000
                  mean     22875000.000000
                  std      16423434.091628
                  min       1500000.000000
                  25%      15375000.000000
                  50%      25000000.000000
                  75%      32500000.000000
...
SOUTH ASIA  2011  mean     1.241677e+08
                  std      2.437052e+08
                  min      0.000000e+00
                  25%      0.000000e+00
                  50%      3.364500e+07
                  75%      1.287500e+08
                  max      1.200000e+09
            2012  count    6.100000e+01
                  mean     8.606065e+07
                  std      1.462807e+08
                  min      0.000000e+00
                  25%      0.000000e+00
                  50%      3.600000e+07
                  75%      1.060000e+08
                  max      8.400000e+08
Length: 3152
In [36]:
grouped4 = projcp_df2.groupby(['regionname','year','board_approval_month'])
In [37]:
result4 =grouped4.apply(func)
In [38]:
result4.unstack('regionname')[:5]
Out[38]:
totalamt
regionname AFRICA EAST ASIA AND PACIFIC EUROPE AND CENTRAL ASIA LATIN AMERICA AND CARIBBEAN MIDDLE EAST AND NORTH AFRICA OTHER SOUTH ASIA
year board_approval_month
1947 August NaN NaN 247000000 NaN NaN NaN NaN
May NaN NaN 250000000 NaN NaN NaN NaN
1948 July NaN NaN 8000000 NaN NaN NaN NaN
March NaN NaN NaN 16000000 NaN NaN NaN
1949 August NaN NaN NaN 5000000 NaN NaN 34000000
In [39]:
result5 = grouped3.apply(func).unstack('regionname').fillna(0)
In [40]:
result5[:5]
Out[40]:
totalamt
regionname AFRICA EAST ASIA AND PACIFIC EUROPE AND CENTRAL ASIA LATIN AMERICA AND CARIBBEAN MIDDLE EAST AND NORTH AFRICA OTHER SOUTH ASIA
year
1947 0 0 497000000 0 0 0 0
1948 0 0 8000000 16000000 0 0 0
1949 0 0 37500000 126600000 0 0 44000000
1950 7000000 125400000 25400000 90100000 12800000 0 18500000
1951 91500000 0 71500000 45400000 0 0 0
In [41]:
# python-us-cpi is a tool for parsing the latest US Consumer Price Index and also provides an inflation calculator api.
#We'll be using this api to calculate the loan commitments from other years into today's dollars for better comparision. 

from uscpi import UsCpi

cpi = UsCpi() # downloads the latest CPI data

    # $100 in 2012 is worth how much in 1980?
cpi.value_with_inflation(100, 2012, 1980)
Out[41]:
35.89
In [42]:
projcpi = projcp_df2[['regionname','countryname','project_name','totalamt','grantamt','sector1','year']].copy()
In [43]:
# Function used to convert monetary values to today's value from any year less than 2013 using the cpi api

def fun2(y):
    totalamts = y['totalamt']
    year = int(y['year'])   
    regionname =y['regionname']
    countryname = y['countryname']
    project_name = y['project_name']
    grantamt = y['grantamt']
    sector1 = y['sector1']
    boolVal = 1914 <= year <= 2013
    if(boolVal):
        totalamts = cpi.value_with_inflation(totalamts,year,2013)
    return Series([regionname,countryname,project_name,totalamts,grantamt,sector1,year],index=['regionname','countryname','project_name','totalamt','grantamt','sector1','year'])
In [44]:
resultcpi = projcpi.fillna(0).apply(fun2, axis=1)
/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.10.1-py2.7-macosx-10.5-i386.egg/pandas/core/frame.py:3576: FutureWarning: rename with inplace=True  will return None from pandas 0.11 onward
  " from pandas 0.11 onward", FutureWarning)
In [45]:
# Data cleaning: Removing un-wanted data from the sector1 string
resultcpi['sectorMain'] = resultcpi['sector1'].str.split("!").str[0]
In [46]:
resultcpi['country'] = resultcpi['countryname'].str.split(";").str[0]
In [47]:
# validate that the data is cleaned. 
resultcpi[:4]
Out[47]:
regionname countryname project_name totalamt grantamt sector1 year sectorMain country
0 EUROPE AND CENTRAL ASIA Republic of Armenia;Republic of Armenia JSDF Strengthening the Livelihoods and Voice o... 0 2670000 Other social services!$!100!$!JB 2013 Other social services Republic of Armenia
1 EAST ASIA AND PACIFIC Socialist Republic of Vietnam;Socialist Republ... Mekong Delta Transport Infrastructure Developm... 156000000 0 Rural and Inter-Urban Roads and Highways!$!55!... 2013 Rural and Inter-Urban Roads and Highways Socialist Republic of Vietnam
2 AFRICA Republic of Mozambique;Republic of Mozambique Mozambique Nutrition Additional Financing 37000000 0 Health!$!100!$!JA 2013 Health Republic of Mozambique
3 EUROPE AND CENTRAL ASIA Republic of Moldova;Republic of Moldova Moldova Education Reform Project 40000000 0 General education sector!$!100!$!EZ 2013 General education sector Republic of Moldova
In [48]:
resultcpi['year'] = resultcpi['year'].astype(int)
In [49]:
# The dataset is big, which makes it very difficult to analyze. This next step we construct a boolean to extract only those items that have been funded from 200 - 2013
is_bv = (resultcpi['year'] >= 2000) & (resultcpi['year'] <= 2013)
In [50]:
resultcpi2 = resultcpi[is_bv]
In [51]:
# verify that data is formatted in the way we want to analyze it. 
resultcpi2[:4]
Out[51]:
regionname countryname project_name totalamt grantamt sector1 year sectorMain country
0 EUROPE AND CENTRAL ASIA Republic of Armenia;Republic of Armenia JSDF Strengthening the Livelihoods and Voice o... 0 2670000 Other social services!$!100!$!JB 2013 Other social services Republic of Armenia
1 EAST ASIA AND PACIFIC Socialist Republic of Vietnam;Socialist Republ... Mekong Delta Transport Infrastructure Developm... 156000000 0 Rural and Inter-Urban Roads and Highways!$!55!... 2013 Rural and Inter-Urban Roads and Highways Socialist Republic of Vietnam
2 AFRICA Republic of Mozambique;Republic of Mozambique Mozambique Nutrition Additional Financing 37000000 0 Health!$!100!$!JA 2013 Health Republic of Mozambique
3 EUROPE AND CENTRAL ASIA Republic of Moldova;Republic of Moldova Moldova Education Reform Project 40000000 0 General education sector!$!100!$!EZ 2013 General education sector Republic of Moldova
In [52]:
#In the line below we are finding the total bank commitments to Africa over a period from 2000 - 2009. 
# We use the cpi function to calculate the inflation and CPI on all the loans less than 2013
ggroup_africa = resultcpi2[resultcpi2['regionname']=='AFRICA'].groupby('year').apply(func)
In [846]:
ggroup_africa.plot(kind='bar', title='Bank lending commitments to Africa in year 2000 - 2013'); plt.tight_layout()
In [53]:
#In the line below we are finding the total bank commitments per region over a period from 2000 - 2009. 
# We use the cpi function to calculate the inflation and CPI on all the loans less than 2013
amtByRegion =resultcpi2.groupby(['regionname','year']).apply(func).unstack('regionname')
In [54]:
amtByRegion[:2]
Out[54]:
totalamt
regionname AFRICA EAST ASIA AND PACIFIC EUROPE AND CENTRAL ASIA LATIN AMERICA AND CARIBBEAN MIDDLE EAST AND NORTH AFRICA OTHER SOUTH ASIA
year
2000 4.695581e+09 3.100790e+09 4.469655e+09 5.950326e+09 1.367870e+09 0 3.328411e+09
2001 4.775420e+09 2.858354e+09 5.639439e+09 6.512251e+09 9.593054e+08 0 3.996901e+09
In [852]:
amtByRegion.plot(kind='bar',figsize=(16,8), title='Lendig commitments by the Bank from 1947 - 2013'); plt.legend(loc='best')
Out[852]:
<matplotlib.legend.Legend at 0x216a1030>
In [55]:
# count the number of world bank projects from 2000 - 2013 per country
numOfproj_by_country = resultcpi2.groupby('country').size().order(na_last=True, ascending=False, kind='mergesort')
In [56]:
numOfproj_by_country[:5]
Out[56]:
country
Republic of India                211
People's Republic of China       205
Federative Republic of Brazil    187
Republic of Indonesia            177
Africa                           167

The top 3 borrowers from the world bank are part of the BRICS. We are interested in analyzing patterns of borrowing between the BRIC Nations, their freedom index, Human Development Index and GDP. The next steps analyze the lending of the world bank to these nations.

  • Brazil
  • China
  • India
  • Russia
  • South Africa

In [855]:
# From above, I observed that the top most funded UN nations are BRICS, so the list below is created to filter out the BRICS for further observation and analysis
listBRICS = ['Federative Republic of Brazil','Russian Federation','Republic of India','People\'s Republic of China','Republic of South Africa']
In [856]:
brics_nations = resultcpi2[resultcpi2['country'].isin(listBRICS)].groupby(['country','year']).size()
In [857]:
# In the Graph, we look at the number of projects funded by the world bank per country per year since 2000 - 2013
brics_nations.unstack('country').fillna(0).plot(subplots=True, figsize=(8, 8),kind='bar'); plt.legend(loc='best');plt.tight_layout()
In [859]:
#rpt[rpt['STK_ID'].isin(stk_list)]
df_of_BRICS = resultcpi2[resultcpi2['country'].isin(listBRICS)].groupby(['country','sector3']).size().order(na_last=True, ascending=False, kind='mergesort')
In [860]:
df_of_BRICS.unstack('country').fillna(0)
Out[860]:
country Federative Republic of Brazil People's Republic of China Republic of India Republic of South Africa Russian Federation
sector3
Agricultural extension and research 1 2 7 0 0
Agro-industry 0 1 0 0 0
Agro-industry; marketing; and trade 2 0 6 0 0
Animal production 0 4 1 0 0
Banking 3 2 2 0 0
Capital markets 0 2 0 0 0
Central government administration 12 4 6 1 9
Compulsory health finance 0 1 0 0 0
Compulsory pension and unemployment insurance 3 2 0 0 0
Crops 0 0 1 0 0
Energy efficiency in Heat and Power 2 14 7 0 2
Flood protection 2 6 2 0 0
Forestry 10 9 4 1 2
General agriculture; fishing and forestry sector 33 7 16 4 0
General education sector 2 0 0 0 1
General energy sector 1 2 0 1 0
General finance sector 1 2 3 0 0
General industry and trade sector 2 1 0 0 1
General public administration sector 4 3 3 0 0
General transportation sector 4 2 2 0 0
General water; sanitation and flood protection sector 4 5 6 0 0
Health 8 5 16 0 2
Housing construction 1 2 3 0 0
Housing finance 1 0 0 0 0
Hydropower 0 0 2 0 0
Irrigation and drainage 0 8 14 0 0
Law and justice 0 0 1 0 1
Micro- and SME finance 3 0 1 0 0
Microfinance 0 0 1 0 0
Non-compulsory health finance 0 1 0 0 0
Oil and gas 1 0 0 0 1
Other Renewable Energy 0 3 1 2 0
Other industry 1 5 5 0 1
Other social services 11 2 5 0 2
Payments; settlements; and remittance systems 0 1 0 0 0
Petrochemicals and fertilizers 1 0 0 0 0
Ports; waterways and shipping 1 5 0 0 0
Power 2 5 6 1 0
Pre-primary education 1 0 0 0 0
Primary education 4 1 5 0 0
Public administration- Agriculture; fishing and forestry 0 1 1 0 0
Public administration- Education 1 0 0 0 0
Public administration- Energy and mining 1 2 0 0 0
Public administration- Financial Sector 0 0 0 0 1
Public administration- Health 1 0 1 0 0
Public administration- Transportation 2 0 2 0 0
Public administration- Water; sanitation and flood protection 3 2 1 0 0
Railways 2 8 3 0 0
Renewable energy 2 6 5 3 0
Roads and highways 4 11 9 0 0
Rural and Inter-Urban Roads and Highways 7 11 17 0 0
SME Finance 0 1 2 0 0
Sanitation 1 2 1 0 0
Secondary education 0 0 1 0 0
Sewerage 2 11 0 0 0
Solid waste management 4 5 0 0 0
Sub-national government administration 18 3 23 0 4
Telecommunications 0 0 1 0 0
Tertiary education 0 0 2 0 0
Thermal Power Generation 0 1 0 0 0
Transmission and Distribution of Electricity 0 0 4 1 0
Urban Transport 11 15 1 0 0
Vocational training 0 4 2 0 0
Wastewater Collection and Transportation 2 4 0 0 0
Wastewater Treatment and Disposal 1 2 0 0 0
Water supply 4 9 9 0 1
In [861]:
# We import the Freedom index csv for comparison analysis

f = codecs.open(file_path("fredict"), encoding='iso-8859-1')
free_df = pd.read_csv(f)
In [862]:
free_df[:2]
Out[862]:
name index year overall score property rights freedom from corruption fiscal freedom government spending business freedom labor freedom monetary freedom trade freedom investment freedom financial freedom
0 Afghanistan 2013 N/A N/A 15 N/A 83.2 59.7 75.8 69.5 N/A 65 N/A
1 Albania 2013 65.2 30 31 92.6 75.1 81 49 78.4 79.8 65 70
In [863]:
# similar to the projects and operations dataset, I restrict the analysis to only data from 2000 - 2009
free_df2 = free_df[free_df['index year']>=2000].copy()
In [864]:
free_df2.columns
Out[864]:
Index([name, index year, overall score, property rights, freedom from corruption, fiscal freedom, government spending, business freedom, labor freedom, monetary freedom, trade freedom, investment freedom, financial freedom], dtype=object)
In [865]:
# I extract the BRICS to further observe them 
free_df2 = free_df2[free_df2['name'].isin(['China', 'India', 'Russia', 'Brazil', 'South Africa'])]
In [867]:
free_df2[:5]
Out[867]:
name index year overall score property rights freedom from corruption fiscal freedom government spending business freedom labor freedom monetary freedom trade freedom investment freedom financial freedom
20 Brazil 2013 57.7 50 38 70.3 54.8 53 57.2 74.4 69.7 50 60
32 China 2013 51.9 20 36 70.2 83.3 48 62.6 71.6 72 25 30
70 India 2013 55.2 50 31 78.3 77.9 37.3 73.6 65.3 63.6 35 40
131 Russia 2013 51.1 25 24 86.9 54.4 69.2 52.6 66.7 77.4 25 30
147 South Africa 2013 61.8 50 41 70.5 69.2 74.7 55.6 75.8 76.3 45 60
In [868]:
free_df3 = free_df2[['name','index year','overall score']].copy()
In [869]:
free_df3[:2]
Out[869]:
name index year overall score
20 Brazil 2013 57.7
32 China 2013 51.9
In [870]:
free_df3['overall score'] = free_df3['overall score'].astype(float)
In [874]:
free_df3.pivot_table(['overall score'], rows=['index year'], cols='name').plot(kind='line', title='freedom Index per BRICS country', figsize=(10,10))
Out[874]:
<matplotlib.axes.AxesSubplot at 0x24b8cbb0>
In [875]:
free_df3.pivot_table(['overall score'], rows=['index year'], cols='name').plot(subplots=True, figsize=(8, 8)); plt.legend(loc='best');plt.tight_layout();plt.ylabel('Freedom Index');
In [876]:
# In the Graph, we look at the number of projects funded by the world bank per country per year since 2000 - 2013
brics_nations.unstack('country').fillna(0).plot(subplots=True, figsize=(8, 8),kind='bar'); plt.legend(loc='best');plt.tight_layout()
Findings: 1. In 2010, The world bank funding increased (compared to previous year) in all the BRICS except Russia and the GDP in these countries also dropped. Is there a relationship? Financial Crisis? Challenges 1. The Worldbank data has so many facets that can be useful for data analysis however not all variables are as properly explained. 2. During the analysis of the data, i noticed that after 1970, The world bank changed the format of their reporting which made it really difficult during munging ot the data.

Freedom versus funding

In [5]:
f = codecs.open(file_path("fredict"), encoding='iso-8859-1')
free_df = pd.read_csv(f)
In [6]:
#because I'm looking at the contribution of funds over a period of time, I want to look at the current
#Freedom Index for these countries to make an anlysis of their current state
free_df2 = free_df[free_df['index year']==2013].copy()

#for simplicity, let's ignore those who have not been scored as well
free_df2 = free_df2[free_df2['overall score']!='N/A'].copy()
In [7]:
free_df2.columns
Out[7]:
Index([name, index year, overall score, property rights, freedom from corruption, fiscal freedom, government spending, business freedom, labor freedom, monetary freedom, trade freedom, investment freedom, financial freedom], dtype=object)

What are the most "free" countries according to the freedom index?

In [8]:
low_freedom = free_df2.sort(['overall score'], ascending=True)
low_freedom = low_freedom[:10]
low_freedom[['name', 'overall score']]
Out[8]:
name overall score
118 North Korea 1.5
38 Cuba 28.5
184 Zimbabwe 28.6
180 Venezuela 36.1
50 Eritrea 36.3
23 Burma 39.2
41 Democratic Republic of Congo 39.6
49 Equatorial Guinea 42.3
171 Turkmenistan 42.6
72 Iran 43.2
In [9]:
high_freedom = free_df2.sort(['overall score'], ascending=False)
high_freedom = high_freedom[:10]
high_freedom[['name', 'overall score']]
Out[9]:
name overall score
67 Hong Kong 89.3
142 Singapore 88
6 Australia 82.6
114 New Zealand 81.4
155 Switzerland 81
27 Canada 79.4
31 Chile 79
104 Mauritius 76.9
42 Denmark 76.1
176 United States 76

let's look at corruption as well

In [10]:
#high corruption
high_corruption = free_df2.sort(['freedom from corruption'], ascending=True)
high_corruption = high_corruption[:10]
high_corruption[['name', 'freedom from corruption']]
Out[10]:
name freedom from corruption
14 Belize 0
118 North Korea 10
23 Burma 15
171 Turkmenistan 16
178 Uzbekistan 16
65 Haiti 18
24 Burundi 19
49 Equatorial Guinea 19
180 Venezuela 19
3 Angola 20
In [11]:
#low corruption
low_corruption = free_df2.sort(['freedom from corruption'], ascending=False)
low_corruption = low_corruption[:10]
low_corruption[['name', 'freedom from corruption']]
Out[11]:
name freedom from corruption
114 New Zealand 95
42 Denmark 94
54 Finland 94
154 Sweden 93
142 Singapore 92
119 Norway 90
163 The Netherlands 89
6 Australia 88
155 Switzerland 88
27 Canada 87

What are the most funded countries?

Hypothesis: we would expect that the most funded countries have some correlation between economic freedom. We also might expect that if countries do have the most funding and are not included high on the freedom index, that they might have a high corruption index.
In [57]:
numOfproj_by_country[:10]
Out[57]:
country
Republic of India                  211
People's Republic of China         205
Federative Republic of Brazil      187
Republic of Indonesia              177
Africa                             167
Socialist Republic of Vietnam      158
Islamic Republic of Pakistan       104
United Mexican States               96
Islamic State of Afghanistan        93
People's Republic of Bangladesh     92
In [58]:
#recall the resultcpi looking at the projects funded, converted to 2013 dollars

#let's sort by country
country_cpi = resultcpi.sort(column='country', ascending=True)
country_cpi[['country', 'totalamt', 'grantamt', 'year']][:2]
/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.10.1-py2.7-macosx-10.5-i386.egg/pandas/core/frame.py:3112: FutureWarning: column is deprecated, use columns
  warnings.warn("column is deprecated, use columns", FutureWarning)
Out[58]:
country totalamt grantamt year
14536 NaN 10.600000 0 0
14474 NaN 57.299999 0 0
In [59]:
#there are a few problems with this data-- First, there are continents included in the countryname:

country_cpi= country_cpi.dropna()

#country_cpi= country_cpi[((country_cpi.country !='Africa')
#                &(country_cpi.country !='Central America') 
#                &(country_cpi.country !='Latin America')
#                &(country_cpi.country !='Europe')
#                &(country_cpi.country !='East Asia and Pacific')
#                &(country_cpi.country !='Europe and Central Asia')
#                &(country_cpi.country !='World')
#                &(country_cpi.country !='Asia')
#                &(country_cpi.country !='Middle East and North Africa')
#                &(country_cpi.country !='Africa')
#                &(country_cpi.country !='South Eastern Europe and Balkans'))]
In [60]:
#Because the naming conventions of the Freedom Index and World Bank, we will have to manually input the countries we are looking for to compare their freedom index with their world bank funding.
#Because the naming conventions of the Freedom Index and World Bank, we will have to manually input the countries we are looking for to compare their freedom index with their world bank funding.
low_Freedom_list= ['Belize','Turkmenistan','Republic of Zimbabwe','Republic of Uzbekistan',
                    'Republic of Haiti', 'Republic of Burundi', 'Republic of Equatorial Guinea', 
                    'People\'s Republic of Angola', 'Republica Bolivariana de Venezuela']
high_Freedom_list= ['Kingdom of Norway', 'New Zealand', 'Kingdom of Denmark', 'Republic of Finland',
                    'Republic of Sweden', 'Kingdom of The Netherlands', 'Common of Australia']
In [61]:
#now we can see how much money was committed to each country in the Freedom_list
low_Freedom_nations = country_cpi[country_cpi['country'].isin(low_Freedom_list)].groupby(['country']).size()
low_Freedom_nations
Out[61]:
country
Belize                                18
People's Republic of Angola           29
Republic of Burundi                   90
Republic of Equatorial Guinea         11
Republic of Haiti                     86
Republic of Uzbekistan                36
Republic of Zimbabwe                  54
Republica Bolivariana de Venezuela    53
Turkmenistan                           9
In [62]:
low_Freedom_nations.plot(kind='bar', title='Lending to Countries with low Freedom Index'); plt.tight_layout()

From this point, we are interested in expanding on the previous analysis, by retriving information from wikipedia an others.

In [1]:
import pandas as pd
import wikipydia as wk
import mwparserfromhell
from wikitools import wiki
from wikitools import api
from wikitools import category
from wikitools import page
import itertools
import re
wikisite = "http://en.wikipedia.org/w/api.php"
wikiObject = wiki.Wiki(wikisite)

projectsAPI = pd.read_csv('../data/projects_operations_api.csv')
wikipediadf = pd.read_csv('../data/matchcountries.csv')

# some cleaning on the datasets
wikipediadf.index =wikipediadf['countryname'] 
projectsAPI['countryname'] = [str(country).split(";")[0] for country in projectsAPI['countryname']] 
#print matchNames.columns
#print projectsAPI.columns
projects = pd.merge(projectsAPI,wikipediadf, on='countryname', how = 'left')
projects = projects[projects['countryname'].map(type) != type(0.0)]
projectsAPI = projectsAPI[projectsAPI['countryname'].map(type) != type(0.0)]

projects['totalamt'] = projects['totalamt'].str.replace(';','')
projects['totalamt'] = projects['totalamt'].astype('float32')
print projects.columns
projects['year'] = [str(x)[0:4] for x in projects['boardapprovaldate']]
projects[projects.year == 'nan'] =[str(x)[0:4] for x in projects[projects.year == 'nan']['closingdate']] 
Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location, wikiname, type, checktype, mapname], dtype=object)
In [2]:
import matplotlib.pyplot as plt
import matplotlib.colors as col

def color_variant(hex_color, brightness_offset=1):  
    if len(hex_color) != 7:  
        raise Exception("Passed %s into color_variant(), needs to be in #87c95f format." % hex_color)  
    rgb_hex = [hex_color[x:x+2] for x in [1, 3, 5]]  
    new_rgb_int = [int(hex_value, 16) + brightness_offset for hex_value in rgb_hex]  
    new_rgb_int = [min([255, max([0, i])]) for i in new_rgb_int] # make sure new values are between 0 and 255  
    # hex() produces "0x88", we want just "88"  
    
    hexcolor = "#"
    for i in new_rgb_int:
        if(i<16):
            hexcolor+="0"+str(hex(i)[2:])
        else:
            hexcolor+=str(hex(i)[2:])
    return hexcolor

def drawBarCharReference(Color,targetlist, field, title, labels):
    fig = plt.figure(num=None, figsize=(24, 8), dpi=700, facecolor='w', edgecolor='k')
    
    ax = fig.add_subplot(111)
    
    ColorBase = Color
    changeQuantile = True
    changeRange = 0.10
    i = 0
    
    for x in targetlist.sort(columns=field,ascending=True).index:     
        if i/float(len(targetlist.index)) > changeRange:
            ColorBase = color_variant(ColorBase,20)
            changeRange = changeRange + 0.10
        targetlist['color'][x] =  ColorBase
        #print (type(targetlist[field][x]))
        ax.bar(i,float(targetlist[field][x]),1,color=matplotlib.colors.colorConverter.to_rgb(ColorBase)) 
        i+=1
    
    ax.set_xticklabels( ([x[1] for x in targetlist.sort(columns=field,ascending=True).index]) )
    #plt.subplots_adjust(bottom=1, left=.01, right=.99, top=.90, hspace=.35)
    plt.xticks(np.arange(0.5, i+1, 1))
    plt.setp(ax.get_xticklabels(), fontsize=9, rotation='vertical')
    plt.setp(ax.get_yticklabels(), fontsize=10)
    plt.title(title)
    plt.xlabel(labels[0],fontsize=18)
    plt.ylabel(labels[1],fontsize=18)
    plt.show()

    
In [3]:
# http://www.geophysique.be/2013/02/12/matplotlib-basemap-tutorial-10-shapefiles-unleached-continued/

#
# BaseMap example by geophysique.be
# tutorial 10
 
import os
import inspect
import numpy as np
import matplotlib.pyplot as plt
from itertools import islice, izip
from mpl_toolkits.basemap import Basemap

def zip_filter_by_state(records, shapes, included_states=None):
    # by default, no filtering
    # included_states is a list of states fips prefixes
    for (record, state) in izip(records, shapes):
        if record[1] in included_states:
            yield (record, state) 


def draw_global_map(colors, indexlist, titles):
    ### PARAMETERS FOR MATPLOTLIB :
    import matplotlib as mpl
    mpl.rcParams['font.size'] = 14.
    mpl.rcParams['font.family'] = 'Serif'
    mpl.rcParams['axes.labelsize'] = 8.
    mpl.rcParams['xtick.labelsize'] = 40.
    mpl.rcParams['ytick.labelsize'] = 20.
     
    fig = plt.figure(figsize=(11.7,8.3))
    #Custom adjust of the subplots
    plt.subplots_adjust(left=0.05,right=0.95,top=0.90,bottom=0.05,wspace=0.15,hspace=0.05)
    ax = plt.subplot(111)
    #Let's create a basemap of USA
    
    x1 = -179.
    x2 = 179.
    y1 = -60.
    y2 = 80.
    
    
    i=0
    #colors = ['#8C040A','#9A040C','#A8050E','#C40813','#D20915','#DF0A17','#ED0C19','#FC0D1B']
    
        
    m = Basemap(resolution='i',projection='merc', llcrnrlat=y1,urcrnrlat=y2,llcrnrlon=x1,urcrnrlon=x2,lat_ts=(y1+y2)/2)
    m.drawcountries(linewidth=0.5)
    m.drawcoastlines(linewidth=0.5)
    m.drawparallels(np.arange(y1,y2,20.),labels=[1,0,0,0],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw parallels
    m.drawmeridians(np.arange(x1,x2,20.),labels=[0,0,0,1],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw meridians
     
        
        
    from matplotlib.collections import LineCollection
    from matplotlib import cm
    import shapefile 
    
    basemap_data_dir = os.path.join(os.path.dirname(inspect.getfile(Basemap)), "data")

    # this is my git clone of https://github.com/matplotlib/basemap --> these files will be in the PiCloud basemap_data_dir
    if os.path.exists(os.path.join(basemap_data_dir,"UScounties.shp")):
        shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
    else:
        # put in your path
        #shpf = shapefile.Reader("/Users/raymondyee/Dropbox/WwoD13/tl_2012_us_county")
        shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
    
    shapes = shpf.shapes()
    records = shpf.records()
    
    #print cm.colors.ColorConverter.to_rgba('#eeefff') 
    #random_number = 38*145*155
    
    # show only CA and AK (for example)
    for record, shape in zip(records, shapes):
        lons,lats = zip(*shape.points)
        data = np.array(m(lons, lats)).T
     
        if len(shape.parts) == 1:
            segs = [data,]
        else:
            segs = []
            for i in range(1,len(shape.parts)):
                index = shape.parts[i-1]
                index2 = shape.parts[i]
                segs.append(data[index:index2])
            segs.append(data[index2:])
     
        lines = LineCollection(segs,antialiaseds=(1,))
        #cm.jet(random_number)
        lines.set_facecolors(colors[0])
        lines.set_edgecolors(colors[1])
        lines.set_linewidth(0.1)
        ax.add_collection(lines)
    for record, shape in zip_filter_by_state(records, shapes, [x[1] for x in indexlist.index]):
        lons,lats = zip(*shape.points)
        data = np.array(m(lons, lats)).T
     
        if len(shape.parts) == 1:
            segs = [data,]
        else:
            segs = []
            for i in range(1,len(shape.parts)):
                index = shape.parts[i-1]
                index2 = shape.parts[i]
                segs.append(data[index:index2])
            segs.append(data[index2:])
     
        lines = LineCollection(segs,antialiaseds=(1,))
        #cm.jet(random_number)
        i=i+1
        x_color=None
        for w in heatmapfounding.index:
            if record[1] in w[1]:
                x_color = w
                break
    
        lines.set_facecolors(indexlist['color'][x_color])
        lines.set_edgecolors(indexlist['color'][x_color])
        lines.set_linewidth(0.1)
        ax.add_collection(lines)
    plt.title(titles[0])
    plt.savefig('tutorial10.png',dpi=300)
    plt.show()


#draw_global_map(['#3C989E','#424242'],heatmapfounding, ['Total World Bank Lending Commitments Accumulated 2001-2013'])
In [16]:
heatmapfounding = pd.DataFrame(projects[projects.type == 'Country'], columns=['wikiname','mapname','totalamt','year'])
heatmapfounding = pd.DataFrame(heatmapfounding[heatmapfounding.year>='2001'], columns=['wikiname','mapname','totalamt','year'])

heatmapfounding = heatmapfounding.groupby(['wikiname','mapname']).sum()
heatmapfounding['color'] = pd.Series(["hola" for x in heatmapfounding.index], index=heatmapfounding.index)

drawBarCharReference( '#C73F2A',heatmapfounding, "totalamt","Total World Bank Lending Commitments Accumulated 2001-2013",['Country','US$'])
draw_global_map(['#ffffff','#000000'],heatmapfounding, ['Total World Bank Lending Commitments Accumulated 2001-2013'])
In [15]:
def cleanFloatnumber(x):
    if type(x) is float:
        return float(x)
    elif type(x) is str:
        if len(x) ==0:
            return None
        x=re.sub('<!--.*?-->','',x)
        x=re.sub('<*?>.*?<*?>','',x)
        x=x.strip()
        delimiterRegex = re.compile(r'[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?')
        Numbers = re.findall(delimiterRegex,x)
        if len(Numbers)>0:
            return float(Numbers[0])
        else:
            return None
    else:
        return None

def cleanIntNumber(x):
    if type(x) is float:
        return float(x)
    elif type(x) is str:
        if len(x) ==0:
            return None
        x=re.sub('<!--.*?-->','',x)
        x=re.sub('<*?>.*?<*?>','',x)
        x=re.sub(',','',x)
        x=x.strip()
        delimiterRegex = re.compile(r'[0-9]+')
        Numbers = re.findall(delimiterRegex,x)
        if len(Numbers)>0:
            return float(Numbers[0])
        else:
            return None
    else:
        return None
In [6]:
def get_infobox_from_wikipedia(countryname):
    #print "Checking: "+str(countryname)+"__"
    
    country_found = False
    hdi = None
    gini = None
    GDP = None
    GDP_nominal_per_capita = None
    population = None
    if str(countryname).strip() == "" or countryname is None or str(countryname).strip()=='nan':
        return hdi,gini,GDP,GDP_nominal_per_capita, population
    try:
        wikipage = page.Page(wikiObject,title=countryname)
    except Exception as inst:
        print "No results from Wikipedia: "+str(countryname)
        return hdi,gini,GDP,GDP_nominal_per_capita, population
    wikiraw = wikipage.getWikiText()
    wikiraw = wikiraw.decode('UTF-8')
    parsedWikiText = mwparserfromhell.parse(wikiraw) 
    for x in parsedWikiText.nodes:
        if "template" in str(type(x)) and "Infobox country" in str(x.name):
            country_found = True
            if x.has_param('population_census'):
                population = cleanIntNumber(str(x.get('population_census').value))
            if population is None:
                if x.has_param('population_estimate'):
                    population = cleanIntNumber(str(x.get('population_estimate').value))
            if x.has_param('HDI'):
                hdi = cleanFloatnumber(str(x.get('HDI').value))
            if x.has_param('Gini'):
                gini = cleanFloatnumber(str(x.get('Gini').value))
            if x.has_param('GDP'):
                GDP = x.get('GDP').value
            if x.has_param('GDP_nominal_per_capita'):
                GDP_nominal_per_capita = str(x.get('GDP_nominal_per_capita').value)
            break
    if country_found == False:
        print "No Infobox: "+str(countryname)
    return hdi,gini,GDP,GDP_nominal_per_capita,population

wikipediadf["HDI"], wikipediadf["gini"],wikipediadf['GDP'],wikipediadf['GDP_nominal_per_capita'],wikipediadf['population'] = zip(*wikipediadf['wikiname'].map(get_infobox_from_wikipedia))

#pp = pd.DataFrame(zip(*wikipediadf[wikipediadf.wikiname == "Guinea"]['wikiname'].map(get_infobox_from_wikipedia)))

#print pp[:]
In [7]:
# It was not possible to process this data from wikipedia, so I decided to filter it (Ignacio)

for i in wikipediadf[wikipediadf.type == 'Country'].index:
    typeFound = type(wikipediadf['population'][i])
    if typeFound is not float and typeFound is not None:
        print "deleted"
        wikipediadf=wikipediadf.drop([i])
        break
for i in wikipediadf[wikipediadf.type == 'Country'].index:
    typeFound = type(wikipediadf['GDP_nominal_per_capita'][i])
    if typeFound is not float and typeFound is not None:
        print "deleted"
        wikipediadf=wikipediadf.drop([i])
        break
deleted
deleted
In [8]:
projects = pd.merge(projectsAPI,wikipediadf, on='countryname', how = 'left')
projects = projects[projects['countryname'].map(type) != type(0.0)]
projectsAPI = projectsAPI[projectsAPI['countryname'].map(type) != type(0.0)]

projects['totalamt'] = projects['totalamt'].str.replace(';','')
projects['totalamt'] = projects['totalamt'].astype('float32')
print projects.columns
projects['year'] = [str(x)[0:4] for x in projects['boardapprovaldate']]
projects[projects.year == 'nan'] =[str(x)[0:4] for x in projects[projects.year == 'nan']['closingdate']] 
Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location, wikiname, type, checktype, mapname, HDI, gini, GDP, GDP_nominal_per_capita, population], dtype=object)
In [9]:
def drawBarCharReference2(Color,targetlist, field, title,labels):
    fig = plt.figure(num=None, figsize=(24, 8), dpi=700, facecolor='w', edgecolor='k')
    
    ax = fig.add_subplot(111)
    
    ColorBase = Color
    changeQuantile = True
    changeRange = 0.10
    i = 0
    
    for x in targetlist.sort(columns=field,ascending=True).index:     
        if i/float(len(targetlist.index)) > changeRange:
            ColorBase = color_variant(ColorBase,20)
            changeRange = changeRange + 0.10
        targetlist['color'][x] =  ColorBase
        #print (type(targetlist[field][x]))
        ax.bar(i,float(targetlist[field][x]),1,color=matplotlib.colors.colorConverter.to_rgb(ColorBase)) 
        i+=1
    
    ax.set_xticklabels( ([targetlist['mapname'][x] for x in targetlist.sort(columns=field,ascending=True).index]) )
    #plt.subplots_adjust(bottom=1, left=.01, right=.99, top=.90, hspace=.35)
    plt.xticks(np.arange(0.5, i+1, 1))
    plt.setp(ax.get_xticklabels(), fontsize=9, rotation='vertical')
    plt.setp(ax.get_yticklabels(), fontsize=10)
    plt.title(title)
    plt.xlabel(labels[0], fontsize=18)
    plt.ylabel(labels[1], fontsize=18)
    plt.show()
In [10]:
def zip_filter_by_state2(records, shapes, included_states=None):
    # by default, no filtering
    # included_states is a list of states fips prefixes
    for (record, state) in izip(records, shapes):
        if record[1] in included_states:
            yield (record, state) 

def draw_global_map2(colors, indexlist, titles):
    ### PARAMETERS FOR MATPLOTLIB :
    import matplotlib as mpl
    mpl.rcParams['font.size'] = 14.
    mpl.rcParams['font.family'] = 'Serif'
    mpl.rcParams['axes.labelsize'] = 8.
    mpl.rcParams['xtick.labelsize'] = 40.
    mpl.rcParams['ytick.labelsize'] = 20.
     
    fig = plt.figure(figsize=(11.7,8.3))
    #Custom adjust of the subplots
    plt.subplots_adjust(left=0.05,right=0.95,top=0.90,bottom=0.05,wspace=0.15,hspace=0.05)
    ax = plt.subplot(111)
    #Let's create a basemap of USA
    
    x1 = -179.
    x2 = 179.
    y1 = -60.
    y2 = 80.
    
    
    i=0
    #colors = ['#8C040A','#9A040C','#A8050E','#C40813','#D20915','#DF0A17','#ED0C19','#FC0D1B']
    
        
    m = Basemap(resolution='i',projection='merc', llcrnrlat=y1,urcrnrlat=y2,llcrnrlon=x1,urcrnrlon=x2,lat_ts=(y1+y2)/2)
    m.drawcountries(linewidth=0.5)
    m.drawcoastlines(linewidth=0.5)
    m.drawparallels(np.arange(y1,y2,20.),labels=[1,0,0,0],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw parallels
    m.drawmeridians(np.arange(x1,x2,20.),labels=[0,0,0,1],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw meridians
     
        
        
    from matplotlib.collections import LineCollection
    from matplotlib import cm
    import shapefile 
    
    basemap_data_dir = os.path.join(os.path.dirname(inspect.getfile(Basemap)), "data")

    # this is my git clone of https://github.com/matplotlib/basemap --> these files will be in the PiCloud basemap_data_dir
    if os.path.exists(os.path.join(basemap_data_dir,"UScounties.shp")):
        shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
    else:
        # put in your path
        #shpf = shapefile.Reader("/Users/raymondyee/Dropbox/WwoD13/tl_2012_us_county")
        shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
    
    shapes = shpf.shapes()
    records = shpf.records()
    
    #print cm.colors.ColorConverter.to_rgba('#eeefff') 
    #random_number = 38*145*155
    
    # show only CA and AK (for example)
    for record, shape in zip(records, shapes):
        lons,lats = zip(*shape.points)
        data = np.array(m(lons, lats)).T
     
        if len(shape.parts) == 1:
            segs = [data,]
        else:
            segs = []
            for i in range(1,len(shape.parts)):
                index = shape.parts[i-1]
                index2 = shape.parts[i]
                segs.append(data[index:index2])
            segs.append(data[index2:])
     
        lines = LineCollection(segs,antialiaseds=(1,))
        #cm.jet(random_number)
        lines.set_facecolors(colors[0])
        lines.set_edgecolors(colors[1])
        lines.set_linewidth(0.1)
        ax.add_collection(lines)

    for record, shape in zip_filter_by_state2(records, shapes, [indexlist['mapname'][x] for x in indexlist.index]):
        lons,lats = zip(*shape.points)
        data = np.array(m(lons, lats)).T
     
        if len(shape.parts) == 1:
            segs = [data,]
        else:
            segs = []
            for i in range(1,len(shape.parts)):
                index = shape.parts[i-1]
                index2 = shape.parts[i]
                segs.append(data[index:index2])
            segs.append(data[index2:])
     
        lines = LineCollection(segs,antialiaseds=(1,))
        #cm.jet(random_number)
        i=i+1
        x_color=None
        for (w,x) in [(indexlist['mapname'][x],x) for x in indexlist.index]:
            if type(w) is str and record[1] in w:
                x_color = x
                break
    
        lines.set_facecolors(indexlist['color'][x_color])
        lines.set_edgecolors(indexlist['color'][x_color])
        lines.set_linewidth(0.1)
        ax.add_collection(lines)
    plt.title(titles[0])
    plt.savefig('tutorial10.png',dpi=300)
    plt.show()

The Human Development Index (HDI) is a composite statistic of life expectancy, education, and income indices to rank countries into four tiers of human development. It was created by economist Mahbub ul Haq, followed by economist Amartya Sen in 1990,[1] and published by the United Nations Development Programme.[2]

Published on 4 November 2010 (and updated on 10 June 2011), starting with the 2010 Human Development Report the HDI combines three dimensions:

  1. A long and healthy life: Life expectancy at birth
  2. Education index: Mean years of schooling and Expected years of schooling
  3. A decent standard of living: GNI per capita (PPP US$)
In [11]:
#wikipediadf["HDI"], #
#wikipediadf["gini"],
#wikipediadf['GDP'],
#wikipediadf['GDP_nominal_per_capita'],
#wikipediadf['population']

heatmapHDI = pd.DataFrame(projects[projects.type == 'Country'], columns=['wikiname','mapname','HDI'])
#heatmapHDI=heatmapHDI.reindex(index=['wikiname','wikiname'])
#heatmapHDI = heatmapHDI.groupby(['wikiname','mapname'])
#heatmapHDI = pd.DataFrame(heatmapHDI)
heatmapHDI = heatmapHDI.fillna(0)
heatmapHDI = heatmapHDI.drop_duplicates()
heatmapHDI['color'] = pd.Series(["hola" for x in heatmapHDI.index], index=heatmapHDI.index)
drawBarCharReference2( '#425910',heatmapHDI, 'HDI',"Human Development Index (Wikipedia)", ['Country','HDI Index'])
draw_global_map2(['#ffffff','#000000'],heatmapHDI, ['Human Development Index Map (Wikipedia)'])