Find interesting patterns/trends through analysis & comparison of the world bank loan commitments, HDI,Freedom Index and GDP
To analyze what insight open data can give us as to how effective initiatives and funding actually is as opposed to what it’s meant to be.
An economics and development thinkTank based in Washington D.C, they analyze and keep track of the economic freedom around the world with the influential Index of Economic Freedom.
from pandas import DataFrame, Series
import pandas as pd
import os
import codecs
# Verify existence of & Read in the datasets - project and operations & Freedom Index
DATA_FILES={"projdict":"data/projects_operations_api.csv", "fredict":"data/FreedomIndex.csv"}
def file_path(key):
return os.path.join(os.pardir, DATA_FILES[key])
for file_key in DATA_FILES.keys():
abs_fname = file_path(file_key)
print abs_fname, os.path.exists(abs_fname)
../data/projects_operations_api.csv True ../data/FreedomIndex.csv True
f = codecs.open(file_path("projdict"), encoding='iso-8859-1')
initial_proj_df = pd.read_csv(f)
initial_proj_df.columns
Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location], dtype=object)
In this section, we explore the world banks commitments to Africa. By looking at the total amount loaned out to different african countries by the years.
is_africa = initial_proj_df['regionname']=='AFRICA'
initial_proj_df[is_africa]['countryname'][:5]
2 Republic of Mozambique;Republic of Mozambique 4 Republic of Mozambique;Republic of Mozambique 8 Federal Republic of Nigeria;Federal Republic o... 11 United Republic of Tanzania;United Republic of... 14 Republic of Togo;Republic of Togo Name: countryname
initial_proj_df[is_africa][['countryname', 'totalamt']][:5]
countryname | totalamt | |
---|---|---|
2 | Republic of Mozambique;Republic of Mozambique | 37;000;000 |
4 | Republic of Mozambique;Republic of Mozambique | 50;000;000 |
8 | Federal Republic of Nigeria;Federal Republic o... | 0 |
11 | United Republic of Tanzania;United Republic of... | 100;000;000 |
14 | Republic of Togo;Republic of Togo | 0 |
#The totalamt value is not properly formatted. This step cleans up the value by stripping out unnecessary characters.
initial_proj_df['totalamt'] = initial_proj_df['totalamt'].str.replace(';','')
initial_proj_df[is_africa]['totalamt'][:5]
2 37000000 4 50000000 8 0 11 100000000 14 0 Name: totalamt
initial_proj_df['totalamt'] = initial_proj_df['totalamt'].astype('float32')
sum(initial_proj_df[is_africa]['totalamt'][:5])
1.87e+08
In the next steps, we perform clean up of the data. For example, The amounts in the projects & operations dataset have comma's as the delimiters, so they need to be stripped out and the values parsed as floats.
initial_proj_df[['regionname','countryname','projectstatusdisplay','totalamt']][:2]
regionname | countryname | projectstatusdisplay | totalamt | |
---|---|---|---|---|
0 | EUROPE AND CENTRAL ASIA | Republic of Armenia;Republic of Armenia | Active | 0 |
1 | EAST ASIA AND PACIFIC | Socialist Republic of Vietnam;Socialist Republ... | Active | 156000000 |
# This step is data cleaning. Removing the semi-column from the money values.
initial_proj_df['lendprojectcost'] = initial_proj_df['lendprojectcost'].str.replace(';','')
initial_proj_df['lendprojectcost'] = initial_proj_df['lendprojectcost'].astype('float32')
initial_proj_df['ibrdcommamt'] = initial_proj_df['ibrdcommamt'].str.replace(';','')
initial_proj_df['ibrdcommamt'] = initial_proj_df['ibrdcommamt'].astype('float32')
initial_proj_df['idacommamt'] = initial_proj_df['idacommamt'].str.replace(';','')
initial_proj_df['idacommamt'] = initial_proj_df['idacommamt'].astype('float32')
initial_proj_df['grantamt'] = initial_proj_df['grantamt'].str.replace(';','')
initial_proj_df['grantamt'] = initial_proj_df['grantamt'].astype('float32')
initial_proj_df[is_africa][['countryname','project_name','boardapprovaldate','status','lendprojectcost','grantamt']][:10]
countryname | project_name | boardapprovaldate | status | lendprojectcost | grantamt | |
---|---|---|---|---|---|---|
2 | Republic of Mozambique;Republic of Mozambique | Mozambique Nutrition Additional Financing | 2013-01-24T00:00:00Z | Active | 37000000 | 0 |
4 | Republic of Mozambique;Republic of Mozambique | Mozambique Climate Change Development Policy O... | 2013-01-24T00:00:00Z | Active | 50000000 | 0 |
8 | Federal Republic of Nigeria;Federal Republic o... | Nigeria Post-Compliance I EITI | 2013-01-18T00:00:00Z | Active | 900000 | 900000 |
11 | United Republic of Tanzania;United Republic of... | TANZANIA SECOND CENTRAL TRANSP CORRIDOR PROJEC... | 2013-01-15T00:00:00Z | Active | 100000000 | 0 |
14 | Republic of Togo;Republic of Togo | Integrated Disaster and Land Management Project | 2013-01-09T00:00:00Z | Active | 7290000 | 7290000 |
17 | Africa;Africa | Nile Cooperation for Results Project | 2013-01-01T00:00:00Z | Active | 15300000 | 15300000 |
21 | Republic of Senegal;Republic of Senegal | SN- First Governance and Growth Support Project | 2012-12-20T00:00:00Z | Closed | 55000000 | 0 |
22 | Republic of Namibia;Republic of Namibia | Namibian Coast Conservation Additional Finance | 2012-12-20T00:00:00Z | Active | 7800000 | 1930000 |
23 | Burkina Faso;Burkina Faso | Third Phase Community Based Rural Development ... | 2012-12-20T00:00:00Z | Active | 86000000 | 0 |
24 | Burkina Faso;Burkina Faso | Sustainable land and forestry management Project | 2012-12-20T00:00:00Z | Active | 77410000 | 7410000 |
projcp_df = initial_proj_df.copy()
projcp_df = projcp_df.drop(['lendinginstrtype','envassesmentcategorycode','productlinetype','closingdate','url','sector2','sector3','sector4','sector5','sector','mjsector1','mjsector2','mjsector3','mjsector4','mjsector5','mjsector','theme1','theme2','theme3','theme4','theme5','financier','mjtheme2name','mjtheme3name','mjtheme4name','mjtheme5name'],axis=1)
del projcp_df['projectstatusdisplay']
projcp_df2 = projcp_df.drop(['prodline','supplementprojectflg','goal','mjtheme1name','location'], axis=1)
projcp_df2.columns
Index([id, regionname, countryname, lendinginstr, status, project_name, boardapprovaldate, board_approval_month, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, projectdoc , majorsector_percent , sector1, theme ], dtype=object)
projcp_df2[is_africa][:5]
id | regionname | countryname | lendinginstr | status | project_name | boardapprovaldate | board_approval_month | lendprojectcost | ibrdcommamt | idacommamt | totalamt | grantamt | borrower | impagency | projectdoc | majorsector_percent | sector1 | theme | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | P125477 | AFRICA | Republic of Mozambique;Republic of Mozambique | Specific Investment Loan | Active | Mozambique Nutrition Additional Financing | 2013-01-24T00:00:00Z | January | 37000000 | 0 | 37000000 | 37000000 | 0 | GOVERNMENT OF MOZAMBIQUE | MINISTRY OF HEALTH | NaN | NaN | Health!$!100!$!JA | NaN |
4 | P128434 | AFRICA | Republic of Mozambique;Republic of Mozambique | Development Policy Lending | Active | Mozambique Climate Change Development Policy O... | 2013-01-24T00:00:00Z | January | 50000000 | 0 | 50000000 | 50000000 | 0 | GOVERNMENT OF MOZAMBIQUE | MINISTRY OF PLANNING AND DEVELOPMENT | NaN | NaN | General agriculture; fishing and forestry sect... | NaN |
8 | P132807 | AFRICA | Federal Republic of Nigeria;Federal Republic o... | Technical Assistance Loan | Active | Nigeria Post-Compliance I EITI | 2013-01-18T00:00:00Z | January | 900000 | 0 | 0 | 0 | 900000 | GOVERNMENT OF NIGERIA | NEITI SECRETARIAT | NaN | NaN | Other Mining and Extractive Industries!$!50!$!LS | NaN |
11 | P124114 | AFRICA | United Republic of Tanzania;United Republic of... | Specific Investment Loan | Active | TANZANIA SECOND CENTRAL TRANSP CORRIDOR PROJEC... | 2013-01-15T00:00:00Z | January | 100000000 | 0 | 100000000 | 100000000 | 0 | MIN. OF FINANCE & ECONOMIC AFFAIRS | TANZANIA NATIONAL ROADS AGENCY | NaN | NaN | Urban Transport!$!95!$!TC | NaN |
14 | P123922 | AFRICA | Republic of Togo;Republic of Togo | Specific Investment Loan | Active | Integrated Disaster and Land Management Project | 2013-01-09T00:00:00Z | January | 7290000 | 0 | 0 | 0 | 7290000 | MINISTRY OF ENVIRONMENT AND FORESTS | THE NATIONAL PLATFORM FOR DISASTER RISK REDUCTION | NaN | NaN | General water; sanitation and flood protection... | NaN |
grouped = projcp_df2.groupby('regionname')
This section we look further into the Projects and operations data sets and try to find any interesting or surprising facts to analyze further. We also perform some basic statistics on the data such as summation, mean, standard deviation, etc
# function to calculate the total amount awarded by the worldbank per country or regional operating body
def func(x):
totalamt = x['totalamt'].sum()
return Series([totalamt] ,index=['totalamt'])
# result dataframe
result = grouped.apply(func)
#create a new column in dataframe to hold the years from the board approval date
projcp_df2['year'] = projcp_df2['boardapprovaldate'].str[:4]
projcp_df2['year'][:2]
0 2013 1 2013 Name: year
# group data by year and region name
grouped3 = projcp_df2.groupby(['regionname','year'])
# statistics on the banks lending commitments to different regions over time
grouped3['totalamt'].describe()
regionname year AFRICA 1950 count 2.000000 mean 3500000.000000 std 2121320.547267 min 2000000.000000 25% 2750000.000000 50% 3500000.000000 75% 4250000.000000 max 5000000.000000 1951 count 4.000000 mean 22875000.000000 std 16423434.091628 min 1500000.000000 25% 15375000.000000 50% 25000000.000000 75% 32500000.000000 ... SOUTH ASIA 2011 mean 1.241677e+08 std 2.437052e+08 min 0.000000e+00 25% 0.000000e+00 50% 3.364500e+07 75% 1.287500e+08 max 1.200000e+09 2012 count 6.100000e+01 mean 8.606065e+07 std 1.462807e+08 min 0.000000e+00 25% 0.000000e+00 50% 3.600000e+07 75% 1.060000e+08 max 8.400000e+08 Length: 3152
grouped4 = projcp_df2.groupby(['regionname','year','board_approval_month'])
result4 =grouped4.apply(func)
result4.unstack('regionname')[:5]
totalamt | ||||||||
---|---|---|---|---|---|---|---|---|
regionname | AFRICA | EAST ASIA AND PACIFIC | EUROPE AND CENTRAL ASIA | LATIN AMERICA AND CARIBBEAN | MIDDLE EAST AND NORTH AFRICA | OTHER | SOUTH ASIA | |
year | board_approval_month | |||||||
1947 | August | NaN | NaN | 247000000 | NaN | NaN | NaN | NaN |
May | NaN | NaN | 250000000 | NaN | NaN | NaN | NaN | |
1948 | July | NaN | NaN | 8000000 | NaN | NaN | NaN | NaN |
March | NaN | NaN | NaN | 16000000 | NaN | NaN | NaN | |
1949 | August | NaN | NaN | NaN | 5000000 | NaN | NaN | 34000000 |
result5 = grouped3.apply(func).unstack('regionname').fillna(0)
result5[:5]
totalamt | |||||||
---|---|---|---|---|---|---|---|
regionname | AFRICA | EAST ASIA AND PACIFIC | EUROPE AND CENTRAL ASIA | LATIN AMERICA AND CARIBBEAN | MIDDLE EAST AND NORTH AFRICA | OTHER | SOUTH ASIA |
year | |||||||
1947 | 0 | 0 | 497000000 | 0 | 0 | 0 | 0 |
1948 | 0 | 0 | 8000000 | 16000000 | 0 | 0 | 0 |
1949 | 0 | 0 | 37500000 | 126600000 | 0 | 0 | 44000000 |
1950 | 7000000 | 125400000 | 25400000 | 90100000 | 12800000 | 0 | 18500000 |
1951 | 91500000 | 0 | 71500000 | 45400000 | 0 | 0 | 0 |
# python-us-cpi is a tool for parsing the latest US Consumer Price Index and also provides an inflation calculator api.
#We'll be using this api to calculate the loan commitments from other years into today's dollars for better comparision.
from uscpi import UsCpi
cpi = UsCpi() # downloads the latest CPI data
# $100 in 2012 is worth how much in 1980?
cpi.value_with_inflation(100, 2012, 1980)
35.89
projcpi = projcp_df2[['regionname','countryname','project_name','totalamt','grantamt','sector1','year']].copy()
# Function used to convert monetary values to today's value from any year less than 2013 using the cpi api
def fun2(y):
totalamts = y['totalamt']
year = int(y['year'])
regionname =y['regionname']
countryname = y['countryname']
project_name = y['project_name']
grantamt = y['grantamt']
sector1 = y['sector1']
boolVal = 1914 <= year <= 2013
if(boolVal):
totalamts = cpi.value_with_inflation(totalamts,year,2013)
return Series([regionname,countryname,project_name,totalamts,grantamt,sector1,year],index=['regionname','countryname','project_name','totalamt','grantamt','sector1','year'])
resultcpi = projcpi.fillna(0).apply(fun2, axis=1)
/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.10.1-py2.7-macosx-10.5-i386.egg/pandas/core/frame.py:3576: FutureWarning: rename with inplace=True will return None from pandas 0.11 onward " from pandas 0.11 onward", FutureWarning)
# Data cleaning: Removing un-wanted data from the sector1 string
resultcpi['sectorMain'] = resultcpi['sector1'].str.split("!").str[0]
resultcpi['country'] = resultcpi['countryname'].str.split(";").str[0]
# validate that the data is cleaned.
resultcpi[:4]
regionname | countryname | project_name | totalamt | grantamt | sector1 | year | sectorMain | country | |
---|---|---|---|---|---|---|---|---|---|
0 | EUROPE AND CENTRAL ASIA | Republic of Armenia;Republic of Armenia | JSDF Strengthening the Livelihoods and Voice o... | 0 | 2670000 | Other social services!$!100!$!JB | 2013 | Other social services | Republic of Armenia |
1 | EAST ASIA AND PACIFIC | Socialist Republic of Vietnam;Socialist Republ... | Mekong Delta Transport Infrastructure Developm... | 156000000 | 0 | Rural and Inter-Urban Roads and Highways!$!55!... | 2013 | Rural and Inter-Urban Roads and Highways | Socialist Republic of Vietnam |
2 | AFRICA | Republic of Mozambique;Republic of Mozambique | Mozambique Nutrition Additional Financing | 37000000 | 0 | Health!$!100!$!JA | 2013 | Health | Republic of Mozambique |
3 | EUROPE AND CENTRAL ASIA | Republic of Moldova;Republic of Moldova | Moldova Education Reform Project | 40000000 | 0 | General education sector!$!100!$!EZ | 2013 | General education sector | Republic of Moldova |
resultcpi['year'] = resultcpi['year'].astype(int)
# The dataset is big, which makes it very difficult to analyze. This next step we construct a boolean to extract only those items that have been funded from 200 - 2013
is_bv = (resultcpi['year'] >= 2000) & (resultcpi['year'] <= 2013)
resultcpi2 = resultcpi[is_bv]
# verify that data is formatted in the way we want to analyze it.
resultcpi2[:4]
regionname | countryname | project_name | totalamt | grantamt | sector1 | year | sectorMain | country | |
---|---|---|---|---|---|---|---|---|---|
0 | EUROPE AND CENTRAL ASIA | Republic of Armenia;Republic of Armenia | JSDF Strengthening the Livelihoods and Voice o... | 0 | 2670000 | Other social services!$!100!$!JB | 2013 | Other social services | Republic of Armenia |
1 | EAST ASIA AND PACIFIC | Socialist Republic of Vietnam;Socialist Republ... | Mekong Delta Transport Infrastructure Developm... | 156000000 | 0 | Rural and Inter-Urban Roads and Highways!$!55!... | 2013 | Rural and Inter-Urban Roads and Highways | Socialist Republic of Vietnam |
2 | AFRICA | Republic of Mozambique;Republic of Mozambique | Mozambique Nutrition Additional Financing | 37000000 | 0 | Health!$!100!$!JA | 2013 | Health | Republic of Mozambique |
3 | EUROPE AND CENTRAL ASIA | Republic of Moldova;Republic of Moldova | Moldova Education Reform Project | 40000000 | 0 | General education sector!$!100!$!EZ | 2013 | General education sector | Republic of Moldova |
#In the line below we are finding the total bank commitments to Africa over a period from 2000 - 2009.
# We use the cpi function to calculate the inflation and CPI on all the loans less than 2013
ggroup_africa = resultcpi2[resultcpi2['regionname']=='AFRICA'].groupby('year').apply(func)
ggroup_africa.plot(kind='bar', title='Bank lending commitments to Africa in year 2000 - 2013'); plt.tight_layout()
#In the line below we are finding the total bank commitments per region over a period from 2000 - 2009.
# We use the cpi function to calculate the inflation and CPI on all the loans less than 2013
amtByRegion =resultcpi2.groupby(['regionname','year']).apply(func).unstack('regionname')
amtByRegion[:2]
totalamt | |||||||
---|---|---|---|---|---|---|---|
regionname | AFRICA | EAST ASIA AND PACIFIC | EUROPE AND CENTRAL ASIA | LATIN AMERICA AND CARIBBEAN | MIDDLE EAST AND NORTH AFRICA | OTHER | SOUTH ASIA |
year | |||||||
2000 | 4.695581e+09 | 3.100790e+09 | 4.469655e+09 | 5.950326e+09 | 1.367870e+09 | 0 | 3.328411e+09 |
2001 | 4.775420e+09 | 2.858354e+09 | 5.639439e+09 | 6.512251e+09 | 9.593054e+08 | 0 | 3.996901e+09 |
amtByRegion.plot(kind='bar',figsize=(16,8), title='Lendig commitments by the Bank from 1947 - 2013'); plt.legend(loc='best')
<matplotlib.legend.Legend at 0x216a1030>
# count the number of world bank projects from 2000 - 2013 per country
numOfproj_by_country = resultcpi2.groupby('country').size().order(na_last=True, ascending=False, kind='mergesort')
numOfproj_by_country[:5]
country Republic of India 211 People's Republic of China 205 Federative Republic of Brazil 187 Republic of Indonesia 177 Africa 167
The top 3 borrowers from the world bank are part of the BRICS. We are interested in analyzing patterns of borrowing between the BRIC Nations, their freedom index, Human Development Index and GDP. The next steps analyze the lending of the world bank to these nations.
# From above, I observed that the top most funded UN nations are BRICS, so the list below is created to filter out the BRICS for further observation and analysis
listBRICS = ['Federative Republic of Brazil','Russian Federation','Republic of India','People\'s Republic of China','Republic of South Africa']
brics_nations = resultcpi2[resultcpi2['country'].isin(listBRICS)].groupby(['country','year']).size()
# In the Graph, we look at the number of projects funded by the world bank per country per year since 2000 - 2013
brics_nations.unstack('country').fillna(0).plot(subplots=True, figsize=(8, 8),kind='bar'); plt.legend(loc='best');plt.tight_layout()
#rpt[rpt['STK_ID'].isin(stk_list)]
df_of_BRICS = resultcpi2[resultcpi2['country'].isin(listBRICS)].groupby(['country','sector3']).size().order(na_last=True, ascending=False, kind='mergesort')
df_of_BRICS.unstack('country').fillna(0)
country | Federative Republic of Brazil | People's Republic of China | Republic of India | Republic of South Africa | Russian Federation |
---|---|---|---|---|---|
sector3 | |||||
Agricultural extension and research | 1 | 2 | 7 | 0 | 0 |
Agro-industry | 0 | 1 | 0 | 0 | 0 |
Agro-industry; marketing; and trade | 2 | 0 | 6 | 0 | 0 |
Animal production | 0 | 4 | 1 | 0 | 0 |
Banking | 3 | 2 | 2 | 0 | 0 |
Capital markets | 0 | 2 | 0 | 0 | 0 |
Central government administration | 12 | 4 | 6 | 1 | 9 |
Compulsory health finance | 0 | 1 | 0 | 0 | 0 |
Compulsory pension and unemployment insurance | 3 | 2 | 0 | 0 | 0 |
Crops | 0 | 0 | 1 | 0 | 0 |
Energy efficiency in Heat and Power | 2 | 14 | 7 | 0 | 2 |
Flood protection | 2 | 6 | 2 | 0 | 0 |
Forestry | 10 | 9 | 4 | 1 | 2 |
General agriculture; fishing and forestry sector | 33 | 7 | 16 | 4 | 0 |
General education sector | 2 | 0 | 0 | 0 | 1 |
General energy sector | 1 | 2 | 0 | 1 | 0 |
General finance sector | 1 | 2 | 3 | 0 | 0 |
General industry and trade sector | 2 | 1 | 0 | 0 | 1 |
General public administration sector | 4 | 3 | 3 | 0 | 0 |
General transportation sector | 4 | 2 | 2 | 0 | 0 |
General water; sanitation and flood protection sector | 4 | 5 | 6 | 0 | 0 |
Health | 8 | 5 | 16 | 0 | 2 |
Housing construction | 1 | 2 | 3 | 0 | 0 |
Housing finance | 1 | 0 | 0 | 0 | 0 |
Hydropower | 0 | 0 | 2 | 0 | 0 |
Irrigation and drainage | 0 | 8 | 14 | 0 | 0 |
Law and justice | 0 | 0 | 1 | 0 | 1 |
Micro- and SME finance | 3 | 0 | 1 | 0 | 0 |
Microfinance | 0 | 0 | 1 | 0 | 0 |
Non-compulsory health finance | 0 | 1 | 0 | 0 | 0 |
Oil and gas | 1 | 0 | 0 | 0 | 1 |
Other Renewable Energy | 0 | 3 | 1 | 2 | 0 |
Other industry | 1 | 5 | 5 | 0 | 1 |
Other social services | 11 | 2 | 5 | 0 | 2 |
Payments; settlements; and remittance systems | 0 | 1 | 0 | 0 | 0 |
Petrochemicals and fertilizers | 1 | 0 | 0 | 0 | 0 |
Ports; waterways and shipping | 1 | 5 | 0 | 0 | 0 |
Power | 2 | 5 | 6 | 1 | 0 |
Pre-primary education | 1 | 0 | 0 | 0 | 0 |
Primary education | 4 | 1 | 5 | 0 | 0 |
Public administration- Agriculture; fishing and forestry | 0 | 1 | 1 | 0 | 0 |
Public administration- Education | 1 | 0 | 0 | 0 | 0 |
Public administration- Energy and mining | 1 | 2 | 0 | 0 | 0 |
Public administration- Financial Sector | 0 | 0 | 0 | 0 | 1 |
Public administration- Health | 1 | 0 | 1 | 0 | 0 |
Public administration- Transportation | 2 | 0 | 2 | 0 | 0 |
Public administration- Water; sanitation and flood protection | 3 | 2 | 1 | 0 | 0 |
Railways | 2 | 8 | 3 | 0 | 0 |
Renewable energy | 2 | 6 | 5 | 3 | 0 |
Roads and highways | 4 | 11 | 9 | 0 | 0 |
Rural and Inter-Urban Roads and Highways | 7 | 11 | 17 | 0 | 0 |
SME Finance | 0 | 1 | 2 | 0 | 0 |
Sanitation | 1 | 2 | 1 | 0 | 0 |
Secondary education | 0 | 0 | 1 | 0 | 0 |
Sewerage | 2 | 11 | 0 | 0 | 0 |
Solid waste management | 4 | 5 | 0 | 0 | 0 |
Sub-national government administration | 18 | 3 | 23 | 0 | 4 |
Telecommunications | 0 | 0 | 1 | 0 | 0 |
Tertiary education | 0 | 0 | 2 | 0 | 0 |
Thermal Power Generation | 0 | 1 | 0 | 0 | 0 |
Transmission and Distribution of Electricity | 0 | 0 | 4 | 1 | 0 |
Urban Transport | 11 | 15 | 1 | 0 | 0 |
Vocational training | 0 | 4 | 2 | 0 | 0 |
Wastewater Collection and Transportation | 2 | 4 | 0 | 0 | 0 |
Wastewater Treatment and Disposal | 1 | 2 | 0 | 0 | 0 |
Water supply | 4 | 9 | 9 | 0 | 1 |
# We import the Freedom index csv for comparison analysis
f = codecs.open(file_path("fredict"), encoding='iso-8859-1')
free_df = pd.read_csv(f)
free_df[:2]
name | index year | overall score | property rights | freedom from corruption | fiscal freedom | government spending | business freedom | labor freedom | monetary freedom | trade freedom | investment freedom | financial freedom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Afghanistan | 2013 | N/A | N/A | 15 | N/A | 83.2 | 59.7 | 75.8 | 69.5 | N/A | 65 | N/A |
1 | Albania | 2013 | 65.2 | 30 | 31 | 92.6 | 75.1 | 81 | 49 | 78.4 | 79.8 | 65 | 70 |
# similar to the projects and operations dataset, I restrict the analysis to only data from 2000 - 2009
free_df2 = free_df[free_df['index year']>=2000].copy()
free_df2.columns
Index([name, index year, overall score, property rights, freedom from corruption, fiscal freedom, government spending, business freedom, labor freedom, monetary freedom, trade freedom, investment freedom, financial freedom], dtype=object)
# I extract the BRICS to further observe them
free_df2 = free_df2[free_df2['name'].isin(['China', 'India', 'Russia', 'Brazil', 'South Africa'])]
free_df2[:5]
name | index year | overall score | property rights | freedom from corruption | fiscal freedom | government spending | business freedom | labor freedom | monetary freedom | trade freedom | investment freedom | financial freedom | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
20 | Brazil | 2013 | 57.7 | 50 | 38 | 70.3 | 54.8 | 53 | 57.2 | 74.4 | 69.7 | 50 | 60 |
32 | China | 2013 | 51.9 | 20 | 36 | 70.2 | 83.3 | 48 | 62.6 | 71.6 | 72 | 25 | 30 |
70 | India | 2013 | 55.2 | 50 | 31 | 78.3 | 77.9 | 37.3 | 73.6 | 65.3 | 63.6 | 35 | 40 |
131 | Russia | 2013 | 51.1 | 25 | 24 | 86.9 | 54.4 | 69.2 | 52.6 | 66.7 | 77.4 | 25 | 30 |
147 | South Africa | 2013 | 61.8 | 50 | 41 | 70.5 | 69.2 | 74.7 | 55.6 | 75.8 | 76.3 | 45 | 60 |
free_df3 = free_df2[['name','index year','overall score']].copy()
free_df3[:2]
name | index year | overall score | |
---|---|---|---|
20 | Brazil | 2013 | 57.7 |
32 | China | 2013 | 51.9 |
free_df3['overall score'] = free_df3['overall score'].astype(float)
free_df3.pivot_table(['overall score'], rows=['index year'], cols='name').plot(kind='line', title='freedom Index per BRICS country', figsize=(10,10))
<matplotlib.axes.AxesSubplot at 0x24b8cbb0>
free_df3.pivot_table(['overall score'], rows=['index year'], cols='name').plot(subplots=True, figsize=(8, 8)); plt.legend(loc='best');plt.tight_layout();plt.ylabel('Freedom Index');
# In the Graph, we look at the number of projects funded by the world bank per country per year since 2000 - 2013
brics_nations.unstack('country').fillna(0).plot(subplots=True, figsize=(8, 8),kind='bar'); plt.legend(loc='best');plt.tight_layout()
Freedom versus funding
f = codecs.open(file_path("fredict"), encoding='iso-8859-1')
free_df = pd.read_csv(f)
#because I'm looking at the contribution of funds over a period of time, I want to look at the current
#Freedom Index for these countries to make an anlysis of their current state
free_df2 = free_df[free_df['index year']==2013].copy()
#for simplicity, let's ignore those who have not been scored as well
free_df2 = free_df2[free_df2['overall score']!='N/A'].copy()
free_df2.columns
Index([name, index year, overall score, property rights, freedom from corruption, fiscal freedom, government spending, business freedom, labor freedom, monetary freedom, trade freedom, investment freedom, financial freedom], dtype=object)
low_freedom = free_df2.sort(['overall score'], ascending=True)
low_freedom = low_freedom[:10]
low_freedom[['name', 'overall score']]
name | overall score | |
---|---|---|
118 | North Korea | 1.5 |
38 | Cuba | 28.5 |
184 | Zimbabwe | 28.6 |
180 | Venezuela | 36.1 |
50 | Eritrea | 36.3 |
23 | Burma | 39.2 |
41 | Democratic Republic of Congo | 39.6 |
49 | Equatorial Guinea | 42.3 |
171 | Turkmenistan | 42.6 |
72 | Iran | 43.2 |
high_freedom = free_df2.sort(['overall score'], ascending=False)
high_freedom = high_freedom[:10]
high_freedom[['name', 'overall score']]
name | overall score | |
---|---|---|
67 | Hong Kong | 89.3 |
142 | Singapore | 88 |
6 | Australia | 82.6 |
114 | New Zealand | 81.4 |
155 | Switzerland | 81 |
27 | Canada | 79.4 |
31 | Chile | 79 |
104 | Mauritius | 76.9 |
42 | Denmark | 76.1 |
176 | United States | 76 |
#high corruption
high_corruption = free_df2.sort(['freedom from corruption'], ascending=True)
high_corruption = high_corruption[:10]
high_corruption[['name', 'freedom from corruption']]
name | freedom from corruption | |
---|---|---|
14 | Belize | 0 |
118 | North Korea | 10 |
23 | Burma | 15 |
171 | Turkmenistan | 16 |
178 | Uzbekistan | 16 |
65 | Haiti | 18 |
24 | Burundi | 19 |
49 | Equatorial Guinea | 19 |
180 | Venezuela | 19 |
3 | Angola | 20 |
#low corruption
low_corruption = free_df2.sort(['freedom from corruption'], ascending=False)
low_corruption = low_corruption[:10]
low_corruption[['name', 'freedom from corruption']]
name | freedom from corruption | |
---|---|---|
114 | New Zealand | 95 |
42 | Denmark | 94 |
54 | Finland | 94 |
154 | Sweden | 93 |
142 | Singapore | 92 |
119 | Norway | 90 |
163 | The Netherlands | 89 |
6 | Australia | 88 |
155 | Switzerland | 88 |
27 | Canada | 87 |
numOfproj_by_country[:10]
country Republic of India 211 People's Republic of China 205 Federative Republic of Brazil 187 Republic of Indonesia 177 Africa 167 Socialist Republic of Vietnam 158 Islamic Republic of Pakistan 104 United Mexican States 96 Islamic State of Afghanistan 93 People's Republic of Bangladesh 92
#recall the resultcpi looking at the projects funded, converted to 2013 dollars
#let's sort by country
country_cpi = resultcpi.sort(column='country', ascending=True)
country_cpi[['country', 'totalamt', 'grantamt', 'year']][:2]
/Library/Frameworks/Python.framework/Versions/7.3/lib/python2.7/site-packages/pandas-0.10.1-py2.7-macosx-10.5-i386.egg/pandas/core/frame.py:3112: FutureWarning: column is deprecated, use columns warnings.warn("column is deprecated, use columns", FutureWarning)
country | totalamt | grantamt | year | |
---|---|---|---|---|
14536 | NaN | 10.600000 | 0 | 0 |
14474 | NaN | 57.299999 | 0 | 0 |
#there are a few problems with this data-- First, there are continents included in the countryname:
country_cpi= country_cpi.dropna()
#country_cpi= country_cpi[((country_cpi.country !='Africa')
# &(country_cpi.country !='Central America')
# &(country_cpi.country !='Latin America')
# &(country_cpi.country !='Europe')
# &(country_cpi.country !='East Asia and Pacific')
# &(country_cpi.country !='Europe and Central Asia')
# &(country_cpi.country !='World')
# &(country_cpi.country !='Asia')
# &(country_cpi.country !='Middle East and North Africa')
# &(country_cpi.country !='Africa')
# &(country_cpi.country !='South Eastern Europe and Balkans'))]
#Because the naming conventions of the Freedom Index and World Bank, we will have to manually input the countries we are looking for to compare their freedom index with their world bank funding.
#Because the naming conventions of the Freedom Index and World Bank, we will have to manually input the countries we are looking for to compare their freedom index with their world bank funding.
low_Freedom_list= ['Belize','Turkmenistan','Republic of Zimbabwe','Republic of Uzbekistan',
'Republic of Haiti', 'Republic of Burundi', 'Republic of Equatorial Guinea',
'People\'s Republic of Angola', 'Republica Bolivariana de Venezuela']
high_Freedom_list= ['Kingdom of Norway', 'New Zealand', 'Kingdom of Denmark', 'Republic of Finland',
'Republic of Sweden', 'Kingdom of The Netherlands', 'Common of Australia']
#now we can see how much money was committed to each country in the Freedom_list
low_Freedom_nations = country_cpi[country_cpi['country'].isin(low_Freedom_list)].groupby(['country']).size()
low_Freedom_nations
country Belize 18 People's Republic of Angola 29 Republic of Burundi 90 Republic of Equatorial Guinea 11 Republic of Haiti 86 Republic of Uzbekistan 36 Republic of Zimbabwe 54 Republica Bolivariana de Venezuela 53 Turkmenistan 9
low_Freedom_nations.plot(kind='bar', title='Lending to Countries with low Freedom Index'); plt.tight_layout()
From this point, we are interested in expanding on the previous analysis, by retriving information from wikipedia an others.
¶import pandas as pd
import wikipydia as wk
import mwparserfromhell
from wikitools import wiki
from wikitools import api
from wikitools import category
from wikitools import page
import itertools
import re
wikisite = "http://en.wikipedia.org/w/api.php"
wikiObject = wiki.Wiki(wikisite)
projectsAPI = pd.read_csv('../data/projects_operations_api.csv')
wikipediadf = pd.read_csv('../data/matchcountries.csv')
# some cleaning on the datasets
wikipediadf.index =wikipediadf['countryname']
projectsAPI['countryname'] = [str(country).split(";")[0] for country in projectsAPI['countryname']]
#print matchNames.columns
#print projectsAPI.columns
projects = pd.merge(projectsAPI,wikipediadf, on='countryname', how = 'left')
projects = projects[projects['countryname'].map(type) != type(0.0)]
projectsAPI = projectsAPI[projectsAPI['countryname'].map(type) != type(0.0)]
projects['totalamt'] = projects['totalamt'].str.replace(';','')
projects['totalamt'] = projects['totalamt'].astype('float32')
print projects.columns
projects['year'] = [str(x)[0:4] for x in projects['boardapprovaldate']]
projects[projects.year == 'nan'] =[str(x)[0:4] for x in projects[projects.year == 'nan']['closingdate']]
Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location, wikiname, type, checktype, mapname], dtype=object)
import matplotlib.pyplot as plt
import matplotlib.colors as col
def color_variant(hex_color, brightness_offset=1):
if len(hex_color) != 7:
raise Exception("Passed %s into color_variant(), needs to be in #87c95f format." % hex_color)
rgb_hex = [hex_color[x:x+2] for x in [1, 3, 5]]
new_rgb_int = [int(hex_value, 16) + brightness_offset for hex_value in rgb_hex]
new_rgb_int = [min([255, max([0, i])]) for i in new_rgb_int] # make sure new values are between 0 and 255
# hex() produces "0x88", we want just "88"
hexcolor = "#"
for i in new_rgb_int:
if(i<16):
hexcolor+="0"+str(hex(i)[2:])
else:
hexcolor+=str(hex(i)[2:])
return hexcolor
def drawBarCharReference(Color,targetlist, field, title, labels):
fig = plt.figure(num=None, figsize=(24, 8), dpi=700, facecolor='w', edgecolor='k')
ax = fig.add_subplot(111)
ColorBase = Color
changeQuantile = True
changeRange = 0.10
i = 0
for x in targetlist.sort(columns=field,ascending=True).index:
if i/float(len(targetlist.index)) > changeRange:
ColorBase = color_variant(ColorBase,20)
changeRange = changeRange + 0.10
targetlist['color'][x] = ColorBase
#print (type(targetlist[field][x]))
ax.bar(i,float(targetlist[field][x]),1,color=matplotlib.colors.colorConverter.to_rgb(ColorBase))
i+=1
ax.set_xticklabels( ([x[1] for x in targetlist.sort(columns=field,ascending=True).index]) )
#plt.subplots_adjust(bottom=1, left=.01, right=.99, top=.90, hspace=.35)
plt.xticks(np.arange(0.5, i+1, 1))
plt.setp(ax.get_xticklabels(), fontsize=9, rotation='vertical')
plt.setp(ax.get_yticklabels(), fontsize=10)
plt.title(title)
plt.xlabel(labels[0],fontsize=18)
plt.ylabel(labels[1],fontsize=18)
plt.show()
# http://www.geophysique.be/2013/02/12/matplotlib-basemap-tutorial-10-shapefiles-unleached-continued/
#
# BaseMap example by geophysique.be
# tutorial 10
import os
import inspect
import numpy as np
import matplotlib.pyplot as plt
from itertools import islice, izip
from mpl_toolkits.basemap import Basemap
def zip_filter_by_state(records, shapes, included_states=None):
# by default, no filtering
# included_states is a list of states fips prefixes
for (record, state) in izip(records, shapes):
if record[1] in included_states:
yield (record, state)
def draw_global_map(colors, indexlist, titles):
### PARAMETERS FOR MATPLOTLIB :
import matplotlib as mpl
mpl.rcParams['font.size'] = 14.
mpl.rcParams['font.family'] = 'Serif'
mpl.rcParams['axes.labelsize'] = 8.
mpl.rcParams['xtick.labelsize'] = 40.
mpl.rcParams['ytick.labelsize'] = 20.
fig = plt.figure(figsize=(11.7,8.3))
#Custom adjust of the subplots
plt.subplots_adjust(left=0.05,right=0.95,top=0.90,bottom=0.05,wspace=0.15,hspace=0.05)
ax = plt.subplot(111)
#Let's create a basemap of USA
x1 = -179.
x2 = 179.
y1 = -60.
y2 = 80.
i=0
#colors = ['#8C040A','#9A040C','#A8050E','#C40813','#D20915','#DF0A17','#ED0C19','#FC0D1B']
m = Basemap(resolution='i',projection='merc', llcrnrlat=y1,urcrnrlat=y2,llcrnrlon=x1,urcrnrlon=x2,lat_ts=(y1+y2)/2)
m.drawcountries(linewidth=0.5)
m.drawcoastlines(linewidth=0.5)
m.drawparallels(np.arange(y1,y2,20.),labels=[1,0,0,0],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw parallels
m.drawmeridians(np.arange(x1,x2,20.),labels=[0,0,0,1],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw meridians
from matplotlib.collections import LineCollection
from matplotlib import cm
import shapefile
basemap_data_dir = os.path.join(os.path.dirname(inspect.getfile(Basemap)), "data")
# this is my git clone of https://github.com/matplotlib/basemap --> these files will be in the PiCloud basemap_data_dir
if os.path.exists(os.path.join(basemap_data_dir,"UScounties.shp")):
shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
else:
# put in your path
#shpf = shapefile.Reader("/Users/raymondyee/Dropbox/WwoD13/tl_2012_us_county")
shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
shapes = shpf.shapes()
records = shpf.records()
#print cm.colors.ColorConverter.to_rgba('#eeefff')
#random_number = 38*145*155
# show only CA and AK (for example)
for record, shape in zip(records, shapes):
lons,lats = zip(*shape.points)
data = np.array(m(lons, lats)).T
if len(shape.parts) == 1:
segs = [data,]
else:
segs = []
for i in range(1,len(shape.parts)):
index = shape.parts[i-1]
index2 = shape.parts[i]
segs.append(data[index:index2])
segs.append(data[index2:])
lines = LineCollection(segs,antialiaseds=(1,))
#cm.jet(random_number)
lines.set_facecolors(colors[0])
lines.set_edgecolors(colors[1])
lines.set_linewidth(0.1)
ax.add_collection(lines)
for record, shape in zip_filter_by_state(records, shapes, [x[1] for x in indexlist.index]):
lons,lats = zip(*shape.points)
data = np.array(m(lons, lats)).T
if len(shape.parts) == 1:
segs = [data,]
else:
segs = []
for i in range(1,len(shape.parts)):
index = shape.parts[i-1]
index2 = shape.parts[i]
segs.append(data[index:index2])
segs.append(data[index2:])
lines = LineCollection(segs,antialiaseds=(1,))
#cm.jet(random_number)
i=i+1
x_color=None
for w in heatmapfounding.index:
if record[1] in w[1]:
x_color = w
break
lines.set_facecolors(indexlist['color'][x_color])
lines.set_edgecolors(indexlist['color'][x_color])
lines.set_linewidth(0.1)
ax.add_collection(lines)
plt.title(titles[0])
plt.savefig('tutorial10.png',dpi=300)
plt.show()
#draw_global_map(['#3C989E','#424242'],heatmapfounding, ['Total World Bank Lending Commitments Accumulated 2001-2013'])
heatmapfounding = pd.DataFrame(projects[projects.type == 'Country'], columns=['wikiname','mapname','totalamt','year'])
heatmapfounding = pd.DataFrame(heatmapfounding[heatmapfounding.year>='2001'], columns=['wikiname','mapname','totalamt','year'])
heatmapfounding = heatmapfounding.groupby(['wikiname','mapname']).sum()
heatmapfounding['color'] = pd.Series(["hola" for x in heatmapfounding.index], index=heatmapfounding.index)
drawBarCharReference( '#C73F2A',heatmapfounding, "totalamt","Total World Bank Lending Commitments Accumulated 2001-2013",['Country','US$'])
draw_global_map(['#ffffff','#000000'],heatmapfounding, ['Total World Bank Lending Commitments Accumulated 2001-2013'])
def cleanFloatnumber(x):
if type(x) is float:
return float(x)
elif type(x) is str:
if len(x) ==0:
return None
x=re.sub('<!--.*?-->','',x)
x=re.sub('<*?>.*?<*?>','',x)
x=x.strip()
delimiterRegex = re.compile(r'[-+]?[0-9]*\.?[0-9]+(?:[eE][-+]?[0-9]+)?')
Numbers = re.findall(delimiterRegex,x)
if len(Numbers)>0:
return float(Numbers[0])
else:
return None
else:
return None
def cleanIntNumber(x):
if type(x) is float:
return float(x)
elif type(x) is str:
if len(x) ==0:
return None
x=re.sub('<!--.*?-->','',x)
x=re.sub('<*?>.*?<*?>','',x)
x=re.sub(',','',x)
x=x.strip()
delimiterRegex = re.compile(r'[0-9]+')
Numbers = re.findall(delimiterRegex,x)
if len(Numbers)>0:
return float(Numbers[0])
else:
return None
else:
return None
def get_infobox_from_wikipedia(countryname):
#print "Checking: "+str(countryname)+"__"
country_found = False
hdi = None
gini = None
GDP = None
GDP_nominal_per_capita = None
population = None
if str(countryname).strip() == "" or countryname is None or str(countryname).strip()=='nan':
return hdi,gini,GDP,GDP_nominal_per_capita, population
try:
wikipage = page.Page(wikiObject,title=countryname)
except Exception as inst:
print "No results from Wikipedia: "+str(countryname)
return hdi,gini,GDP,GDP_nominal_per_capita, population
wikiraw = wikipage.getWikiText()
wikiraw = wikiraw.decode('UTF-8')
parsedWikiText = mwparserfromhell.parse(wikiraw)
for x in parsedWikiText.nodes:
if "template" in str(type(x)) and "Infobox country" in str(x.name):
country_found = True
if x.has_param('population_census'):
population = cleanIntNumber(str(x.get('population_census').value))
if population is None:
if x.has_param('population_estimate'):
population = cleanIntNumber(str(x.get('population_estimate').value))
if x.has_param('HDI'):
hdi = cleanFloatnumber(str(x.get('HDI').value))
if x.has_param('Gini'):
gini = cleanFloatnumber(str(x.get('Gini').value))
if x.has_param('GDP'):
GDP = x.get('GDP').value
if x.has_param('GDP_nominal_per_capita'):
GDP_nominal_per_capita = str(x.get('GDP_nominal_per_capita').value)
break
if country_found == False:
print "No Infobox: "+str(countryname)
return hdi,gini,GDP,GDP_nominal_per_capita,population
wikipediadf["HDI"], wikipediadf["gini"],wikipediadf['GDP'],wikipediadf['GDP_nominal_per_capita'],wikipediadf['population'] = zip(*wikipediadf['wikiname'].map(get_infobox_from_wikipedia))
#pp = pd.DataFrame(zip(*wikipediadf[wikipediadf.wikiname == "Guinea"]['wikiname'].map(get_infobox_from_wikipedia)))
#print pp[:]
# It was not possible to process this data from wikipedia, so I decided to filter it (Ignacio)
for i in wikipediadf[wikipediadf.type == 'Country'].index:
typeFound = type(wikipediadf['population'][i])
if typeFound is not float and typeFound is not None:
print "deleted"
wikipediadf=wikipediadf.drop([i])
break
for i in wikipediadf[wikipediadf.type == 'Country'].index:
typeFound = type(wikipediadf['GDP_nominal_per_capita'][i])
if typeFound is not float and typeFound is not None:
print "deleted"
wikipediadf=wikipediadf.drop([i])
break
deleted deleted
projects = pd.merge(projectsAPI,wikipediadf, on='countryname', how = 'left')
projects = projects[projects['countryname'].map(type) != type(0.0)]
projectsAPI = projectsAPI[projectsAPI['countryname'].map(type) != type(0.0)]
projects['totalamt'] = projects['totalamt'].str.replace(';','')
projects['totalamt'] = projects['totalamt'].astype('float32')
print projects.columns
projects['year'] = [str(x)[0:4] for x in projects['boardapprovaldate']]
projects[projects.year == 'nan'] =[str(x)[0:4] for x in projects[projects.year == 'nan']['closingdate']]
Index([id, regionname, countryname, prodline, lendinginstr, lendinginstrtype, envassesmentcategorycode, supplementprojectflg, productlinetype, projectstatusdisplay, status, project_name, boardapprovaldate, board_approval_month, closingdate, lendprojectcost, ibrdcommamt, idacommamt, totalamt, grantamt, borrower, impagency, url, projectdoc , majorsector_percent , sector1, sector2, sector3, sector4, sector5, sector, mjsector1, mjsector2, mjsector3, mjsector4, mjsector5, mjsector, theme1, theme2, theme3, theme4, theme5, theme , goal, financier, mjtheme1name, mjtheme2name, mjtheme3name, mjtheme4name, mjtheme5name, location, wikiname, type, checktype, mapname, HDI, gini, GDP, GDP_nominal_per_capita, population], dtype=object)
def drawBarCharReference2(Color,targetlist, field, title,labels):
fig = plt.figure(num=None, figsize=(24, 8), dpi=700, facecolor='w', edgecolor='k')
ax = fig.add_subplot(111)
ColorBase = Color
changeQuantile = True
changeRange = 0.10
i = 0
for x in targetlist.sort(columns=field,ascending=True).index:
if i/float(len(targetlist.index)) > changeRange:
ColorBase = color_variant(ColorBase,20)
changeRange = changeRange + 0.10
targetlist['color'][x] = ColorBase
#print (type(targetlist[field][x]))
ax.bar(i,float(targetlist[field][x]),1,color=matplotlib.colors.colorConverter.to_rgb(ColorBase))
i+=1
ax.set_xticklabels( ([targetlist['mapname'][x] for x in targetlist.sort(columns=field,ascending=True).index]) )
#plt.subplots_adjust(bottom=1, left=.01, right=.99, top=.90, hspace=.35)
plt.xticks(np.arange(0.5, i+1, 1))
plt.setp(ax.get_xticklabels(), fontsize=9, rotation='vertical')
plt.setp(ax.get_yticklabels(), fontsize=10)
plt.title(title)
plt.xlabel(labels[0], fontsize=18)
plt.ylabel(labels[1], fontsize=18)
plt.show()
def zip_filter_by_state2(records, shapes, included_states=None):
# by default, no filtering
# included_states is a list of states fips prefixes
for (record, state) in izip(records, shapes):
if record[1] in included_states:
yield (record, state)
def draw_global_map2(colors, indexlist, titles):
### PARAMETERS FOR MATPLOTLIB :
import matplotlib as mpl
mpl.rcParams['font.size'] = 14.
mpl.rcParams['font.family'] = 'Serif'
mpl.rcParams['axes.labelsize'] = 8.
mpl.rcParams['xtick.labelsize'] = 40.
mpl.rcParams['ytick.labelsize'] = 20.
fig = plt.figure(figsize=(11.7,8.3))
#Custom adjust of the subplots
plt.subplots_adjust(left=0.05,right=0.95,top=0.90,bottom=0.05,wspace=0.15,hspace=0.05)
ax = plt.subplot(111)
#Let's create a basemap of USA
x1 = -179.
x2 = 179.
y1 = -60.
y2 = 80.
i=0
#colors = ['#8C040A','#9A040C','#A8050E','#C40813','#D20915','#DF0A17','#ED0C19','#FC0D1B']
m = Basemap(resolution='i',projection='merc', llcrnrlat=y1,urcrnrlat=y2,llcrnrlon=x1,urcrnrlon=x2,lat_ts=(y1+y2)/2)
m.drawcountries(linewidth=0.5)
m.drawcoastlines(linewidth=0.5)
m.drawparallels(np.arange(y1,y2,20.),labels=[1,0,0,0],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw parallels
m.drawmeridians(np.arange(x1,x2,20.),labels=[0,0,0,1],color='black',dashes=[1,0],labelstyle='+/-',linewidth=0.2) # draw meridians
from matplotlib.collections import LineCollection
from matplotlib import cm
import shapefile
basemap_data_dir = os.path.join(os.path.dirname(inspect.getfile(Basemap)), "data")
# this is my git clone of https://github.com/matplotlib/basemap --> these files will be in the PiCloud basemap_data_dir
if os.path.exists(os.path.join(basemap_data_dir,"UScounties.shp")):
shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
else:
# put in your path
#shpf = shapefile.Reader("/Users/raymondyee/Dropbox/WwoD13/tl_2012_us_county")
shpf = shapefile.Reader("../data/world_country_admin_boundary_shapefile_with_fips_codes.shp")
shapes = shpf.shapes()
records = shpf.records()
#print cm.colors.ColorConverter.to_rgba('#eeefff')
#random_number = 38*145*155
# show only CA and AK (for example)
for record, shape in zip(records, shapes):
lons,lats = zip(*shape.points)
data = np.array(m(lons, lats)).T
if len(shape.parts) == 1:
segs = [data,]
else:
segs = []
for i in range(1,len(shape.parts)):
index = shape.parts[i-1]
index2 = shape.parts[i]
segs.append(data[index:index2])
segs.append(data[index2:])
lines = LineCollection(segs,antialiaseds=(1,))
#cm.jet(random_number)
lines.set_facecolors(colors[0])
lines.set_edgecolors(colors[1])
lines.set_linewidth(0.1)
ax.add_collection(lines)
for record, shape in zip_filter_by_state2(records, shapes, [indexlist['mapname'][x] for x in indexlist.index]):
lons,lats = zip(*shape.points)
data = np.array(m(lons, lats)).T
if len(shape.parts) == 1:
segs = [data,]
else:
segs = []
for i in range(1,len(shape.parts)):
index = shape.parts[i-1]
index2 = shape.parts[i]
segs.append(data[index:index2])
segs.append(data[index2:])
lines = LineCollection(segs,antialiaseds=(1,))
#cm.jet(random_number)
i=i+1
x_color=None
for (w,x) in [(indexlist['mapname'][x],x) for x in indexlist.index]:
if type(w) is str and record[1] in w:
x_color = x
break
lines.set_facecolors(indexlist['color'][x_color])
lines.set_edgecolors(indexlist['color'][x_color])
lines.set_linewidth(0.1)
ax.add_collection(lines)
plt.title(titles[0])
plt.savefig('tutorial10.png',dpi=300)
plt.show()
The Human Development Index (HDI) is a composite statistic of life expectancy, education, and income indices to rank countries into four tiers of human development. It was created by economist Mahbub ul Haq, followed by economist Amartya Sen in 1990,[1] and published by the United Nations Development Programme.[2]
Published on 4 November 2010 (and updated on 10 June 2011), starting with the 2010 Human Development Report the HDI combines three dimensions:
#wikipediadf["HDI"], #
#wikipediadf["gini"],
#wikipediadf['GDP'],
#wikipediadf['GDP_nominal_per_capita'],
#wikipediadf['population']
heatmapHDI = pd.DataFrame(projects[projects.type == 'Country'], columns=['wikiname','mapname','HDI'])
#heatmapHDI=heatmapHDI.reindex(index=['wikiname','wikiname'])
#heatmapHDI = heatmapHDI.groupby(['wikiname','mapname'])
#heatmapHDI = pd.DataFrame(heatmapHDI)
heatmapHDI = heatmapHDI.fillna(0)
heatmapHDI = heatmapHDI.drop_duplicates()
heatmapHDI['color'] = pd.Series(["hola" for x in heatmapHDI.index], index=heatmapHDI.index)
drawBarCharReference2( '#425910',heatmapHDI, 'HDI',"Human Development Index (Wikipedia)", ['Country','HDI Index'])
draw_global_map2(['#ffffff','#000000'],heatmapHDI, ['Human Development Index Map (Wikipedia)'])
heatmapGini = pd.DataFrame(projects[projects.type == 'Country'], columns=['wikiname','mapname','gini'])
#heatmapHDI=heatmapHDI.reindex(index=['wikiname','wikiname'])
#heatmapHDI = heatmapHDI.groupby(['wikiname','mapname'])
#heatmapHDI = pd.DataFrame(heatmapHDI)
heatmapGini = heatmapGini.fillna(0)
heatmapGini = heatmapGini.drop_duplicates()
heatmapGini['color'] = pd.Series(["hola" for x in heatmapGini.index], index=heatmapGini.index)
drawBarCharReference2( '#0E1B5A',heatmapGini, 'gini',"Inequality Index (Wikipedia)", ['Country','Gini Index'])
draw_global_map2(['#ffffff','#000000'],heatmapGini, ['Inequality Index (Wikipedia)'])
heatmapPopulation = pd.DataFrame(projects[projects.type == 'Country'], columns=['wikiname','mapname','population'])
#heatmapHDI=heatmapHDI.reindex(index=['wikiname','wikiname'])
#heatmapHDI = heatmapHDI.groupby(['wikiname','mapname'])
#heatmapHDI = pd.DataFrame(heatmapHDI)
heatmapPopulation = heatmapPopulation.fillna(0)
heatmapPopulation = heatmapPopulation.drop_duplicates()
heatmapPopulation['color'] = pd.Series(["hola" for x in heatmapPopulation.index], index=heatmapHDI.index)
drawBarCharReference2( '#3D0A0E',heatmapPopulation, 'population',"Population per country (Wikipedia)",['Country','Population'])
draw_global_map2(['#ffffff','#000000'],heatmapPopulation, ['Population per country Map (Wikipedia)'])
heatmapfoundingPercapita = pd.DataFrame(projects[projects.type == 'Country'], columns=['wikiname','mapname','totalamt','year','population'])
heatmapfoundingPercapita = pd.DataFrame(heatmapfoundingPercapita[heatmapfoundingPercapita.year>='2001'], columns=['wikiname','mapname','totalamt','year','population'])
heatmapfoundingPercapita = heatmapfoundingPercapita.groupby(['wikiname','mapname']).sum()
for x in heatmapfoundingPercapita.sort(columns='totalamt',ascending=True).index:
#print ""+str(heatmapfoundingPercapita['totalamt'][x]/heatmapfoundingPercapita['population'][x])
heatmapfoundingPercapita['totalamt'][x]=heatmapfoundingPercapita['totalamt'][x]/heatmapfoundingPercapita['population'][x]
## filtering outliers to
if heatmapfoundingPercapita['totalamt'][x]>=400:
heatmapfoundingPercapita['totalamt'][x] =400
heatmapfoundingPercapita['color'] = pd.Series(["hola" for x in heatmapfoundingPercapita.index], index=heatmapfoundingPercapita.index)
drawBarCharReference( '#7A1138',heatmapfoundingPercapita, "totalamt","World Bank Lending per capita, commitments Accumulated 2001-2013",['Country','US$'])
draw_global_map(['#ffffff','#000000'],heatmapfoundingPercapita, ['World Bank Lending per capita, commitments Accumulated 2001-2013'])clusion