#!/usr/bin/env python # coding: utf-8 # # Nigerian Presidential Election Result Analysis - 2015 # # Author: Umar Yusuf
# Blog post: http://umar-yusuf.blogspot.com.ng/2016/09/Analysis-of-Nigerian-Presidential-Election-Result-2015-using-Python-Programming-Language.html # # ### Data Source # The data was gathered from the Independent National Electoral Commission (INEC) official website. # # The Data (in .csv format) used is available for download here. Download it and save it at thesame location with this notebook. # # Am going to use these three main python programming packages pandas with matplotlib embedded to analyse the 2015 Presidential Election Result. # # # Introduction # Nigeria has 36 states and 1 federal capital territory. The 2015 presidential election was held in the 37 territories within the country. # # Fourteen (14) political parties representing fourteen (14) candidates participated in the 2015 presidential elections. The parties are as follow: AA, ACPN, AD, ADC, APA, APC, CPP, HOPE, KOWA, NCP, PDP, PPN, UDP and UPP. See the result table below:- # # # # Even though the battle was between the two biggest parties (APC and PDP). The dataset we will explore will contain all the parties. # # The dataset contains the numeric values by states for:-
# 1~ Vote scored by each political party
# 2~ Number_of_Registered_Voters
# 3~ Number_of_Accredited_Voters
# 4~ Number_of_Valid_Votes
# 5~ Number_of_Rejected_Votes
# 6~ Total_Votes_Cast
# 7~ Population
# 8~ Population_Rank
# 9~ Number_of_LGA
# # #### I will attempt to answer the following questions through this analysis:- # a) What are the minimum and maximum votes for each party?
# b) Is winning in top states with highest numbers of voters’ turnout, registered voters, total votes cast, and population related to winning the general election?
# c) Is there any odd case where "Population" of a state is lower than "Number_of_Registered_Voters" and "Number_of_Accredited_Voters"?
# d) Which state voted most for the lowest rank party?
# #### Import libraries and load in the dataset # In[1]: # Lets import the packages import pandas as pd # Lets enable our plot to display inline within notebook get_ipython().run_line_magic('matplotlib', 'inline') # In[2]: inec_table = pd.read_csv("INEC 2015 Presidential Election Results.csv") inec_table.head() # ### Statistical summary of all the columns # # This will show us the minimum and maximum votes for each party. # In[3]: inec_table.describe() # ### Turnout of Voters for the election # We can see the ratio of voters turnout for the election by dividing "Number_of_Reg_Voters" by "Total_Votes_Cast" for each state # In[4]: inec_table["Voters Turnout"] = inec_table["Total_Votes_Cast"] / inec_table["Number_of_Reg_Voters"] inec_table[["State", "Voters Turnout"]][:11] # In[5]: inec_table.plot(x="State", y='Voters Turnout', figsize=(20, 5), kind="line", grid=1) # ### Five top states with the highest "Number_of_Reg_Voters" # In[6]: inec_table.sort_values("Number_of_Reg_Voters", ascending=False)[:5] # #### Which party got the highest vote among the top states with the highest "Number_of_Reg_Voters" # In[7]: win1 = inec_table.sort_values("Number_of_Reg_Voters", ascending=False)[:5] win1.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'], figsize=(20, 5), kind="bar", grid=1) # ### Five top states with the highest number of "Total_Votes_Cast" # In[8]: inec_table.sort_values("Total_Votes_Cast", ascending=False)[:5] # #### Which party got the highest vote among the top states with the highest "Total_Votes_Cast" # In[9]: win2 = inec_table.sort_values("Total_Votes_Cast", ascending=False)[:5] win2.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'], figsize=(20, 5), kind="bar", grid=1) # ### Five top states with the highest "Population" # In[10]: inec_table.sort_values("Population", ascending=False)[:5] # #### Which party got the highest vote among the top states with the highest "Population" # In[11]: win3 = inec_table.sort_values("Population", ascending=False)[:5] win3.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'], figsize=(20, 5), kind="bar", grid=1) # ### Five top states with the highest "Number_of_LGA" # In[12]: inec_table.sort_values("Number_of_LGA", ascending=False)[:5] # #### Which party got the highest vote among the top states with the highest "Number_of_LGA" # In[13]: win4 = inec_table.sort_values("Number_of_LGA", ascending=False)[:5] win4.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'], figsize=(20, 5), kind="bar", grid=1) # ### Lets extract the following columns out to form a separate dataframe from the dataset # 1~ Number_of_Registered_Voters
# 2~ Number_of_Accredited_Voters
# 3~ Number_of_Valid_Votes
# 4~ Number_of_Rejected_Votes
# 5~ Total_Votes_Cast
# 6~ Population
# 7~ Population_Rank
# 8~ Number_of_LGA
# In[14]: voters_table = inec_table[['State', 'Number_of_Reg_Voters', 'Number_of_Accr_Voters', 'Number_of_Valid_Votes', 'Number_of_Rejected_Votes', 'Number_of_Rejected_Votes', 'Total_Votes_Cast', 'Population', 'Population_Rank', 'Number_of_LGA']] voters_table # ### Summary statistics of voters_table # In[15]: voters_table.describe() # ### Graph "Number_of_Registered_Voters" Vs "Number_of_Accredited_Voters" Vs "Population" # Naturally, "Number_of_Registered_Voters" should be higher than "Number_of_Accredited_Voters". Likewise, "Population" should be higher than both "Number_of_Registered_Voters" and "Number_of_Accredited_Voters". Lets see if there is any odd case in any particular state? # In[16]: voters_table.plot(x='State', y=['Number_of_Reg_Voters', 'Number_of_Accr_Voters', 'Population'], kind='bar', figsize=(20, 5), title='Bar Plot', grid=1) # ### Lets extract the parties columns out to form a separate dataframe from the dataset # In[17]: parties_table = inec_table[['State', 'AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP']] parties_table # ### Summary statistics of parties_table # In[18]: parties_table.describe() # ### Sum of Votes gotten by each party # In[19]: vote_sum = parties_table[['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP']].sum() vote_sum # ### Visualize the total votes by party # In[20]: vote_sum.plot(kind='bar', figsize=(20, 5), grid=1) # #### As you can see, votes gotten by "APC" and "PDP" far outweighs that of other parties. So lets focus on these two biggest parties... # ### Visualize votes of "APC" and "PDP" by states # In[21]: parties_table.plot(x="State", y=["APC", "PDP"], figsize=(10,25), kind="barh", grid=100) # ### States with lowest votes # # Lets see what the bottom states with lowest number of votes have to offer # In[22]: low_vote_states = vote_sum.sort_values()[:11] low_vote_states # In[23]: low_vote_states.plot(kind="bar", figsize=(15, 5), grid=100) # ## HOPE Party # Lets see the state that voted most for the lowest rank party - HOPE # In[24]: hope_party = parties_table[['State', 'HOPE']] hope_party.plot(x='State', y='HOPE', kind='bar', figsize=(15, 5)) # As seen above, the states that voted most for lowest rank party (HOPE) are Ebonyi, Oyo and Rivers. # # What next? # You can do more with this dataset, but for me that is it on analysing Nigeria 2015 presidential election result with python. # # Next, I will do a spatial analysis on thesame election result dataset with QGIS (http://qgis.org/) and Tableau (http://tableau.com/). Note that there are excellent python packages that supports spatial analysis, namely: GeoPandas, PySAL, Pyshp, Shapely, ArcPy, PyQGIS, Fiona, Rasterio, GDAL/OGR etc # # So if you are interested in the spatial analysis, click on the link below:- # ~1~ Spatial Analysis of Nigeria 2015 Presidential Election Result Using QGIS - Desktop Visualization # # ~2~ Spatial Analysis of Nigeria 2015 Presidential Election Result Using Tableau - Web-based Visualization