#!/usr/bin/env python
# coding: utf-8
# # Nigerian Presidential Election Result Analysis - 2015
#
# Author: Umar Yusuf
# Blog post: http://umar-yusuf.blogspot.com.ng/2016/09/Analysis-of-Nigerian-Presidential-Election-Result-2015-using-Python-Programming-Language.html
#
# ### Data Source
# The data was gathered from the Independent National Electoral Commission (INEC) official website.
#
# The Data (in .csv format) used is available for download here. Download it and save it at thesame location with this notebook.
#
# Am going to use these three main python programming packages pandas with matplotlib embedded to analyse the 2015 Presidential Election Result.
#
# # Introduction
# Nigeria has 36 states and 1 federal capital territory. The 2015 presidential election was held in the 37 territories within the country.
#
# Fourteen (14) political parties representing fourteen (14) candidates participated in the 2015 presidential elections. The parties are as follow: AA, ACPN, AD, ADC, APA, APC, CPP, HOPE, KOWA, NCP, PDP, PPN, UDP and UPP. See the result table below:-
#
#
#
# Even though the battle was between the two biggest parties (APC and PDP). The dataset we will explore will contain all the parties.
#
# The dataset contains the numeric values by states for:-
# 1~ Vote scored by each political party
# 2~ Number_of_Registered_Voters
# 3~ Number_of_Accredited_Voters
# 4~ Number_of_Valid_Votes
# 5~ Number_of_Rejected_Votes
# 6~ Total_Votes_Cast
# 7~ Population
# 8~ Population_Rank
# 9~ Number_of_LGA
#
# #### I will attempt to answer the following questions through this analysis:-
# a) What are the minimum and maximum votes for each party?
# b) Is winning in top states with highest numbers of voters’ turnout, registered voters, total votes cast, and population related to winning the general election?
# c) Is there any odd case where "Population" of a state is lower than "Number_of_Registered_Voters" and "Number_of_Accredited_Voters"?
# d) Which state voted most for the lowest rank party?
# #### Import libraries and load in the dataset
# In[1]:
# Lets import the packages
import pandas as pd
# Lets enable our plot to display inline within notebook
get_ipython().run_line_magic('matplotlib', 'inline')
# In[2]:
inec_table = pd.read_csv("INEC 2015 Presidential Election Results.csv")
inec_table.head()
# ### Statistical summary of all the columns
#
# This will show us the minimum and maximum votes for each party.
# In[3]:
inec_table.describe()
# ### Turnout of Voters for the election
# We can see the ratio of voters turnout for the election by dividing "Number_of_Reg_Voters" by "Total_Votes_Cast" for each state
# In[4]:
inec_table["Voters Turnout"] = inec_table["Total_Votes_Cast"] / inec_table["Number_of_Reg_Voters"]
inec_table[["State", "Voters Turnout"]][:11]
# In[5]:
inec_table.plot(x="State", y='Voters Turnout', figsize=(20, 5), kind="line", grid=1)
# ### Five top states with the highest "Number_of_Reg_Voters"
# In[6]:
inec_table.sort_values("Number_of_Reg_Voters", ascending=False)[:5]
# #### Which party got the highest vote among the top states with the highest "Number_of_Reg_Voters"
# In[7]:
win1 = inec_table.sort_values("Number_of_Reg_Voters", ascending=False)[:5]
win1.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'],
figsize=(20, 5), kind="bar", grid=1)
# ### Five top states with the highest number of "Total_Votes_Cast"
# In[8]:
inec_table.sort_values("Total_Votes_Cast", ascending=False)[:5]
# #### Which party got the highest vote among the top states with the highest "Total_Votes_Cast"
# In[9]:
win2 = inec_table.sort_values("Total_Votes_Cast", ascending=False)[:5]
win2.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'],
figsize=(20, 5), kind="bar", grid=1)
# ### Five top states with the highest "Population"
# In[10]:
inec_table.sort_values("Population", ascending=False)[:5]
# #### Which party got the highest vote among the top states with the highest "Population"
# In[11]:
win3 = inec_table.sort_values("Population", ascending=False)[:5]
win3.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'],
figsize=(20, 5), kind="bar", grid=1)
# ### Five top states with the highest "Number_of_LGA"
# In[12]:
inec_table.sort_values("Number_of_LGA", ascending=False)[:5]
# #### Which party got the highest vote among the top states with the highest "Number_of_LGA"
# In[13]:
win4 = inec_table.sort_values("Number_of_LGA", ascending=False)[:5]
win4.plot(x="State", y=['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP'],
figsize=(20, 5), kind="bar", grid=1)
# ### Lets extract the following columns out to form a separate dataframe from the dataset
# 1~ Number_of_Registered_Voters
# 2~ Number_of_Accredited_Voters
# 3~ Number_of_Valid_Votes
# 4~ Number_of_Rejected_Votes
# 5~ Total_Votes_Cast
# 6~ Population
# 7~ Population_Rank
# 8~ Number_of_LGA
# In[14]:
voters_table = inec_table[['State', 'Number_of_Reg_Voters', 'Number_of_Accr_Voters', 'Number_of_Valid_Votes', 'Number_of_Rejected_Votes',
'Number_of_Rejected_Votes', 'Total_Votes_Cast', 'Population', 'Population_Rank', 'Number_of_LGA']]
voters_table
# ### Summary statistics of voters_table
# In[15]:
voters_table.describe()
# ### Graph "Number_of_Registered_Voters" Vs "Number_of_Accredited_Voters" Vs "Population"
# Naturally, "Number_of_Registered_Voters" should be higher than "Number_of_Accredited_Voters". Likewise, "Population" should be higher than both "Number_of_Registered_Voters" and "Number_of_Accredited_Voters". Lets see if there is any odd case in any particular state?
# In[16]:
voters_table.plot(x='State', y=['Number_of_Reg_Voters', 'Number_of_Accr_Voters', 'Population'], kind='bar', figsize=(20, 5), title='Bar Plot', grid=1)
# ### Lets extract the parties columns out to form a separate dataframe from the dataset
# In[17]:
parties_table = inec_table[['State', 'AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP']]
parties_table
# ### Summary statistics of parties_table
# In[18]:
parties_table.describe()
# ### Sum of Votes gotten by each party
# In[19]:
vote_sum = parties_table[['AA', 'ACPN', 'AD', 'ADC', 'APA', 'APC', 'CPP', 'HOPE', 'KOWA', 'NCP', 'PDP', 'PPN', 'UDP', 'UPP']].sum()
vote_sum
# ### Visualize the total votes by party
# In[20]:
vote_sum.plot(kind='bar', figsize=(20, 5), grid=1)
# #### As you can see, votes gotten by "APC" and "PDP" far outweighs that of other parties. So lets focus on these two biggest parties...
# ### Visualize votes of "APC" and "PDP" by states
# In[21]:
parties_table.plot(x="State", y=["APC", "PDP"], figsize=(10,25), kind="barh", grid=100)
# ### States with lowest votes
#
# Lets see what the bottom states with lowest number of votes have to offer
# In[22]:
low_vote_states = vote_sum.sort_values()[:11]
low_vote_states
# In[23]:
low_vote_states.plot(kind="bar", figsize=(15, 5), grid=100)
# ## HOPE Party
# Lets see the state that voted most for the lowest rank party - HOPE
# In[24]:
hope_party = parties_table[['State', 'HOPE']]
hope_party.plot(x='State', y='HOPE', kind='bar', figsize=(15, 5))
# As seen above, the states that voted most for lowest rank party (HOPE) are Ebonyi, Oyo and Rivers.
# # What next?
# You can do more with this dataset, but for me that is it on analysing Nigeria 2015 presidential election result with python.
#
# Next, I will do a spatial analysis on thesame election result dataset with QGIS (http://qgis.org/) and Tableau (http://tableau.com/). Note that there are excellent python packages that supports spatial analysis, namely: GeoPandas, PySAL, Pyshp, Shapely, ArcPy, PyQGIS, Fiona, Rasterio, GDAL/OGR etc
#
# So if you are interested in the spatial analysis, click on the link below:-
# ~1~ Spatial Analysis of Nigeria 2015 Presidential Election Result Using QGIS - Desktop Visualization
#
# ~2~ Spatial Analysis of Nigeria 2015 Presidential Election Result Using Tableau - Web-based Visualization