! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from shapely.geometry import Point
import plotly.express as px
%matplotlib inline
# activate plot theme
import qeds
qeds.themes.mpl_style();
I start by uploading my thesis data. For my thesis, I am interested in understanding how increasing the intensity of the measure of fatalities affects support for Duterte's party, PDP Laban, in the most recent mayoral elections. Since Duterte was elected in 2016, the Philippines has become a hotbed for extra judicial killings of drug suspects. There's a lot of literature in economics about the effect of conflict on political and economic outcomes. One really important paper for my thesis is done by Luisa Blanco in 2012, who finds that increased perceptions of insecurity(resulting from exposure to violent crime) leads to distrust in the government and also increased palatability of authoritarianism(partially because of its perceived efficiency in dealing with crime). The Philippines offers an interesting context: On the one hand, the majority of the extra-judicial killings have been comitted or encouraged by the state towards mostly low income Filipinos. This could lead to Filipinos feeling less safe and distrusting the government. On the other hand, the Philippines has a long history of dealing with inefficient courts, and so the drug war could be welcomed as an efficient way of dealing with the drug trade. The purpose of my thesis is to see which effect dominates in the Philippine setting.
#read thesis data here
PH = pd.read_stata("Collapsed merged with controls and dist!(unmatched munis dropped) 2.dta")
#generating the coordinates
PH["Coordinates"] = list(zip(PH.longitude, PH.latitude))
PH.head()
PH["Coordinates"] = PH["Coordinates"].apply(Point)
PH.head()
PDP percent is my variable for the vote share of Duterte's party's candidates.
Next, I upload the stanford shape file that contains the polygon data at the municipality level.
PHshape2 = gpd.read_file("PHL_stanford.shp")
PHshape2.plot()
#get the world map:
world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world.set_index("iso_a3")
world.loc["PHL",'geometry']
PH.info()
I now convert my thesis data to a geopanda data frame so that I can merge using geometric locations.
PHvote = gpd.GeoDataFrame(PH, geometry="Coordinates")
PHvote
In the cells below, I merge my thesis data with the shape file I need to plot the municipalities and the cities. I tried using all 3 versions of "op" ; the other 2 are "contains" and "within". "Intersects" gives me the best merge.
##merging the spacial data with the thesis data
PHmerge = gpd.sjoin(PHvote, PHshape2, how = "right", op = 'intersects')
PHmerge
I clean the merged data set a little bit to only retain matched observations.
## retaining only the matched municipalities
PHmerge = PHmerge[PHmerge.classification == 'Community']
PHmerge
Before I can make the map, I need to convert my merged data set into a geopanda data frame.
PHmerge = gpd.GeoDataFrame(PHmerge, geometry="geometry")
PHmerge
Finally, I graph the vote share in the Philippines by municipality in the 2019 Mayoral elections.
##map for PH
fig, gax = plt.subplots(figsize = (10,8))
#plot the Philippines!
world.query("name=='Philippines'").plot(ax=gax, edgecolor='black',color='white')
#plot the muicipalities
PHmerge.plot(ax=gax, column="PDPpercent", legend=True,cmap='RdBu_r')
The above results show that most municipalities don't have majority support for Duterte's party's mayoral candidates. The important thing though is that there is lots of variation in the municipality's vote shares for Duterte, which is a good thing.
I also try to graph the fatalities for the Philippines. I should note that most municipalities that observed at least one fatality have a relatively small count of fatalities(the mean of the fatalities variable in my data set is around 3). To scale the fatalities better on the map, I decide to use the log of fatalities instead of the count. Since the range in the count of fatalities is quite large(it goes from 0 to 428), some municipalities look as though they had no fatalities because of the way cmap scales the colouring. In my thesis, I include specifications where I use the count of fataltiies, and others where I include the log of fatalities.
Note: I chose to use a heat map for this visualisation since this is the same type of colour scheme Blanco(2012) uses for their homecide rate visualisations in their appendix.
PHmerge["log_fatalities"]=np.log(PHmerge["fatalities"])
#cleaning the log fatalities variable
PHmerge["log_fatalities"] = np.nan_to_num(PHmerge["log_fatalities"], copy=True, nan=0.0, posinf=None, neginf=0)
PHmerge["log_fatalities"]
fig, gax = plt.subplots(figsize = (10,8))
#plot the Philippines!
world.query("name=='Philippines'").plot(ax=gax, edgecolor='black',color='white')
#plot the muicipalities
PHmerge.plot(ax=gax, column="log_fatalities", legend=True,cmap='hot')
This map is reassuring since there is a lot of geographic variation in the fatalities. If the fatalities were concerntrated in one specific region that would mean that I might have to limit my analysis to that region, which would severly limit my observations. Since the fatalities are relatively spread out throughout the country, I am able to make use of all the municipalities in my data set.
For reference, here is the map where I use the count of fatalities instead:
fig, gax = plt.subplots(figsize = (10,8))
#plot the Philippines!
world.query("name=='Philippines'").plot(ax=gax, edgecolor='black',color='white')
#plot the muicipalities
PHmerge.plot(ax=gax, column="fatalities", legend=True,cmap='hot')
This map is less assuring, but again, the mean of the fatalities variable is right around 3, which means most municipalities won't show up in this colour scaling. Only the outliers show up in this colour scaling, and even then the outliers are still spread throughout the country, which makes me confident that there isn't one regional fixed effect driving the fatalities.
Next, I make an interactive graph!
##interactivity:
from bokeh.io import output_notebook
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import brewer
output_notebook()
import json
#Convert data to geojson for bokeh
PH_geojson=GeoJSONDataSource(geojson=PHmerge.to_json())
color_mapper = LinearColorMapper(palette = brewer['RdBu'][10], low = 0, high = 1)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
border_line_color=None,location = (0,0), orientation = 'horizontal')
hover = HoverTool(tooltips = [ ('Municipality','@muni'),('PDP vote share', '@PDPpercent'),
('Fatalities','@fatalities')])
p = figure(title="Philippines vote share in the 2019 Mayoral elections", tools=[hover])
p.patches("xs","ys",source=PH_geojson,
fill_color = {'field' :'PDPpercent', 'transform' : color_mapper})
p.add_layout(color_bar, 'below')
show(p)