In [1]:
! pip install qeds fiona geopandas xgboost gensim folium pyLDAvis descartes
Requirement already satisfied: qeds in /opt/conda/lib/python3.7/site-packages (0.6.2)
Requirement already satisfied: fiona in /opt/conda/lib/python3.7/site-packages (1.8.9.post2)
Requirement already satisfied: geopandas in /opt/conda/lib/python3.7/site-packages (0.7.0)
Collecting xgboost
  Using cached xgboost-1.0.2-py3-none-manylinux1_x86_64.whl (109.7 MB)
Collecting gensim
  Using cached gensim-3.8.1-cp37-cp37m-manylinux1_x86_64.whl (24.2 MB)
Requirement already satisfied: folium in /opt/conda/lib/python3.7/site-packages (0.10.1)
Processing /home/jupyter/.cache/pip/wheels/3b/fb/41/e32e5312da9f440d34c4eff0d2207b46dc9332a7b931ef1e89/pyLDAvis-2.1.2-py2.py3-none-any.whl
Requirement already satisfied: descartes in /opt/conda/lib/python3.7/site-packages (1.1.0)
Requirement already satisfied: pandas in /opt/conda/lib/python3.7/site-packages (from qeds) (1.0.1)
Requirement already satisfied: openpyxl in /opt/conda/lib/python3.7/site-packages (from qeds) (3.0.3)
Requirement already satisfied: pandas-datareader in /opt/conda/lib/python3.7/site-packages (from qeds) (0.8.1)
Requirement already satisfied: pyarrow in /opt/conda/lib/python3.7/site-packages (from qeds) (0.16.0)
Requirement already satisfied: numpy in /opt/conda/lib/python3.7/site-packages (from qeds) (1.18.1)
Requirement already satisfied: plotly in /opt/conda/lib/python3.7/site-packages (from qeds) (4.5.3)
Requirement already satisfied: quandl in /opt/conda/lib/python3.7/site-packages (from qeds) (3.5.0)
Requirement already satisfied: statsmodels in /opt/conda/lib/python3.7/site-packages (from qeds) (0.11.0)
Requirement already satisfied: scipy in /opt/conda/lib/python3.7/site-packages (from qeds) (1.4.1)
Requirement already satisfied: seaborn in /opt/conda/lib/python3.7/site-packages (from qeds) (0.9.0)
Requirement already satisfied: requests in /opt/conda/lib/python3.7/site-packages (from qeds) (2.23.0)
Requirement already satisfied: scikit-learn in /opt/conda/lib/python3.7/site-packages (from qeds) (0.22.1)
Requirement already satisfied: matplotlib in /opt/conda/lib/python3.7/site-packages (from qeds) (3.1.3)
Requirement already satisfied: quantecon in /opt/conda/lib/python3.7/site-packages (from qeds) (0.4.6)
Requirement already satisfied: attrs>=17 in /opt/conda/lib/python3.7/site-packages (from fiona) (19.3.0)
Requirement already satisfied: click<8,>=4.0 in /opt/conda/lib/python3.7/site-packages (from fiona) (7.0)
Requirement already satisfied: cligj>=0.5 in /opt/conda/lib/python3.7/site-packages (from fiona) (0.5.0)
Requirement already satisfied: click-plugins>=1.0 in /opt/conda/lib/python3.7/site-packages (from fiona) (1.1.1)
Requirement already satisfied: six>=1.7 in /opt/conda/lib/python3.7/site-packages (from fiona) (1.14.0)
Requirement already satisfied: munch in /opt/conda/lib/python3.7/site-packages (from fiona) (2.5.0)
Requirement already satisfied: pyproj>=2.2.0 in /opt/conda/lib/python3.7/site-packages (from geopandas) (2.3.1)
Requirement already satisfied: shapely in /opt/conda/lib/python3.7/site-packages (from geopandas) (1.6.4.post2)
Processing /home/jupyter/.cache/pip/wheels/1f/e5/fc/7412935a7184efc8ad377e948c81b1cc99b6a02eb8dc7c918c/smart_open-1.10.0-py3-none-any.whl
Requirement already satisfied: branca>=0.3.0 in /opt/conda/lib/python3.7/site-packages (from folium) (0.4.0)
Requirement already satisfied: jinja2>=2.9 in /opt/conda/lib/python3.7/site-packages (from folium) (2.11.1)
Processing /home/jupyter/.cache/pip/wheels/56/b0/fe/4410d17b32f1f0c3cf54cdfb2bc04d7b4b8f4ae377e2229ba0/future-0.18.2-py3-none-any.whl
Requirement already satisfied: joblib>=0.8.4 in /opt/conda/lib/python3.7/site-packages (from pyLDAvis) (0.14.1)
Processing /home/jupyter/.cache/pip/wheels/3c/33/97/805b282e129f60bb4e87cea622338f30b65f21eaf65219971f/funcy-1.14-py2.py3-none-any.whl
Requirement already satisfied: pytest in /opt/conda/lib/python3.7/site-packages (from pyLDAvis) (5.3.5)
Requirement already satisfied: numexpr in /opt/conda/lib/python3.7/site-packages (from pyLDAvis) (2.7.1)
Requirement already satisfied: wheel>=0.23.0 in /opt/conda/lib/python3.7/site-packages (from pyLDAvis) (0.34.2)
Requirement already satisfied: python-dateutil>=2.6.1 in /opt/conda/lib/python3.7/site-packages (from pandas->qeds) (2.8.1)
Requirement already satisfied: pytz>=2017.2 in /opt/conda/lib/python3.7/site-packages (from pandas->qeds) (2019.3)
Requirement already satisfied: et-xmlfile in /opt/conda/lib/python3.7/site-packages (from openpyxl->qeds) (1.0.1)
Requirement already satisfied: jdcal in /opt/conda/lib/python3.7/site-packages (from openpyxl->qeds) (1.4.1)
Requirement already satisfied: lxml in /opt/conda/lib/python3.7/site-packages (from pandas-datareader->qeds) (4.5.0)
Requirement already satisfied: retrying>=1.3.3 in /opt/conda/lib/python3.7/site-packages (from plotly->qeds) (1.3.3)
Requirement already satisfied: inflection>=0.3.1 in /opt/conda/lib/python3.7/site-packages (from quandl->qeds) (0.3.1)
Requirement already satisfied: more-itertools in /opt/conda/lib/python3.7/site-packages (from quandl->qeds) (8.2.0)
Requirement already satisfied: patsy>=0.5 in /opt/conda/lib/python3.7/site-packages (from statsmodels->qeds) (0.5.1)
Requirement already satisfied: chardet<4,>=3.0.2 in /opt/conda/lib/python3.7/site-packages (from requests->qeds) (3.0.4)
Requirement already satisfied: urllib3!=1.25.0,!=1.25.1,<1.26,>=1.21.1 in /opt/conda/lib/python3.7/site-packages (from requests->qeds) (1.25.7)
Requirement already satisfied: idna<3,>=2.5 in /opt/conda/lib/python3.7/site-packages (from requests->qeds) (2.9)
Requirement already satisfied: certifi>=2017.4.17 in /opt/conda/lib/python3.7/site-packages (from requests->qeds) (2019.11.28)
Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->qeds) (2.4.6)
Requirement already satisfied: cycler>=0.10 in /opt/conda/lib/python3.7/site-packages (from matplotlib->qeds) (0.10.0)
Requirement already satisfied: kiwisolver>=1.0.1 in /opt/conda/lib/python3.7/site-packages (from matplotlib->qeds) (1.1.0)
Requirement already satisfied: sympy in /opt/conda/lib/python3.7/site-packages (from quantecon->qeds) (1.5.1)
Requirement already satisfied: numba>=0.38 in /opt/conda/lib/python3.7/site-packages (from quantecon->qeds) (0.48.0)
Collecting boto3
  Downloading boto3-1.12.29-py2.py3-none-any.whl (128 kB)
     |████████████████████████████████| 128 kB 6.4 MB/s eta 0:00:01
Collecting google-cloud-storage
  Using cached google_cloud_storage-1.26.0-py2.py3-none-any.whl (75 kB)
Requirement already satisfied: MarkupSafe>=0.23 in /opt/conda/lib/python3.7/site-packages (from jinja2>=2.9->folium) (1.1.1)
Requirement already satisfied: py>=1.5.0 in /opt/conda/lib/python3.7/site-packages (from pytest->pyLDAvis) (1.8.1)
Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from pytest->pyLDAvis) (20.1)
Requirement already satisfied: pluggy<1.0,>=0.12 in /opt/conda/lib/python3.7/site-packages (from pytest->pyLDAvis) (0.13.0)
Requirement already satisfied: wcwidth in /opt/conda/lib/python3.7/site-packages (from pytest->pyLDAvis) (0.1.8)
Requirement already satisfied: importlib-metadata>=0.12 in /opt/conda/lib/python3.7/site-packages (from pytest->pyLDAvis) (1.5.0)
Requirement already satisfied: setuptools in /opt/conda/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib->qeds) (45.2.0.post20200209)
Requirement already satisfied: mpmath>=0.19 in /opt/conda/lib/python3.7/site-packages (from sympy->quantecon->qeds) (1.1.0)
Requirement already satisfied: llvmlite<0.32.0,>=0.31.0dev0 in /opt/conda/lib/python3.7/site-packages (from numba>=0.38->quantecon->qeds) (0.31.0)
Collecting s3transfer<0.4.0,>=0.3.0
  Using cached s3transfer-0.3.3-py2.py3-none-any.whl (69 kB)
Collecting botocore<1.16.0,>=1.15.29
  Downloading botocore-1.15.29-py2.py3-none-any.whl (6.0 MB)
     |████████████████████████████████| 6.0 MB 9.4 MB/s eta 0:00:01
Collecting jmespath<1.0.0,>=0.7.1
  Using cached jmespath-0.9.5-py2.py3-none-any.whl (24 kB)
Collecting google-resumable-media<0.6dev,>=0.5.0
  Using cached google_resumable_media-0.5.0-py2.py3-none-any.whl (38 kB)
Collecting google-auth<2.0dev,>=1.11.0
  Using cached google_auth-1.11.3-py2.py3-none-any.whl (76 kB)
Collecting google-cloud-core<2.0dev,>=1.2.0
  Using cached google_cloud_core-1.3.0-py2.py3-none-any.whl (26 kB)
Requirement already satisfied: zipp>=0.5 in /opt/conda/lib/python3.7/site-packages (from importlib-metadata>=0.12->pytest->pyLDAvis) (2.2.1)
Collecting docutils<0.16,>=0.10
  Using cached docutils-0.15.2-py3-none-any.whl (547 kB)
Requirement already satisfied: cachetools<5.0,>=2.0.0 in /home/jupyter/.local/lib/python3.7/site-packages (from google-auth<2.0dev,>=1.11.0->google-cloud-storage->smart-open>=1.8.1->gensim) (4.0.0)
Requirement already satisfied: pyasn1-modules>=0.2.1 in /opt/conda/lib/python3.7/site-packages (from google-auth<2.0dev,>=1.11.0->google-cloud-storage->smart-open>=1.8.1->gensim) (0.2.7)
Collecting rsa<4.1,>=3.1.4
  Using cached rsa-4.0-py2.py3-none-any.whl (38 kB)
Collecting google-api-core<2.0.0dev,>=1.16.0
  Using cached google_api_core-1.16.0-py2.py3-none-any.whl (70 kB)
Requirement already satisfied: pyasn1<0.5.0,>=0.4.6 in /opt/conda/lib/python3.7/site-packages (from pyasn1-modules>=0.2.1->google-auth<2.0dev,>=1.11.0->google-cloud-storage->smart-open>=1.8.1->gensim) (0.4.8)
Processing /home/jupyter/.cache/pip/wheels/4c/a1/71/5e427276ceeff277fd76878d1b19fbf4587a2845015d86864b/googleapis_common_protos-1.51.0-py3-none-any.whl
Requirement already satisfied: protobuf>=3.4.0 in /opt/conda/lib/python3.7/site-packages (from google-api-core<2.0.0dev,>=1.16.0->google-cloud-core<2.0dev,>=1.2.0->google-cloud-storage->smart-open>=1.8.1->gensim) (3.11.4)
Installing collected packages: xgboost, jmespath, docutils, botocore, s3transfer, boto3, google-resumable-media, rsa, google-auth, googleapis-common-protos, google-api-core, google-cloud-core, google-cloud-storage, smart-open, gensim, future, funcy, pyLDAvis
Successfully installed boto3-1.12.29 botocore-1.15.29 docutils-0.15.2 funcy-1.14 future-0.18.2 gensim-3.8.1 google-api-core-1.16.0 google-auth-1.11.3 google-cloud-core-1.3.0 google-cloud-storage-1.26.0 google-resumable-media-0.5.0 googleapis-common-protos-1.51.0 jmespath-0.9.5 pyLDAvis-2.1.2 rsa-4.0 s3transfer-0.3.3 smart-open-1.10.0 xgboost-1.0.2
In [2]:
import geopandas as gpd
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
from shapely.geometry import Point
import plotly.express as px

%matplotlib inline
# activate plot theme
import qeds
qeds.themes.mpl_style();

I start by uploading my thesis data. For my thesis, I am interested in understanding how increasing the intensity of the measure of fatalities affects support for Duterte's party, PDP Laban, in the most recent mayoral elections. Since Duterte was elected in 2016, the Philippines has become a hotbed for extra judicial killings of drug suspects. There's a lot of literature in economics about the effect of conflict on political and economic outcomes. One really important paper for my thesis is done by Luisa Blanco in 2012, who finds that increased perceptions of insecurity(resulting from exposure to violent crime) leads to distrust in the government and also increased palatability of authoritarianism(partially because of its perceived efficiency in dealing with crime). The Philippines offers an interesting context: On the one hand, the majority of the extra-judicial killings have been comitted or encouraged by the state towards mostly low income Filipinos. This could lead to Filipinos feeling less safe and distrusting the government. On the other hand, the Philippines has a long history of dealing with inefficient courts, and so the drug war could be welcomed as an efficient way of dealing with the drug trade. The purpose of my thesis is to see which effect dominates in the Philippine setting.

In [4]:
#read thesis data here

PH = pd.read_stata("Collapsed merged with controls and dist!(unmatched munis dropped) 2.dta")

#generating the coordinates
PH["Coordinates"] = list(zip(PH.longitude, PH.latitude))
PH.head()
PH["Coordinates"] = PH["Coordinates"].apply(Point)
PH.head()
Out[4]:
province muni PDPpercent region region_code province_code municipality_city_code longitude latitude location ... lon_a_5 lat_a_5 d_a_5 PDPran FPDP pi_g fatalities_pc fatalities_p_c pi_a Coordinates
0 ABRA BANGUED 0.101855 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140101000 120.600914 17.611204 POINT(120.600912102 17.6112035614) ... 120.991997 14.545 341.874815 1.0 6.0 -0.386866 0.000125 1.245770 0.126267 POINT (120.6009140014648 17.61120414733887)
1 ABRA BOLINEY 0.000000 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140102000 120.864738 17.378698 POINT(120.864738788 17.3786990212) ... 120.991997 14.545 313.868504 0.0 0.0 -0.627300 0.000000 0.000000 0.451233 POINT (120.8647384643555 17.37869834899902)
2 ABRA BUCAY 0.053107 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140103000 120.719849 17.523537 POINT(120.719846281 17.523537522) ... 120.991997 14.545 330.886511 1.0 1.0 -0.562465 0.000058 0.584283 0.298767 POINT (120.7198486328125 17.52353668212891)
3 ABRA BUCLOC 0.047902 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140104000 120.848061 17.436989 POINT(120.848063524 17.4369884989) ... 120.991997 14.545 320.394546 1.0 0.0 -0.727650 0.000000 0.000000 0.467700 POINT (120.8480606079102 17.43698883056641)
4 ABRA DAGUIOMAN 0.007278 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140105000 120.952278 17.449829 POINT(120.952276395 17.4498298787) ... 120.991997 14.545 321.473186 1.0 0.0 -0.056885 0.000000 0.000000 0.331700 POINT (120.952278137207 17.4498291015625)

5 rows × 63 columns

PDP percent is my variable for the vote share of Duterte's party's candidates.

Next, I upload the stanford shape file that contains the polygon data at the municipality level.

In [5]:
PHshape2 = gpd.read_file("PHL_stanford.shp")
In [6]:
PHshape2.plot()
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff67eca37d0>
In [7]:
#get the world map:

world = gpd.read_file(gpd.datasets.get_path("naturalearth_lowres"))
world = world.set_index("iso_a3")
In [8]:
world.loc["PHL",'geometry']
Out[8]:
In [9]:
PH.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 1615 entries, 0 to 1614
Data columns (total 63 columns):
 #   Column                  Non-Null Count  Dtype   
---  ------                  --------------  -----   
 0   province                1615 non-null   object  
 1   muni                    1615 non-null   object  
 2   PDPpercent              1615 non-null   float32 
 3   region                  1615 non-null   object  
 4   region_code             1615 non-null   object  
 5   province_code           1615 non-null   object  
 6   municipality_city_code  1615 non-null   object  
 7   longitude               1615 non-null   float32 
 8   latitude                1615 non-null   float32 
 9   location                1615 non-null   object  
 10  muni_code_numeric       1615 non-null   int32   
 11  p_2009                  1615 non-null   float32 
 12  p_2012                  1615 non-null   float32 
 13  p_2015                  1615 non-null   float32 
 14  var26                   1615 non-null   object  
 15  municipality_code       1615 non-null   object  
 16  Male                    1615 non-null   object  
 17  Female                  1615 non-null   object  
 18  population              1615 non-null   int32   
 19  portname                1615 non-null   object  
 20  prttype                 1615 non-null   object  
 21  lat_s                   1615 non-null   float32 
 22  lon_s                   1615 non-null   float32 
 23  iso3                    1615 non-null   object  
 24  iso3_op                 1615 non-null   object  
 25  remarks                 1615 non-null   object  
 26  geonameid               1615 non-null   int32   
 27  gdb_geomattr_data       0 non-null      float64 
 28  d_s                     1615 non-null   float64 
 29  muni1                   1615 non-null   object  
 30  muni2                   1615 non-null   object  
 31  muni11                  1615 non-null   object  
 32  muni12                  1615 non-null   object  
 33  fatalities              1615 non-null   float64 
 34  regioncode              1615 non-null   object  
 35  provincecode            1615 non-null   object  
 36  city_municipality       1615 non-null   object  
 37  city_muncode            1615 non-null   object  
 38  airport                 1615 non-null   object  
 39  classification          1615 non-null   object  
 40  description             1615 non-null   object  
 41  address                 1615 non-null   object  
 42  lon_a                   1615 non-null   float32 
 43  lat_a                   1615 non-null   float32 
 44  d_a                     1615 non-null   float64 
 45  a_typhoon               1615 non-null   float64 
 46  a_earthquake            1615 non-null   float64 
 47  _merge1                 1615 non-null   category
 48  votePDP                 1615 non-null   float32 
 49  fatality_b              1615 non-null   float32 
 50  lat_s_5                 1615 non-null   float32 
 51  lon_s_5                 1615 non-null   float32 
 52  d_s_5                   1615 non-null   float64 
 53  lon_a_5                 1615 non-null   float32 
 54  lat_a_5                 1615 non-null   float32 
 55  d_a_5                   1615 non-null   float64 
 56  PDPran                  1615 non-null   float32 
 57  FPDP                    1615 non-null   float32 
 58  pi_g                    1615 non-null   float32 
 59  fatalities_pc           1615 non-null   float32 
 60  fatalities_p_c          1615 non-null   float32 
 61  pi_a                    1615 non-null   float32 
 62  Coordinates             1615 non-null   object  
dtypes: category(1), float32(22), float64(8), int32(3), object(29)
memory usage: 638.8+ KB

I now convert my thesis data to a geopanda data frame so that I can merge using geometric locations.

In [10]:
PHvote = gpd.GeoDataFrame(PH, geometry="Coordinates")

PHvote
Out[10]:
province muni PDPpercent region region_code province_code municipality_city_code longitude latitude location ... lon_a_5 lat_a_5 d_a_5 PDPran FPDP pi_g fatalities_pc fatalities_p_c pi_a Coordinates
0 ABRA BANGUED 0.101855 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140101000 120.600914 17.611204 POINT(120.600912102 17.6112035614) ... 120.991997 14.545 341.874815 1.0 6.0 -0.386866 0.000125 1.245770 0.126267 POINT (120.60091 17.61120)
1 ABRA BOLINEY 0.000000 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140102000 120.864738 17.378698 POINT(120.864738788 17.3786990212) ... 120.991997 14.545 313.868504 0.0 0.0 -0.627300 0.000000 0.000000 0.451233 POINT (120.86474 17.37870)
2 ABRA BUCAY 0.053107 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140103000 120.719849 17.523537 POINT(120.719846281 17.523537522) ... 120.991997 14.545 330.886511 1.0 1.0 -0.562465 0.000058 0.584283 0.298767 POINT (120.71985 17.52354)
3 ABRA BUCLOC 0.047902 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140104000 120.848061 17.436989 POINT(120.848063524 17.4369884989) ... 120.991997 14.545 320.394546 1.0 0.0 -0.727650 0.000000 0.000000 0.467700 POINT (120.84806 17.43699)
4 ABRA DAGUIOMAN 0.007278 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140105000 120.952278 17.449829 POINT(120.952276395 17.4498298787) ... 120.991997 14.545 321.473186 1.0 0.0 -0.056885 0.000000 0.000000 0.331700 POINT (120.95228 17.44983)
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1610 ZAMBOANGA SIBUGAY ROSELLER LIM 0.197443 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098312000 122.430687 7.702022 POINT(122.430683899 7.7020220193) ... 124.442001 8.592 242.523021 1.0 0.0 -0.112825 0.000000 0.000000 0.489067 POINT (122.43069 7.70202)
1611 ZAMBOANGA SIBUGAY SIAY 0.260228 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098313000 122.876854 7.734716 POINT(122.876851077 7.73471573608) ... 124.442001 8.592 196.818101 1.0 0.0 -0.042893 0.000000 0.000000 0.440700 POINT (122.87685 7.73472)
1612 ZAMBOANGA SIBUGAY TALUSAN 0.178658 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098314000 122.884216 7.334711 POINT(122.884218322 7.3347104573) ... 124.442001 8.592 220.979871 1.0 0.0 -0.158900 0.000000 0.000000 0.606933 POINT (122.88422 7.33471)
1613 ZAMBOANGA SIBUGAY TITAY 0.000000 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098315000 122.517639 7.845210 POINT(122.517641 7.84521002867) ... 124.442001 8.592 227.549967 0.0 0.0 0.017960 0.000000 0.000000 0.414100 POINT (122.51764 7.84521)
1614 ZAMBOANGA SIBUGAY TUNGAWAN 0.000000 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098316000 122.370041 7.568430 POINT(122.370040541 7.56843016062) ... 124.442001 8.592 254.889034 0.0 0.0 -0.121783 0.000048 0.475851 0.559233 POINT (122.37004 7.56843)

1615 rows × 63 columns

In the cells below, I merge my thesis data with the shape file I need to plot the municipalities and the cities. I tried using all 3 versions of "op" ; the other 2 are "contains" and "within". "Intersects" gives me the best merge.

In [11]:
##merging the spacial data with the thesis data

PHmerge = gpd.sjoin(PHvote, PHshape2, how = "right", op = 'intersects')
PHmerge
Out[11]:
index_left province muni PDPpercent region region_code province_code municipality_city_code longitude latitude ... lon_a_5 lat_a_5 d_a_5 PDPran FPDP pi_g fatalities_pc fatalities_p_c pi_a geometry
0 0.0 ABRA BANGUED 0.101855 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140101000 120.600914 17.611204 ... 120.991997 14.545 341.874815 1.0 6.0 -0.386866 0.000125 1.245770 0.126267 POLYGON ((120.61553 17.62360, 120.61870 17.623...
1 1.0 ABRA BOLINEY 0.000000 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140102000 120.864738 17.378698 ... 120.991997 14.545 313.868504 0.0 0.0 -0.627300 0.000000 0.000000 0.451233 POLYGON ((120.90749 17.41933, 120.92464 17.413...
2 2.0 ABRA BUCAY 0.053107 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140103000 120.719849 17.523537 ... 120.991997 14.545 330.886511 1.0 1.0 -0.562465 0.000058 0.584283 0.298767 POLYGON ((120.74548 17.57671, 120.75101 17.575...
3 3.0 ABRA BUCLOC 0.047902 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140104000 120.848061 17.436989 ... 120.991997 14.545 320.394546 1.0 0.0 -0.727650 0.000000 0.000000 0.467700 POLYGON ((120.82477 17.41721, 120.78956 17.416...
4 4.0 ABRA DAGUIOMAN 0.007278 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140105000 120.952278 17.449829 ... 120.991997 14.545 321.473186 1.0 0.0 -0.056885 0.000000 0.000000 0.331700 POLYGON ((120.98141 17.50534, 120.98367 17.504...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1579 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN MULTIPOLYGON (((120.23167 14.85611, 120.23222 ...
1595 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN POLYGON ((123.39980 8.44444, 123.40269 8.44411...
1614 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN POLYGON ((123.54957 8.21537, 123.54903 8.20592...
1617 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN POLYGON ((123.18555 7.84652, 123.18510 7.84630...
1638 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN ... NaN NaN NaN NaN NaN NaN NaN NaN NaN POLYGON ((122.99306 7.82587, 123.00117 7.81298...

1701 rows × 64 columns

I clean the merged data set a little bit to only retain matched observations.

In [12]:
## retaining only the matched municipalities

PHmerge = PHmerge[PHmerge.classification == 'Community']
PHmerge
Out[12]:
index_left province muni PDPpercent region region_code province_code municipality_city_code longitude latitude ... lon_a_5 lat_a_5 d_a_5 PDPran FPDP pi_g fatalities_pc fatalities_p_c pi_a geometry
0 0.0 ABRA BANGUED 0.101855 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140101000 120.600914 17.611204 ... 120.991997 14.545 341.874815 1.0 6.0 -0.386866 0.000125 1.245770 0.126267 POLYGON ((120.61553 17.62360, 120.61870 17.623...
1 1.0 ABRA BOLINEY 0.000000 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140102000 120.864738 17.378698 ... 120.991997 14.545 313.868504 0.0 0.0 -0.627300 0.000000 0.000000 0.451233 POLYGON ((120.90749 17.41933, 120.92464 17.413...
2 2.0 ABRA BUCAY 0.053107 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140103000 120.719849 17.523537 ... 120.991997 14.545 330.886511 1.0 1.0 -0.562465 0.000058 0.584283 0.298767 POLYGON ((120.74548 17.57671, 120.75101 17.575...
3 3.0 ABRA BUCLOC 0.047902 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140104000 120.848061 17.436989 ... 120.991997 14.545 320.394546 1.0 0.0 -0.727650 0.000000 0.000000 0.467700 POLYGON ((120.82477 17.41721, 120.78956 17.416...
4 4.0 ABRA DAGUIOMAN 0.007278 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140105000 120.952278 17.449829 ... 120.991997 14.545 321.473186 1.0 0.0 -0.056885 0.000000 0.000000 0.331700 POLYGON ((120.98141 17.50534, 120.98367 17.504...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1647 1610.0 ZAMBOANGA SIBUGAY ROSELLER LIM 0.197443 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098312000 122.430687 7.702022 ... 124.442001 8.592 242.523021 1.0 0.0 -0.112825 0.000000 0.000000 0.489067 MULTIPOLYGON (((122.54519 7.68240, 122.54472 7...
1648 1611.0 ZAMBOANGA SIBUGAY SIAY 0.260228 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098313000 122.876854 7.734716 ... 124.442001 8.592 196.818101 1.0 0.0 -0.042893 0.000000 0.000000 0.440700 MULTIPOLYGON (((122.80055 7.73917, 122.80083 7...
1649 1612.0 ZAMBOANGA SIBUGAY TALUSAN 0.178658 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098314000 122.884216 7.334711 ... 124.442001 8.592 220.979871 1.0 0.0 -0.158900 0.000000 0.000000 0.606933 MULTIPOLYGON (((122.91917 7.34111, 122.91861 7...
1650 1613.0 ZAMBOANGA SIBUGAY TITAY 0.000000 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098315000 122.517639 7.845210 ... 124.442001 8.592 227.549967 0.0 0.0 0.017960 0.000000 0.000000 0.414100 POLYGON ((122.68086 7.88831, 122.68091 7.88682...
1651 1614.0 ZAMBOANGA SIBUGAY TUNGAWAN 0.000000 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098316000 122.370041 7.568430 ... 124.442001 8.592 254.889034 0.0 0.0 -0.121783 0.000048 0.475851 0.559233 MULTIPOLYGON (((122.41989 7.49679, 122.42042 7...

1613 rows × 64 columns

Before I can make the map, I need to convert my merged data set into a geopanda data frame.

In [13]:
PHmerge = gpd.GeoDataFrame(PHmerge, geometry="geometry")
PHmerge
Out[13]:
index_left province muni PDPpercent region region_code province_code municipality_city_code longitude latitude ... lon_a_5 lat_a_5 d_a_5 PDPran FPDP pi_g fatalities_pc fatalities_p_c pi_a geometry
0 0.0 ABRA BANGUED 0.101855 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140101000 120.600914 17.611204 ... 120.991997 14.545 341.874815 1.0 6.0 -0.386866 0.000125 1.245770 0.126267 POLYGON ((120.61553 17.62360, 120.61870 17.623...
1 1.0 ABRA BOLINEY 0.000000 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140102000 120.864738 17.378698 ... 120.991997 14.545 313.868504 0.0 0.0 -0.627300 0.000000 0.000000 0.451233 POLYGON ((120.90749 17.41933, 120.92464 17.413...
2 2.0 ABRA BUCAY 0.053107 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140103000 120.719849 17.523537 ... 120.991997 14.545 330.886511 1.0 1.0 -0.562465 0.000058 0.584283 0.298767 POLYGON ((120.74548 17.57671, 120.75101 17.575...
3 3.0 ABRA BUCLOC 0.047902 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140104000 120.848061 17.436989 ... 120.991997 14.545 320.394546 1.0 0.0 -0.727650 0.000000 0.000000 0.467700 POLYGON ((120.82477 17.41721, 120.78956 17.416...
4 4.0 ABRA DAGUIOMAN 0.007278 Cordillera Administrative Region (CAR) PH140000000 PH140100000 PH140105000 120.952278 17.449829 ... 120.991997 14.545 321.473186 1.0 0.0 -0.056885 0.000000 0.000000 0.331700 POLYGON ((120.98141 17.50534, 120.98367 17.504...
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1647 1610.0 ZAMBOANGA SIBUGAY ROSELLER LIM 0.197443 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098312000 122.430687 7.702022 ... 124.442001 8.592 242.523021 1.0 0.0 -0.112825 0.000000 0.000000 0.489067 MULTIPOLYGON (((122.54519 7.68240, 122.54472 7...
1648 1611.0 ZAMBOANGA SIBUGAY SIAY 0.260228 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098313000 122.876854 7.734716 ... 124.442001 8.592 196.818101 1.0 0.0 -0.042893 0.000000 0.000000 0.440700 MULTIPOLYGON (((122.80055 7.73917, 122.80083 7...
1649 1612.0 ZAMBOANGA SIBUGAY TALUSAN 0.178658 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098314000 122.884216 7.334711 ... 124.442001 8.592 220.979871 1.0 0.0 -0.158900 0.000000 0.000000 0.606933 MULTIPOLYGON (((122.91917 7.34111, 122.91861 7...
1650 1613.0 ZAMBOANGA SIBUGAY TITAY 0.000000 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098315000 122.517639 7.845210 ... 124.442001 8.592 227.549967 0.0 0.0 0.017960 0.000000 0.000000 0.414100 POLYGON ((122.68086 7.88831, 122.68091 7.88682...
1651 1614.0 ZAMBOANGA SIBUGAY TUNGAWAN 0.000000 Region IX (Zamboanga Peninsula) PH090000000 PH098300000 PH098316000 122.370041 7.568430 ... 124.442001 8.592 254.889034 0.0 0.0 -0.121783 0.000048 0.475851 0.559233 MULTIPOLYGON (((122.41989 7.49679, 122.42042 7...

1613 rows × 64 columns

Finally, I graph the vote share in the Philippines by municipality in the 2019 Mayoral elections.

In [14]:
##map for PH

fig, gax = plt.subplots(figsize = (10,8))
#plot the Philippines!
world.query("name=='Philippines'").plot(ax=gax, edgecolor='black',color='white')


#plot the muicipalities
PHmerge.plot(ax=gax, column="PDPpercent", legend=True,cmap='RdBu_r')
Out[14]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff67c586910>

The above results show that most municipalities don't have majority support for Duterte's party's mayoral candidates. The important thing though is that there is lots of variation in the municipality's vote shares for Duterte, which is a good thing.

I also try to graph the fatalities for the Philippines. I should note that most municipalities that observed at least one fatality have a relatively small count of fatalities(the mean of the fatalities variable in my data set is around 3). To scale the fatalities better on the map, I decide to use the log of fatalities instead of the count. Since the range in the count of fatalities is quite large(it goes from 0 to 428), some municipalities look as though they had no fatalities because of the way cmap scales the colouring. In my thesis, I include specifications where I use the count of fataltiies, and others where I include the log of fatalities.

Note: I chose to use a heat map for this visualisation since this is the same type of colour scheme Blanco(2012) uses for their homecide rate visualisations in their appendix.

In [60]:
PHmerge["log_fatalities"]=np.log(PHmerge["fatalities"]) 
/opt/conda/lib/python3.7/site-packages/pandas/core/series.py:679: RuntimeWarning:

divide by zero encountered in log

In [61]:
#cleaning the log fatalities variable
PHmerge["log_fatalities"] = np.nan_to_num(PHmerge["log_fatalities"], copy=True, nan=0.0, posinf=None, neginf=0)
PHmerge["log_fatalities"]
Out[61]:
0       1.791759
1       0.000000
2       0.000000
3       0.000000
4       0.000000
          ...   
1647    0.000000
1648    0.000000
1649    0.000000
1650    0.000000
1651    0.693147
Name: log_fatalities, Length: 1613, dtype: float64
In [62]:
fig, gax = plt.subplots(figsize = (10,8))
#plot the Philippines!
world.query("name=='Philippines'").plot(ax=gax, edgecolor='black',color='white')


#plot the muicipalities
PHmerge.plot(ax=gax, column="log_fatalities", legend=True,cmap='hot')
 
Out[62]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff608181550>

This map is reassuring since there is a lot of geographic variation in the fatalities. If the fatalities were concerntrated in one specific region that would mean that I might have to limit my analysis to that region, which would severly limit my observations. Since the fatalities are relatively spread out throughout the country, I am able to make use of all the municipalities in my data set.

For reference, here is the map where I use the count of fatalities instead:

In [63]:
fig, gax = plt.subplots(figsize = (10,8))
#plot the Philippines!
world.query("name=='Philippines'").plot(ax=gax, edgecolor='black',color='white')


#plot the muicipalities
PHmerge.plot(ax=gax, column="fatalities", legend=True,cmap='hot')
Out[63]:
<matplotlib.axes._subplots.AxesSubplot at 0x7ff6083d7490>

This map is less assuring, but again, the mean of the fatalities variable is right around 3, which means most municipalities won't show up in this colour scaling. Only the outliers show up in this colour scaling, and even then the outliers are still spread throughout the country, which makes me confident that there isn't one regional fixed effect driving the fatalities.

Next, I make an interactive graph!

In [19]:
##interactivity:
from bokeh.io import output_notebook
from bokeh.plotting import figure, ColumnDataSource
from bokeh.io import output_notebook, show, output_file
from bokeh.plotting import figure
from bokeh.models import GeoJSONDataSource, LinearColorMapper, ColorBar, HoverTool
from bokeh.palettes import brewer
output_notebook()
import json
#Convert data to geojson for bokeh
PH_geojson=GeoJSONDataSource(geojson=PHmerge.to_json())


color_mapper = LinearColorMapper(palette = brewer['RdBu'][10], low = 0, high = 1)
color_bar = ColorBar(color_mapper=color_mapper, label_standoff=8,width = 500, height = 20,
                     border_line_color=None,location = (0,0), orientation = 'horizontal')
hover = HoverTool(tooltips = [ ('Municipality','@muni'),('PDP vote share', '@PDPpercent'),
                               ('Fatalities','@fatalities')])
p = figure(title="Philippines vote share in the 2019 Mayoral elections", tools=[hover])
p.patches("xs","ys",source=PH_geojson,
          fill_color = {'field' :'PDPpercent', 'transform' : color_mapper})
p.add_layout(color_bar, 'below')
show(p)
Loading BokehJS ...