Voting Power, Turnout, and Persuasion in Presidential Elections¶

One interesting implication of the electoral college is that some individuals have more voting power than others. This, combined with this the fact that some states are more likely to swing an election means the importance of a vote varies considerably by location. Andrew Gelman has done some work focusing on calculating the probability a voter will swing an election. Below, I ask a different but related question: "Given an election turned out the way it did, how powerful were individual voters in each state?" This alternative metric, called the Voting Power Index, is discussed more here. Rather than rely on predicted probabilities of electoral outcomes, it simply divides the state's electoral votes by the realized vote margin:

$$state\_vpi = \frac{state\_electoral\_votes}{\lvert dem\_voters - rep\_voters\rvert}$$

I decided to take this metric a step further and calculate a county VPI, which is the fraction of the state's voting power that resides with the voting age population of each county:

$$county\_vpi = state\_vpi*\frac{county\_vap}{state\_vap}$$

These numbers can then provide some insight on the ongoing persuasion vs. turnout debate because the county voting power can be further disaggregated by voting status:

$$persuasion\_vpi = county\_vpi*\frac{voting\_vap}{county\_vap}$$$$turnout\_vpi = county\_vpi*\frac{nonvoting\_vap}{county\_vap}$$

This turnout_vpi value could hopefully act as an adjustment on the numbers from The Missing Obama Millions article, which don't take the electoral college into account. To be honest, I'm mainly using this metric because it's straightforward to calculate and easy to understand. I haven't fully considered all the implications, but it seems to produce results that are in-line with other analyses [5]. I'm definietly open to critique and could use input from a political scientist to "kick the tires" of this metric.

In [53]:

%matplotlib inline

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib
import numpy as np

import folium
import branca.colormap as cm

import seaborn as sns

import altair as alt
from vega_datasets import data

import statsmodels.formula.api as smf
import statsmodels.api as sm

matplotlib.style.use('ggplot') 

In [2]:

#For the notebook only (not for JupyterLab) run this command once per session
alt.renderers.enable('notebook')

Out[2]:

RendererRegistry.enable('notebook')

In [3]:

county_df = pd.read_csv('./inputs/US_County_Level_Presidential_Results_04-16.csv', dtype={'fips_code': object})

county_df = county_df[county_df['county_name'] != 'Alaska']

In [4]:

state_df = county_df.groupby(['state','year']).agg({'vap':'sum', 'dem_num':'sum', 'rep_num':'sum', 'county_num': 'sum'})
state_df = state_df.reset_index()
state_df.rename(columns={'vap':'state_vap', 'dem_num': 'state_dem_num',
    'rep_num': 'state_rep_num', 'county_num': 'state_num'}, inplace=True)
state_df['state_dem_margin'] = (state_df['state_dem_num'] - state_df['state_rep_num']) / state_df['state_num']

#Calculate the standard deviation of the margin
state_deviation_df = state_df.groupby(['state']).agg({'state_dem_margin': 'std'})
state_deviation_df = state_deviation_df.reset_index()
state_deviation_df.rename(columns={'state_dem_margin': 'state_dem_margin_std'}, inplace=True)

state_df = pd.merge(state_df, state_deviation_df, on=['state'])

In [5]:

county_df = pd.merge(county_df, state_df, on=['state','year'])

Initial County Dataframe¶

In [6]:

county_df['county_vpi'] = (county_df['vap']/county_df['state_vap'])*(county_df['state_electoral_votes']/
                    abs(county_df['state_dem_num'] - county_df['state_rep_num']))

#Rescale column so median value is 1
county_df['county_vpi'] = county_df['county_vpi']*(1/county_df['county_vpi'].median())

#The fraction of county VPI that is tapped by increasing turnout
county_df['turnout_vpi'] = ((county_df['vap'] - county_df['county_num'])/county_df['vap'])*county_df['county_vpi']

#The fraction of county VPI tapped by persuading voters to switch sides
county_df['persuasion_vpi'] = (county_df['county_num']/county_df['vap'])*county_df['county_vpi']

#Adjusting turnout for party margin
county_df['partisanturnout_vpi'] = county_df['dem_margin']*county_df['turnout_vpi']

#The advantage of turnout over persuasion in a county:
county_df['turnout_advantage'] = county_df['turnout_vpi'] - county_df['persuasion_vpi'] 

county_df = county_df.sort_values(by='county_vpi', ascending=False)

cols = ['county_name','state', 'year', 'dem_margin', 'turnout', 'turnout_vpi', 'persuasion_vpi', 'county_vpi']

county_df[cols].head(30)

#Maybe include whole table in sortable iframe here:

Out[6]:

	county_name	state	year	dem_margin	turnout	turnout_vpi	persuasion_vpi	county_vpi
9743	Bernalillo County	NM	2004	0.0420	0.5911	1157.042729	1672.443141	2829.485870
9332	Saint Louis County	MO	2008	0.1990	0.7670	557.343621	1834.764478	2392.108099
4847	Hillsborough County	NH	2016	-0.0020	0.6545	763.202291	1446.068221	2209.270512
4849	Rockingham County	NH	2016	-0.0583	0.7383	435.852891	1229.390599	1665.243490
9284	Jackson County	MO	2008	0.2539	0.6831	515.814695	1111.849170	1627.663865
4390	Wayne County	MI	2016	0.3734	0.5837	539.384492	756.163891	1295.548383
4371	Oakland County	MI	2016	0.0811	0.6801	303.963423	646.094828	950.058250
9351	Saint Louis City	MO	2008	0.6815	0.6079	335.774084	520.612544	856.386628
4848	Merrimack County	NH	2016	0.0308	0.6848	259.213077	563.095587	822.308664
9328	Saint Charles County	MO	2008	-0.0974	0.7592	195.964621	617.850926	813.815546
9648	Clark County	NV	2004	0.0485	0.4836	419.452537	392.824048	812.276584
4826	Clark County	NV	2016	0.1096	0.4527	443.659050	366.924796	810.583846
9750	Dona Ana County	NM	2004	0.0357	0.5151	379.832819	403.505212	783.338031
12280	Milwaukee County	WI	2004	0.2434	0.7217	197.975668	513.273876	711.249544
4850	Strafford County	NH	2016	0.0856	0.6590	241.359028	466.455912	707.814940
9275	Greene County	MO	2008	-0.1585	0.6661	222.883702	444.702308	667.586010
4358	Macomb County	MI	2016	-0.1153	0.6148	255.384842	407.628058	663.012901
9768	Santa Fe County	NM	2004	0.4321	0.6560	226.076234	431.117576	657.193810
9686	Hillsborough County	NH	2004	-0.0287	0.6764	205.819444	430.221261	636.040705
9766	San Juan County	NM	2004	-0.3262	0.5327	257.068508	293.094828	550.163336
9765	Sandoval County	NM	2004	-0.0271	0.5356	251.487424	290.047049	541.534472
9286	Jefferson County	MO	2008	0.0252	0.6630	176.088284	346.415216	522.503500
9260	Clay County	MO	2008	-0.0069	0.6937	158.581377	359.202927	517.784304
4846	Grafton County	NH	2016	0.1896	0.6738	166.484551	343.872287	510.356838
9688	Rockingham County	NH	2004	-0.0418	0.7205	135.650672	349.624258	485.274930
10090	Mecklenburg County	NC	2008	0.2437	0.7202	131.616304	338.832514	470.448818
4349	Kent County	MI	2016	-0.0308	0.6372	170.557276	299.596590	470.153866
7811	Polk County	IA	2004	0.0463	0.6887	145.247415	321.383619	466.631034
10122	Wake County	NC	2008	0.1445	0.7990	91.760593	364.753614	456.514206
4844	Cheshire County	NH	2016	0.1262	0.6652	142.008996	282.158481	424.167478

In [7]:

# out_cols = list(cols)
# out_cols.append('partisanturnout_vpi')
# county_df[out_cols].to_csv('./outputs/countyvpi_2004-2016.csv', index=False)

Visualizing the County Data¶

In [8]:

fig, ax = plt.subplots(figsize=(9,8))
ax.set_yscale('log')
sns.boxplot(data=county_df, x='year', y='county_vpi')  #, whis=np.inf

Out[8]:

<matplotlib.axes._subplots.AxesSubplot at 0x1c1a5e09e8>

In [9]:

plt_df = county_df[['year', 'county_vpi']].copy()
plt_df['county_vpi'] = np.log(plt_df['county_vpi'])

#col="time",  col="year", col_wrap=2, size=4
g = sns.FacetGrid(plt_df,  col="year", col_wrap=2,  size=4)  
g = g.map(plt.hist, "county_vpi", bins=30, color='#4682b4')
g.set_axis_labels("log(county_vpi)", "Count")

Out[9]:

<seaborn.axisgrid.FacetGrid at 0x1c1b52a390>

In [10]:

map_df = county_df[['fips_code', 'county_vpi', 'year']].copy()

#Change scale so easier to see on chloropleth:
# Also available in altair, but less customizable, see here: 
# https://vega.github.io/vega-lite/docs/scale.html
map_df['county_vpi'] = np.power(map_df['county_vpi'], 0.3)

#Arrange in columns
map_df = map_df.pivot(index='fips_code', columns='year', values='county_vpi')
map_df.reset_index(inplace=True)
#Convert to ints, because the topojson strips the leading zeros
map_df['fips_code'] = map_df['fips_code'].astype(int)

In [11]:

counties = alt.topo_feature(data.us_10m.url, 'counties')

years = ['2004', '2008', '2012', '2016']

chart = alt.Chart(
    data=counties,
).mark_geoshape(
    stroke='white',
    strokeWidth=0.25
).encode(
    alt.Color(
        alt.repeat('row'), 
        type='quantitative',
        legend=alt.Legend(title="Voter Power Index, 2004-2016"),
        scale= {"scheme": "oranges"}
    )
).transform_lookup(
    lookup='id',
    from_=alt.LookupData(map_df, 'fips_code', years),
).project(
    type='albersUsa'
).properties(
    width=600,
    height=400
).repeat(
    row=years
)

chart.save('./outputs/vpi_2004-2016.html')

chart

Out[11]:

A Closer Look at 2016¶

Below, I select out just the 2016 values for a look at those. One important point to make is that the VPI only shows that, given the way the election turned out, these counties were important. It's entirely possible that a different set of counties would show up if campaigns focused on their resources elsewhere, depending on how powerful you think campaigns are. So this 2016 data is more useful for providing a picture of what happened, rather than saying what should have been done instead. The averages I look at later can provide more of a general picture of places that tend to be important.

In [12]:

county2016_df = county_df[county_df['year'] == 2016]

# 'state_dem_margin',
cols_2016 = ['county_name','state', 'year', 'dem_margin',
             'turnout', 'turnout_vpi', 'persuasion_vpi', 'county_vpi',]

county2016_df[cols_2016].sort_values(by='county_vpi', ascending=False).head(30)

Out[12]:

	county_name	state	year	dem_margin	turnout	turnout_vpi	persuasion_vpi	county_vpi
4847	Hillsborough County	NH	2016	-0.0020	0.6545	763.202291	1446.068221	2209.270512
4849	Rockingham County	NH	2016	-0.0583	0.7383	435.852891	1229.390599	1665.243490
4390	Wayne County	MI	2016	0.3734	0.5837	539.384492	756.163891	1295.548383
4371	Oakland County	MI	2016	0.0811	0.6801	303.963423	646.094828	950.058250
4848	Merrimack County	NH	2016	0.0308	0.6848	259.213077	563.095587	822.308664
4826	Clark County	NV	2016	0.1096	0.4527	443.659050	366.924796	810.583846
4850	Strafford County	NH	2016	0.0856	0.6590	241.359028	466.455912	707.814940
4358	Macomb County	MI	2016	-0.1153	0.6148	255.384842	407.628058	663.012901
4846	Grafton County	NH	2016	0.1896	0.6738	166.484551	343.872287	510.356838
4349	Kent County	MI	2016	-0.0308	0.6372	170.557276	299.596590	470.153866
4844	Cheshire County	NH	2016	0.1262	0.6652	142.008996	282.158481	424.167478
3183	Maricopa County	AZ	2016	-0.0289	0.4787	191.532380	175.913839	367.446219
6165	Milwaukee County	WI	2016	0.3701	0.6069	144.012269	222.310555	366.322824
4842	Belknap County	NH	2016	-0.1678	0.7013	100.996831	237.125381	338.122212
4333	Genesee County	MI	2016	0.0946	0.6238	115.104725	190.826300	305.931025
4389	Washtenaw County	MI	2016	0.4128	0.6460	100.443951	183.323358	283.767309
5372	Philadelphia County	PA	2016	0.6698	0.5792	115.593496	159.124457	274.717953
4843	Carroll County	NH	2016	-0.0566	0.7359	71.663218	199.688143	271.351361
4418	Hennepin County	MN	2016	0.3493	0.7074	74.660455	180.535170	255.195626
4851	Sullivan County	NH	2016	-0.0265	0.6410	85.057186	151.841761	236.898946
4257	Cumberland County	ME	2016	0.2635	0.7273	61.043883	162.818433	223.862316
4341	Ingham County	MI	2016	0.2687	0.5697	96.279322	127.483898	223.763221
5323	Allegheny County	PA	2016	0.1645	0.6601	75.873263	147.367799	223.241062
6137	Dane County	WI	2016	0.4717	0.7387	55.366688	156.548600	211.915288
4378	Ottawa County	MI	2016	-0.3047	0.6671	69.241319	138.756781	207.998100
4347	Kalamazoo County	MI	2016	0.1276	0.6177	75.981134	122.779735	198.760869
4845	Coos County	NH	2016	-0.0909	0.5992	70.160513	104.873671	175.034185
3396	New Castle County	DE	2016	0.2960	0.5970	70.338304	104.203257	174.541561
4839	Washoe County	NV	2016	0.0129	0.5729	74.024668	99.279717	173.304385
3441	Miami-Dade County	FL	2016	0.2960	0.4532	92.740626	76.856905	169.597531

In [13]:

# Create Interactive Map
map_1 = folium.Map(location=[39, -96], zoom_start=4)

vpi_series = county2016_df.set_index('fips_code')['county_vpi']

colorscale = cm.LinearColormap(
    ['white', 'orange', 'red'],
    index=[0, 0.05, 1.0],
    caption = 'Voter Power Index'
).scale(0, vpi_series.max())

def style_function(feature):
    #Add leading zeros, because they're not included
    feature['id'] = feature['id'].zfill(5)
    vpi = vpi_series.get(feature['id'], None)
    return {
        'fillOpacity': 0.8,
        'weight': 0.05,
        'fillColor': "#ffffff" if vpi is None else colorscale(vpi)
    }

folium.GeoJson(
    open('./inputs/us-counties.json').read(),
    style_function=style_function,
    name='2016'
).add_to(map_1)

map_1.save('./outputs/vpi_folium-2016.html')

map_1

Out[13]:

In [14]:

#Important counties for Democrats, in states they lost:
county2016_df[(county2016_df['state_dem_margin'] < 0)] \
    .sort_values(by='county_vpi', ascending=False)[cols_2016].head(20)
#Important counties for Democrats, in states with margins within 1 standard deviation
# county2016_df[(county2016_df['state_dem_margin']  < 0)] \
#     .sort_values(by='county_vpi', ascending=False).head(50)

Out[14]:

	county_name	state	year	dem_margin	turnout	turnout_vpi	persuasion_vpi	county_vpi
4390	Wayne County	MI	2016	0.3734	0.5837	539.384492	756.163891	1295.548383
4371	Oakland County	MI	2016	0.0811	0.6801	303.963423	646.094828	950.058250
4358	Macomb County	MI	2016	-0.1153	0.6148	255.384842	407.628058	663.012901
4349	Kent County	MI	2016	-0.0308	0.6372	170.557276	299.596590	470.153866
3183	Maricopa County	AZ	2016	-0.0289	0.4787	191.532380	175.913839	367.446219
6165	Milwaukee County	WI	2016	0.3701	0.6069	144.012269	222.310555	366.322824
4333	Genesee County	MI	2016	0.0946	0.6238	115.104725	190.826300	305.931025
4389	Washtenaw County	MI	2016	0.4128	0.6460	100.443951	183.323358	283.767309
5372	Philadelphia County	PA	2016	0.6698	0.5792	115.593496	159.124457	274.717953
4341	Ingham County	MI	2016	0.2687	0.5697	96.279322	127.483898	223.763221
5323	Allegheny County	PA	2016	0.1645	0.6601	75.873263	147.367799	223.241062
6137	Dane County	WI	2016	0.4717	0.7387	55.366688	156.548600	211.915288
4378	Ottawa County	MI	2016	-0.3047	0.6671	69.241319	138.756781	207.998100
4347	Kalamazoo County	MI	2016	0.1276	0.6177	75.981134	122.779735	198.760869
3441	Miami-Dade County	FL	2016	0.2960	0.4532	92.740626	76.856905	169.597531
6192	Waukesha County	WI	2016	-0.2649	0.7690	36.264767	120.708511	156.973278
4381	Saginaw County	MI	2016	-0.0114	0.6276	54.409449	91.691815	146.101264
5367	Montgomery County	PA	2016	0.2128	0.6807	46.131019	98.363149	144.494168
4355	Livingston County	MI	2016	-0.2956	0.7180	40.418475	102.916091	143.334567
4369	Muskegon County	MI	2016	0.0151	0.5879	53.208862	75.912113	129.120974

In [15]:

# Where was more turnout especially important for democrats?  
# This multiplies the democratic margin for the county times 
# the fraction of county's voting power that resided with non-voters
# in states that democrats lost:
temp_cols = list(cols_2016)
temp_cols.append('partisanturnout_vpi')

county2016_df[(county2016_df['state_dem_margin'] < 0)] \
    .sort_values(by='partisanturnout_vpi', ascending=False) \
    [temp_cols].head(20)

Out[15]:

	county_name	state	year	dem_margin	turnout	turnout_vpi	persuasion_vpi	county_vpi	partisanturnout_vpi
4390	Wayne County	MI	2016	0.3734	0.5837	539.384492	756.163891	1295.548383	201.406169
5372	Philadelphia County	PA	2016	0.6698	0.5792	115.593496	159.124457	274.717953	77.424524
6165	Milwaukee County	WI	2016	0.3701	0.6069	144.012269	222.310555	366.322824	53.298941
4389	Washtenaw County	MI	2016	0.4128	0.6460	100.443951	183.323358	283.767309	41.463263
3441	Miami-Dade County	FL	2016	0.2960	0.4532	92.740626	76.856905	169.597531	27.451225
6137	Dane County	WI	2016	0.4717	0.7387	55.366688	156.548600	211.915288	26.116467
4341	Ingham County	MI	2016	0.2687	0.5697	96.279322	127.483898	223.763221	25.870254
4371	Oakland County	MI	2016	0.0811	0.6801	303.963423	646.094828	950.058250	24.651434
3404	Broward County	FL	2016	0.3514	0.5507	53.225032	65.232522	118.457554	18.703276
5323	Allegheny County	PA	2016	0.1645	0.6601	75.873263	147.367799	223.241062	12.481152
4333	Genesee County	MI	2016	0.0946	0.6238	115.104725	190.826300	305.931025	10.888907
5367	Montgomery County	PA	2016	0.2128	0.6807	46.131019	98.363149	144.494168	9.816681
4347	Kalamazoo County	MI	2016	0.1276	0.6177	75.981134	122.779735	198.760869	9.695193
3446	Orange County	FL	2016	0.2465	0.5331	37.509184	42.832926	80.342110	9.246014
3509	DeKalb County	GA	2016	0.6397	0.5486	12.546856	15.246383	27.793239	8.026224
3525	Fulton County	GA	2016	0.4163	0.5436	17.638595	21.007978	38.646573	7.342947
5344	Delaware County	PA	2016	0.2229	0.6805	31.517100	67.122985	98.640085	7.025162
3448	Palm Beach County	FL	2016	0.1544	0.5674	39.589691	51.932850	91.522541	6.112648
3186	Pima County	AZ	2016	0.1383	0.5184	44.010083	47.380936	91.391018	6.086594
5027	Mecklenburg County	NC	2016	0.2941	0.5899	18.240876	26.238808	44.479683	5.364642

County Averages, 2004-2016¶

The above 2016 analysis is interesting but if we want values that are more generalizable to future elections it makes sense to look at averages. If you average over too few elections you risk overfitting to a specific point in time, but averaging over too many years will make the results irrelevant to the present. I was only able to compile data from 2004 forward anyways, so these are the years I went with.

In [16]:

# Which counties, on average across all elections, have highest VPIs?   
# fips codes changed?:'county_name', 'state' , 'fips_code'
# county names aren't consistent: 'county_name', 
avg_df = county_df.groupby(['state', 'fips_code']) \
    .agg({'county_vpi':'mean',
        'vap':'mean',
        'turnout_vpi': 'mean',
        'dem_margin': 'mean',
        'turnout': 'mean',
        'county_vpi': 'mean',
        'persuasion_vpi':'mean', 
        'partisanturnout_vpi': 'mean',
        'turnout_advantage': 'mean',
        'state_dem_margin': 'mean',
        'state_dem_margin_std': 'mean'})
    
avg_df = avg_df.reset_index()
#avg_df['fips_code'].count() #3114
countynames_df = county2016_df[['county_name', 'fips_code']]
avg_df = avg_df.merge(countynames_df, how='left', on='fips_code')

#avg_df['fips_code'].count() #3114
avg_cols = ['county_name','state', 'dem_margin',
    'turnout', 'turnout_vpi', 'persuasion_vpi', 'county_vpi']

avg_df[avg_cols].sort_values(by='county_vpi', ascending=False).head(30)

Out[16]:

	county_name	state	dem_margin	turnout	turnout_vpi	persuasion_vpi	county_vpi
1901	Bernalillo County	NM	0.149150	0.574300	322.191780	458.901725	781.093505
1875	Hillsborough County	NH	0.004500	0.673875	261.531237	509.284778	770.816015
1466	St. Louis County County	MO	0.147975	0.713775	146.267445	475.652972	621.920417
1877	Rockingham County	NH	-0.033725	0.736700	154.517507	427.941955	582.459462
1935	Clark County	NV	0.123775	0.491850	274.776186	247.379861	522.156047
1418	Jackson County	MO	0.198300	0.625425	135.126185	287.882299	423.008484
1282	Wayne County	MI	0.432750	0.613300	148.977818	212.273072	361.250891
1876	Merrimack County	NH	0.086950	0.690625	91.522184	198.694873	290.217058
3004	Milwaukee County	WI	0.333100	0.681100	89.948122	193.970429	283.918551
1263	Oakland County	MI	0.077700	0.722425	82.209028	179.036279	261.245307
1878	Strafford County	NH	0.138775	0.653925	85.666172	161.759279	247.425451
1485	St. Louis City County	MO	0.648300	0.562800	87.590113	134.671075	222.261188
1908	Dona Ana County	NM	0.134450	0.487375	106.915843	111.046813	217.962656
1462	St. Charles County	MO	-0.186850	0.691450	51.804907	160.086089	211.890996
149	Maricopa County	AZ	-0.096975	0.504300	102.560581	101.472468	204.033050
1250	Macomb County	MI	-0.001075	0.646625	69.591618	112.881508	182.473126
1927	Santa Fe County	NM	0.500950	0.638625	62.969974	118.248456	181.218430
1874	Grafton County	NH	0.207425	0.695250	56.945682	121.479588	178.425270
289	New Castle County	DE	0.315725	0.629775	64.782894	109.726453	174.509347
1409	Greene County	MO	-0.230925	0.603550	58.499603	115.185390	173.684993
518	Honolulu County	HI	0.282000	0.424475	87.682310	69.460802	157.143112
2976	Dane County	WI	0.426900	0.766800	37.693819	118.256630	155.950449
1683	Mecklenburg County	NC	0.199550	0.623500	46.769953	105.776832	152.546785
1872	Cheshire County	NH	0.210625	0.679900	49.880695	101.502992	151.383687
1925	San Juan County	NM	-0.286475	0.532225	70.316299	80.104796	150.421096
1924	Sandoval County	NM	0.045575	0.587850	68.888435	81.126173	150.014608
1715	Wake County	NC	0.109925	0.694975	34.137508	114.221262	148.358770
597	Polk County	IA	0.111850	0.689775	44.707537	99.109343	143.816880
1420	Jefferson County	MO	-0.115050	0.613525	46.144611	89.681424	135.826035
1394	Clay County	MO	-0.067175	0.635425	41.664370	93.034411	134.698781

In [17]:

# Create map
map_2 = folium.Map(location=[39, -96], zoom_start=4)

vpi_series = avg_df.set_index('fips_code')['county_vpi']

colorscale = cm.LinearColormap(
    ['white', 'orange', 'red'], 
    index=[0, 0.1, 1.0]
).scale(0, vpi_series.max())

def style_function(feature):
    #Add leading zeros, because they're not included for all GeoJson
    feature['id'] = feature['id'].zfill(5)
    vpi = vpi_series.get(feature['id'], None)
    return {
        'fillOpacity': 0.8,
        'weight': 0.05,
        'fillColor': "#ffffff" if vpi is None else colorscale(vpi)
    }

county_geo = './inputs/us-counties.json'

folium.GeoJson(
    open(county_geo).read(), style_function=style_function
).add_to(map_2)


folium.LayerControl().add_to(map_2)
# Save as index.html
map_2.save('./outputs/vpi_folium-2004-2016.html')

map_2 

Out[17]:

In [18]:

fig, ax = plt.subplots(figsize=(8,7))
plt.title('')
ax.set_yscale('log')
#ax.set_xscale('log')
ax.set_xlabel('County VPI')
ax.set_ylabel('Count (log)')
avg_df.hist(column='county_vpi', bins=30, ax=ax, color='#4682b4')  #bins=10, , bottom=0.1

Out[18]:

array([<matplotlib.axes._subplots.AxesSubplot object at 0x106c12dd8>],
      dtype=object)

In [19]:

#So let's repeat the same analysis as above using these average values now. 
# Counties that are cruicial for Democrats in states that they lose, or are often close: #0.05
# avg_df[avg_df['state_dem_margin'] < 0.05] \
#     .sort_values(by='county_vpi', ascending=False) \
#     [avg_cols].head(30)
# Note that at a cutoff of 0.05, WI and MI are left out, which might provide some insight
# into why they weren't the focus of the 2016 campaign, although their average margin
# is still within a standard deviation of 0:  

#Using stdev as cutoff:
avg_df[(avg_df['state_dem_margin'] - avg_df['state_dem_margin_std'])  < 0] \
    .sort_values(by='county_vpi', ascending=False) \
    [avg_cols].head(30)

Out[19]:

	county_name	state	dem_margin	turnout	turnout_vpi	persuasion_vpi	county_vpi
1875	Hillsborough County	NH	0.004500	0.673875	261.531237	509.284778	770.816015
1466	St. Louis County County	MO	0.147975	0.713775	146.267445	475.652972	621.920417
1877	Rockingham County	NH	-0.033725	0.736700	154.517507	427.941955	582.459462
1935	Clark County	NV	0.123775	0.491850	274.776186	247.379861	522.156047
1418	Jackson County	MO	0.198300	0.625425	135.126185	287.882299	423.008484
1282	Wayne County	MI	0.432750	0.613300	148.977818	212.273072	361.250891
1876	Merrimack County	NH	0.086950	0.690625	91.522184	198.694873	290.217058
3004	Milwaukee County	WI	0.333100	0.681100	89.948122	193.970429	283.918551
1263	Oakland County	MI	0.077700	0.722425	82.209028	179.036279	261.245307
1878	Strafford County	NH	0.138775	0.653925	85.666172	161.759279	247.425451
1485	St. Louis City County	MO	0.648300	0.562800	87.590113	134.671075	222.261188
1462	St. Charles County	MO	-0.186850	0.691450	51.804907	160.086089	211.890996
149	Maricopa County	AZ	-0.096975	0.504300	102.560581	101.472468	204.033050
1250	Macomb County	MI	-0.001075	0.646625	69.591618	112.881508	182.473126
1874	Grafton County	NH	0.207425	0.695250	56.945682	121.479588	178.425270
1409	Greene County	MO	-0.230925	0.603550	58.499603	115.185390	173.684993
2976	Dane County	WI	0.426900	0.766800	37.693819	118.256630	155.950449
1683	Mecklenburg County	NC	0.199550	0.623500	46.769953	105.776832	152.546785
1872	Cheshire County	NH	0.210625	0.679900	49.880695	101.502992	151.383687
1715	Wake County	NC	0.109925	0.694975	34.137508	114.221262	148.358770
597	Polk County	IA	0.111850	0.689775	44.707537	99.109343	143.816880
1420	Jefferson County	MO	-0.115050	0.613525	46.144611	89.681424	135.826035
1394	Clay County	MO	-0.067175	0.635425	41.664370	93.034411	134.698781
333	Miami-Dade County	FL	0.189300	0.530975	70.069633	62.784885	132.854518
1241	Kent County	MI	-0.072175	0.676250	46.230380	82.839295	129.069675
3031	Waukesha County	WI	-0.304775	0.806300	23.639061	96.444778	120.083840
1870	Belknap County	NH	-0.081125	0.698725	36.262529	83.674006	119.936536
814	Marion County	IN	0.188175	0.543075	46.917241	71.846952	118.764192
1948	Washoe County	NV	0.033750	0.595925	49.434472	69.178636	118.613108
2264	Philadelphia County	PA	0.665325	0.609925	42.629915	61.745824	104.375739

In [20]:

# Where is more turnout on average important for democrats?  
# This multiplies the average democratic margin for the county times 
# the fraction of county's voting power that resides with non-voters
# in states that democrats lose on average:
temp_cols = list(avg_cols)
temp_cols.append('partisanturnout_vpi')

# Hard cutoff
# avg_df[(avg_df['state_dem_margin'] < 0.05)] \
#     .sort_values(by='partisanturnout_vpi', ascending=False) \
#     [temp_cols].head(20)

# Using standard deviation as cutoff:
avg_df[(avg_df['state_dem_margin'] - avg_df['state_dem_margin_std'])  < 0] \
    .sort_values(by='partisanturnout_vpi', ascending=False) \
    [temp_cols].head(20)

Out[20]:

	county_name	state	dem_margin	turnout	turnout_vpi	persuasion_vpi	county_vpi	partisanturnout_vpi
1485	St. Louis City County	MO	0.648300	0.562800	87.590113	134.671075	222.261188	59.512773
1282	Wayne County	MI	0.432750	0.613300	148.977818	212.273072	361.250891	56.343820
1418	Jackson County	MO	0.198300	0.625425	135.126185	287.882299	423.008484	33.846646
1466	St. Louis County County	MO	0.147975	0.713775	146.267445	475.652972	621.920417	28.579127
2264	Philadelphia County	PA	0.665325	0.609925	42.629915	61.745824	104.375739	28.265886
3004	Milwaukee County	WI	0.333100	0.681100	89.948122	193.970429	283.918551	26.973063
1935	Clark County	NV	0.123775	0.491850	274.776186	247.379861	522.156047	26.547873
333	Miami-Dade County	FL	0.189300	0.530975	70.069633	62.784885	132.854518	16.912560
2976	Dane County	WI	0.426900	0.766800	37.693819	118.256630	155.950449	14.620923
296	Broward County	FL	0.335900	0.597550	41.629796	54.008480	95.638276	14.402219
814	Marion County	IN	0.188175	0.543075	46.917241	71.846952	118.764192	12.542131
1683	Mecklenburg County	NC	0.199550	0.623500	46.769953	105.776832	152.546785	11.133419
1281	Washtenaw County	MI	0.364925	0.678475	27.339354	50.613277	77.952632	11.065435
1874	Grafton County	NH	0.207425	0.695250	56.945682	121.479588	178.425270	10.346608
810	Lake County	IN	0.271225	0.557900	27.210686	39.655590	66.866276	9.038743
1878	Strafford County	NH	0.138775	0.653925	85.666172	161.759279	247.425451	8.503564
1872	Cheshire County	NH	0.210625	0.679900	49.880695	101.502992	151.383687	7.501982
231	Denver County	CO	0.491850	0.631100	15.214315	25.033011	40.247326	7.348741
1233	Ingham County	MI	0.261250	0.618050	26.172508	35.480233	61.652741	6.918139
2030	Cuyahoga County	OH	0.367400	0.669800	17.778616	37.946032	55.724648	6.505297

Turnout vs. Persuasion¶

Next, I sum the persuasion and turnout values for each election, and then calculate their ratio. This shows that persuasion wins out in every election, but that these ratios can vary considerably. The average pesuasion to turnout ratio is 1.6:1, but the ratio was only 1.3:1 in 2012, a year with low turnout and few close states.

An interesting point Nate Cohn made is that when you persuade someone, it's implied that you're probably switching their vote from an opponent. On the face of it, this would make persuasion doubly effective:

These results are interesting, but aren't informative until we have data on the cost effectiveness of each approach, which probably depends on things like the population density and price of ad buys in different locations. Rather than thinking in binary terms, I think it makes sense to ask which strategy is better depending on the location.

In [21]:

yr_df = county_df.groupby(['year']) \
    .agg({'persuasion_vpi': 'sum',
          'turnout_vpi': 'sum', 'turnout_advantage':'sum'})
yr_df['ratio'] = yr_df['persuasion_vpi'] / yr_df['turnout_vpi']

yr_df

Out[21]:

	turnout_vpi	persuasion_vpi	turnout_advantage	ratio
year
2004	11808.359012	19373.813817	-7565.454805	1.640686
2008	11373.802537	21449.705764	-10075.903226	1.885887
2012	5122.170093	6897.855664	-1775.685570	1.346667
2016	11810.178369	19269.847592	-7459.669223	1.631631

In [22]:

yr_df.mean()

Out[22]:

turnout_vpi          10028.627503
persuasion_vpi       16747.805709
turnout_advantage    -6719.178206
ratio                    1.626218
dtype: float64

State Averages, 2004-2016¶

In [23]:

#Finally, a look at which states have the highest VPIs on average:
stateavg_df = avg_df.groupby(['state']) \
    .agg({'county_vpi':'sum', 'state_dem_margin': 'mean',
          'state_dem_margin_std': 'mean'})
    
stateavg_df = stateavg_df.reset_index()
stateavg_df.rename(columns={'county_vpi': 'state_vpi'}, inplace=True)

#state2016_df.head()
stateavg_df.sort_values(by='state_vpi', ascending=False) \
    [['state', 'state_dem_margin', 'state_dem_margin_std', 'state_vpi']]

Out[23]:

	state	state_dem_margin	state_dem_margin_std	state_vpi
23	MO	-0.088309	0.076329	3723.507729
29	NH	0.042422	0.042528	2583.739105
31	NM	0.082859	0.064740	2420.283067
21	MI	0.072843	0.073145	2081.528255
47	WI	0.051164	0.067599	1762.649753
26	NC	-0.044506	0.055681	1689.238845
11	IA	0.013179	0.083005	1068.553786
8	FL	-0.006281	0.033483	1047.400807
37	PA	0.043789	0.046965	875.171984
14	IN	-0.123273	0.097718	873.010964
32	NV	0.047860	0.064288	733.761064
25	MT	-0.142498	0.087199	517.434294
22	MN	0.057319	0.039571	507.139352
34	OH	-0.006852	0.055957	500.556711
20	ME	0.109764	0.065559	416.638357
2	AZ	-0.079020	0.030036	342.266591
4	CO	0.036412	0.058286	331.796870
44	VA	0.018062	0.067559	318.843362
7	DE	0.156620	0.077219	292.939386
9	GA	-0.086934	0.054129	288.144129
27	ND	-0.228343	0.115341	272.820570
40	SD	-0.194021	0.088569	259.209226
36	OR	0.108927	0.050525	225.742887
10	HI	0.321944	0.166591	223.493998
38	RI	0.228223	0.063615	216.777214
48	WV	-0.235889	0.136793	187.958074
39	SC	-0.127968	0.035746	186.380243
45	VT	0.298291	0.079442	172.258431
30	NJ	0.134955	0.047948	169.003730
42	TX	-0.148490	0.060181	167.444501
24	MS	-0.155411	0.038428	163.852290
1	AR	-0.200546	0.074477	162.990378
46	WA	0.137353	0.044792	155.471282
49	WY	-0.396539	0.055923	154.001139
5	CT	0.159225	0.051560	149.951731
28	NE	-0.238572	0.076248	143.174601
17	LA	-0.174428	0.023537	127.490992
15	KS	-0.207398	0.043104	127.423478
41	TN	-0.189735	0.054267	125.180037
13	IL	0.173189	0.060589	122.598985
3	CA	0.217719	0.084518	116.419163
16	KY	-0.221563	0.057683	104.420891
12	ID	-0.317518	0.051835	99.442684
19	MD	0.227307	0.065130	94.385269
0	AL	-0.242762	0.029066	92.337821
43	UT	-0.349195	0.142572	90.717261
33	NY	0.239724	0.044825	87.296743
35	OK	-0.331157	0.024607	75.709216
18	MA	0.252962	0.017399	73.806510
6	DC	0.840280	0.030791	66.658087

How well does the VPI predict the future?¶

The whole point of this metric is to use it to prioritize resources in future elections. So how well do the results from one election predict the next? Actually, it turns out it does a pretty horrible job, so maybe this metric is more useful as a way to describe the results of an election rather than predict the future.

In [45]:

fig, ax = plt.subplots(figsize=(9,8))

years = [2004, 2008, 2012]
colors = ['blue', 'green', 'red']
for i, year in enumerate(years):
    title1 = 'year_' + str(year)
    title2 = 'year_' + str(year+4)
    cols = ['county_name', 'state', 'fips_code', 'year', 'county_vpi']
    temp1_df = county_df[county_df['year'] == year]
    temp1_df = temp1_df[cols]
    temp1_df['county_vpi'] = np.log(temp1_df['county_vpi'])
    temp1_df.rename(columns={'county_vpi': title1}, inplace=True)
    temp2_df = county_df[county_df['year'] == (year + 4)]
    temp2_df = temp2_df[cols]
    temp2_df['county_vpi'] = np.log(temp2_df['county_vpi'])
    temp2_df.rename(columns={'county_vpi': title2}, inplace=True)
    temp_df = temp1_df.merge(temp2_df, on=['county_name', 'state', 'fips_code'])
    
    plt.scatter(temp_df[title1], temp_df[title2], color=colors[i], alpha=0.6)
    #x = pd.DataFrame({title1: np.linspace(0, 3000, 100)})
    x = pd.DataFrame({title1: np.linspace(-5, 10, 100)})
    formula = title2 + ' ~ 1 + ' + title1
    poly_1 = smf.ols(formula=formula, data=temp_df, missing='drop').fit()
    plt.plot(x, poly_1.predict(x), 'k-',
        label='y={1:.2f}x+{2:.2f}, $R^2$={0:.2f}'.format(poly_1.rsquared,
        poly_1.params[title1], poly_1.params.Intercept), alpha=0.6, color=colors[i])  
    
ax.set_xlabel('log(VPI Previous Election)')
ax.set_ylabel('log(VPI Next Election)')
#ax.set_ylim(0,2600)
plt.legend()
plt.show()

#Interesting, they're straight lines because each county is derived from same state value,
#and city/county populations possibly follow something like a power law. 
#This regression makes no sense because the individual data points aren't independent
#And a mixed model makes no sense because it's the equivalent of fitting lines to single points

In [30]:

statesum_df = county_df.groupby(['state', 'year']).agg({'county_vpi':'sum'})   
statesum_df = statesum_df.reset_index()
statesum_df.rename(columns={'county_vpi': 'state_vpi'}, inplace=True)

In [49]:

fig, ax = plt.subplots(figsize=(9,8))
years = [2004, 2008, 2012]
colors = ['blue', 'green', 'red']
for i, year in enumerate(years):
    title1 = 'year_' + str(year)
    title2 = 'year_' + str(year+4)
    cols = ['state','year', 'state_vpi']
    temp1_df = statesum_df[statesum_df['year'] == year]
    temp1_df = temp1_df[cols]
    temp1_df['state_vpi'] = np.log(temp1_df['state_vpi'])
    temp1_df.rename(columns={'state_vpi': title1}, inplace=True)
    temp2_df = statesum_df[statesum_df['year'] == (year + 4)]
    temp2_df = temp2_df[cols]
    temp2_df['state_vpi'] = np.log(temp2_df['state_vpi'])
    temp2_df.rename(columns={'state_vpi': title2}, inplace=True)
    temp_df = temp1_df.merge(temp2_df, on=['state'])
    
    plt.scatter(temp_df[title1], temp_df[title2], color=colors[i], alpha=0.6)
    #x = pd.DataFrame({title1: np.linspace(0, 3000, 100)})
    x = pd.DataFrame({title1: np.linspace(4, 10, 100)})
    #formula = 'np.log({0}) ~ 1 + np.log({1})'.format(title2, title1)
    formula = '{0} ~ 1 + {1}'.format(title2, title1)
    poly_1 = smf.ols(formula=formula, data=temp_df, missing='drop').fit()
    plt.plot(x, poly_1.predict(x), 'k-',
        label='y={1:.2f}x+{2:.2f}, $R^2$={0:.2f}'.format(poly_1.rsquared,
        poly_1.params[title1], poly_1.params.Intercept), alpha=0.6, color=colors[i])  
    
ax.set_xlabel('log(VPI Previous Election)')
ax.set_ylabel('log(VPI Next Election)')
#ax.set_ylim(0,2600)
ax.set_ylim(3,10)
plt.legend(loc='upper right')
plt.show()

In [57]:

year = 2004
cols = ['state', 'year', 'state_vpi']
temp1_df = statesum_df[statesum_df['year'] == year]
temp1_df = temp1_df[cols]
temp1_df['state_vpi'] = np.log(temp1_df['state_vpi'])
temp1_df.rename(columns={'state_vpi': 'year_' + str(year)}, inplace=True)
temp2_df = statesum_df[statesum_df['year'] == year + 4]
temp2_df = temp2_df[cols]
temp2_df['state_vpi'] = np.log(temp2_df['state_vpi'])
temp2_df.rename(columns={'state_vpi': 'year_'+ str(year + 4)}, inplace=True)
temp3_df = statesum_df[statesum_df['year'] == year + 8]
temp3_df = temp3_df[cols]
temp3_df['state_vpi'] = np.log(temp3_df['state_vpi'])
temp3_df.rename(columns={'state_vpi': 'year_'+ str(year + 8)}, inplace=True)
temp4_df = statesum_df[statesum_df['year'] == year + 12]
temp4_df = temp4_df[cols]
temp4_df['state_vpi'] = np.log(temp4_df['state_vpi'])
temp4_df.rename(columns={'state_vpi': 'year_'+ str(year + 12)}, inplace=True)
temp_df = temp1_df.merge(temp2_df, on=['state'])
temp_df = temp_df.merge(temp3_df, on=['state'])
temp_df = temp_df.merge(temp4_df, on=['state'])
temp_df['year_mean'] = (temp_df['year_2004']+temp_df['year_2008']+temp_df['year_2012'])/3

fig, ax = plt.subplots(figsize=(9,8))
title1 = 'year_mean'
title2 = 'year_2016'
temp_df[title1] = np.log(temp_df[title1])
temp_df[title2] = np.log(temp_df[title2])
plt.scatter(temp_df[title1], temp_df[title2], alpha=0) 
ax.set_xlabel('log(mean 2004 to 2012 VPI)')
ax.set_ylabel('log(2016 VPI)')
x = pd.DataFrame({title1: np.linspace(temp_df[title1].min(),
    temp_df[title1].max(), 100)})
formula = title2 + ' ~ 1 + ' + title1
poly_1 = smf.ols(formula=formula,
    data=temp_df, missing='drop').fit()
plt.plot(x, poly_1.predict(x), 'k-',
    label='y={1:.2f}x+{2:.2f}, $R^2$={0:.2f}'.format(poly_1.rsquared,
    poly_1.params[title1],poly_1.params.Intercept), alpha=0.7)  

A = temp_df[title1]
B = temp_df[title2]
C = temp_df['state']
for a,b,c in zip(A, B, C):
    ax.annotate('%s' % c, xy=(a,b), textcoords='data') 

plt.legend()
plt.show()

Conclusion¶

Voting power is lognormally distributed, so some counties/voters are much more powerful than others.
Certain locations in NH, NM, MO, NV, MI, WI, PA, and FL repeatedly top the list.
At least according to this metric, persuasion has more power than turnout (an average 1.6:1 ratio), but the cost of each approach needs to be considered before drawing any conclusions.
The VPI doesn't do a very good job of predicting future elections, so it's probably more useful as a way to describe past results.

Next steps¶

This analysis only considers the power of a location in the presidential elections, but Senate, House and State Level elections are crucial too. It wouldn't be very hard to include results from Senate elections as those are statewide, but House districts don't match county lines and have ever-changing boundaries. As a result, it would be hard to get Voting Age Population data, and then distribute the voting power amongst the counties.

One thing I could do is create a synthetic population of the US using census data. I could then distribute voting power to every individual based on their location in Presidential, Congressional and State elections, then aggregate it by any boundary I choose. I'd need accurate district boundaries and results for every election along with some model of how power is shared across branches of government to make this work though. . .

References¶

[1] What is the chance that your vote will decide the election? Ask Stan! http://andrewgelman.com/2016/11/07/chance-vote-will-decide-election/

[2] What is the Chance That Your Vote Will Decide the Election? https://pkremp.github.io/pr_decisive_vote.html

[3] Nate Cohn. https://twitter.com/Nate_Cohn/status/972608738631868416

[4] The Missing Obama Millions. https://www.nytimes.com/2018/03/10/opinion/sunday/obama-trump-voters-democrats.html

[5] Voter Power Index: Just How Much Does the Electoral College Distort the Value of Your Vote? https://www.dailykos.com/stories/2016/12/19/1612252/-Voter-Power-Index-Just-How-Much-Does-the-Electoral-College-Distort-the-Value-of-Your-Vote

[6] Synthpop: Synthetic populations from census data. https://github.com/UDST/synthpop

[7] Is Voting Habit Forming? New Evidence from Experiments and Regression Discontinuities. https://onlinelibrary.wiley.com/doi/abs/10.1111/ajps.12210