I've seen a claim that the early results from IEBC were too "smooth" (too many random variables blah blah). The point of this exercise is to see whether the claim holds.

First a bit of a flashback. We'll use prev election results as a baseline to predict vote share and turnout. The numbers might not be completely accurate but that's not important for the exercise. I could only find the IEBC report as a pdf and it's kinda tedious to verify each line.

In [25]:

```
import pandas as pd
import numpy as np
%matplotlib inline
import matplotlib.pyplot as plt
elections_2013 = pd.read_csv('election_2013.csv', index_col=False)
elections_2013['COUNTY'] = elections_2013['COUNTY'].apply(lambda x: x.strip().lower())
elections_2013 = elections_2013.set_index('COUNTY')
elections_2013
```

Out[25]:

In [115]:

```
elections_2013.plot(kind='bar', figsize=(20,10))
```

Out[115]:

In [6]:

```
registration_2017 = pd.read_csv('registration.csv')
registration_2017['COUNTY'] = registration_2017['COUNTY'].apply(lambda x: x.strip().lower())
registration_2017 = registration_2017.set_index('COUNTY')
registration_2017
```

Out[6]:

In [7]:

```
registration_2017.plot(kind='bar', figsize=(20,10))
```

Out[7]:

In [35]:

```
turnout_rate = [0.01*elections_2013['TURNOUT'][county] for county in registration_2017.index]
ti = 0
spoil = 0.03
adj_voters = registration_2017.copy()
def turnout_adjust(x):
global ti
votes = np.floor((x * turnout_rate[ti]) -(x*spoil))
ti += 1
return votes
adj_voters['NUM VOTERS'] = adj_voters['NUM VOTERS'].apply(turnout_adjust)
adj_voters
```

Out[35]:

The idea here is very simple and could probably be accomplished without real data. We assume that IEBC releases results from N polling stations at a time. These N stations are randomly sampled across counties. From each polling station we assume that the share of votes that go to a candidate is equal to the share of votes the candidate received in 2013 in addition to half of Mudavadi's votes. What I'm trying to show is that the difference need not be volatile.

In [54]:

```
def time_step(adj_voters_cpy, num_stations=100):
station_size = 600
mdvd_share = 0.5 # raila's share of mudava votes
u = np.zeros(num_stations)
o = np.zeros(num_stations)
weights = [1 if x > 0 else 0 for x in adj_voters_cpy['NUM VOTERS']]
num_samples = np.min([num_stations,np.array(weights).sum()])
sample = adj_voters_cpy['NUM VOTERS'].sample(num_samples, weights = weights)
for i in range(num_samples):
rand_station = sample.index[i] #stations come in randomly
voters = np.min([station_size, adj_voters_cpy['NUM VOTERS'][rand_station]])
pred_u = elections_2013['UHURU'][rand_station] + (mdvd_share*elections_2013['MUDAVADI'][rand_station])
pred_o = elections_2013['ODINGA'][rand_station] + ((1-mdvd_share)*elections_2013['MUDAVADI'][rand_station])
u[i] = pred_u * voters * 0.01
o[i] = pred_o * voters * 0.01
adj_voters_cpy['NUM VOTERS'][rand_station] = adj_voters_cpy['NUM VOTERS'][rand_station] - voters
return (u, o)
def exp(batch_size=100):
adj_voters_cpy = adj_voters.copy()
uhuru_data = [0]
raila_data = [0]
diff = [0]
while adj_voters_cpy.any()[0]:
u, o = time_step(adj_voters_cpy, batch_size)
uhuru_data.append(uhuru_data[-1] + u.sum())
raila_data.append(raila_data[-1] + o.sum())
diff.append(np.abs((uhuru_data[-1] - raila_data[-1])/float(uhuru_data[-1])))
return uhuru_data,raila_data,diff
uhuru_data,raila_data,diff = exp()
plt.plot(uhuru_data,label = 'uhuru')
plt.plot(raila_data, label = 'raila' )
plt.ylabel('Total Votes')
plt.xlabel('Time(ticks)')
plt.legend(loc="upper left")
```

Out[54]:

In [39]:

```
plt.plot(diff, label='pct diff')
plt.legend()
```

Out[39]:

In [62]:

```
num_stations = 1
while num_stations < 200:
num_stations += 30
uhuru_data,raila_data,diff = exp(num_stations)
var = [np.std(diff[:i]) for i in range(len(diff))]
plt.plot(var,label = "num stations:" + str(num_stations))
plt.xlabel("Time")
plt.ylabel("std-dev")
plt.legend(loc='upper right')
```

Out[62]:

Obviously ALOT of assumptions were made so free to correct me if anything is clearly wrong. I can't verify the authenticity of a screenshot floating around claiming that Uhuru's votes were a constant multiple (1.12) of Odinga's. However, idea that the difference should be volatile and that the lack thereof implies rigging doesn't make much sense to me. It also seems to me that there are much easier ways to prove rigging (presumably ones that don't involve us understanding the gambler's fallacy, or be expert DB admins).