In [69]:
%matplotlib inline

import numpy as np
import pandas as pd

from matplotlib import pyplot as plt
import prettyplotlib as pplt
import seaborn as sns

from statsmodels.formula.api import glm
import statsmodels.api as sm

np.set_printoptions(precision=2)

# local imports from the http://github.com/pjbull/civil_conflict
# repository
from ConflictMap import ConflictMap

import ConflictModel
reload(ConflictModel)
#from ConflictModel import ConflictModel

from graphics import plot_locations
from graphics import plot_route_with_colors

from models import Configuration
from models import Route
from models import HqLocator
from models import HqLocatorUganda

from optimization import sim_anneal

from SliceSampler import SliceSampler

Modeling and Simulating Political Violence and Optimizing Aid Distribution in Uganda

Peter J. Bull & Isaac M. Slavitt

Institute for Applied Computational Science
Harvard University
52 Oxford Street
Cambridge, MA 02139

Using MCMC techniques, we model civil conflict in Uganda and optimize the limited relief aid that can be provided in these scenarios. We describe a method to simulate civil conflict events in space and time given historical data about these events. We also optimize the distribution of limited aid resources to refugees from these crises. Modeling aid delivery as a combination of the traveling salesman problem and the knapsack algorithm — two NP-hard problems — we find acceptable solutions using stochastic metaheuristics.

Additional code can be found in our civil_conflict GitHub repository.

Additional details can be found in our technical writeup.

Loading in The Data

The data comes from ACLED (Armed Conflict Location and Event Data Project), which is a dataset with locations, dates, fatalities, motivation, actors involved, and other information about civil conflicts in Africa. Their collection of data on Uganda covers 1997-2013, and they have a real-time tracker of events reported in 2014.

In [3]:
# Load in the data
with open("data/ACLED-Uganda_19970101-to-20131231_final.xlsx") as f:
    uganda_file = pd.ExcelFile(f)
    
uganda_data = uganda_file.parse('Sheet1')
uganda_data.head()
Out[3]:
GWNO EVENT_ID_CNTY EVENT_ID_NO_CNTY EVENT_DATE YEAR TIME_PRECISION EVENT_TYPE ACTOR1 ALLY_ACTOR_1 INTER1 ACTOR2 ALLY_ACTOR_2 INTER2 INTERACTION COUNTRY ADMIN1 ADMIN2 ADMIN3 LOCATION LATITUDE
0 500 1UGA                                          ... 12 1997-01-01 1997 3 Battle-No change of territory Military Forces of Uganda (1986-) NaN 1 UFDF: Uganda Federal Democratic Front NaN 2 12 Uganda Mubende Mityana Bulera Buganda 0.50000 ...
1 500 2UGA                                          ... 13 1997-01-01 1997 3 Battle-No change of territory Military Forces of Uganda (1986-) NaN 1 LRA: Lord's Resistance Army NaN 2 12 Uganda Gulu Gulu Pece Gulu 2.76667 ...
2 500 3UGA                                          ... 42 1997-01-07 1997 1 Battle-No change of territory Military Forces of Uganda (1986-) NaN 1 ADF-NALU: Allied Democratic Forces-National Ar... NaN 2 12 Uganda Kamwenge Kitagwenda Nyabbani Nyabani 0.13580 ...
3 500 4UGA                                          ... 49 1997-01-08 1997 1 Battle-No change of territory Military Forces of Uganda (1986-) NaN 1 ADF-NALU: Allied Democratic Forces-National Ar... NaN 2 12 Uganda Kasese Busongora Kasese Tc Kasese 0.18333 ...
4 500 5UGA                                          ... 65 1997-01-11 1997 1 Violence against civilians LRA: Lord's Resistance Army NaN 2 Civilians (Uganda) NaN 7 27 Uganda Pader Aruu Acholibur Acholi-Bur 3.12583 ...

5 rows × 25 columns

We can start by visualizing these conflicts on a map of Uganda. We've abstrstracted out a set of utility functions for manipulating and loading the data. We'll make an effort to explain and logical or mathematic intricacies in the code that is not directly in this notebook. The full code can be found at our civil_conflict GitHub repository.

In [7]:
# create the conflict map object
# based on the uganda data and plot it
conflict_map = ConflictMap(uganda_data)
fig, ax = conflict_map.plot_map()
fig.show()

Modeling the Spatial Dimension

We treat the entire country of Uganda as a probability distribution from which geospatial conflict events could be sampled. We took historical conflict location data from the entire ACLED data set and smoothed it using a Mat ́ern covariance function. The below fugures show this smoothing applied to the conflicts above.

In [59]:
# overplot the kde onto the map of the conflict data
conflict_model = ConflictModel.ConflictModel(uganda_data)
fig, ax = conflict_map.plot_map()
conflict_model.plot_kde(fig=fig, ax=ax)
fig.show()

Sampling from the Distribution

This estimate (i.e., the empirical distribution of the conflict data), has a complex functional form which makes it challenging to sample from. However, it is simple for a given coordinate to get the probability of an event. Given this property of our smooth, we can apply MCMC sampling techniques to generate samples from this probability distribution. The figure shows the distribution of the samples as a two-dimensional histogram. We also plot some of the diagnostics for our sampler.

In [14]:
slice_samples = conflict_model.draw_samples(50000)
SliceSampler.run_diagnostics(slice_samples)
The Gelman-Rubin potential scale reduction factor is:  [ 1.  1.]  (< 1.1 indicates good mixing)
The Geweke Diagnostic Value is:  [ 0.] (< 1.96 indicates convergence)
In [17]:
# Visualize the samples on our map of Uganda
fig, ax = conflict_map.plot_map()
pplt.scatter(ax, slice_samples[:1000,0], 
             slice_samples[:1000,1], 
             zorder=20, 
             s=20, 
             color='b', 
             alpha=0.3)
fig.show()

Modeling Events in Time

As a modeling assumption, we separate the dimensions of space and time as being independent. To model events in time across the country, we use an autoregressive Poisson GLM. While standard au- toregressive models create a linear relation between a future value and a previous value, the Poisson GLM permits a linear relation between previous data and the mean of a Poisson distribution. This will allow us to retain the probabilistic interpretation of the events in time.

We'll start by creating a data frame that for each time step has the previous 5 time steps as features.

In [39]:
# strip the dataframe just to the variables we care about
dt = uganda_data[['EVENT_DATE', 'FATALITIES']]
dt.set_index('EVENT_DATE', inplace=True)

# lambda for our groupby
by = lambda x: lambda y: getattr(y, x)

# get a column with the count of the number of events
fatality_df = dt.groupby([by('year'), by('month')]).count()

# get a column with the sum of the fatalities
fatality_df['sum'] = dt.groupby([by('year'), by('month')]).sum()

# adds a column for each lag that we want to the dataframe
def add_lags(orig_df, num_lags):
    df = orig_df.copy()
    
    lag_names = []
    for i in range(1, num_lags+1):
        fat_name = 'prev{}fat'.format(i)
        
        df[fat_name] = np.zeros(df.shape[0])
        df[fat_name][i:] = df['FATALITIES'][:-i]
        lag_names.append(fat_name)
        
    return df.iloc[i:,:], lag_names

fatality_df, new_lags = add_lags(fatality_df, 5)
fatality_df.head()
Out[39]:
FATALITIES sum prev1fat prev2fat prev3fat prev4fat prev5fat
1997 6 12 245 2 5 4 5 5
7 24 130 12 2 5 4 5
8 9 28 24 12 2 5 4
9 14 78 9 24 12 2 5
10 5 42 14 9 24 12 2

5 rows × 7 columns

Next, we'll fit a Poisson GLM to this model.

In [19]:
# fit the poisson model and summarize it
model_string = "FATALITIES ~ " + "+".join(new_lags) 
poisson_glm_lags = glm(model_string, data = fatality_df, family = sm.families.Poisson(sm.families.links.log)).fit()
print poisson_glm_lags.summary()
                 Generalized Linear Model Regression Results                  
==============================================================================
Dep. Variable:             FATALITIES   No. Observations:                  197
Model:                            GLM   Df Residuals:                      191
Model Family:                 Poisson   Df Model:                            5
Link Function:                    log   Scale:                             1.0
Method:                          IRLS   Log-Likelihood:                -1169.8
Date:                Sun, 04 May 2014   Deviance:                       1448.4
Time:                        17:34:22   Pearson chi2:                 1.57e+03
No. Iterations:                     7                                         
==============================================================================
                 coef    std err          t      P>|t|      [95.0% Conf. Int.]
------------------------------------------------------------------------------
Intercept      2.3284      0.027     84.733      0.000         2.275     2.382
prev1fat       0.0137      0.001     16.691      0.000         0.012     0.015
prev2fat       0.0085      0.001      8.890      0.000         0.007     0.010
prev3fat       0.0021      0.001      1.973      0.048      1.36e-05     0.004
prev4fat      -0.0014      0.001     -1.343      0.179        -0.004     0.001
prev5fat       0.0051      0.001      5.410      0.000         0.003     0.007
==============================================================================

We can now plot how this autoregressive model responds to the past and predicts the current time step.

In [21]:
# create figure
fig, ax = plt.subplots(1)
fig.set_size_inches(16,4)
plt.xlim(0, 195)

# update reset index and get tick labels
melt = fatality_df.reset_index()
melt["ticks"] = ["{}-{}".format(melt.level_0.iloc[i], melt.level_1.iloc[i]) for i in range(melt.shape[0])]

# plot the fitted values of the GLM
pplt.plot(poisson_glm_lags.fittedvalues, label="Auto-regressive Poisson GLM")
pplt.plot(fatality_df.FATALITIES, label="True Values")
plt.ylabel("Count of Violent Events")
loc, label = plt.xticks()

# update the ticks
plt.xticks(np.array(loc[:-1], dtype=int), melt.ticks.iloc[np.array(loc[:-1], dtype=int)].values)
pplt.legend()
Out[21]:
<matplotlib.legend.Legend at 0x10e88e790>

Given the fitted Poisson GLM model, we can predict what the distribution of event counts will look like for the next month. We don't have data for this month, so using a random sampling of data from space and time could help us plan for aid distribution.

In [45]:
# We can now predict the next time step
newrow = pd.DataFrame([dict(prev1fat=fatality_df.FATALITIES[-1],
                             prev2fat=fatality_df.FATALITIES[-2],
                             prev3fat=fatality_df.FATALITIES[-3],
                             prev4fat=fatality_df.FATALITIES[-4],
                             prev5fat=fatality_df.FATALITIES[-5])])

new_fatality_df = fatality_df.append(newrow, ignore_index=True)
poisson_mean = poisson_glm_lags.predict(exog=new_fatality_df[-1:])
n_conflicts = np.random.poisson(poisson_mean)
Out[45]:
15

Optimization of the Distribution of Aid Resources

In the second part of this project, we use the temporal/geospatial conflict occurence model in the first section as both inspiration for the aid delivery analogy and also as a source of randomly sampled data points representing geospatially distributed conflicts.

Framing the Problem

One question of particular interest is how to route emergency aid to locations where it is needed. For concreteness, let’s postulate a Red Cross medical or food supply caravan that originates from the organization’s in-country headquarters. This caravan wishes to visit all n emergent locations in order to deliver needed supplies. They wish to do so in the most efficient manner possible.

We extend the TSP into a multi-objective optimization problem where the contents of the aid trucks also have an optimization component. Therein lies the knapsack problem: subject to a volume or weight constraint, and given that different locations might have very different needs such as food, vaccinations, or emergent medical supplies, which supplies do we pack on the trucks?

First, we set up a set of simualted violent events that are drawn from the space and time model that we created in the first part.

In [65]:
# use slice sampling, and get a number of points drawn from the Poisson distribution
c = Configuration(sample_method=conflict_model.sample_fake_uniform, n=n_conflicts)

# put headquarters in the capitol city, Kampala
c.set_hq(np.array([[32.5811, 0.3136]]))

# initialize the supplies
r = Route(c, np.array([5]*3))
In [66]:
# now we can use simulated annealing to find the best
# path between the simulated events.
x0 = np.arange(c.n)
res, losses, dists = sim_anneal(r.loss, r.perturb, x0,
                               t0=10.0, ntrial=55000, reanneal=1000,
                               other_func=r.dist, verbose=True)
reannealing; i[0] exp(dE/t)[4.03201760503] eprev[27997.7144308], enew[27997.7144308]
reannealing; i[1000] exp(dE/t)[2.78731844161e-167] eprev[21694.7454187], enew[25146.3050934]
reannealing; i[2000] exp(dE/t)[8.92159126122e-280] eprev[21677.7152407], enew[26882.2515896]
reannealing; i[3000] exp(dE/t)[3.35599817263e-197] eprev[21723.3067821], enew[25021.2919497]
reannealing; i[4000] exp(dE/t)[1.6461569002e-09] eprev[21677.7152407], enew[21810.4103005]
reannealing; i[5000] exp(dE/t)[2.09481298587e-40] eprev[21677.7152407], enew[22217.2101667]
reannealing; i[6000] exp(dE/t)[1.40963735116e-32] eprev[21677.7152407], enew[22067.4708309]
reannealing; i[7000] exp(dE/t)[5.30657588505e-241] eprev[21677.7152407], enew[24323.9122618]
reannealing; i[8000] exp(dE/t)[4.98370502808e-59] eprev[21686.9703402], enew[22264.8568442]
reannealing; i[9000] exp(dE/t)[1.88434353491e-97] eprev[21677.7152407], enew[22540.5672073]
reannealing; i[10000] exp(dE/t)[0.0] eprev[21677.7152407], enew[25275.1659066]
reannealing; i[11000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24275.7758529]
reannealing; i[12000] exp(dE/t)[8.07756245387e-25] eprev[21677.7152407], enew[21834.3945432]
reannealing; i[13000] exp(dE/t)[1.08906213526e-30] eprev[21677.7152407], enew[21853.0842475]
reannealing; i[14000] exp(dE/t)[1.13159924923e-114] eprev[21677.7152407], enew[22277.9360892]
reannealing; i[15000] exp(dE/t)[2.94998364102e-140] eprev[21677.7152407], enew[22339.2025033]
reannealing; i[16000] exp(dE/t)[5.88632866918e-281] eprev[21677.7152407], enew[22873.3835194]
reannealing; i[17000] exp(dE/t)[0.0] eprev[21677.7152407], enew[23876.8463418]
reannealing; i[18000] exp(dE/t)[4.43007824785e-106] eprev[21677.7152407], enew[22041.8232162]
reannealing; i[19000] exp(dE/t)[5.82756117011e-49] eprev[21677.7152407], enew[21827.7463317]
reannealing; i[20000] exp(dE/t)[7.62716380148e-26] eprev[21677.7152407], enew[21748.0297023]
reannealing; i[21000] exp(dE/t)[0.0] eprev[21677.7152407], enew[23076.1465571]
reannealing; i[22000] exp(dE/t)[6.69966139553e-43] eprev[21677.7152407], enew[21773.3454587]
reannealing; i[23000] exp(dE/t)[1.67989871821e-77] eprev[21677.7152407], enew[21834.3945432]
reannealing; i[24000] exp(dE/t)[5.19614031036e-245] eprev[21677.7152407], enew[22126.3898623]
reannealing; i[25000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22297.8362454]
reannealing; i[26000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22277.2151614]
reannealing; i[27000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22136.612459]
reannealing; i[28000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22094.3760293]
reannealing; i[29000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22067.4708309]
reannealing; i[30000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24910.4406026]
reannealing; i[31000] exp(dE/t)[0.0] eprev[21677.7152407], enew[28288.0918844]
reannealing; i[32000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22277.9360892]
reannealing; i[33000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24421.3200963]
reannealing; i[34000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22046.6892686]
reannealing; i[35000] exp(dE/t)[0.0] eprev[21677.7152407], enew[21887.0951005]
reannealing; i[36000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24002.8330844]
reannealing; i[37000] exp(dE/t)[0.0] eprev[21677.7152407], enew[21984.9537953]
reannealing; i[38000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22162.7546538]
reannealing; i[39000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24060.6751542]
reannealing; i[40000] exp(dE/t)[0.0] eprev[21677.7152407], enew[23919.6153363]
reannealing; i[41000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22969.3523674]
reannealing; i[42000] exp(dE/t)[0.0] eprev[21677.7152407], enew[23325.9300884]
reannealing; i[43000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24108.185788]
reannealing; i[44000] exp(dE/t)[0.0] eprev[21677.7152407], enew[26778.0329025]
reannealing; i[45000] exp(dE/t)[1.60386116979e-145] eprev[21677.7152407], enew[21827.7463317]
reannealing; i[46000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22096.1216065]
reannealing; i[47000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22277.2151614]
reannealing; i[48000] exp(dE/t)[0.0] eprev[21677.7152407], enew[22191.0682701]
reannealing; i[49000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24355.6878199]
reannealing; i[50000] exp(dE/t)[0.0] eprev[21677.7152407], enew[24525.7897699]
reannealing; i[51000] exp(dE/t)[1.05992152299e-241] eprev[21677.7152407], enew[21810.4103005]
reannealing; i[52000] exp(dE/t)[0.0] eprev[21677.7152407], enew[23011.7984338]
reannealing; i[53000] exp(dE/t)[0.0] eprev[21677.7152407], enew[21963.3663758]
reannealing; i[54000] exp(dE/t)[0.0] eprev[21677.7152407], enew[23067.7464462]
In [67]:
# plot the results
fig, ax = conflict_map.plot_map(outline_only=True)
ax = plot_route_with_colors(c, res, ax)
ax.legend()
plt.show()

fig = plt.figure(figsize=(14, 4))
plt.plot(losses, label='loss')
plt.semilogy()
plt.legend()
plt.xlabel('accepted proposal')
plt.ylabel('loss')
plt.show()