Indonesian conflicts 2012

This is just a simple data analysis exercise based on the GDELT database, don't take the results seriously, and try not to draw any conclusions from it :)

In [1]:
from IPython.core.display import HTML
styles = open("Style.css").read()
HTML(styles)
Out[1]:
In [2]:
import pandas as pd
import numpy as np

from mpl_toolkits.basemap import Basemap

Connect to the database:

In [3]:
%load_ext sql
In [4]:
%sql mysql://root:[email protected]:3306/gdelt
Out[4]:
u'Connected: [email protected]'

This will query the GDELT database for events involving Indonesia in the year 2012:

In [5]:
qry = '''select * from Events 
         where 
         ((substring(Actor1Code, 1, 3)="IDN" or substring(Actor2Code, 1, 3)="IDN") 
         and substring(Day, 1, 4)="2012")'''
In [6]:
res = %sql $qry

len(res)
Out[6]:
42734

This creates a pandas DataFrame out of the query result, and filter out the data even more to include only conflict-related events directed towards Indonesia:

In [9]:
indn = pd.DataFrame(res, columns=res.keys)

# automatic datatype conversion stopped working somehow
float_cols = ['Actor1Geo_Lat', 'Actor1Geo_Long',
              'Actor2Geo_Lat', 'Actor2Geo_Long',
              'ActionGeo_Lat', 'ActionGeo_Long',
              'GoldsteinScale']

indn.QuadCategory = indn.QuadCategory.astype(int)
indn[float_cols] = indn[float_cols].astype(float)

# those starting verbal and material conflict with indonesia
to_indn = indn[(indn.Actor2Code.apply(lambda x: x[:3])=='IDN') & 
               ((indn.QuadCategory==3) | (indn.QuadCategory==4))]

to_indn.iloc[:5,:6]
Out[9]:
Day Actor1Code Actor2Code EventCode QuadCategory GoldsteinScale
0 20120101 AUSELI IDN 138 3 -7
10 20120101 GOV IDN 112 3 -2
54 20120101 JPN IDN 173 4 -5
61 20120101 NGA IDN 100 3 -5
62 20120101 NGA IDN 1724 4 -5

We then transform the data to aggregate the number of events at a particular location ((lat, lon) coordinate):

In [10]:
pivot = pd.pivot_table(to_indn, values='Day', rows=['Actor1Geo_Lat', 'Actor1Geo_Long'],
                       cols='QuadCategory', aggfunc=len)

pivot = pivot.fillna(0)
pivot = pivot.reset_index()
pivot.head(10)
Out[10]:
QuadCategory Actor1Geo_Lat Actor1Geo_Long 3 4
0 -43.5333 172.633 1 0
1 -42.8832 147.332 0 1
2 -42.8667 147.683 0 3
3 -42.6839 147.266 0 1
4 -41.3000 174.783 1 2
5 -41.1333 175.033 0 1
6 -41.0000 174.000 0 7
7 -38.1667 145.800 1 0
8 -37.8167 144.967 6 5
9 -37.5627 143.863 1 1

Now we sort the transformed data for each conflict event (verbal and material), and get the top 100 locations (interestingly the most conflict actually comes by far from the capital, so I filter it out so as not to swamp the others):

In [11]:
top_100_1 = pivot.sort(columns=[3], ascending=False)[1:101]
top_100_2 = pivot.sort(columns=[4], ascending=False)[1:101]

top_100_2[:10]
Out[11]:
QuadCategory Actor1Geo_Lat Actor1Geo_Long 3 4
350 -5.00000 140.000 86 171
349 -5.00000 120.000 119 116
35 -27.00000 133.000 54 69
519 3.16667 101.700 22 53
321 -6.17806 106.630 17 42
262 -6.75000 107.500 33 41
90 -8.65000 115.217 11 40
104 -8.51630 120.767 23 40
501 2.50000 112.500 18 39
822 38.00000 -97.000 22 34
In [12]:
top_100_1 = top_100_1.rename(columns={3: 'Verbal Conflict',
                                      4: 'Material Conflict'})

top_100_2 = top_100_2.rename(columns={3: 'Verbal Conflict',
                                      4: 'Material Conflict'})

Now plot the locations on a global map. The circle sizes correspond to the number of events at the location, i.e. the country/location on earth that exert conflict towards Indonesia, red being material conflict, yellow being verbal conflict. The bigger the circle, the more conflicts are initiated by the country at the location of the circle towards Indonesia in the year 2012.

In [13]:
figure(figsize=(15,15))

# create the map object
id_map = Basemap()

# draw important features
id_map.drawcoastlines(linewidth=.3) 
id_map.drawcountries(linewidth=.2)
id_map.fillcontinents(color='0.8') # Light gray
id_map.drawmapboundary()

# transform from (lat, lon) to (x, y) 
x1, y1 = id_map(top_100_1['Actor1Geo_Long'].values, top_100_1['Actor1Geo_Lat'].values)
x2, y2 = id_map(top_100_2['Actor1Geo_Long'].values, top_100_2['Actor1Geo_Lat'].values)

# plot material conflict locations
for i in xrange(len(x2)):
    id_map.plot(x2[i], y2[i], 'or', 
                markersize=top_100_2['Material Conflict'].values[i] / 2, 
                alpha=.3) 

# plot verbal conflict locations
for i in xrange(len(x1)):
    id_map.plot(x1[i], y1[i], 'oy', 
                markersize=top_100_1['Verbal Conflict'].values[i] / 2, 
                alpha=.8)

Focusing on a more specific region:

In [14]:
figure(figsize=(10,10))

# çreate the map object with boundaries
id_map = Basemap(llcrnrlon=30, llcrnrlat=-40, # lower left corner
                 urcrnrlon=155, urcrnrlat=50) # upper right corner

# draw important features
id_map.drawcoastlines(linewidth=.3) 
id_map.drawcountries(linewidth=.2)
id_map.fillcontinents(color='0.8') # Light gray
id_map.drawmapboundary()

x1, y1 = id_map(top_100_1['Actor1Geo_Long'].values, top_100_1['Actor1Geo_Lat'].values)
x2, y2 = id_map(top_100_2['Actor1Geo_Long'].values, top_100_2['Actor1Geo_Lat'].values)

# plot material conflict locations
for i in xrange(len(x2)):
    id_map.plot(x2[i], y2[i], 'or', 
                markersize=top_100_2['Material Conflict'].values[i] / 2, 
                alpha=.3) 
    
# plot verbal conflict locations
for i in xrange(len(x1)):
    id_map.plot(x1[i], y1[i], 'oy', 
                markersize=top_100_1['Verbal Conflict'].values[i] / 2, 
                alpha=.8)
In [ ]: