Indonesian conflicts 2012¶

This is just a simple data analysis exercise based on the GDELT database, don't take the results seriously, and try not to draw any conclusions from it :)

In [1]:

from IPython.core.display import HTML
styles = open("Style.css").read()
HTML(styles)

Out[1]:

In [2]:

import pandas as pd
import numpy as np

from mpl_toolkits.basemap import Basemap

Connect to the database:

In [3]:

%load_ext sql

In [4]:

%sql mysql://root:mysqlpassword@localhost:3306/gdelt

Out[4]:

u'Connected: root@gdelt'

This will query the GDELT database for events involving Indonesia in the year 2012:

In [5]:

qry = '''select * from Events 
         where 
         ((substring(Actor1Code, 1, 3)="IDN" or substring(Actor2Code, 1, 3)="IDN") 
         and substring(Day, 1, 4)="2012")'''

In [6]:

res = %sql $qry

len(res)

Out[6]:

This creates a pandas DataFrame out of the query result, and filter out the data even more to include only conflict-related events directed towards Indonesia:

In [9]:

indn = pd.DataFrame(res, columns=res.keys)

# automatic datatype conversion stopped working somehow
float_cols = ['Actor1Geo_Lat', 'Actor1Geo_Long',
              'Actor2Geo_Lat', 'Actor2Geo_Long',
              'ActionGeo_Lat', 'ActionGeo_Long',
              'GoldsteinScale']

indn.QuadCategory = indn.QuadCategory.astype(int)
indn[float_cols] = indn[float_cols].astype(float)

# those starting verbal and material conflict with indonesia
to_indn = indn[(indn.Actor2Code.apply(lambda x: x[:3])=='IDN') & 
               ((indn.QuadCategory==3) | (indn.QuadCategory==4))]

to_indn.iloc[:5,:6]

Out[9]:

	Day	Actor1Code	Actor2Code	EventCode	QuadCategory	GoldsteinScale
0	20120101	AUSELI	IDN	138	3	-7
10	20120101	GOV	IDN	112	3	-2
54	20120101	JPN	IDN	173	4	-5
61	20120101	NGA	IDN	100	3	-5
62	20120101	NGA	IDN	1724	4	-5

We then transform the data to aggregate the number of events at a particular location ((lat, lon) coordinate):

In [10]:

pivot = pd.pivot_table(to_indn, values='Day', rows=['Actor1Geo_Lat', 'Actor1Geo_Long'],
                       cols='QuadCategory', aggfunc=len)

pivot = pivot.fillna(0)
pivot = pivot.reset_index()
pivot.head(10)

Out[10]:

QuadCategory	Actor1Geo_Lat	Actor1Geo_Long	3	4
0	-43.5333	172.633	1	0
1	-42.8832	147.332	0	1
2	-42.8667	147.683	0	3
3	-42.6839	147.266	0	1
4	-41.3000	174.783	1	2
5	-41.1333	175.033	0	1
6	-41.0000	174.000	0	7
7	-38.1667	145.800	1	0
8	-37.8167	144.967	6	5
9	-37.5627	143.863	1	1

Now we sort the transformed data for each conflict event (verbal and material), and get the top 100 locations (interestingly the most conflict actually comes by far from the capital, so I filter it out so as not to swamp the others):

In [11]:

top_100_1 = pivot.sort(columns=[3], ascending=False)[1:101]
top_100_2 = pivot.sort(columns=[4], ascending=False)[1:101]

top_100_2[:10]

Out[11]:

QuadCategory	Actor1Geo_Lat	Actor1Geo_Long	3	4
350	-5.00000	140.000	86	171
349	-5.00000	120.000	119	116
35	-27.00000	133.000	54	69
519	3.16667	101.700	22	53
321	-6.17806	106.630	17	42
262	-6.75000	107.500	33	41
90	-8.65000	115.217	11	40
104	-8.51630	120.767	23	40
501	2.50000	112.500	18	39
822	38.00000	-97.000	22	34

In [12]:

top_100_1 = top_100_1.rename(columns={3: 'Verbal Conflict',
                                      4: 'Material Conflict'})

top_100_2 = top_100_2.rename(columns={3: 'Verbal Conflict',
                                      4: 'Material Conflict'})

Now plot the locations on a global map. The circle sizes correspond to the number of events at the location, i.e. the country/location on earth that exert conflict towards Indonesia, red being material conflict, yellow being verbal conflict. The bigger the circle, the more conflicts are initiated by the country at the location of the circle towards Indonesia in the year 2012.

In [13]:

figure(figsize=(15,15))

# create the map object
id_map = Basemap()

# draw important features
id_map.drawcoastlines(linewidth=.3) 
id_map.drawcountries(linewidth=.2)
id_map.fillcontinents(color='0.8') # Light gray
id_map.drawmapboundary()

# transform from (lat, lon) to (x, y) 
x1, y1 = id_map(top_100_1['Actor1Geo_Long'].values, top_100_1['Actor1Geo_Lat'].values)
x2, y2 = id_map(top_100_2['Actor1Geo_Long'].values, top_100_2['Actor1Geo_Lat'].values)

# plot material conflict locations
for i in xrange(len(x2)):
    id_map.plot(x2[i], y2[i], 'or', 
                markersize=top_100_2['Material Conflict'].values[i] / 2, 
                alpha=.3) 

# plot verbal conflict locations
for i in xrange(len(x1)):
    id_map.plot(x1[i], y1[i], 'oy', 
                markersize=top_100_1['Verbal Conflict'].values[i] / 2, 
                alpha=.8)

Focusing on a more specific region:

In [14]:

figure(figsize=(10,10))

# çreate the map object with boundaries
id_map = Basemap(llcrnrlon=30, llcrnrlat=-40, # lower left corner
                 urcrnrlon=155, urcrnrlat=50) # upper right corner

# draw important features
id_map.drawcoastlines(linewidth=.3) 
id_map.drawcountries(linewidth=.2)
id_map.fillcontinents(color='0.8') # Light gray
id_map.drawmapboundary()

x1, y1 = id_map(top_100_1['Actor1Geo_Long'].values, top_100_1['Actor1Geo_Lat'].values)
x2, y2 = id_map(top_100_2['Actor1Geo_Long'].values, top_100_2['Actor1Geo_Lat'].values)

# plot material conflict locations
for i in xrange(len(x2)):
    id_map.plot(x2[i], y2[i], 'or', 
                markersize=top_100_2['Material Conflict'].values[i] / 2, 
                alpha=.3) 
    
# plot verbal conflict locations
for i in xrange(len(x1)):
    id_map.plot(x1[i], y1[i], 'oy', 
                markersize=top_100_1['Verbal Conflict'].values[i] / 2, 
                alpha=.8)

In [ ]: