This is just a simple data analysis exercise based on the GDELT database, don't take the results seriously, and try not to draw any conclusions from it :)
from IPython.core.display import HTML
styles = open("Style.css").read()
HTML(styles)
import pandas as pd
import numpy as np
from mpl_toolkits.basemap import Basemap
Connect to the database:
%load_ext sql
%sql mysql://root:mysqlpassword@localhost:3306/gdelt
u'Connected: root@gdelt'
This will query the GDELT database for events involving Indonesia in the year 2012:
qry = '''select * from Events
where
((substring(Actor1Code, 1, 3)="IDN" or substring(Actor2Code, 1, 3)="IDN")
and substring(Day, 1, 4)="2012")'''
res = %sql $qry
len(res)
42734
This creates a pandas
DataFrame out of the query result, and filter out the data even more to include only conflict-related events directed towards Indonesia:
indn = pd.DataFrame(res, columns=res.keys)
# automatic datatype conversion stopped working somehow
float_cols = ['Actor1Geo_Lat', 'Actor1Geo_Long',
'Actor2Geo_Lat', 'Actor2Geo_Long',
'ActionGeo_Lat', 'ActionGeo_Long',
'GoldsteinScale']
indn.QuadCategory = indn.QuadCategory.astype(int)
indn[float_cols] = indn[float_cols].astype(float)
# those starting verbal and material conflict with indonesia
to_indn = indn[(indn.Actor2Code.apply(lambda x: x[:3])=='IDN') &
((indn.QuadCategory==3) | (indn.QuadCategory==4))]
to_indn.iloc[:5,:6]
Day | Actor1Code | Actor2Code | EventCode | QuadCategory | GoldsteinScale | |
---|---|---|---|---|---|---|
0 | 20120101 | AUSELI | IDN | 138 | 3 | -7 |
10 | 20120101 | GOV | IDN | 112 | 3 | -2 |
54 | 20120101 | JPN | IDN | 173 | 4 | -5 |
61 | 20120101 | NGA | IDN | 100 | 3 | -5 |
62 | 20120101 | NGA | IDN | 1724 | 4 | -5 |
We then transform the data to aggregate the number of events at a particular location ((lat, lon) coordinate):
pivot = pd.pivot_table(to_indn, values='Day', rows=['Actor1Geo_Lat', 'Actor1Geo_Long'],
cols='QuadCategory', aggfunc=len)
pivot = pivot.fillna(0)
pivot = pivot.reset_index()
pivot.head(10)
QuadCategory | Actor1Geo_Lat | Actor1Geo_Long | 3 | 4 |
---|---|---|---|---|
0 | -43.5333 | 172.633 | 1 | 0 |
1 | -42.8832 | 147.332 | 0 | 1 |
2 | -42.8667 | 147.683 | 0 | 3 |
3 | -42.6839 | 147.266 | 0 | 1 |
4 | -41.3000 | 174.783 | 1 | 2 |
5 | -41.1333 | 175.033 | 0 | 1 |
6 | -41.0000 | 174.000 | 0 | 7 |
7 | -38.1667 | 145.800 | 1 | 0 |
8 | -37.8167 | 144.967 | 6 | 5 |
9 | -37.5627 | 143.863 | 1 | 1 |
Now we sort the transformed data for each conflict event (verbal and material), and get the top 100 locations (interestingly the most conflict actually comes by far from the capital, so I filter it out so as not to swamp the others):
top_100_1 = pivot.sort(columns=[3], ascending=False)[1:101]
top_100_2 = pivot.sort(columns=[4], ascending=False)[1:101]
top_100_2[:10]
QuadCategory | Actor1Geo_Lat | Actor1Geo_Long | 3 | 4 |
---|---|---|---|---|
350 | -5.00000 | 140.000 | 86 | 171 |
349 | -5.00000 | 120.000 | 119 | 116 |
35 | -27.00000 | 133.000 | 54 | 69 |
519 | 3.16667 | 101.700 | 22 | 53 |
321 | -6.17806 | 106.630 | 17 | 42 |
262 | -6.75000 | 107.500 | 33 | 41 |
90 | -8.65000 | 115.217 | 11 | 40 |
104 | -8.51630 | 120.767 | 23 | 40 |
501 | 2.50000 | 112.500 | 18 | 39 |
822 | 38.00000 | -97.000 | 22 | 34 |
top_100_1 = top_100_1.rename(columns={3: 'Verbal Conflict',
4: 'Material Conflict'})
top_100_2 = top_100_2.rename(columns={3: 'Verbal Conflict',
4: 'Material Conflict'})
Now plot the locations on a global map. The circle sizes correspond to the number of events at the location, i.e. the country/location on earth that exert conflict towards Indonesia, red being material conflict, yellow being verbal conflict. The bigger the circle, the more conflicts are initiated by the country at the location of the circle towards Indonesia in the year 2012.
figure(figsize=(15,15))
# create the map object
id_map = Basemap()
# draw important features
id_map.drawcoastlines(linewidth=.3)
id_map.drawcountries(linewidth=.2)
id_map.fillcontinents(color='0.8') # Light gray
id_map.drawmapboundary()
# transform from (lat, lon) to (x, y)
x1, y1 = id_map(top_100_1['Actor1Geo_Long'].values, top_100_1['Actor1Geo_Lat'].values)
x2, y2 = id_map(top_100_2['Actor1Geo_Long'].values, top_100_2['Actor1Geo_Lat'].values)
# plot material conflict locations
for i in xrange(len(x2)):
id_map.plot(x2[i], y2[i], 'or',
markersize=top_100_2['Material Conflict'].values[i] / 2,
alpha=.3)
# plot verbal conflict locations
for i in xrange(len(x1)):
id_map.plot(x1[i], y1[i], 'oy',
markersize=top_100_1['Verbal Conflict'].values[i] / 2,
alpha=.8)
Focusing on a more specific region:
figure(figsize=(10,10))
# çreate the map object with boundaries
id_map = Basemap(llcrnrlon=30, llcrnrlat=-40, # lower left corner
urcrnrlon=155, urcrnrlat=50) # upper right corner
# draw important features
id_map.drawcoastlines(linewidth=.3)
id_map.drawcountries(linewidth=.2)
id_map.fillcontinents(color='0.8') # Light gray
id_map.drawmapboundary()
x1, y1 = id_map(top_100_1['Actor1Geo_Long'].values, top_100_1['Actor1Geo_Lat'].values)
x2, y2 = id_map(top_100_2['Actor1Geo_Long'].values, top_100_2['Actor1Geo_Lat'].values)
# plot material conflict locations
for i in xrange(len(x2)):
id_map.plot(x2[i], y2[i], 'or',
markersize=top_100_2['Material Conflict'].values[i] / 2,
alpha=.3)
# plot verbal conflict locations
for i in xrange(len(x1)):
id_map.plot(x1[i], y1[i], 'oy',
markersize=top_100_1['Verbal Conflict'].values[i] / 2,
alpha=.8)