Exploratory Analysis of the Battles of the War of Five Kings Dataset

Preliminary

In [23]:
# Import required modules
import pandas as pd
from ggplot import *
%matplotlib inline

# Set ipython's max row display
pd.set_option('display.max_row', 1000)

# Set iPython's max column width to 50
pd.set_option('display.max_columns', 50)

Loading the data

In [24]:
# Load the dataset
df = pd.read_csv('5kings_battles_v1.csv')

Quick check on the data

In [25]:
# View the top five observations
df.head()
Out[25]:
name year battle_number attacker_king defender_king attacker_1 attacker_2 attacker_3 attacker_4 defender_1 defender_2 defender_3 defender_4 attacker_outcome battle_type major_death major_capture attacker_size defender_size attacker_commander defender_commander summer location region note
0 Battle of the Golden Tooth 298 1 Joffrey/Tommen Baratheon Robb Stark Lannister NaN NaN NaN Tully NaN NaN NaN win pitched battle 1 0 15000 4000 Jaime Lannister Clement Piper, Vance 1 Golden Tooth The Westerlands NaN
1 Battle at the Mummer's Ford 298 2 Joffrey/Tommen Baratheon Robb Stark Lannister NaN NaN NaN Baratheon NaN NaN NaN win ambush 1 0 NaN 120 Gregor Clegane Beric Dondarrion 1 Mummer's Ford The Riverlands NaN
2 Battle of Riverrun 298 3 Joffrey/Tommen Baratheon Robb Stark Lannister NaN NaN NaN Tully NaN NaN NaN win pitched battle 0 1 15000 10000 Jaime Lannister, Andros Brax Edmure Tully, Tytos Blackwood 1 Riverrun The Riverlands NaN
3 Battle of the Green Fork 298 4 Robb Stark Joffrey/Tommen Baratheon Stark NaN NaN NaN Lannister NaN NaN NaN loss pitched battle 1 1 18000 20000 Roose Bolton, Wylis Manderly, Medger Cerwyn, H... Tywin Lannister, Gregor Clegane, Kevan Lannist... 1 Green Fork The Riverlands NaN
4 Battle of the Whispering Wood 298 5 Robb Stark Joffrey/Tommen Baratheon Stark Tully NaN NaN Lannister NaN NaN NaN win ambush 1 1 1875 6000 Robb Stark, Brynden Tully Jaime Lannister 1 Whispering Wood The Riverlands NaN
In [26]:
# View the bottom five observations
df.tail()
Out[26]:
name year battle_number attacker_king defender_king attacker_1 attacker_2 attacker_3 attacker_4 defender_1 defender_2 defender_3 defender_4 attacker_outcome battle_type major_death major_capture attacker_size defender_size attacker_commander defender_commander summer location region note
33 Second Seige of Storm's End 300 34 Joffrey/Tommen Baratheon Stannis Baratheon Baratheon NaN NaN NaN Baratheon NaN NaN NaN win siege 0 0 NaN 200 Mace Tyrell, Mathis Rowan Gilbert Farring 0 Storm's End The Stormlands NaN
34 Siege of Dragonstone 300 35 Joffrey/Tommen Baratheon Stannis Baratheon Baratheon NaN NaN NaN Baratheon NaN NaN NaN win siege 0 0 2000 NaN Loras Tyrell, Raxter Redwyne Rolland Storm 0 Dragonstone The Stormlands NaN
35 Siege of Riverrun 300 36 Joffrey/Tommen Baratheon Robb Stark Lannister Frey NaN NaN Tully NaN NaN NaN win siege 0 0 3000 NaN Daven Lannister, Ryman Fey, Jaime Lannister Brynden Tully 0 Riverrun The Riverlands NaN
36 Siege of Raventree 300 37 Joffrey/Tommen Baratheon Robb Stark Bracken Lannister NaN NaN Blackwood NaN NaN NaN win siege 0 1 1500 NaN Jonos Bracken, Jaime Lannister Tytos Blackwood 0 Raventree The Riverlands NaN
37 Siege of Winterfell 300 38 Stannis Baratheon Joffrey/Tommen Baratheon Baratheon Karstark Mormont Glover Bolton Frey NaN NaN NaN NaN NaN NaN 5000 8000 Stannis Baratheon Roose Bolton 0 Winterfell The North NaN

Exploratory Data Analysis

Which year had the most battles?

In [27]:
# Count the number of observations for each value
df['year'].value_counts()
Out[27]:
299    20
300    11
298     7
dtype: int64

Which region had the most battles?

In [28]:
# Count the number of observations for each value, then make a bar plot
df['region'].value_counts().plot(kind='bar')
Out[28]:
<matplotlib.axes.AxesSubplot at 0x10c33e810>

What was the outcomes of all battles?

In [29]:
# Count the number of observations for each value, then make a bar plot
df['attacker_outcome'].value_counts().plot(kind='bar')
Out[29]:
<matplotlib.axes.AxesSubplot at 0x10c376710>

How common was the different types of battles?

In [30]:
# Count the number of observations for each value, then make a bar plot
df['battle_type'].value_counts().plot(kind='bar')
Out[30]:
<matplotlib.axes.AxesSubplot at 0x10c46eb10>

Which king attacked the most?

In [31]:
# Count the number of observations for each value, then make a bar plot
df['attacker_king'].value_counts().plot(kind='bar')
Out[31]:
<matplotlib.axes.AxesSubplot at 0x10c549350>

Which king was the most attacked?

In [32]:
# Count the number of observations for each value, then make a bar plot
df['defender_king'].value_counts().plot(kind='bar')
Out[32]:
<matplotlib.axes.AxesSubplot at 0x10c90c7d0>

Which kings were most active in the war?

In [33]:
war_action = df['attacker_king'].value_counts() + df['defender_king'].value_counts()
war_action.fillna(1).plot(kind='bar')
Out[33]:
<matplotlib.axes.AxesSubplot at 0x10c9c8610>

Is there any relationship between troop size and battle outcome?

In [34]:
# Create a ggplot scatter plot of attacker_size against defender_size (if not NaN), 
# with the color of each dot being determined by the outcome of the battle
ggplot(aes(x='attacker_size', y='defender_size', colour='attacker_outcome'), 
       data=df[df['attacker_size'].notnull() & df['defender_size'].notnull() & df['attacker_outcome'].notnull()]) + \
    geom_point()
Out[34]:
<ggplot: (281727157)>
In [34]: