Exploratory Analysis of the Battles of the War of Five Kings Dataset¶

Author: Chris Albon (@ChrisAlbon)
Date: August 17, 2014
Repo: https://github.com/chrisalbon/war_of_the_five_kings_dataset/

Preliminary¶

In [23]:

# Import required modules
import pandas as pd
from ggplot import *
%matplotlib inline

# Set ipython's max row display
pd.set_option('display.max_row', 1000)

# Set iPython's max column width to 50
pd.set_option('display.max_columns', 50)

Loading the data¶

In [24]:

# Load the dataset
df = pd.read_csv('5kings_battles_v1.csv')

Quick check on the data¶

In [25]:

# View the top five observations
df.head()

Out[25]:

	name	year	battle_number	attacker_king	defender_king	attacker_1	attacker_2	attacker_3	attacker_4	defender_1	defender_2	defender_3	defender_4	attacker_outcome	battle_type	major_death	major_capture	attacker_size	defender_size	attacker_commander	defender_commander	summer	location	region	note
0	Battle of the Golden Tooth	298	1	Joffrey/Tommen Baratheon	Robb Stark	Lannister	NaN	NaN	NaN	Tully	NaN	NaN	NaN	win	pitched battle	1	0	15000	4000	Jaime Lannister	Clement Piper, Vance	1	Golden Tooth	The Westerlands	NaN
1	Battle at the Mummer's Ford	298	2	Joffrey/Tommen Baratheon	Robb Stark	Lannister	NaN	NaN	NaN	Baratheon	NaN	NaN	NaN	win	ambush	1	0	NaN	120	Gregor Clegane	Beric Dondarrion	1	Mummer's Ford	The Riverlands	NaN
2	Battle of Riverrun	298	3	Joffrey/Tommen Baratheon	Robb Stark	Lannister	NaN	NaN	NaN	Tully	NaN	NaN	NaN	win	pitched battle	0	1	15000	10000	Jaime Lannister, Andros Brax	Edmure Tully, Tytos Blackwood	1	Riverrun	The Riverlands	NaN
3	Battle of the Green Fork	298	4	Robb Stark	Joffrey/Tommen Baratheon	Stark	NaN	NaN	NaN	Lannister	NaN	NaN	NaN	loss	pitched battle	1	1	18000	20000	Roose Bolton, Wylis Manderly, Medger Cerwyn, H...	Tywin Lannister, Gregor Clegane, Kevan Lannist...	1	Green Fork	The Riverlands	NaN
4	Battle of the Whispering Wood	298	5	Robb Stark	Joffrey/Tommen Baratheon	Stark	Tully	NaN	NaN	Lannister	NaN	NaN	NaN	win	ambush	1	1	1875	6000	Robb Stark, Brynden Tully	Jaime Lannister	1	Whispering Wood	The Riverlands	NaN

In [26]:

# View the bottom five observations
df.tail()

Out[26]:

	name	year	battle_number	attacker_king	defender_king	attacker_1	attacker_2	attacker_3	attacker_4	defender_1	defender_2	defender_3	defender_4	attacker_outcome	battle_type	major_death	major_capture	attacker_size	defender_size	attacker_commander	defender_commander	location	region	note
33	Second Seige of Storm's End	300	34	Joffrey/Tommen Baratheon	Stannis Baratheon	Baratheon	NaN	NaN	NaN	Baratheon	NaN	NaN	NaN	win	siege	0	0	NaN	200	Mace Tyrell, Mathis Rowan	Gilbert Farring	Storm's End	The Stormlands	NaN
34	Siege of Dragonstone	300	35	Joffrey/Tommen Baratheon	Stannis Baratheon	Baratheon	NaN	NaN	NaN	Baratheon	NaN	NaN	NaN	win	siege	0	0	2000	NaN	Loras Tyrell, Raxter Redwyne	Rolland Storm	Dragonstone	The Stormlands	NaN
35	Siege of Riverrun	300	36	Joffrey/Tommen Baratheon	Robb Stark	Lannister	Frey	NaN	NaN	Tully	NaN	NaN	NaN	win	siege	0	0	3000	NaN	Daven Lannister, Ryman Fey, Jaime Lannister	Brynden Tully	Riverrun	The Riverlands	NaN
36	Siege of Raventree	300	37	Joffrey/Tommen Baratheon	Robb Stark	Bracken	Lannister	NaN	NaN	Blackwood	NaN	NaN	NaN	win	siege	0	1	1500	NaN	Jonos Bracken, Jaime Lannister	Tytos Blackwood	Raventree	The Riverlands	NaN
37	Siege of Winterfell	300	38	Stannis Baratheon	Joffrey/Tommen Baratheon	Baratheon	Karstark	Mormont	Glover	Bolton	Frey	NaN	NaN	NaN	NaN	NaN	NaN	5000	8000	Stannis Baratheon	Roose Bolton	Winterfell	The North	NaN

Exploratory Data Analysis¶

Which year had the most battles?¶

In [27]:

# Count the number of observations for each value
df['year'].value_counts()

Out[27]:

299    20
300    11
298     7
dtype: int64

Which region had the most battles?¶

In [28]:

# Count the number of observations for each value, then make a bar plot
df['region'].value_counts().plot(kind='bar')

Out[28]:

<matplotlib.axes.AxesSubplot at 0x10c33e810>

What was the outcomes of all battles?¶

In [29]:

# Count the number of observations for each value, then make a bar plot
df['attacker_outcome'].value_counts().plot(kind='bar')

Out[29]:

<matplotlib.axes.AxesSubplot at 0x10c376710>

How common was the different types of battles?¶

In [30]:

# Count the number of observations for each value, then make a bar plot
df['battle_type'].value_counts().plot(kind='bar')

Out[30]:

<matplotlib.axes.AxesSubplot at 0x10c46eb10>

Which king attacked the most?¶

In [31]:

# Count the number of observations for each value, then make a bar plot
df['attacker_king'].value_counts().plot(kind='bar')

Out[31]:

<matplotlib.axes.AxesSubplot at 0x10c549350>

Which king was the most attacked?¶

In [32]:

# Count the number of observations for each value, then make a bar plot
df['defender_king'].value_counts().plot(kind='bar')

Out[32]:

<matplotlib.axes.AxesSubplot at 0x10c90c7d0>

Which kings were most active in the war?¶

In [33]:

war_action = df['attacker_king'].value_counts() + df['defender_king'].value_counts()
war_action.fillna(1).plot(kind='bar')

Out[33]:

<matplotlib.axes.AxesSubplot at 0x10c9c8610>

Is there any relationship between troop size and battle outcome?¶

In [34]:

# Create a ggplot scatter plot of attacker_size against defender_size (if not NaN), 
# with the color of each dot being determined by the outcome of the battle
ggplot(aes(x='attacker_size', y='defender_size', colour='attacker_outcome'), 
       data=df[df['attacker_size'].notnull() & df['defender_size'].notnull() & df['attacker_outcome'].notnull()]) + \
    geom_point()

Out[34]:

<ggplot: (281727157)>

In [34]: