Game of Thrones (GoT) is an American fantasy drama TV series, created by D. Benioff and D.B. Weiss for the American television network HBO. It is the screen adaption of the series of fantasy novels A Song of Ice and Fire, written by George R.R. Martin. The series premiered on HBO in the United States on April 17, 2011, and concluded on May 19, 2019, with 73 episodes broadcast over eight seasons. With its 12 million viewers during season 8 and a plethora of awards---according to Wikipedia, Game of Thrones has attracted record viewership on HBO and has a broad, active, and international fan base.
The intricate world narrated by George R.R. Martin and scripted by Benioff and Weiss has stimulated the curiosity of ranks of scientists, delighted by the opportunity to study complex social phenomena. In this notebook, we delve into the study of GoT relationships to discover what the hypergraphs they generate reveal about the story.
In this notebook, we replicate some of the analysis you can read in our paper at this link!
using Pkg
#pkg"add PyCall Conda SimpleHypergraphs PyPlot GraphPlot ColorSchemes"
using PyCall
using Conda
#Conda.runconda(`install matplotlib --yes`)
#Conda.runconda(`install networkx --yes`)
#run(`$(PyCall.python) -m pip install hypernetx==1.2.5`)
using SimpleHypergraphs
using Graphs
using PyPlot
using GraphPlot, ColorSchemes
This study is based on the dataset at the GitHub repository of Jeffrey Lancaster Game of Thrones Datasets and Visualizations. We will thus focus on the GoT TV series.
We studied GoT characters' co-occurrences with different levels of granularity. We modeled the GoT data set building three different types of hypergraphs, each one reporting whether the GoT characters have appeared in the same season, in the same episode, or in the same scene together.
First, we load a hypergraph studying characters' co-occurences within seasons. Here, the hyperedges are the GoT seasons and the characters who appear in each eason are the nodes.
pth = joinpath(dirname(pathof(SimpleHypergraphs)),"..","tutorials", "basics", "data", "hg_seasons_all.json");
h = SimpleHypergraphs.hg_load(pth; format=JSON_Format(), T=Bool, V=Symbol, E=Symbol);
# how many characters did we see during the overall TV series?
size(h)[1]
577
# how many characters does each season have?
# we can ask this way...
map(he -> println("Season $he has $(length(getvertices(h, he))) characters"), 1:nhe(h));
Season 1 has 125 characters Season 2 has 137 characters Season 3 has 137 characters Season 4 has 152 characters Season 5 has 175 characters Season 6 has 208 characters Season 7 has 75 characters Season 8 has 66 characters
# ... or this way
length.(h.he2v)
8-element Vector{Int64}: 125 137 137 152 175 208 75 66
SimpleHypergraphs integates the Python library HyperNetX to let the user visualize a hypergraph h
exploiting an Euler-diagram visualization. For more details, please refer to the library HyperNetX.
pth = joinpath(dirname(pathof(SimpleHypergraphs)),"..","tutorials", "basics", "data", "hg_seasons_min.json");
# Let's visualize (a smaller parte of) the hypergraph we built
# To build this smaller hypergraph, we considered only those characters
# appearing at least in 10 scenes in the whole series
h1 = SimpleHypergraphs.hg_load(pth; format=JSON_Format(), T=Int, V=Symbol, E=Symbol);
map(he ->
println("Season $he has *$(length(getvertices(h1, he)))* characters appearing in at least 10 scenes"),
1:nhe(h));
Season 1 has *69* characters appearing in at least 10 scenes Season 2 has *74* characters appearing in at least 10 scenes Season 3 has *83* characters appearing in at least 10 scenes Season 4 has *83* characters appearing in at least 10 scenes Season 5 has *75* characters appearing in at least 10 scenes Season 6 has *98* characters appearing in at least 10 scenes Season 7 has *56* characters appearing in at least 10 scenes Season 8 has *44* characters appearing in at least 10 scenes
length.(h1.he2v)
8-element Vector{Int64}: 69 74 83 83 75 98 56 44
# viz params: edge labels
edge_labels = Dict{Int, String}(map(x -> x=>"S$x", 1:nhe(h)))
edge_labels_kwargs = Dict{String,Any}("fontsize" => "x-large")
;
SimpleHypergraphs.draw(
h1,
HyperNetX;
no_border = true,
width = 7,
height = 7,
collapse_nodes = true,
with_node_counts = true,
with_node_labels = true,
edge_labels = edge_labels,
edge_labels_kwargs = edge_labels_kwargs
)
# who are the characters appearing in all 8 seasons?
most_important_character_ids = findall(x->x==1, (length.(h.v2he) .== 8))
for id in most_important_character_ids
println(SimpleHypergraphs.get_vertex_meta(h, id))
end
White_Walker Jon_Snow Sansa_Stark Arya_Stark Theon_Greyjoy Cersei_Lannister Jaime_Lannister Tyrion_Lannister Daenerys_Targaryen Jorah_Mormont Drogon Rhaegal Viserion Lord_Varys Samwell_Tarly Bronn
# Let's have a closer look of what's happening in season 1
pth = joinpath(dirname(pathof(SimpleHypergraphs)),"..","tutorials", "basics", "data", "hg_season1.json");
hg = SimpleHypergraphs.hg_load(pth; format=JSON_Format(), T=Bool, V=Symbol, E=Symbol);
# how many characters do we have? How many scenes?
"$(nhv(hg)) characters and $(nhe(hg)) scenes"
"125 characters and 286 scenes"
At this point, we had an overview about how many characters appeared over the whole GoT TV series and which one of them made it till the end.
Let's find out how these characters interacted with each other in season 1. To gather insights within these complex relationships, we run a community detection algorithm on the hypergraph built considering scenes as hyperedges.
Running this algorithm, we expect to find coherent plotlines and, therefore, groups of characters frequently appearing in a scene together.
# Let's assure to have a single connected component
# Otherwise, we pick the largest one
cc = get_connected_components(hg)
2-element Vector{Vector{Int64}}: [1, 5, 3, 4, 2, 16, 12, 29, 34, 49 … 37, 87, 48, 44, 40, 106, 93, 101, 118, 119] [99]
remove_vertex!(hg, 99)
cc = get_connected_components(hg)
1-element Vector{Vector{Int64}}: [1, 5, 3, 4, 2, 16, 12, 29, 34, 49 … 37, 87, 48, 44, 40, 106, 93, 101, 118, 119]
# We used the label propagation (LP) algorithm
cflp = CFLabelPropagationFinder(100,1234)
comms = findcommunities(hg, cflp)
"We found $(length(comms.np)) communities in the hypergraph of the 1st season."
"We found 18 communities in the hypergraph of the 1st season."
length.(comms.np)
18-element Vector{Int64}: 1 1 4 37 1 1 1 18 3 1 16 4 13 7 12 1 1 2
# Who are they?
for c in comms.np
for character in c
println(get_vertex_meta(hg, character))
end
println("-----")
end
White_Walker ----- Ros ----- Shaggydog Nymeria Grey_Wind Lady ----- Loras_Tyrell Renly_Baratheon Grand_Maester_Pycelle Septa_Mordane Old_Nan Mycah Hugh_of_the_Vale Sansa_Stark Hodor Joffrey_Baratheon Gregor_Clegane Jon_Arryn Jory_Cassel Royal_Steward Myrcella_Baratheon Benjen_Stark Rickon_Stark Varly Eddard_Stark Robert_Baratheon Beric_Dondarrion Lancel_Lannister Ilyn_Payne Sandor_Clegane Tommen_Baratheon Cersei_Lannister Joss Janos_Slynt Yoren Petyr_Baelish Arya_Stark Lord_Varys Barristan_Selmy Summer Tomard Jaime_Lannister Meryn_Trant ----- Mhaegen ----- Armeca ----- Tobho_Mott ----- Mirri_Maz_Duur Viserys_Targaryen Irri Drogon Viserion Daenerys_Targaryen Rhaegal Mago Khal_Drogo Wine_Merchant Doreah Qotho Jhiqui Rakharo Illyrio_Mopatis Rhaego Jorah_Mormont Dothraki_Crone ----- Osha Wallen Stiv ----- Three-Eyed_Raven ----- Vardis_Egen Addam_Marbrand Leo_Lefford Chella Robin_Arryn Lannister_Messenger Shae Mord Lysa_Arryn Kevan_Lannister Bronn Timett Marillion Tyrion_Lannister Shagga Tywin_Lannister ----- Walder_Frey Joyeuse_Erenford Stevron_Frey Ryger_Rivers ----- Alliser_Thorne Ghost Rast Jon_Snow Maester_Aemon Jaremy_Rykker Pypar Othell_Yarwyck Jafer_Flowers Othor Jeor_Mormont Samwell_Tarly Grenn ----- Hot_Pie Syrio_Forel Lommy_Greenhands Gendry King's_Landing_Baker Street_Urchin Red_Keep_Stableboy ----- Robb_Stark Jonos_Bracken Theon_Greyjoy Will Lannister_Scout Bran_Stark Greatjon_Umber Catspaw_Assassin Rodrik_Cassel Catelyn_Stark Maester_Luwin Rickard_Karstark ----- Wight_Wildling_Girl ----- Little_Bird ----- Waymar_Royce Gared -----
Based on the communities above, we can say that the LP algorithm ran on such hypergraph revealed:
The presence of minor communities of characters, appearing only in few scenes of the whole season. It is interesting to note that the algorithm correctly identifies background characters that do heavily not contribute to the main storyline (for now).
The other communities embody the different sub-plotlines happening in the first season. We can list:
The last more significant community contains the majority of the characters appearing in the first season. This result makes sense as all these characters have been forced to stay together at the Red Keep. Thus, they appear in many scenes together.
modularity(hg, comms.np)
0.5040026252111678
# Here, we used a CNM-Like algorithm for finding communities.
cnm = CFModularityCNMLike(500)
cnm_comms = findcommunities(hg, cnm)
"We found $(length(cnm_comms.bp)) communities in the hypergraph of the 1st season with a modularity value of $(cnm_comms.bm)."
"We found 16 communities in the hypergraph of the 1st season with a modularity value of 0.5121486201080864."
# How many characters are there in each community?
# size of each community
comms_size = length.(cnm_comms.bp)
# how many community of that size do exist?
values = unique(length.(cnm_comms.bp))
data = Dict([(i, count(x->x==i, comms_size)) for i in values])
Dict{Int64, Int64} with 5 entries: 5 => 1 12 => 1 76 => 1 1 => 12 19 => 1
# Who are they?
for c in cnm_comms.bp
for character in c
println(get_vertex_meta(hg, character))
end
println("-----")
end
White_Walker Wight_Wildling_Girl Waymar_Royce Will Gared ----- Khal_Drogo Wine_Merchant Mirri_Maz_Duur Doreah Viserys_Targaryen Qotho Jhiqui Little_Bird Irri Rakharo Drogon Viserion Daenerys_Targaryen Rhaegal Illyrio_Mopatis Rhaego Jorah_Mormont Dothraki_Crone Mago ----- Renly_Baratheon Grand_Maester_Pycelle Nymeria Ros Joyeuse_Erenford Septa_Mordane Alliser_Thorne Hodor Lady Myrcella_Baratheon Rast Jon_Snow Maester_Aemon Eddard_Stark Robert_Baratheon Stevron_Frey Lannister_Scout Lancel_Lannister Pypar Sandor_Clegane Greatjon_Umber Joss Petyr_Baelish Othor Samwell_Tarly Arya_Stark Summer Lord_Varys Barristan_Selmy Wallen Grenn Red_Keep_Stableboy Old_Nan Mycah Hugh_of_the_Vale Sansa_Stark Jon_Arryn Joffrey_Baratheon Robb_Stark Jory_Cassel Gregor_Clegane Ghost Royal_Steward Shaggydog Mhaegen Armeca Rickon_Stark Syrio_Forel Three-Eyed_Raven Varly Stiv Beric_Dondarrion Theon_Greyjoy King's_Landing_Baker Street_Urchin Jaremy_Rykker Ilyn_Payne Osha Bran_Stark Cersei_Lannister Othell_Yarwyck Tommen_Baratheon Janos_Slynt Yoren Marillion Jafer_Flowers Jeor_Mormont Catspaw_Assassin Walder_Frey Rodrik_Cassel Maester_Luwin Grey_Wind Jaime_Lannister Catelyn_Stark Ryger_Rivers Meryn_Trant ----- Benjen_Stark ----- Lommy_Greenhands ----- Lysa_Arryn ----- Hot_Pie ----- Addam_Marbrand Leo_Lefford Bronn Chella Timett Lannister_Messenger Tyrion_Lannister Shae Shagga Tywin_Lannister Mord Kevan_Lannister ----- Tobho_Mott ----- Gendry ----- Tomard ----- Vardis_Egen ----- Robin_Arryn ----- Jonos_Bracken ----- Rickard_Karstark ----- Loras_Tyrell -----
We will visualize the obtained communities through a two-section representation of the hypergraph.
# As a TwoSectionView can be instantiated only for hypergraphs with non-overlapping edges,
# here we will use Graphs.jl directly.
m = get_twosection_adjacency_mx(hg;replace_weights=1)
t = Graph(m)
{124, 886} undirected simple Int64 graph
my_colors = vcat(ColorSchemes.rainbow[range(1, stop=length(ColorSchemes.rainbow), step=2)], ColorSchemes.rainbow[2]);
function get_color(i, comms, colors)
for j in 1:length(comms)
if i in comms[j]
return colors[j % length(colors) + 1]
end
end
return "black"
end;
degrees = [Graphs.degree(t, v) for v in Graphs.vertices(t)];
dsize = fill(1.3, 124)
dsize += degrees/maximum(degrees);
# evaluate comms labels
function get_labels(comms)
labels = fill("", 124)
index = 1
for c in comms
labels[first(c)] = "C$(index)"
index += 1
end
labels
end
labels = get_labels(comms.np)
cnm_labels = get_labels(cnm_comms.bp);
gplot(
t,
nodesize = dsize,
nodelabel = labels,
nodelabeldist=2.5,
#nodelabelsize=size,
nodefillc=get_color.(1:Graphs.nv(t), Ref(comms.np), Ref(reverse(my_colors)))
)
gplot(
t,
nodesize = dsize,
nodelabel = cnm_labels,
nodelabeldist = 5.5,
nodelabelsize = dsize,
nodefillc = get_color.(1:Graphs.nv(t), Ref(cnm_comms.bp), Ref(my_colors))
)
Identifying truly important and influential characters in a vast narrative like GoT may not be a trivial task, as it depends on the considered level of granularity. In these cases, the main character(s) in each plotline is referred with the term fractal protagonist(s), to indicate that the answer to "who is the protagonist" depends on the specific plotline.
Who are the characters that apper in the majority of the scenes?
degrees = Dict{Int, Int}()
for v=1:nhv(hg)
degrees[v] = length(gethyperedges(hg, v))
end
sorted_degrees = sort(collect(degrees), by=x->x[2], rev=true);
# Let's plot these data
characters = Array{String, 1}()
degrees = Array{Int, 1}()
max_c = 0
# we will visualize only characters appearing in at least 15 scenes
for c in sorted_degrees
max_c = max_c > 15 ? break : max_c + 1
push!(characters, string(get_vertex_meta(hg, c.first)))
push!(degrees, c.second)
end
pos = collect(1:length(characters))
fig = plt.figure(figsize=(6, 4))
ax = fig.add_subplot(111)
rects = ax.barh(
pos,
degrees,
align="center",
tick_label=characters
)
#plt.tight_layout(.5)
plt.tick_params(labelsize="large")
plt.title("Season 1", pad=10, fontweight="semibold", fontsize="x-large")
plt.tight_layout()
We investigated the importance of the characters also evaluating the betweenness centrality (BC) metric of hypergraph nodes. BC measures the centrality of a node by computing the number of times that a node acts as a bridge between the other two nodes, considering the shortest paths between them.
Here, we used the concept of s-beetwennes-centrality. Check out the paper for more detail about this metric.
# Here we evaluate betweennes value considering 1-paths, 2-paths, and 3-paths.
betweeness = Dict{Int, Dict{Symbol, Float64}}()
for s=1:3
A = SimpleHypergraphs.adjacency_matrix(hg; s=s)
G = Graphs.SimpleGraph(A)
bc = Graphs.betweenness_centrality(G)
for v=1:nhv(hg)
push!(
get!(betweeness, s, Dict{Symbol, Int}()),
get_vertex_meta(hg, v) => bc[v]
)
end
end
sorted_betweeness = Dict{Int, Any}()
for s=1:3
d = get!(betweeness, s, Dict{Symbol, Int}())
d_sorted = sort(collect(d), by=x->x[2], rev=true)
push!(
sorted_betweeness,
s => d_sorted
)
end
sorted_betweeness
Dict{Int64, Any} with 3 entries: 2 => [:Tyrion_Lannister=>0.0842826, :Jon_Snow=>0.070162, :Eddard_Stark=>0.054… 3 => [:Jon_Snow=>0.056747, :Eddard_Stark=>0.0513413, :Tyrion_Lannister=>0.048… 1 => [:Arya_Stark=>0.271413, :Illyrio_Mopatis=>0.251899, :Tyrion_Lannister=>0…
# Getting top 10 characters for each s-value
#character => (HG_degree, G_degree)
data = Dict{Symbol, Array{Float64, 1}}()
for s=1:3
higher_degree_characters = sorted_betweeness[s][1:10]
for elem in higher_degree_characters
if !haskey(data, elem.first)
if s==1
push!(
get!(data, elem.first, Array{Float64, 1}()),
elem.second,
betweeness[2][elem.first],
betweeness[3][elem.first]
)
elseif s==2
push!(
get!(data, elem.first, Array{Float64, 1}()),
betweeness[1][elem.first],
elem.second,
betweeness[3][elem.first]
)
else
push!(
get!(data, elem.first, Array{Float64, 1}()),
betweeness[1][elem.first],
betweeness[2][elem.first],
elem.second
)
end
end
end
end
#by highest betweenes degree in the graph, s=1
sorted_data = sort(collect(data), by=x->x[2][1], rev=true)
13-element Vector{Pair{Symbol, Vector{Float64}}}: :Arya_Stark => [0.27141303430561975, 0.020811742624320874, 0.025567689757446024] :Illyrio_Mopatis => [0.25189924030387845, 0.0, 0.0] :Tyrion_Lannister => [0.15323433877389522, 0.08428255933571452, 0.048736052575128745] :Eddard_Stark => [0.11451260512166041, 0.05462031924487055, 0.051341324021137794] :Catelyn_Stark => [0.11139671344052993, 0.029487862630240228, 0.033451404486765125] :Jon_Snow => [0.11122931698056152, 0.07016200324650089, 0.05674696356665995] :Will => [0.06384113021458084, 0.03825136612021858, 0.0] :Lord_Varys => [0.06377540542298163, 0.0009193736202932523, 0.001578733585930707] :Rodrik_Cassel => [0.05228209175473733, 0.023788449409135447, 0.017966399768193867] :Bran_Stark => [0.042489406954219586, 0.0441890894890005, 0.029071370024299194] :Theon_Greyjoy => [0.038163554220999756, 0.030162976584305612, 0.023194820979115627] :Robert_Baratheon => [0.01766998385978779, 0.014594012937620571, 0.014715908661845302] :Sansa_Stark => [0.015381909177319377, 0.020828998970610812, 0.012916363931724326]
labels = Array{String, 1}()
s1 = Array{Float64, 1}()
s2 = Array{Float64, 1}()
s3 = Array{Float64, 1}()
for elem in sorted_data
push!(labels, string(elem.first))
push!(s1, elem.second[1])
push!(s2, elem.second[2])
push!(s3, elem.second[3])
end
clf()
fig = plt.figure(figsize=(7.5, 4.5))
ax = plt.axes([0.12, 0.32, 0.85, 0.6])
pos = collect(1:length(labels))# the x locations for the groups
width = 0.3 # the width of the bars
s1_rects = ax.bar(pos .- width/2, s1, width, label=L"$1-path$")
s2_rects = ax.bar(pos .+ width/2, s2, width, label=L"$2-path$")
s3_rects = ax.bar(pos .+ (width+width/2), s2, width, label=L"$3-path$")
# Add some text for labels, title and custom x-axis tick labels, etc.
plt.title("Season 8", pad=10, fontweight="semibold", fontsize="x-large")
plt.ylabel("Betweeness Centrality Score", fontweight="semibold", fontsize="x-large", labelpad=10)
ax.set_xticks(pos)
ax.set_xticklabels(labels)
ax.set_yticks(collect(range(0, stop=(.40 > maximum(s1) ? .4 : maximum(s1)), step=0.05))) #(season == 8 ? 25 : 10)
ax.legend()
plt.setp(ax.xaxis.get_majorticklabels(), rotation=65, ha="right", rotation_mode="anchor")
plt.tick_params(labelsize="large")
The plot above, showing the betweenness centrality scores using (1,2,3)-paths, reveals that removing "shallow" relationships can bring to light a completely different insight about the structure of the network we are analyzing. Specifically, in this case, it is worth noting that:
This example highlight how just using 2-paths, we can grasp how characters really interact in the plot and which are the guys who tie all characters together.