This is a short example demonstrating network analysis with DICES data. I'm using NetworkX to build the network models and Pyplot to visualize them.
I'm by no means an expert in network tools. If you have more complex case studies you'd like to share, please get in touch!
# import statements
import pickle
import pandas as pd
import os
from dicesapi import DicesAPI
from dicesapi.jupyter import NotebookPBar
from collections import Counter
from matplotlib import pyplot as plt
import networkx as nx
# initialize connection to database
api = DicesAPI(
progress_class = NotebookPBar,
logfile = 'dices.log',
)
In this case, I'd like to organize every conversation in the corpus according to which parties talk to which other parties.
Nodes in my network will be character instances, and edges will be speaker-addressee relationships. I'm not going to consider how many times they speak throughout the conversation, simple whether person A ever speaks to person B.
I'll assign numbers to the participants in the order in which they appear.
The function below produces a dictionary with three components:
key
: a shorthand representation of who speaks to whomturns
: a table of all the speeches in the clustergraph
: a networkx graph representing speaker-addressee relationshipsdef convo_graph(cluster):
persons = dict()
def get_id(inst):
name = inst.name if inst is not None else 'N/A'
return persons.setdefault(name, len(persons) + 1)
turns = pd.DataFrame(dict(
id = cl.id,
source = [get_id(inst) for inst in (s.spkr or [None])],
target = [get_id(inst) for inst in (s.addr or [None])],
) for s in cluster.getSpeeches())
all_edges = turns.explode('source').explode('target')
flat_with_weights = all_edges.groupby(['source','target']
).size(
).reset_index(name='weight'
).sort_values(['source', 'target'])
graph = nx.from_pandas_edgelist(flat_with_weights, create_using=nx.DiGraph,
source='source', target='target')
key = tuple((e.source, e.target) for i, e in flat_with_weights.iterrows())
return dict(key=key, graph=graph, turns=turns)
clusters = api.getClusters(work_title='Iliad')
print(len(clusters), 'clusters')
Let's try building a couple of graphs to see what they're like. I'm starting with item 0, the first cluster. Try picking other numbers to compare the results.
cl = clusters[10]
print(cl)
pd.DataFrame(dict(
cluster = cl.id,
speech = s.id,
work = f'{s.author.name} {s.work.title}',
first = s.l_fi,
last = s.l_la,
spkr = s.getSpkrString(),
addr = s.getAddrString(),
) for s in cl.getSpeeches())
Run our custom function to produce key, turns, and graph as a dict.
bundle = convo_graph(cl)
Let's start with the turns, since that's the easiest for us to interpret. The speeches are still in order, but the names have been replaced by numbers.
bundle['turns']
The key is a flattened form of this, combining turns that are identical in spkr-addressee relation.
bundle['key']
pbar = NotebookPBar(max=len(clusters))
graphs = []
for i, cl in enumerate(clusters):
graphs.append(convo_graph(cl))
pbar.update(i)
Here we create two dictionaries. One stores all the graphs based on key, the flat representation of the map. The other stores all the turn-taking tables in the same way.
graph_index = {}
turns_index = {}
for graph in graphs:
k = graph['key']
g = graph['graph']
m = graph['turns']
if k not in graph_index:
graph_index[k] = []
graph_index[k].append(g)
if k not in turns_index:
turns_index[k] = []
turns_index[k].append(m)
Make a quick counter of how many graphs are organized under each key, so we can see which morphologies are most common.
key_count = Counter([g['key'] for g in graphs])
key_count.most_common()
We use the counter to take each successive map in order, from most common down. Then we check the graph_index
for an example of the graph representing that morphology and plot it. The final line below also saves a copy of the image.
fig, ax = plt.subplots(3, 4, figsize=(22,12))
plt.subplots_adjust(wspace=1, hspace=.5)
for i, rec in enumerate(key_count.most_common(12)):
key, count = rec
row = i % 4
col = i // 4
plt.sca(ax[col, row])
g = graph_index[key][0]
nx.draw_spring(g, node_color='pink', width=4, with_labels=True)
ax[col,row].set_title(f'n={count}', fontsize=18)
plt.savefig('foo.pdf')
We can also go the other direction: specify a key and look for examples of it in the corpus by using the indices we built.
key = (((1), (2)), ((3), (1)))
# look at first graph
graph = graph_index[key][0]
fig, ax = plt.subplots(figsize=(8,6))
nx.draw(graph, node_color='pink', width=2, with_labels=True)
ax.set_title(f'n={len(g)}')
fig.savefig('chain.pdf')
cl_ids = [turns.loc[0,'id'] for turns in turns_index[key]]
for cl in clusters.filterIDs(cl_ids):
display(
pd.DataFrame(dict(
author = s.author.name,
work = s.work.title,
lines = s.l_range,
speaker = ', '.join([i.name for i in s.spkr]),
addressee = ', '.join([i.name for i in s.addr]),
) for s in cl.getSpeeches())
)