Prof. Götz Pfeiffer

School of Mathematics, Statistics and Applied Mathematics

NUI Galway

... continuing from last time. First, load the libraries.

In [ ]:

```
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt
```

Next, recover the graph `G`

of marital ties between Florentine families, together with the node attributes we have already determined.

In [ ]:

```
G = nx.read_yaml("florentine.yml")
print(G.number_of_nodes())
G.nodes['Medici']
```

In [ ]:

```
G.number_of_nodes()
```

In [ ]:

```
cc = list(nx.connected_components(G))[0]
GG = G.subgraph(cc)
```

Recall that $$ \left( \begin{array}{cc} 3 & 1 \\ 2 & 2 \end{array} \right) \, \left( \begin{array}{c} 1 \\ 1 \end{array}

\left( \begin{array}{c} 4 \\ 4 \end{array}

4 \, \left( \begin{array}{c} 1 \\ 1 \end{array} \right), $$ making the vector $(^1_1)$ an **eigenvector** for the **eigenvalue** $\lambda = 4$ of the matrix $A$.

In this example

- all entries $a_{ij}$ of the matrix $A = (a_{ij})$ are positive;
- the eigenvalue $4$ is strictly larger than the magnitude $|\lambda'|$ of all the other (complex or real) eigenvalues of $A$ (here, $\lambda' = -1$);
- and the eigenvalue $\lambda = 4$ has an eigenvector with all its entries positive.

The theoretical foundation for eigenvector centrality is provided by the following Linear Algebra theorem from 1907/1912, which basically states that the above observations are no coincidence.

Here, a matrix $A$ is called **reducible** if, for some simultaneous
permutation of its rows and columns, it has the block form
$$
A = \left(
\begin{array}{cc}
A_{11} & A_{12} \\
0 & A_{21}
\end{array}
\right).
$$
And $A$ is **irreducible** if it is not reducible.

The incidence matrix of a simple graph $G$ is irreducible if and only if $G$ is connected.

**Definition (Eigenvector centrality).** In a simple, connected graph $G$,
the **eigenvector centrality** $c_i^E$ of node $i$ is defined as
$$
c_i^E = u_i,
$$
where $u = (u_1, \dots, u_n)$ is the (unique) normalized eigenvector
of the adjacency matrix $A$ of $G$
with eigenvalue $\lambda$, and where $\lambda > |\lambda'|$
for all eigenvalues $\lambda'$ of $A$.
The **normalised eigenvector centrality** of node $i$ is defined as
$$
C_i^E = \frac{c_i^E}{C^E},
$$
where $C^E = \sum_j c_j^E$.

Let's attach the eigenvector centralities as node attributes and display the resulting table.

In [ ]:

```
eigen_cen = nx.eigenvector_centrality(GG)
nx.set_node_attributes(G, eigen_cen, '$c_i^E$')
```

In [ ]:

```
pd.DataFrame.from_dict(dict(GG.nodes(data=True)), orient='index').sort_values('degree', ascending=False)
```

A node $x$ in a network can be regarded as being central, if it is **close** to (many) other nodes,
as it can then quickly interact with them. A simple way to measure closeness in this sense
is based on the sum of all the distances to the other nodes, as follows.

**Definition (Closeness Centrality).**
In a simple, connected graph $G$, the **closeness centrality** $c_i^C$ of node $i$
is defined as
$$
c_i^C = \Bigl(\sum_j d_{ij}\Bigr)^{-1}.
$$
The **normalized closeness centrality** of node $i$, defined as
$$
C_i^C = (n-1) c_i^C
$$
takes values in the interval $[0, 1]$.

Why is $0 \leq C_i^D \leq 1$? When is $C_i^D = 1$?

**BFS again.** This time as a python function, which takes a graph $G = (X, E)$ and a vertex $x \in X$
as its arguments. It returns a **dictionary**, which for each node as key has the distance to
$x$ as its value.

In [ ]:

```
from queue import Queue
def dists(graph, node):
# 1. init: set up the dictionary and queue
dists = { x : None for x in graph.nodes() }
dists[node] = 0
q = Queue()
q.put(node)
# 2. loop
while not q.empty():
x = q.get()
for y in G.neighbors(x):
if dists[y] == None:
dists[y] = dists[x] + 1
q.put(y)
# 3. stop here
return dists
```

In [ ]:

```
d = dists(G, 'Medici')
```

In [ ]:

```
d
```

In [ ]:

```
cc = list(nx.connected_components(G))[0]
GG = G.subgraph(cc)
```

In [ ]:

```
nx.draw(GG)
```

In [ ]:

```
d = dists(GG, 'Medici')
```

In [ ]:

```
sum(d.values())
```

In [ ]:

```
n = GG.number_of_nodes()
close_cen = { x : (n-1)/sum(dists(GG, x).values()) for x in GG.nodes() }
```

In [ ]:

```
close_cen
```

In [ ]:

```
nx.closeness_centrality(GG)
```

In [ ]:

```
nx.set_node_attributes(G, close_cen, '$C_i^C$')
```

In [ ]:

```
pd.DataFrame.from_dict(dict(G.nodes(data=True)), orient='index').sort_values('degree', ascending=False)
```

**BFS once more.** This time as a python function, which returns a **dictionaries**, which contains, for each node $y$, a list of **immediate predecessors** of $y$
in a shortest path from $x$ to $y$. Yes, that's another piece of information, BFS can determine
on the fly. From this, recursively, one can reconstruct **all shortest paths** from $x$ to $y$.
We still need to compute the shortest path lengths as part of the algorithm.

In [ ]:

```
from queue import Queue
def preds(graph, node):
# 1. init: set up the two dictionaries and queue
dists = { x : None for x in graph.nodes() }
preds = { x : [] for x in graph.nodes() }
dists[node] = 0
q = Queue()
q.put(node)
# 2. loop
while not q.empty():
x = q.get()
for y in G.neighbors(x):
if dists[y] == None:
dists[y] = dists[x] + 1
q.put(y)
preds[y].append(x)
elif dists[y] == dists[x] + 1:
preds[y].append(x)
# 3. stop here
return preds
```

In [ ]:

```
p = preds(GG, 'Medici')
p
```

When interactions between non-adjacent agents in a network depend on middle men (on shortest paths between these agents, power comes to those in the middle. Betweennness centrality measures centrality in terms of the number of shortest paths a node lies on.

**Defintion (Betweenness Centrality).**
In a simple, connected graph $G$, the **betweenness centrality** $c_i^B$ of node $i$
is defined as
$$
c_i^B = \sum_{j \neq i} \sum_{k \neq i} \frac{n_{jk}(i)}{n_{jk}},
$$
where $n_{jk}$ denotes the **number** of shortest paths from
node $j$ to node $k$, and where $n_{jk}(i)$ denotes the
number of those shortest paths **passing through** node $i$.
The **normalized betweenness centrality** of node $i$, defined as
$$
C_i^B = \frac{c_i^B}{(n-1)(n-2)}
$$
takes values in the interval $[0, 1]$.

Using the predcessor lists, shortest paths can be enumerated recursively: the shortest path from $x$ to itself is the empty path starting an ending at $x$. Else, if $y \neq x$ then each shortest path from $y$ to $x$ travels through exactly one of $y$'s predecessors ... and ends in $y$.

In [ ]:

```
def shortest_paths(x, y, pre):
if x == y:
return [[x]]
else:
paths = []
for p in pre[y]:
for path in shortest_paths(x, p, pre):
paths.append(path + [y])
return paths
```

In [ ]:

```
shortest_paths('Medici', 'Bischeri', p)
```

In [ ]:

```
between = { x : 0 for x in GG.nodes() }
```

In [ ]:

```
n = GG.number_of_nodes()
for x in GG.nodes():
pre = preds(GG, x)
for y in GG.nodes():
paths = shortest_paths(x, y, pre)
njk = len(paths)*(n-1)*(n-2)
for p in paths:
for z in p[1:-1]: # exclude endpoints
between[z] += 1/njk
```

In [ ]:

```
between
```

In [ ]:

```
nx.betweenness_centrality(GG)
```

In [ ]:

```
nx.draw(GG, with_labels=True)
```

Finally, let's add the normalized betweenness centralities as attributes to the nodes of the graph, and display the resulting table.

In [ ]:

```
nx.set_node_attributes(G, between, '$C_i^B$')
```

In [ ]:

```
pd.DataFrame.from_dict(dict(G.nodes(data=True)), orient='index').sort_values('degree', ascending=False)
```

In [ ]:

```
```

In [ ]:

```
```