### CS4423 - Networks¶

Prof. Götz Pfeiffer
School of Mathematics, Statistics and Applied Mathematics
NUI Galway

# Lecture 8: Closeness and Betweenness Centrality¶

... continuing from last time. First, load the libraries.

In [ ]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt


Next, recover the graph G of marital ties between Florentine families, together with the node attributes we have already determined.

In [ ]:
G = nx.read_yaml("florentine.yml")
print(G.number_of_nodes())
G.nodes['Medici']

In [ ]:
G.number_of_nodes()

In [ ]:
cc = list(nx.connected_components(G))[0]
GG = G.subgraph(cc)


Recall that $$\left( \begin{array}{cc} 3 & 1 \\ 2 & 2 \end{array} \right) \, \left( \begin{array}{c} 1 \\ 1 \end{array} # \right)¶ \left( \begin{array}{c} 4 \\ 4 \end{array} # \right)¶ 4 \, \left( \begin{array}{c} 1 \\ 1 \end{array} \right),$$ making the vector $(^1_1)$ an **eigenvector** for the **eigenvalue** $\lambda = 4$ of the matrix $A$.

In this example

• all entries $a_{ij}$ of the matrix $A = (a_{ij})$ are positive;
• the eigenvalue $4$ is strictly larger than the magnitude $|\lambda'|$ of all the other (complex or real) eigenvalues of $A$ (here, $\lambda' = -1$);
• and the eigenvalue $\lambda = 4$ has an eigenvector with all its entries positive.

The theoretical foundation for eigenvector centrality is provided by the following Linear Algebra theorem from 1907/1912, which basically states that the above observations are no coincidence.

Theorem. (Perron-Frobenius for irreducible matrices.) Suppose that $A$ is a square, nonnegative, irreducible matrix. Then: * $A$ has a real eigenvalue $\lambda > 0$ with $\lambda \geq |\lambda'|$ for all eigenvalues $\lambda'$ of $A$; * $\lambda$ is a simple root of the characteristic polynomial of $A$; * there is a $\lambda$-eigenvector $v$ with $v > 0$.

Here, a matrix $A$ is called reducible if, for some simultaneous permutation of its rows and columns, it has the block form $$A = \left( \begin{array}{cc} A_{11} & A_{12} \\ 0 & A_{21} \end{array} \right).$$ And $A$ is irreducible if it is not reducible.

The incidence matrix of a simple graph $G$ is irreducible if and only if $G$ is connected.

**Definition (Eigenvector centrality).** In a simple, connected graph $G$, the **eigenvector centrality** $c_i^E$ of node $i$ is defined as $$c_i^E = u_i,$$ where $u = (u_1, \dots, u_n)$ is the (unique) normalized eigenvector of the adjacency matrix $A$ of $G$ with eigenvalue $\lambda$, and where $\lambda > |\lambda'|$ for all eigenvalues $\lambda'$ of $A$. The **normalised eigenvector centrality** of node $i$ is defined as $$C_i^E = \frac{c_i^E}{C^E},$$ where $C^E = \sum_j c_j^E$.

Let's attach the eigenvector centralities as node attributes and display the resulting table.

In [ ]:
eigen_cen = nx.eigenvector_centrality(GG)
nx.set_node_attributes(G, eigen_cen, '$c_i^E$')

In [ ]:
pd.DataFrame.from_dict(dict(GG.nodes(data=True)), orient='index').sort_values('degree', ascending=False)


### Closeness centrality¶

A node $x$ in a network can be regarded as being central, if it is close to (many) other nodes, as it can then quickly interact with them. A simple way to measure closeness in this sense is based on the sum of all the distances to the other nodes, as follows.

**Definition (Closeness Centrality).** In a simple, connected graph $G$, the **closeness centrality** $c_i^C$ of node $i$ is defined as $$c_i^C = \Bigl(\sum_j d_{ij}\Bigr)^{-1}.$$ The **normalized closeness centrality** of node $i$, defined as $$C_i^C = (n-1) c_i^C$$ takes values in the interval $[0, 1]$.

Why is $0 \leq C_i^D \leq 1$? When is $C_i^D = 1$?

BFS again. This time as a python function, which takes a graph $G = (X, E)$ and a vertex $x \in X$ as its arguments. It returns a dictionary, which for each node as key has the distance to $x$ as its value.

In [ ]:
from queue import Queue

def dists(graph, node):

# 1. init: set up the dictionary and queue
dists = { x : None for x in graph.nodes() }
dists[node] = 0
q = Queue()
q.put(node)

# 2. loop
while not q.empty():
x = q.get()
for y in G.neighbors(x):
if dists[y] == None:
dists[y] = dists[x] + 1
q.put(y)

# 3. stop here
return dists

In [ ]:
d = dists(G, 'Medici')

In [ ]:
d

In [ ]:
cc = list(nx.connected_components(G))[0]
GG = G.subgraph(cc)

In [ ]:
nx.draw(GG)

In [ ]:
d = dists(GG, 'Medici')

In [ ]:
sum(d.values())

In [ ]:
n = GG.number_of_nodes()
close_cen = { x : (n-1)/sum(dists(GG, x).values()) for x in GG.nodes() }

In [ ]:
close_cen

In [ ]:
nx.closeness_centrality(GG)

In [ ]:
nx.set_node_attributes(G, close_cen, '$C_i^C$')

In [ ]:
pd.DataFrame.from_dict(dict(G.nodes(data=True)), orient='index').sort_values('degree', ascending=False)


### Betweenness Centrality¶

BFS once more. This time as a python function, which returns a dictionaries, which contains, for each node $y$, a list of immediate predecessors of $y$ in a shortest path from $x$ to $y$. Yes, that's another piece of information, BFS can determine on the fly. From this, recursively, one can reconstruct all shortest paths from $x$ to $y$. We still need to compute the shortest path lengths as part of the algorithm.

In [ ]:
from queue import Queue

def preds(graph, node):

# 1. init: set up the two dictionaries and queue
dists = { x : None for x in graph.nodes() }
preds = { x : [] for x in graph.nodes() }
dists[node] = 0
q = Queue()
q.put(node)

# 2. loop
while not q.empty():
x = q.get()
for y in G.neighbors(x):
if dists[y] == None:
dists[y] = dists[x] + 1
q.put(y)
preds[y].append(x)
elif dists[y] == dists[x] + 1:
preds[y].append(x)

# 3. stop here
return preds

In [ ]:
p = preds(GG, 'Medici')
p


When interactions between non-adjacent agents in a network depend on middle men (on shortest paths between these agents, power comes to those in the middle. Betweennness centrality measures centrality in terms of the number of shortest paths a node lies on.

**Defintion (Betweenness Centrality).** In a simple, connected graph $G$, the **betweenness centrality** $c_i^B$ of node $i$ is defined as $$c_i^B = \sum_{j \neq i} \sum_{k \neq i} \frac{n_{jk}(i)}{n_{jk}},$$ where $n_{jk}$ denotes the **number** of shortest paths from node $j$ to node $k$, and where $n_{jk}(i)$ denotes the number of those shortest paths **passing through** node $i$. The **normalized betweenness centrality** of node $i$, defined as $$C_i^B = \frac{c_i^B}{(n-1)(n-2)}$$ takes values in the interval $[0, 1]$.

Using the predcessor lists, shortest paths can be enumerated recursively: the shortest path from $x$ to itself is the empty path starting an ending at $x$. Else, if $y \neq x$ then each shortest path from $y$ to $x$ travels through exactly one of $y$'s predecessors ... and ends in $y$.

In [ ]:
def shortest_paths(x, y, pre):
if x == y:
return [[x]]
else:
paths = []
for p in pre[y]:
for path in shortest_paths(x, p, pre):
paths.append(path + [y])

return paths

In [ ]:
shortest_paths('Medici', 'Bischeri', p)

In [ ]:
between = { x : 0 for x in GG.nodes() }

In [ ]:
n = GG.number_of_nodes()
for x in GG.nodes():
pre = preds(GG, x)
for y in GG.nodes():
paths = shortest_paths(x, y, pre)
njk = len(paths)*(n-1)*(n-2)
for p in paths:
for z in p[1:-1]:  # exclude endpoints
between[z] += 1/njk

In [ ]:
between

In [ ]:
nx.betweenness_centrality(GG)

In [ ]:
nx.draw(GG, with_labels=True)


Finally, let's add the normalized betweenness centralities as attributes to the nodes of the graph, and display the resulting table.

In [ ]:
nx.set_node_attributes(G, between, '$C_i^B$')

In [ ]:
pd.DataFrame.from_dict(dict(G.nodes(data=True)), orient='index').sort_values('degree', ascending=False)

In [ ]:


In [ ]: