CS4423 - Networks

Prof. Götz Pfeiffer
School of Mathematics, Statistics and Applied Mathematics
NUI Galway

2. Centrality Measures

Lecture 8: Closeness and Betweenness Centrality

... continuing from last time. First, load the libraries.

In [ ]:
import networkx as nx
import pandas as pd
import matplotlib.pyplot as plt

Next, recover the graph G of marital ties between Florentine families, together with the node attributes we have already determined.

In [ ]:
G = nx.read_yaml("florentine.yml")
print(G.number_of_nodes())
G.nodes['Medici']
In [ ]:
G.number_of_nodes()
In [ ]:
cc = list(nx.connected_components(G))[0]
GG = G.subgraph(cc)

Eigenvectors and Centrality (cont'd.)

Recall that $$ \left( \begin{array}{cc} 3 & 1 \\ 2 & 2 \end{array} \right) \, \left( \begin{array}{c} 1 \\ 1 \end{array}

\right)

\left( \begin{array}{c} 4 \\ 4 \end{array}

\right)

4 \, \left( \begin{array}{c} 1 \\ 1 \end{array} \right), $$ making the vector $(^1_1)$ an **eigenvector** for the **eigenvalue** $\lambda = 4$ of the matrix $A$.

In this example

  • all entries $a_{ij}$ of the matrix $A = (a_{ij})$ are positive;
  • the eigenvalue $4$ is strictly larger than the magnitude $|\lambda'|$ of all the other (complex or real) eigenvalues of $A$ (here, $\lambda' = -1$);
  • and the eigenvalue $\lambda = 4$ has an eigenvector with all its entries positive.

The theoretical foundation for eigenvector centrality is provided by the following Linear Algebra theorem from 1907/1912, which basically states that the above observations are no coincidence.

Theorem. (Perron-Frobenius for irreducible matrices.) Suppose that $A$ is a square, nonnegative, irreducible matrix. Then: * $A$ has a real eigenvalue $\lambda > 0$ with $\lambda \geq |\lambda'|$ for all eigenvalues $\lambda'$ of $A$; * $\lambda$ is a simple root of the characteristic polynomial of $A$; * there is a $\lambda$-eigenvector $v$ with $v > 0$.

Here, a matrix $A$ is called reducible if, for some simultaneous permutation of its rows and columns, it has the block form $$ A = \left( \begin{array}{cc} A_{11} & A_{12} \\ 0 & A_{21} \end{array} \right). $$ And $A$ is irreducible if it is not reducible.

The incidence matrix of a simple graph $G$ is irreducible if and only if $G$ is connected.

**Definition (Eigenvector centrality).** In a simple, connected graph $G$, the **eigenvector centrality** $c_i^E$ of node $i$ is defined as $$ c_i^E = u_i, $$ where $u = (u_1, \dots, u_n)$ is the (unique) normalized eigenvector of the adjacency matrix $A$ of $G$ with eigenvalue $\lambda$, and where $\lambda > |\lambda'|$ for all eigenvalues $\lambda'$ of $A$. The **normalised eigenvector centrality** of node $i$ is defined as $$ C_i^E = \frac{c_i^E}{C^E}, $$ where $C^E = \sum_j c_j^E$.

Let's attach the eigenvector centralities as node attributes and display the resulting table.

In [ ]:
eigen_cen = nx.eigenvector_centrality(GG)
nx.set_node_attributes(G, eigen_cen, '$c_i^E$')
In [ ]:
pd.DataFrame.from_dict(dict(GG.nodes(data=True)), orient='index').sort_values('degree', ascending=False)

Closeness centrality

A node $x$ in a network can be regarded as being central, if it is close to (many) other nodes, as it can then quickly interact with them. A simple way to measure closeness in this sense is based on the sum of all the distances to the other nodes, as follows.

**Definition (Closeness Centrality).** In a simple, connected graph $G$, the **closeness centrality** $c_i^C$ of node $i$ is defined as $$ c_i^C = \Bigl(\sum_j d_{ij}\Bigr)^{-1}. $$ The **normalized closeness centrality** of node $i$, defined as $$ C_i^C = (n-1) c_i^C $$ takes values in the interval $[0, 1]$.

Why is $0 \leq C_i^D \leq 1$? When is $C_i^D = 1$?

BFS again. This time as a python function, which takes a graph $G = (X, E)$ and a vertex $x \in X$ as its arguments. It returns a dictionary, which for each node as key has the distance to $x$ as its value.

In [ ]:
from queue import Queue

def dists(graph, node):
    
    # 1. init: set up the dictionary and queue
    dists = { x : None for x in graph.nodes() }
    dists[node] = 0
    q = Queue()
    q.put(node)
    
    # 2. loop
    while not q.empty():
        x = q.get()
        for y in G.neighbors(x):
            if dists[y] == None:
                dists[y] = dists[x] + 1
                q.put(y)
    
    # 3. stop here
    return dists
In [ ]:
d = dists(G, 'Medici')
In [ ]:
d
In [ ]:
cc = list(nx.connected_components(G))[0]
GG = G.subgraph(cc)
In [ ]:
nx.draw(GG)
In [ ]:
d = dists(GG, 'Medici')
In [ ]:
sum(d.values())
In [ ]:
n = GG.number_of_nodes()
close_cen = { x : (n-1)/sum(dists(GG, x).values()) for x in GG.nodes() }
In [ ]:
close_cen
In [ ]:
nx.closeness_centrality(GG)
In [ ]:
nx.set_node_attributes(G, close_cen, '$C_i^C$')
In [ ]:
pd.DataFrame.from_dict(dict(G.nodes(data=True)), orient='index').sort_values('degree', ascending=False)

Betweenness Centrality

BFS once more. This time as a python function, which returns a dictionaries, which contains, for each node $y$, a list of immediate predecessors of $y$ in a shortest path from $x$ to $y$. Yes, that's another piece of information, BFS can determine on the fly. From this, recursively, one can reconstruct all shortest paths from $x$ to $y$. We still need to compute the shortest path lengths as part of the algorithm.

In [ ]:
from queue import Queue

def preds(graph, node):
    
    # 1. init: set up the two dictionaries and queue
    dists = { x : None for x in graph.nodes() }
    preds = { x : [] for x in graph.nodes() }
    dists[node] = 0
    q = Queue()
    q.put(node)
    
    # 2. loop
    while not q.empty():
        x = q.get()
        for y in G.neighbors(x):
            if dists[y] == None:
                dists[y] = dists[x] + 1
                q.put(y)
                preds[y].append(x)
            elif dists[y] == dists[x] + 1:
                preds[y].append(x)
    
    # 3. stop here
    return preds
In [ ]:
p = preds(GG, 'Medici')
p

When interactions between non-adjacent agents in a network depend on middle men (on shortest paths between these agents, power comes to those in the middle. Betweennness centrality measures centrality in terms of the number of shortest paths a node lies on.

**Defintion (Betweenness Centrality).** In a simple, connected graph $G$, the **betweenness centrality** $c_i^B$ of node $i$ is defined as $$ c_i^B = \sum_{j \neq i} \sum_{k \neq i} \frac{n_{jk}(i)}{n_{jk}}, $$ where $n_{jk}$ denotes the **number** of shortest paths from node $j$ to node $k$, and where $n_{jk}(i)$ denotes the number of those shortest paths **passing through** node $i$. The **normalized betweenness centrality** of node $i$, defined as $$ C_i^B = \frac{c_i^B}{(n-1)(n-2)} $$ takes values in the interval $[0, 1]$.

Using the predcessor lists, shortest paths can be enumerated recursively: the shortest path from $x$ to itself is the empty path starting an ending at $x$. Else, if $y \neq x$ then each shortest path from $y$ to $x$ travels through exactly one of $y$'s predecessors ... and ends in $y$.

In [ ]:
def shortest_paths(x, y, pre):
    if x == y:
        return [[x]]
    else:
        paths = []
        for p in pre[y]:
            for path in shortest_paths(x, p, pre):
                paths.append(path + [y])
        
        return paths
In [ ]:
shortest_paths('Medici', 'Bischeri', p)
In [ ]:
between = { x : 0 for x in GG.nodes() }
In [ ]:
n = GG.number_of_nodes()
for x in GG.nodes():
    pre = preds(GG, x)
    for y in GG.nodes():
        paths = shortest_paths(x, y, pre)
        njk = len(paths)*(n-1)*(n-2)
        for p in paths:
            for z in p[1:-1]:  # exclude endpoints
                between[z] += 1/njk
In [ ]:
between
In [ ]:
nx.betweenness_centrality(GG)
In [ ]:
nx.draw(GG, with_labels=True)

Finally, let's add the normalized betweenness centralities as attributes to the nodes of the graph, and display the resulting table.

In [ ]:
nx.set_node_attributes(G, between, '$C_i^B$')
In [ ]:
pd.DataFrame.from_dict(dict(G.nodes(data=True)), orient='index').sort_values('degree', ascending=False)
In [ ]:
 
In [ ]: