CS4423 - Networks¶

Prof. Götz Pfeiffer
School of Mathematics, Statistics and Applied Mathematics
NUI Galway

Lecture 19: Hubs and Authorities¶

In [ ]:
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt


In-Degree vs. Out-Degree¶

Recall in-degree and out-degree centrality: $$c_i^{D^{\text{in}}} = k_i^{\text{in}} = \sum_{j=1}^n a_{ij}, \quad c_i^{D^{\text{out}}} = k_i^{\text{out}} = \sum_{j=1}^n a_{ji},$$ where $A = (a_{ij})$ is the adjacency matrix of a directed graph $G = (X, E)$ ...

... and the corresponding eigenvector centralities: $$A c^{E^{\text{in}}} = \lambda c^{E^{\text{in}}}, \quad A^{T} c^{E^{\text{out}}} = \lambda c^{E^{\text{out}}}.$$

Hub Centrality and Authority Centrality¶

In a network of nodes connected by directed edges, each node plays two different roles, one as a receiver of links, and one as a sender of links. A first measure of importance, or recognition, of a node in this network might be the number of links it receives, i.e., its in-degree in the underlying graph. If in a collection of web pages relating to a search query on the subject of "networks", say, a particular page receives a high number of links, this page might count as an authority on that subject, with authority score measured by its in-degree.

In turn, the web pages linking to an authority in some sense know where to find valuable information and thus serve as good "lists" for the subject. A high value list is called a hub for this query. It makes sense to measure the value of a page as list in terms of the values of the pages it points at, by assigning to its hub score the sum of the authority scores of the pages it points at.

Now the authority score of a page could take the hub scores of the list pages into account, by using the sum of the hub scores of the pages that point at it as an updated authority score.

Then again, applying the Principle of Repeated Improvement, the hub scores can be updated on the basis of the new authority scores, and so on.

This suggests a ranking procedure that tries to estimate, for each page $p$, its value as an authority and its value as a hub in the form of numerical scores, $a(p)$ and $h(p)$.

Starting off with values all equal to $1$, the estimates are updated by applying the following two rules in an alternating fashion.

**Authority Update Rule:** For each page $p$, update $a(p)$ to be the sum of the hub scores of all the pages pointing to it.
**Hub Update Rule:** For each page $p$, update $h(p)$ to be the sum of the authority scores of all the pages that it points to.

In order to keep the numbers from growing too large, score vectors should be normalized after each step, in a way that replaces $h$ by a scalar multiple $\hat{h} = sh$ so that the entries in $\hat{h}$ add up to $100$, say, representing relative percentage values, similarly for $a$.

After a number of iterations, the values $a(p)$ and $h(p)$ stabilize, in the sense that further applications of the update rules do not yield essentially better relative estimates.

Example. Continuing the example above ...

In [ ]:
nodes = list(range(1,10)) + ["A%s" % (i+1) for i in range(7)]
print(nodes)

In [ ]:
edges = [
(1,"A1"),(2,"A1"),(3,"A1"),(3,"A2"),(4,"A2"),(5,"A3"),
(5,"A5"),(6,"A2"),(6,"A4"),(7,"A4"),(7,"A5"),(8,"A4"),
(8,"A5"),(8,"A6"),(8,"A7"),(9,"A5"),(9,"A6"),(9,"A7")
]

In [ ]:
G = nx.DiGraph()

In [ ]:
pos = nx.circular_layout(G)
for i in [1,2,3,4]:
j = 10 - i
pos[i], pos[j] = pos[j], pos[i]
colors = 9 * ['y'] + 7 * ['w']

In [ ]:
nx.draw(G, with_labels=True, node_color=colors, pos=pos)

In [ ]:
A = nx.adjacency_matrix(G)
print(A.todense())

In [ ]:
AT = A.transpose()

In [ ]:
h = [1 for node in G]
a = [1 for node in G]
print("a = ", a)
print("h = ", h)

In [ ]:
a = AT * h
a = 100/sum(a) * a
print("a = ", a[9:])

In [ ]:
h = A * a
h = 100/sum(h) * h
print("h = ", h[:9])

In [ ]:
aaa = [a]
hhh = [h]

In [ ]:
for i in range(9):
a = AT * h
a = 100/sum(a) * a
print("a = ", a[9:])
aaa.append(a)
h = A * a
h = 100/sum(h) * h
print("h = ", h[:9])
hhh.append(h)

In [ ]:
pd.DataFrame(hhh)

In [ ]:
pd.DataFrame(aaa)


In terms of matrix algebra this effect can be described as follows.

Spectral Analysis of Hubs and Authorities¶

Let $M = (m_{ij})$ be the adjacency matrix of the directed graph $G = (X, E)$ that is $m_{ij} = 1$ if $x_i \to x_j$ and $m_{ij} = 0$ otherwise, where $X = \{x_1, \dots, x_n\}$.

We write $h = (h_1, \dots, h_n)$ for a list of hub scores, with $h_i = h(x_i)$, the hub score of node $x_i$. Similarly, we write $a = (a_1, \dots, a_l)$ for a list of authority scores.

The hub update rule can now be expressed as a matrix multiplication: $$h \gets M a$$ and similarly, the authority update rule, using the transpose of the matrix $M$: $$a \gets M^{T} h$$

Applying two steps of the procedure at once yields update rules $$h \gets M M^T h$$ and $$a \gets M^T M \, a$$ for $h$ and $a$, respectively.

In the limit, one expects to get vectors $h^{\ast}$ and $a^{\ast}$ whose directions do not change under the latter rules, i.e., $$(M M^T) h^{\ast} = c h^{\ast}$$ and $$(M^T M) a^{\ast} = d a^{\ast}$$ for constants $c$ and $d$, meaning that $h^{\ast}$ and $a^{\ast}$ are eigenvectors for the matrices $M M^T$ and $M^T M$, respectively.

Using the fact that $M M^T$ and $M^T M$ are symmetric matrices ($(M M^T)^T = (M^T)^T M^T = M M^T$), it can indeed be shown that any sequence of hub score vectors $h$ under repeated application of the above update rule converges to a real-valued eigenvector $h^{\ast}$ of $M M^T$ for the real eigenvalue $c$. A similar result exists for any sequence of authority score vectors $a$.

In [ ]: