Prof. Götz Pfeiffer

School of Mathematics, Statistics and Applied Mathematics

NUI Galway

In [ ]:

```
import numpy as np
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt
```

Recall **in-degree** and **out-degree centrality**:
$$
c_i^{D^{\text{in}}} = k_i^{\text{in}} = \sum_{j=1}^n a_{ij},
\quad
c_i^{D^{\text{out}}} = k_i^{\text{out}} = \sum_{j=1}^n a_{ji},
$$
where $A = (a_{ij})$ is the adjacency matrix of a directed graph
$G = (X, E)$ ...

... and the corresponding **eigenvector centralities**:
$$
A c^{E^{\text{in}}} = \lambda c^{E^{\text{in}}},
\quad
A^{T} c^{E^{\text{out}}} = \lambda c^{E^{\text{out}}}.
$$

In a network of nodes connected by directed edges, each node
plays two different roles, one as a receiver of links, and one as
a sender of links. A first measure of importance, or recognition, of
a node in this network might be the number of
links it receives, i.e., its **in-degree** in the underlying graph.
If in a collection of web pages relating to a search query on the
subject of "networks", say, a particular page receives a high number
of links, this page might count as an **authority** on that subject,
with **authority score** measured by its in-degree.

In turn, the web pages linking to an authority in some sense know
where to find valuable information and thus serve as good "lists" for
the subject.
A high value list is called a **hub** for this query.
It makes sense to measure the value of a page as list in
terms of the values of the pages it points at, by assigning to its
**hub score** the sum of the authority scores of the pages it points
at.

Now the authority score of a page could take the hub scores of the list pages into account, by using the sum of the hub scores of the pages that point at it as an updated authority score.

Then again, applying the **Principle of Repeated Improvement**,
the hub scores can be updated on the basis of the new authority scores,
and so on.

This suggests a ranking procedure that tries to estimate, for each page $p$, its value as an authority and its value as a hub in the form of numerical scores, $a(p)$ and $h(p)$.

Starting off with values all equal to $1$, the estimates are updated by applying the following two rules in an alternating fashion.

**Authority Update Rule:**
For each page $p$, update $a(p)$
to be the sum of the hub scores of all the pages pointing to it.

**Hub Update Rule:**
For each page $p$,
update $h(p)$
to be the sum of the authority
scores of all the pages
that it points to.

In order to keep the numbers from growing too large,
score vectors should be **normalized** after each step,
in a way that replaces $h$ by a scalar multiple $\hat{h} = sh$
so that the entries in $\hat{h}$ add up to $100$, say,
representing relative percentage values,
similarly for $a$.

After a number of iterations, the values $a(p)$ and $h(p)$ stabilize, in the sense that further applications of the update rules do not yield essentially better relative estimates.

**Example.**
Continuing the example above ...

In [ ]:

```
nodes = list(range(1,10)) + ["A%s" % (i+1) for i in range(7)]
print(nodes)
```

In [ ]:

```
edges = [
(1,"A1"),(2,"A1"),(3,"A1"),(3,"A2"),(4,"A2"),(5,"A3"),
(5,"A5"),(6,"A2"),(6,"A4"),(7,"A4"),(7,"A5"),(8,"A4"),
(8,"A5"),(8,"A6"),(8,"A7"),(9,"A5"),(9,"A6"),(9,"A7")
]
```

In [ ]:

```
G = nx.DiGraph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)
```

In [ ]:

```
pos = nx.circular_layout(G)
for i in [1,2,3,4]:
j = 10 - i
pos[i], pos[j] = pos[j], pos[i]
colors = 9 * ['y'] + 7 * ['w']
```

In [ ]:

```
nx.draw(G, with_labels=True, node_color=colors, pos=pos)
```

In [ ]:

```
A = nx.adjacency_matrix(G)
print(A.todense())
```

In [ ]:

```
AT = A.transpose()
```

In [ ]:

```
h = [1 for node in G]
a = [1 for node in G]
print("a = ", a)
print("h = ", h)
```

In [ ]:

```
a = AT * h
a = 100/sum(a) * a
print("a = ", a[9:])
```

In [ ]:

```
h = A * a
h = 100/sum(h) * h
print("h = ", h[:9])
```

In [ ]:

```
aaa = [a]
hhh = [h]
```

In [ ]:

```
for i in range(9):
a = AT * h
a = 100/sum(a) * a
print("a = ", a[9:])
aaa.append(a)
h = A * a
h = 100/sum(h) * h
print("h = ", h[:9])
hhh.append(h)
```

In [ ]:

```
pd.DataFrame(hhh)
```

In [ ]:

```
pd.DataFrame(aaa)
```

In terms of matrix algebra this effect can be described as follows.

Let $M = (m_{ij})$ be the **adjacency matrix** of the directed graph
$G = (X, E)$
that is $m_{ij} = 1$ if $x_i \to x_j$ and $m_{ij} = 0$ otherwise,
where $X = \{x_1, \dots, x_n\}$.

We write $h = (h_1, \dots, h_n)$ for a list of hub scores, with $h_i = h(x_i)$, the hub score of node $x_i$. Similarly, we write $a = (a_1, \dots, a_l)$ for a list of authority scores.

The **hub update rule** can now be expressed as
a matrix multiplication:
$$
h \gets M a
$$
and similarly, the **authority update rule**, using the transpose of the matrix $M$:
$$
a \gets M^{T} h
$$

Applying two steps of the procedure at once yields update rules $$ h \gets M M^T h $$ and $$ a \gets M^T M \, a $$ for $h$ and $a$, respectively.

**In the limit**, one expects
to get vectors $h^{\ast}$ and $a^{\ast}$ whose directions do not change
under the latter rules, i.e.,
$$
(M M^T) h^{\ast} = c h^{\ast}
$$
and
$$
(M^T M) a^{\ast} = d a^{\ast}
$$
for constants $c$ and $d$, meaning that $h^{\ast}$ and $a^{\ast}$
are **eigenvectors** for the matrices $M M^T$ and $M^T M$,
respectively.

Using the fact that $M M^T$ and $M^T M$ are **symmetric** matrices
($(M M^T)^T = (M^T)^T M^T = M M^T$),
it can indeed be shown that any sequence of hub score vectors
$h$ under repeated application of the above update rule
converges to a real-valued eigenvector $h^{\ast}$ of $M M^T$ for the real eigenvalue $c$.
A similar result exists for any sequence of authority score vectors $a$.

In [ ]:

```
```