Prof. Götz Pfeiffer

School of Mathematics, Statistics and Applied Mathematics

NUI Galway

Sequences of interconnected edges in a graph are called
*paths*, leading to notions of *connectivity* and *distance*. A *tree* is a particularly useful kind of connected graph.
Many questions on networks concerning distance
or connectivity in a graph can be *algorithmically* answered
be a versatile strategy called *Breadth First Search (BFS)*.

Start by importing the necessary python libraries into this jupyter notebook.

In [ ]:

```
import networkx as nx
import matplotlib.pyplot as plt
```

The fundamental notion of *connectivity* in a network is closely
related to the notion of *paths* in a graph.

In [ ]:

```
nodes = list('ABCDEFGHIJKLM')
edges = [
'AB', 'CE', 'FG', 'FH', 'GI', 'GJ', 'HJ', 'HL', 'HM',
'IK', 'JK', 'KL', 'LM'
]
G = nx.Graph()
G.add_nodes_from(nodes)
G.add_edges_from(edges)
```

In [ ]:

```
nx.draw(G, with_labels = True)
```

$(F, G, I)$ is a path in the graph above, and $(H, J, K, L, H)$ is a cycle

A cycle in a simple graph provides, for any two nodes on that cycle, (at least) two different paths from one to the other.

Note that each edge (and node) of the 1970 Internet graph belongs to a cycle. This makes the other way around the cycle an alternative route in case one of the edges should fail.

In a *directed* network, paths are directed, too.
A path from a vertex $x$ to a vertex $y$ is
a sequence of vertices $x = x_0, x_1, \dots, x_k = y$
such that, for any $i = 1, \dots, k$, there is
an edge from $x_{i-1}$ to $x_i$ in the graph.

Communication and transportation networks tend to be connected, as this is their main purpose: to connect.

The connected components of the graph below are the node sets $\{A, B\}$, $\{C, E\}$, $\{D\}$, and $\{F,G,H,I,J,K,L,M\}$. Note that a component can consist of a single node only.

In [ ]:

```
list(nx.connected_components(G))
```

**Note.**
The relation 'there is a path from $x$ to $y$ on the node set $X$ of a
graph is the **transitive closure** of the graph relation 'there is an
*edge* between $x$ and $y$'. It is

**reflexive**(as each node $x$ is connected to itself by the zero length path starting and ending at $x$),**symmetric**(as a path from $x$ to $y$ can be used backwards as a path from $y$ to $x$),and

**transitive**(as a path from $x$ to $y$ and a path from $y$ to $z$ together make up a path from $x$ to $z$),

hence
an **equivalence relation**.
The connected components of the graph are
the parts (equivalence classes) of the corresponding **partition** of $X$.

Trees appear in the analysis of networks and many other applications.

**Definition.** A graph is called **acyclic** if it does not contain
any cycles.
A **tree** is a connected, acyclic graph.
A **forest** is a graph whose connected components are all trees.

Trees are naturally bipartite.

The (bipartite) graph $B$ from the previous lecture is in fact a tree.

In [ ]:

```
B = nx.Graph([(1,6), (2,6), (2,7), (2,8), (2,9),
(3,9), (4,10), (5,9), (5,10)])
nx.draw(B, with_labels=True)
```

**Theorem.** Let $G = (X, E)$ be a graph of degree $n = |X|$
and size $m = |E|$.
Then the following are equivalent:
* $G$ is a tree (i.e. acyclic and connected);
* $G$ is connected and $m = n-1$;
* $G$ is a minimal connected graph (i.e., removing an edge will disconnect $G$);
* $G$ is acyclic and $m = n-1$;
* $G$ is a maximal acyclic graph (i.e., adding an edge will introduce a cycle in $G$).

**Corollary.** If $G = (X, E)$ is a tree then,
for any vertices $x, y \in X$, there is *exactly one path* from
$x$ to $y$ in $G$.

Recall that a **path** in a network $G = (X, E)$
is a sequence $p = (x_0, x_1, \dots, x_k)$ of
nodes $x_i \in X$, $i = 0, \dots, k$, such that any
pair of consecutive nodes forms an edge in $G$, i.e.,
$\{x_{i-1}, x_i\} \in E$ for all $i = 1, \dots, k$.
The **length** $l(p)$ of the path $p$ is the
number of edges, $l(p) = k$.

In many practical applications it is of interest to find for a pair $x, y$ of nodes, one or all the paths form $x$ to $y$ connecting the two nodes with the fewest number of edges possible. This is a more complex measure on a network than, say, the degree of a node, and we will need a more complex procedure, that is: an algorithm, in order to answer such questions systematically. Let's start with a proper definition.

**Definition.** Let $G = (X, E)$ be a simple graph and let
$x, y \in X$. Let $P(x, y)$ be the set of all paths from $x$ to $y$.
Then the __distance__ $d(x, y)$ from $x$ to $y$ is
$$d(x, y) = \min \{ l(p) : p \in P(x, y) \},$$
the shortest possible length of a path from $x$ to $y$, and a __shortest path__ from $x$ to $y$ is a path $p \in P(x, y)$ of length $l(p) = d(x, y)$.
The __diameter__ $\mathrm{diam}(G)$ of the network $G$ is the length of the longest shortest path between any two nodes,
$$\mathrm{diam}(G) = \max \{ d(x, y) : x, y \in X \}$$.

Now we consider the following problem: Given a node $x \in X$, what
are the distances $d(x, y)$ for all nodes $y \in X$? A systematic
procedure for finding these distances, and the shortest paths through
which they are realized, is given by the algorithm which is know in
Computer Science as **Breadth First Search** (BFS).

In order to describe the algorithm step by step, let's call a node $y$
a **neighbor** of node $x$, if $\{x, y\}$ is an
edge, and let's denote by
$$N(x) = \{ y \in X : \{x, y\} \in E \}$$
the set of all neighbors of node $x$. The algorithm works through the
network layer by layer, starting with the given vertex $x$ at layer
$0$ and all its friends at layer $1$. It then finds the friends of the
friends at layer $2$, and so on, until every node that can be reached
from $x$ by a path has been recorded, taking care that no node gets
recorded twice. The layer of a node then corresponds to its distance
from the given node $x$.

In practice, as in the following example, the layer does not need to be made explicit.

In [ ]:

```
G = nx.read_adjlist("bfs.adj")
```

In [ ]:

```
nx.draw(G, with_labels=True)
```

In [ ]:

```
for x in G:
G.nodes[x]['seen'] = False
```

In [ ]:

```
G.nodes['A']
```

In [ ]:

```
seen = []
```

In [ ]:

```
seen.append('A')
G.nodes['A']['seen'] = True
seen
```

In [ ]:

```
list(G.neighbors('A'))
```

In [ ]:

```
for x in G.neighbors('A'):
seen.append(x)
G.nodes[x]['seen'] = True
```

In [ ]:

```
seen
```

In [ ]:

```
node = 'B'
for x in G.neighbors(node):
if not G.nodes[x]['seen']:
seen.append(x)
G.nodes[x]['seen'] = True
seen
```

In [ ]:

```
node = 'K'
for x in G.neighbors(node):
if not G.nodes[x]['seen']:
seen.append(x)
G.nodes[x]['seen'] = True
seen
```

... and so on, until there are no more nodes to be processed.

When this process is formulated as an algorithm, we use a **queue**
(a first-in first-out buffer) to keep track of the node
whose neighbors are currently under consideration.
A queue is an array of values that comes with two basic operations:

- one can
**push**a value to the end of the queue, and - one can
**pop**a value off the top of the queue (provided the queue is not empty).

It can be shown that this version of the algorithm in the common case of a sparse network has complexity $O(n)$, which is as good as one could hope for.

**Breadth First Search.**
Given a simple graph
$G = (X, E)$ and a vertex $x \in X$,
determine $d(x, y)$ for all nodes $y \in X$.
1. [Initialize.] Suppose that $X = \{x_1, x_2, \ldots, x_n\}$
and that $x = x_j$. Set $d_i \gets \perp$ (undefined) for $i = 1, \dots, n$.
Set $d_j \gets 0$ and initialize a queue $Q \gets (x_j)$.
2. [Loop.]
While $Q \neq \emptyset$:
* pop node $x_k$ off $Q$
* for each neighbor $x_l$ of $x_k$ with $d_l = \perp$:
* push $x_l$ onto $Q$ and set $d_l \gets d_k + 1$.
3. [Stop.] Return the array $(d_1, \dots, d_n)$.

In [ ]:

```
from queue import Queue
```

In [ ]:

```
for x in G:
G.nodes[x]['d'] = -1 # undefined
x = 'B'
G.nodes[x]['d'] = 0
q = Queue()
q.put(x)
```

In [ ]:

```
list(q.queue)
```

In [ ]:

```
while not q.empty():
x = q.get()
for y in G.neighbors(x):
if G.nodes[y]['d'] < 0: # undefined?
G.nodes[y]['d'] = G.nodes[x]['d'] + 1
q.put(y)
print(x, ": ", list(q.queue))
```

In [ ]:

```
print(list(map(lambda x: G.nodes[x]['d'], G)))
```

In [ ]:

```
print(list(G.nodes.data('d')))
```

**Variants.**
BFS is an extremely versatile algorithm, which applies in many different
situations and can be adapted to produce additional information
on a network.

For example, BFS run on a node $x$ in a network $G = (X, E)$
determines the **connected component** of $X$ in $G$
(as the set of all nodes that get a distance value assigned).

With little more work (and an additional array) BFS can produce
a **spanning tree** (or **shortest path tree**).
Here, whenever node $x_l$ is pushed onto $Q$, it is assigned
the current node $x_k$ (in the additional array)
as its predecessor on a shortest path from $x_j$ to $x_l$.
The subgraph of the network consisting of these edges is a tree.
As a tree, it has exactly one path between the given node $x$
and any of its
vertices $y$ and, by construction, this path is a shortest path
between $x$ and $y$.

In [ ]:

```
for x in G:
G.nodes[x]['d'] = -1 # undefined
x = 'A'
G.nodes[x]['d'] = 0
q = Queue()
q.put(x)
for e in G.edges:
G.edges[e]['seen'] = False
```

In [ ]:

```
print(list(G.edges.data()))
```

In [ ]:

```
while not q.empty():
x = q.get()
for y in G.neighbors(x):
if G.nodes[y]['d'] < 0: # undefined?
G.nodes[y]['d'] = G.nodes[x]['d'] + 1
q.put(y)
G.edges[x, y]['seen'] = True
print(x, ": ", list(q.queue))
```

In [ ]:

```
sub = [e for e in G.edges if G.edges[e]['seen']]
```

In [ ]:

```
nx.draw(G.edge_subgraph(sub), with_labels=True)
```

Or, one could highlight the spanning tree inside the graph by using, say, red as color for the spanning edges (and blue for the rest).

In [ ]:

```
colors = ['red' if G.edges[e]['seen'] else 'blue' for e in G.edges]
nx.draw(G, edge_color = colors, with_labels = True, width=2.0)
```

Of course, in order to find distances, or shortest paths
between **all pairs** of nodes $x$ and $y$ in a network, one can
perform BFS for each of the vertices $x \in X$ in turn.

The algorithm and its variants also works on directed networks, but the results then will have to be interpreted in the context of directed networks.

More about BFS can be found in [Newman, Section 10.3].

- Compute the distances $d(x, y)$ for all vertices $x$ and $y$ in the above graph
`G`

. - Hence determine the diameter $\mathrm{diam}(G)$.
- Construct a (simple) graph $H$ with edges
1-9, 9-3, 9-12, 9-15, 9-2, 9-13, 5-11, 5-14, 5-3, 11-14, 11-4, 14-12, 14-4, 12-15, 15-7, 2-6, 2-7, 13-10, 4-7, 7-8

- Using BFS, construct a spanning tree of $H$, starting with vertex $1$.
- Compute a matrix $D= (d_{ij})$ with entries $$ d_{ij} = d(i, j), $$ the distance between nodes $i$ and $j$ in $H$.

In [ ]:

```
```