CS4423 - Networks

Prof. Götz Pfeiffer
School of Mathematics, Statistics and Applied Mathematics
NUI Galway

Assignment 3

Provide answers to the problems in the cells provided.
The buttons at the top of the page can be used to create more cells if needed. The type of cell can be changed from Code to Markdown (and back). Code cells take (and execute) python code. Markdown cells take (and format nicely) text input. In this way, you can provide answers, ask questions, or raise issues, in words.

Marks will be awarded for participation and engagement.

When finished, print this notebook into a pdf file and submit this to blackboard.

Deadline is next Monday at 5pm.

Setup

This is a jupyter notebook. Find an environment that allows you to work with it. You can either install jupyter as a python package on your own laptop or PC. Or you can use a suitable website on the internet, such as nbviewer and binder.

In order to execute the code in a cell, use the mouse or arrow keys to highlight the box and then press SHIFT-RETURN.

Should it ever happen that the notebook becomes unusable, start again with a fresh copy.

The following packages need to be loaded for this notebook to work.

In [ ]:
import pandas as pd
import networkx as nx
import matplotlib.pyplot as plt

1. The Brain-of-a-Worm Network

The connectome of an organism is a comprehensive map of all neural connections between the neurons in the brain. C. Elegans is a small worm (1mm long) whose neural network has been completely determined by the South African biologist Sydney Brenner, who won a Nobel prize for this work in 2002.

An undirected connected version of this network on 279 nodes with 2287 connections is available from the book's website and is copied here into a single file c_elegans_undir.net. This file is in the pajek format, and can be imported into this notebook with the nx.read_pajek command. This command constructs a multigraph, which can easily be converted into a (simple) graph by applying the nx.Graph constructor to it.

In [ ]:
G = nx.read_pajek("c_elegans_undir.net")
G = nx.Graph(G)

Don't draw this graph! It is too big to produce a meaningful picture.

Use appropriate networkx commands to determine:

  • the number of vertices $n$ of G,
  • the number of edges $m$ of G,
  • the characteristic path length $L$ of G,
  • the clustering coefficient $C$ of G,
  • the number of triangles $n_{\Delta}$ in G and
  • the transitivity $T = 3 n_{\Delta} / n_{\wedge}$ of G.
In [ ]:
 

2. Degree Distribution

Any graph $G$ with $n$ nodes and $m$ edges can be compared to a random $G(n, m)$ graph with the same parameters, or to a random $G(n, p)$ graph with parameter $p = m/\binom{n}{2}$. One attribute of interest is the degree distribution of $G$. We know that the degree distribution of a random $G(n, p)$ graph is binomial. How does the worm's brain compare to that?

For example, with $n = 100$ and $m = 292$, one can generate a random graph and plot its node degrees as a histogram, as follows:

In [ ]:
R = nx.gnm_random_graph(100, 292)
hist = nx.degree_histogram(R)
df = pd.DataFrame(hist)
df.plot.bar()

Does that look like a binomial distribution?

  • For parameters $n$ and $m$ chosen identical to those of the worm brain graph G, construct a random $G(n, m)$ graph R.

  • Determine and plot the degree histogram of R.

  • Determine and plot the degree histogram of G.

  • In your own words, describe the difference in appearance between the two plots (in the text cell below, the one that does not have In []: written against it).

In [ ]:
 

... text goes here ...

3. Clustering by Degree

Another interesting exercise is studying how the (node) clustering coefficient of a graph $G = (X, E)$ depends on the degree of the nodes. For each value of $k$, denote by $X_k$ the set of nodes of degree $X_k$. Then one can define the average clustering at degree $k$ as $$ C_k = \frac 1 {|X_k|} \sum_{x \in X_K} c(x), $$ where $c(x)$ is the node clustering coefficient of vertex $x$.

To plot $C_k$ against $k$ will need some preparation. In order to determine the sets $X_k$, we can use the map constructed by G.degree(), which assigns to each node its degree, and revert it in such a way that each degree $k$ gets assigned the list of nodes of degree $k$. This can be done with a general purpose python function that works with any python dictionary kv.

In [ ]:
def reversed_dict(kv):
    reverse = {}
    for k, v in kv.items():
        reverse.setdefault(v, []).append(k)
    return reverse

For example, a dictionary of the degrees of a random $G(n, p)$-graph R with $n = 16$ and $p = 0.125$ is constructed like so.

In [ ]:
R = nx.gnp_random_graph(16, 0.125)
degs = dict(R.degree())
print(degs)

This dictionary has as keys the 16 nodes $0, 1, \dots, 15$, and for each key as value the corresponding node degree, most of the time $1$ or $2$. We want a dictionary that works in the opposite direction, i.e.. has keys $1$ and $2$ (and whatever else occurs as node degree), and as value for key $2$, say, the list of all nodes that have degree $2$. That's what the above function reversed_dict constructs:

In [ ]:
nodes_by_degree = reversed_dict(degs)
print(nodes_by_degree)

Next, we apply the map returned (as python dictionary) by the networkx command nx.clustering to the node lists in nodes_by_degree to obtain a new dictionary clustering_by_degree which contains lists of node clustering coefficients in place of node names, one for each node degree $k$ that occurs in the graph.

In [ ]:
clust = nx.clustering(R)
print(clust)
In [ ]:
clustering_by_degree = { d : [clust[x] for x in v] for d, v in nodes_by_degree.items() }
print(clustering_by_degree)

Finally, we can determine the average value sum(v)/len(v) of each list v in this dictionary, and obtain a list of pairs $(k, C_k)$.

In [ ]:
avg_clust_degree = [ (d, sum(v)/len(v)) for d, v in clustering_by_degree.items() ]
print(avg_clust_degree)

Unsurprisingly, the value of $C_k$ for both $k = 0$ and $k = 1$ is $0$. Using a pandas data frame, the resulting values can be plotted as a scatter plot like this:

In [ ]:
df = pd.DataFrame(avg_clust_degree, columns = ["k", "C_k"] )
df.plot.scatter(x = "k", y = "C_k")
  • For a random $G(n, m)$ graph R of the same degree and size as the worm brain network G, determine and plot the average degree clustering coefficients $C_k$.

  • Use an additional argument loglog = True in the plotting command (i.e., df.plot.scatter(x = "k", y = "C_k", loglog = True)) to obtain a log-log plot of the same data.

  • Determine and plot the average degree clustering coefficients $C_k$ for the worm brain graph G, again as a log-log plot.

  • In your own words, describe the difference, if any, between the two data sets in the text cell below.

In [ ]:
 

... text goes here ...

4. Small World Models.

Random graphs in the Watts-Strogatz model are obtained from a regular $(n, d)$-circle graph by randomly rewiring all edges with a given probability $p$. Such a graph can be generated with the command nx.watts_strogatz_graph(n, 2*d, p) (note that the second argument is actually 2*d).

In [ ]:
n, d, p = 16, 3, 0.16
G = nx.watts_strogatz_graph(n, 2*d, p)
nx.draw_circular(G, with_labels=True, node_color='y')

Watts-Strogatz random graphs are supposed to be like real world networks in the sense that they combine relatively short characteristic path lengths with relatively high clustering coefficients.

  • For values $n = 1000$ and $d = 5$, produce a sequence of 50 (or so) $(n, d, p)$-WS graphs for different values of $p$ between $0$ and $1$ (including the extreme cases $p = 0$ and $p = 1$). Compute and compare their (graph) clustering coefficients and their characteristic path lengths. (Use smaller values for $n$ if $n = 1000$ turns out to be too demanding on resources.)

  • In your own words, in which range of values for $p$, do the generated graphs indeed have high clustering and short paths?

In [ ]:
 

... text goes here ...