Notebook

NTDS'18 milestone 2: network models¶

Students¶

Team: 47
Students: Leo Capiod ; Frank Schmutz ; Harold Sussmilch ; Josef Weber
Dataset: Flight Routes : airports and routes for ms 1

Rules¶

Milestones have to be completed by teams. No collaboration between teams is allowed.
Textual answers shall be short. Typically one to two sentences.
Code has to be clean.
In the first part, you cannot import any other library than we imported. In the second part, you are allowed to import any library you want.
When submitting, the notebook is executed and the results are stored. I.e., if you open the notebook again it should show numerical results and plots. We won't be able to execute your notebooks.
The notebook is re-executed from a blank state before submission. That is to be sure it is reproducible. You can click "Kernel" then "Restart & Run All" in Jupyter.

Objective¶

The purpose of this milestone is to explore various random network models, analyse their properties and compare them to your network. In the first part of the milestone you will implement two random graph models and try to fit them to your network. In this part you are not allowed to use any additional package. In the second part of the milestone you will choose a third random graph model that you think shares some properties with your network. You will be allowed to use additional packages to construct this network, but you must explain your network choice. Finally, make your code as clean as possible, and keep your textual answers short.

Part 0¶

Import the adjacency matrix of your graph that you constructed in milestone 1, as well as the number of nodes and edges of your network.

In [1]:

%matplotlib inline

import random

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import scipy

In [2]:

adjacency =  np.load('../data/adj_matr.npy')
n_nodes =  len(adjacency)
n_edges =  np.count_nonzero(np.tril(adjacency, k = -1))

Part 1¶

For the computation of this part of the milestone you are only allowed to use the packages that have been imported in the cell below.

Question 1¶

Create a function that constructs an Erdős–Rényi graph.

In [3]:

def erdos_renyi(n, p, seed=None):
    """Create an instance from the Erdos-Renyi graph model.
    
    Parameters
    ----------
    n: int
        Size of the graph.
    p: float
        Edge probability. A number between 0 and 1.
    seed: int (optional)
        Seed for the random number generator. To get reproducible results.
    
    Returns
    -------
    adjacency
        The adjacency matrix of a graph.
    """
    
    np.random.seed(seed)
    adjacency = np.vectorize(lambda n: 1 if n > 1 - p else 0)(np.tril(np.random.random((n, n)), k = -1))
    
    return adjacency + adjacency.T

In [4]:

er = erdos_renyi(5, 0.6, 9765)
plt.spy(er)
plt.title('Erdos-Renyi (5, 0.6)')

Out[4]:

Text(0.5,1.05,'Erdos-Renyi (5, 0.6)')

In [5]:

er = erdos_renyi(10, 0.4, 7648)
plt.spy(er)
plt.title('Erdos-Renyi (10, 0.4)')

Out[5]:

Text(0.5,1.05,'Erdos-Renyi (10, 0.4)')

Question 2¶

Use the function to create a random Erdos-Renyi graph. Choose the parameters such that number of nodes is the same as in your graph, and the number of edges similar. You don't need to set the random seed. Comment on your choice of parameters.

In [6]:

Lmax = n_nodes * (n_nodes - 1) / 2
p_edge = n_edges / Lmax
er = erdos_renyi(n_nodes, p_edge)
er_n_edges = np.count_nonzero(np.tril(er))

In [7]:

print("The Erdos-Rényi graph has", er_n_edges, "edges, our actual graph has", n_edges, "edges.")

The Erdos-Rényi graph has 18641 edges, our actual graph has 18579 edges.

We choose $n = n_{nodes}$ since we want the same number of nodes as in our graph, and we choose $p$ to be $n_{edges}$ in our graph divided by theoretical maximum number of edges $L_{max}$ our graph could have.

In [8]:

plt.figure(figsize=(20, 20))
plt.subplot(121)
plt.spy(er, markersize=1)
plt.title('Erdos-Renyi (n_nodes, n_edges / Lmax)')
plt.subplot(122)
plt.spy(adjacency, markersize=1)
plt.title('Flight routes graph')

Out[8]:

Text(0.5,1.05,'Flight routes graph')

Question 3¶

Create a function that constructs a Barabási-Albert graph.

In [9]:

def barabasi_albert(n, m, m0=2, seed=None):
    """Create an instance from the Barabasi-Albert graph model.
    
    Parameters
    ----------
    n: int
        Size of the graph.
    m: int
        Number of edges to attach from a new node to existing nodes.
    seed: int (optional)
        Seed for the random number generator. To get reproducible results.
    
    Returns
    -------
    adjacency
        The adjacency matrix of a graph.
    """
    
    if m0 < m:
        m0 = m
    
    # appply seed to random
    np.random.seed(seed)
    
    # create empty adjacency matrix
    adjacency = np.zeros([n,n])
    
    # connect m initial nodes
    adjacency[:m0, :m0] = 1
    adjacency[np.diag_indices_from(adjacency)] = 0
    
    for node_index in range(m0, n):
        # twice the current number of edges = m0 * (m0 - 1) + (node_index - m0) * 2 * m
        denom = m0 * (m0 - 1) + (node_index - m0) * 2 * m
        edge_prob = np.vectorize(lambda n: 1 if denom == 0 else adjacency[n].sum() / denom)(np.array(range(node_index)))
        for to in np.random.choice(np.array(range(node_index)), size=m, replace=False, p=edge_prob):
            adjacency[node_index, to] = 1
            adjacency[to, node_index] = 1
    
    return adjacency

In [10]:

ba = barabasi_albert(5, 1, seed=9087)
plt.spy(ba)
plt.title('Barabasi-Albert (5, 1)')

Out[10]:

Text(0.5,1.05,'Barabasi-Albert (5, 1)')

In [11]:

ba = barabasi_albert(10, 2, seed=8708)
plt.spy(ba)
plt.title('Barabasi-Albert (10, 2)')

Out[11]:

Text(0.5,1.05,'Barabasi-Albert (10, 2)')

Question 4¶

Use the function to create a random Barabási-Albert graph. Choose the parameters such that number of nodes is the same as in your graph, and the number of edges similar. You don't need to set the random seed. Comment on your choice of parameters.

In [12]:

# need to solve for m: n_edges = m0 * (m0 - 1) / 2 + (n_nodes - m0) * m
# since there are two variables we should fix one of them. Let's fix m0 = 10
m0 = 10
m = np.round((n_edges - m0 * (m0 - 1) / 2) / (n_nodes - m0)).astype(int)
print("We take m =", m)

We take m = 6

In [13]:

ba = barabasi_albert(n_nodes, m, m0).astype(int)
ba_n_edges = np.count_nonzero(np.tril(ba))

In [14]:

print("The Barabási-Albert graph has", ba_n_edges, "edges, our actual graph has", n_edges, "edges.")

The Barabási-Albert graph has 18861 edges, our actual graph has 18579 edges.

We choose $n = n_{nodes}$ obviously, and we chose $m$ by solving the following equation: $m_{0}(m_{0} - 1)/2 + (n_{nodes} - m_{0})m = n_{edges}$

In [15]:

plt.figure(figsize=(20, 20))
plt.subplot(121)
plt.spy(ba, markersize=1)
plt.title('Barabasi-Albert (n_nodes, m)')
plt.subplot(122)
plt.spy(adjacency, markersize=1)
plt.title('Flight routes graph')

Out[15]:

Text(0.5,1.05,'Flight routes graph')

Question 5¶

Compare the number of edges in all three networks (your real network, the Erdős–Rényi network, and the Barabási-Albert netowk).

In [16]:

print("Our real network has", n_edges, "edges, the Erdos-Rényi netwrok has", er_n_edges, "edges and the Barabási-Albert network has", ba_n_edges, "edges.")

Our real network has 18579 edges, the Erdos-Rényi netwrok has 18641 edges and the Barabási-Albert network has 18861 edges.

Question 6¶

Implement a function that computes the Kullback–Leibler (KL) divergence between two probability distributions. We'll use it to compare the degree distributions of networks.

In [17]:

def kl_divergence(p, q):
    """Compute the KL divergence between probability distributions of degrees of two networks.
    
    Parameters
    ----------
    p: np.array
        Probability distribution of degrees of the 1st graph.
    q: np.array
        Probability distribution of degrees of the 2nd graph.
    
    Returns
    -------
    kl
        The KL divergence between the two distributions.
    """
        
    # We extend p or q with zeros so that they have the same length
    if len(p) > len(q):
        zeros = np.zeros([len(p) - len(q),])
        q = np.hstack((q, zeros))
    elif len(q) > len(p):
        zeros = np.zeros([len(q) - len(p),])
        p = np.hstack((p, zeros))
    
    bins = []
    last_idx_bin = 0
    last_p_idx_nonzero = 0
    last_q_idx_nonzero = 0
    # Create bins
    for i in range(0, len(p)):
        if p[i] != 0 and q[i] != 0:
            bins.append(i)
            last_idx_bin = i
            last_p_idx_nonzero = i
            last_q_idx_nonzero = i
        elif p[i] != 0:
            last_p_idx_nonzero = i
            if last_q_idx_nonzero > last_idx_bin:
                bins.append(i)
                last_idx_bin = i
        elif q[i] != 0:
            last_q_idx_nonzero = i
            if last_p_idx_nonzero > last_idx_bin:
                bins.append(i)
                last_idx_bin = i
    
    # remove the last element in bins and by last index of p
    bins.pop()
    bins.append(len(p))
    
    bins = np.array(bins)
    
    # Create new arrays with number of elements corresponding to number of bins
    p_ = np.zeros((len(bins),))
    q_ = np.zeros((len(bins),))
    for i in range(0, len(p)):
        bin_nb = len(bins[bins < i])
        p_[bin_nb] += p[i]
        q_[bin_nb] += q[i]
    
    kl_func = np.vectorize(lambda n: 0 if p_[n] == 0 or q_[n] == 0 else p_[n] * np.log(p_[n] / q_[n]), otypes=[np.float])
    kl = kl_func(np.array(range(len(p_)))).sum()
    
    return kl

In [18]:

p_test = np.array([0.2, 0.2, 0.2, 0.4])
q_test = np.array([0.3, 0.3, 0.1, 0.3])
kl_divergence(p_test, q_test)

Out[18]:

0.09151622184943575

Question 7¶

Compare the degree distribution of your network to each of the two synthetic ones, in terms of KL divergence. Hint: Make sure you normalise your degree distributions to make them valid probability distributions.

In [19]:

mask = adjacency.copy()
mask[np.nonzero(adjacency)] = 1
real_degrees = mask.sum(axis=0)
real_weights = np.ones_like(real_degrees) / float(n_nodes)
real_degree_distribution = np.bincount(real_degrees) / len(real_degrees)

In [20]:

er_degrees = er.sum(axis=0)
er_weights = np.ones_like(er_degrees) / float(n_nodes)
er_degree_distribution = np.bincount(er_degrees) / len(er_degrees)

In [21]:

ba_degrees = ba.sum(axis=0)
ba_weights = np.ones_like(ba_degrees) / float(n_nodes)
ba_degree_distribution = np.bincount(ba_degrees) / len(ba_degrees)

In [22]:

print('kl_divergence(er, real) =', kl_divergence(er_degree_distribution, real_degree_distribution))

kl_divergence(er, real) = 1.779615608521806

In [23]:

print('kl_divergence(ba, real) =', kl_divergence(ba_degree_distribution, real_degree_distribution))

kl_divergence(ba, real) = 0.7109164200449662

Question 8¶

Plot the degree distribution historgrams for all three networks. Are they consistent with the KL divergence results? Explain.

In [24]:

plt.figure(figsize=(20,6))
plt.subplot(131)
plt.axis([-10, 250, 0, 0.65])
plt.title('Real network')
_ = plt.hist(real_degrees, bins=50, weights=real_weights, range=(0,249))
plt.subplot(132)
plt.axis([-10, 250, 0, 0.65])
plt.title('Erdos-Rényi network')
_ = plt.hist(er_degrees, bins=50, weights=er_weights, range=(0,249))
plt.subplot(133)
plt.axis([-10, 250, 0, 0.65])
plt.title('Barabási-Albert network')
_ = plt.hist(ba_degrees, bins=50, weights=ba_weights, range=(0,249))

They are consistent with the KL divergence results: the BA model has a degree distribution which is more similar to the real network than the ER model, hence the lower KL divergence value for the BA model.

Question 9¶

Imagine you got equal degree distributions. Would that guarantee you got the same graph? Explain.

No it wouldn't. It's very easy to come up with a counter example. Imagine the following degree distribution for a graph with 5 nodes: 3 nodes have a degree of 2 and the other 2 nodes have a degree of 1. Now consider the following two scenarios.
Scenario 1:
$V_1 = \{0, 1, 2, 3, 4\}$
$E_1 = \{(0, 4), (1, 2), (1, 3), (2, 3)\}$
Scenario 2:
$V_2 = \{0, 1, 2, 3, 4\}$
$E_2 = \{(0, 1), (1, 2), (2, 3), (3, 4)\}$
In the first scenario vertices 1, 2 and 3 have a degree of 2 and vertices 0 and 4 have a degree of 1 and the graph is not connected.
In the second scenario vertices 1, 2, and 3 have a degree of 2 and vertices 0 and 4 have a degree of 1 and the graph is connected.
Since we have the same degree for each vertex in both scenario but the graph is connected in one of them and not in the other, those graphs are cleary not the same. Therefore equal degree distributions does not guarantee that we have the same graph.

Part 2¶

You are allowed to use any additional library here (e.g., NetworkX, PyGSP, etc.). Be careful not to include something here and use it in part 1!

Question 10 & Question 11¶

Choose a random network model that fits you network well. Explain your choice.

Hint: Check lecture notes for different network models and their properties. Your choice should be made based on at least one property you'd expect to be similar.

Explain (in short) how the chosen model works.

In [25]:

import networkx as nx
import collections

In [26]:

# function to plot adjacency matrix and draw network
def plot_Adjacency_w_Network(A, G, title="none"):
    f, axs = plt.subplots(1,2,figsize=(13,13))
    plt.subplot(221)
    plt.spy(A, markersize=1)
    plt.title('Adjacency '+ title)
    
    plt.subplot(222)
    nx.draw(G, pos=nx.shell_layout(G), with_labels=True, font_weight='bold', node_color="yellow")
    plt.title(title)
    plt.show()

In [27]:

# function to plot adjacency comparison btw model and real world model
def compare_adjacency(model_A, title):
    adjacency =  np.load('../data/adj_matr.npy')
    adjacency_watts_strogatz = np.load('../data/'+model_A+'')
    edges = np.count_nonzero(np.tril(adjacency_watts_strogatz))
    
    # ba-preferential-only
    plt.figure(figsize=(20, 20))
    plt.subplot(121)
    plt.spy(adjacency_watts_strogatz, markersize=1)
    plt.title(''+ title +' ('+str(n)+', '+str(edges)+')')

    # real model
    plt.subplot(122)
    plt.spy(adjacency, markersize=1)
    plt.title('Flight routes graph ('+str(n_nodes)+', '+str(n_edges)+')')
    plt.show()

Network properties to discuss on each model¶

Degree distribution
size & growth
randomness

1. Erdös-Rényi model¶

2. Watts-Strogatz model¶

3. Barabási-Albert Model¶

3.1. Barabási-Albert Model - Growth-only model¶

3.1. Barabási-Albert Model - Preferential attachment-only model¶

1. Erdös-Rényi model¶

In [28]:

n = 30
p = 0.5
seed = 12
'''
    n : int
        The number of nodes.
    p : float
        Probability for edge creation.
'''
G = nx.gnp_random_graph(n, p, seed=12, directed=False)
A = nx.adj_matrix(G)
plot_Adjacency_w_Network(A,G,"Erdös-Rényi model")

Erdös-Rényi model does not fit good.¶

a random network model is a network where each pair of nodes is connected with probability p
random: YES

Model fit¶

bad representation of real world
model does not generate local clusters
in general ER graphs have a low clustering coefficient, which we have in our real model
have a constant, random, and independent probability of two nodes being connected
no hubs as in scale free networks
- The existence of hubs is the biggest difference between random networks and scale-free networks.

2. Watts-Strogatz model¶

In [29]:

n = 30
k = 8
p = 0.3
seed = 12
'''
    n : int
        The number of nodes
    k : int
        Each node is connected to k nearest neighbors in ring topology
    p : float
        The probability of rewiring each edge (0 = only neighbours, 1 = all nodes)
'''
G = nx.watts_strogatz_graph(n, k, p, seed)
A = nx.adj_matrix(G)
plot_Adjacency_w_Network(A,G,"Watts-Strogatz model")

Watts-Strogatz model fits better than the first one.¶

Extension of the random network model for 
    1. small world property
    2. high clustering
random: YES

Model Fit:¶

We can interpret it as follows: Each airport node is connected to k other neighbouring airports. k represents the average degree of each airpot node.

Furthermore the probability p (between 0 and 1) describes the rewiring probability that allows each node to connect to any other airport node of the whole network exept itself. This is basically how clusters evolve in the undirected graph using this model.

If p is zero the airport is connected to its closest k airport nodes. If p is one each connection gets rewired/redirected.

(+) in a real world model redirections probably happen but not so often as in this model.
(+) the idea of the closest neighbours is pretty similar and logic to the airport-flight network
(-) there is no network growth

3. Barabási-Albert model¶

In [30]:

n = 30
m = 5
seed = 12
'''
    n : int
        Number of nodes
    m : int
        Number of edges to attach from a new node to existing nodes
'''
G = nx.barabasi_albert_graph(n, m, seed)
A = nx.adj_matrix(G)
plot_Adjacency_w_Network(A,G,"Barabási-Albert model")

The Barabási-Albert Model fits better than the first two models.¶

Growth and preferential attachment cannot be captured by random networks
    1. growth only model
    2. preferential attachment only model
random: NO -> scale free

The The Barabási-Albert Model starts off by creating a m times m matrix in which each node is connected to each other. After the initial generation of the mxm matrix, for each iteration, a new node with m edges/connections gets added to the network, until the desired network size of n gets reached.

To decide to which node a new node connects this model uses preferential attachment probability, which is based on the each degree of the currently existing nodes of the network. Therefore the new node tends to connect to nodes with higher degree as their preferential attachment probability is higher.

Model Fit:¶

(+) growth & size: a new airport gets added and creates not necessarily m nodes but at least one
(+) preferential attachment: similar to our real network as big airports are more "attractive" probably for

economic reasons

(-) scale-free and therefore not a random model

3.1. Barabási-Albert Model - Growth-only model¶

In [31]:

# no preferential process!
def barabasi_albert_growth_only(n, m, seed=None):

    np.random.seed(seed)

    # create m*m matrix with m connections -> full
    adjacency = np.ones((m+1,m+1))
    
    # delete diagonal
    np.fill_diagonal(adjacency, 0)
    
    # loop till graph is n
    for i in range(m+2, n):

        # add node each iteration and resize array by one col and row
        b = np.zeros((i,i))
        b[:-1,:-1] = adjacency
        adjacency = b
                
        # sum of all degrees of current adj
        sum_deg = np.count_nonzero(adjacency)

        # equal pref attachment probabilities for each old node
        p = [1/adjacency.shape[0]-1] * (adjacency.shape[0]-1)
        
        # normalize so that the sum equals one for np.choice
        pp = np.array(p)
        pp /= pp.sum()

        # add m edges to the new node
        while np.count_nonzero(adjacency[i-1]) <= m:  
            # random choice of the preferential attachment
            new_edge_index = np.random.choice(len(pp), 1, p=pp)
            adjacency[int(new_edge_index)][i-1] = 1
            adjacency[i-1][int(new_edge_index)] = 1
            
    np.fill_diagonal(adjacency, 0)
    return adjacency

In [32]:

n = 50
m = 5
seed = 12
ba_go = barabasi_albert_growth_only(n, m, seed)
G = nx.from_numpy_matrix(ba_go)
A = nx.adj_matrix(G)
plot_Adjacency_w_Network(A,G,"Barabási-Albert Model - Growth-only model")

Barabási-Albert Model - Growth-only model¶

How does the BA - Growth-only model?¶

the lack of growth eliminates its originally scale free character and is therefore random

we start of with a fully connected m times m matrix
we grow the matrix by one node for each iteration
each node has the same probability to be chosen
the chosen node gets assigned m edges

Model Fit:¶

(+) growth & size: a new airport gets added and creates not necessarily m nodes but at least one
(+/-) preferential attachment: no preferential attachment but therefore a random network
(+) random

3.2. The Barabási-Albert Model - Preferential attachment-only model¶

In [33]:

# no growth process!
def barabasi_albert_preferential_attachment_only(n, m, seed=None):
    
    #print("max m: ",(n*(n-1))/2)
    if m>((n*(n-1))/2):
        print("too large")
        return False
    
    np.random.seed(seed)

    # create empty nXn matrix with diagonal inital connection = 1 for each node
    adjacency = np.zeros((n,n))
    np.fill_diagonal(adjacency, 1)

    # add m edges to random nodes
    for i in range(m):

        # sum of all degrees of current adj
        sum_deg = np.count_nonzero(adjacency)

        # calc pref attachment probabilities for each old node
        p = []
        for k in range(0, adjacency.shape[0]):
            x = np.count_nonzero(adjacency[k])/sum_deg
            p.append(x)
            
        # normalize so that the sum equals one for np.choice
        pp = np.array(p)
        pp /= pp.sum()

        # select a random node
        node_select = np.random.randint(0,n-1)
        #print(node_select)
        if np.count_nonzero(adjacency[node_select]) >= n-1:
            while True:
                node_select = np.random.randint(0,n-1)
                if np.count_nonzero(adjacency[node_select]) is not n:
                    break
        
        # add m edges to the new node
        new_edge_index = np.random.choice(len(pp), 1, p=pp)
        
        # add 1 connection per iteration
        if (adjacency[int(new_edge_index)][node_select] == 1):            
            while True:  
                # random choice of the preferential attachment
                new_edge_index = np.random.choice(len(pp), 1, p=pp)
                if (adjacency[int(new_edge_index)][node_select] == 0):
                    adjacency[int(new_edge_index)][node_select] = 1
                    adjacency[node_select][int(new_edge_index)] = 1
                    break
        else:     
            adjacency[int(new_edge_index)][node_select] = 1
            adjacency[node_select][int(new_edge_index)] = 1

    np.fill_diagonal(adjacency, 0)
    return adjacency

In [34]:

n = 50
m = 200
seed = 12
ba_go = barabasi_albert_preferential_attachment_only(n,m)
G = nx.from_numpy_matrix(ba_go)
A = nx.adj_matrix(G)
plot_Adjacency_w_Network(A,G,"The Barabási-Albert Model - Preferential attachment-only model")

count_edges = np.count_nonzero(np.tril(ba_go))
print("edges: ", count_edges, "nodes:", n)

edges:  200 nodes: 50

Barabási-Albert Model - Preferential attachment-only model¶

How does the BA - Preferential attachment-only model work?¶

the lack of preferential attachment eliminates its originally scale free character and is therefore random

this model takes two arguments, the size of the network n which is static, m edges that get added to the network
each iteration we add a new edge to a randomly with preferential attachment chosen node
- this means our network has n nodes and m edges
we only add a single edge per iteration until we reach m edges in the network

Model Fit:¶

(+/-) fixed growth and therefore fixed size, taking away this aspect from the OG model makes it a random network
(+) preferential attachment: similar to our real network as big airports are more "attractive" probably for

economic reasons

(+) random

Model ranking according to best fit¶

Barabási-Albert (not random)
making Barabási-Albert random
- The Barabási-Albert Model - Growth-only model
- The Barabási-Albert Model - Preferential attachment-only model
Watts-Strogatz

Erdös-Rényi

 - The Barabási-Albert Model - Preferential attachment-only model

Nowadays new airports do not get built too often as they already existed for years or decades. According to Wikipedia 30 airports are currently under construction. Most of the time airports get expanded which leads to more connectivity and therefore we consider the growth aspect as secondary.

In reality it is rather the case that a airport expands its connections to other airports. That is due to the large variety of airlines, which adapt their strategies according to new touristic destinations or in general economic reasons.

        - Watts Strogatz Model

The chance that a airport is connected to its "neighbours" is high in real world.

.. therefore we decided to compare multiple models...¶

Question 12¶

Create a random graph from that model, such that the number of nodes is the same as in your graph.

In [35]:

print("The real model has ", n_nodes, " nodes and ", n_edges, " edges.")

The real model has  3146  nodes and  18579  edges.

Watts Strogatz¶

In [36]:

n = n_nodes
k = 12
p = 0.15
seed = None

def watts_strogatz(n,k,p,seed):
    G = nx.watts_strogatz_graph(n, k, p, seed)
    A = nx.adj_matrix(G)
    np.save("../data/watts_strogatz", A.toarray())
    return G,A

#G,A = watts_strogatz(n,k,p,seed)

In [37]:

compare_adjacency("watts_strogatz.npy", "watts_strogatz")

Barabási-Albert Model - Preferential attachment-only model¶

In [38]:

# uncomment to run - takes 5min
n = n_nodes
m = n_edges
def ba_preferential_only(n,m):
    ba_pa = barabasi_albert_preferential_attachment_only(n,m)
    G = nx.from_numpy_matrix(ba_pa)
    A = nx.adj_matrix(G)
    np.save("../data/adj_matr_ba_preferential_attachment_only", A.toarray())
    return G,A

#G, A, ba_pa_n_edges = ba_preferential_only(n,m)

In [39]:

compare_adjacency("adj_matr_ba_preferential_attachment_only.npy", "barabasi_albert_preferential_attachment_only")

Barabási-Albert Model - Growth-only model¶

In [40]:

# need to solve for m: n_edges = m * (m - 1) / 2 + (n_nodes - m) * m
# m^2 * (-1/2) + m * (-1/2 + n_nodes) - n_edges == 0
roots = np.roots([-1/2, -1/2 + n_nodes, -n_edges])
m_calc = roots[roots < n_nodes][0].round().astype(int)
print("We take m =", m_calc)

We take m = 6

In [41]:

n = int(n_nodes)
m = m_calc

def ba_growth_only(n,m):
    ba_go = barabasi_albert_growth_only(n, m).astype(int)
    G = nx.from_numpy_matrix(ba_go)
    A = nx.adj_matrix(G)
    np.save("../data/adjacency_ba_growth_only", A.toarray())
    return G,A

#G,A = ba_growth_only(n,m) 

In [42]:

compare_adjacency("adjacency_ba_growth_only.npy", "barabasi_albert_preferential_attachment_only")

Question 13¶

Check the properties you expected to be similar, and compare to your network.

Are the results what you expected? Explain.

In [43]:

adjacency =  np.load('../data/adj_matr.npy')
adjacency_watts_strogatz = np.load('../data/watts_strogatz.npy')
adj_matr_ba_preferential_attachment_only = np.load('../data/adj_matr_ba_preferential_attachment_only.npy')
adjacency_ba_growth_only = np.load('../data/adjacency_ba_growth_only.npy')

Clustering Coeficcient¶

In [44]:

G_og = nx.from_numpy_matrix(adjacency)
C_og = nx.average_clustering(G_og)
print("clustering-coeficcient target real model                :", C_og)

G_stro = nx.from_numpy_matrix(adjacency_watts_strogatz)
C_stro = nx.average_clustering(G_stro)
print("clustering-coeficcient watts strogatz                   :", C_stro)

G_ba_pa = nx.from_numpy_matrix(adj_matr_ba_preferential_attachment_only)
C_ba_pa = nx.average_clustering(G_ba_pa)
print("clustering-coeficcient ba_preferential only model       :", C_ba_pa)

G_ba_go = nx.from_numpy_matrix(adjacency_ba_growth_only)
C_ba_go = nx.average_clustering(G_ba_go)
print("clustering-coeficcient ba_growth only model             :", C_ba_go)

clustering-coeficcient target real model                : 0.4959643683521253
clustering-coeficcient watts strogatz                   : 0.4237120796093915
clustering-coeficcient ba_preferential only model       : 0.004973441166319747
clustering-coeficcient ba_growth only model             : 0.0063111294026478035

We expected the watts strogatz model to have a similar clustering-coefficient since in this models nodes tend to be connected the their closest neighbour, same as in our real world model.

Degree Distribution & Degree Rank¶

In [45]:

def plot_degree_dist(liste, titles):
    colors = ["r","g","b","y"]
    subplot = 221
    plt.figure(figsize=(20, 5))
    
    for graph in range(0,len(liste)):
        degree_sequence = sorted([d for n, d in liste[graph].degree()], reverse=True)  # degree sequence
        degreeCount = collections.Counter(degree_sequence)
        deg, cnt = zip(*degreeCount.items())
        
        plt.subplot(subplot)
        subplot=subplot+1
        plt.title("degree distribution "+titles[graph])        
        plt.bar(deg, cnt, width=0.80, color=colors[graph])
    plt.tight_layout()

    subplot = 221
    plt.figure(figsize=(20, 5))
    for graph in range(0,len(liste)):
        degree_sequence = sorted([d for n, d in liste[graph].degree()], reverse=True)
        # print "Degree sequence", degree_sequence
        dmax = max(degree_sequence)

        plt.subplot(subplot)
        subplot=subplot+1
        plt.loglog(degree_sequence, 'b-', marker='o', color=colors[graph])
        plt.title("Degree rank "+titles[graph])
        plt.ylabel("degree")
        plt.xlabel("rank")
    plt.tight_layout()

    plt.show()

In [46]:

plot_degree_dist([G_og, G_ba_pa, G_ba_go, G_stro], ["Original Graph","G_ba_preferential_attachment","G_ba_growth_only", "Watts Strogatz"])

We can see that the only model with a similar degree distribution is the barabasi albert with no preferential attachment. This was to be expected since this model forms in a similar way as our real world network: Bigger airports appeared first and then smaller airports were connected to them. The difference is that the model is missing some sort of preferential attachment: in the real world small airports tend to connect with bigger airport with much higher probability than with other small airports. This is why the scale-free barabasi albert model with both growth and preferential attachment would have best modeled the degree distribution.

In [48]:

og_degree_dist = np.array([d for n,d in G_og.degree])
ba_pa_degree_dist = np.array([d for n,d in G_ba_pa.degree])
ba_go_degree_dist = np.array([d for n,d in G_ba_go.degree])
stro_degree_dist = np.array([d for n,d in G_stro.degree])

In [49]:

print('Degree distributions:\n')
print('Original graph:\t\t\tmean:', np.mean(og_degree_dist), '\tstandard deviation:', np.std(og_degree_dist))
print('BA_preferential_attachment:\tmean:', np.mean(ba_pa_degree_dist), '\tstandard deviation:', np.std(ba_pa_degree_dist))
print('BA_growth_only:\t\t\tmean:', np.mean(ba_go_degree_dist), '\tstandard deviation:', np.std(ba_go_degree_dist))
print('Watts Strogatz:\t\t\tmean:', np.mean(stro_degree_dist), '\t\t\tstandard deviation:', np.std(stro_degree_dist))
print('Barabasi-Albert:\t\tmean:', np.mean(ba_degrees), '\tstandard deviation:', np.std(ba_degrees))

Degree distributions:

Original graph:			mean: 11.811188811188812 	standard deviation: 24.967543115080108
BA_preferential_attachment:	mean: 11.811188811188812 	standard deviation: 5.686050064689753
BA_growth_only:			mean: 13.982193958664547 	standard deviation: 7.385307522752407
Watts Strogatz:			mean: 12.0 			standard deviation: 1.2792042981336627
Barabasi-Albert:		mean: 11.990464081373172 	standard deviation: 14.139345062824072

It was to be expected that the mean degree of each node would be similar for each model since fix the number of nodes and try to have a similar number of edges, but here the standard deviation represents the size and number of hubs in the network. As expected both random ba models represent the hubs much better than the watts strogatz model. Also as previously stated the Barabasi-Albert model with both growth and preferential attachment, even though it is not a random model, represents hubs much better than any random network model.