#!/usr/bin/env python # coding: utf-8 # # [AMLD'19 Learning and Processing over Networks](https://github.com/rodrigo-pena/amld2019-graph-workshop) # # # 2 Network Science # # In this notebook we will examine a couple of basic network properties, taking a flight route graph and a road graph as objects of study. # # It will be useful to check the documentation of the [`networkx`](https://networkx.github.io/documentation/networkx-1.10/index.html) package as we go along, since a lot of the properties we will see can be easily computed via calls to this package. # In[1]: import numpy as np import scipy as sp import osmnx as ox import networkx as nx import pandas as pd import matplotlib.pyplot as plt import collections import utils # The data used for constructing the flight route network was taken from [OpenFlights](https://openflights.org/data.html). Each node in the network represents an airport. Edges are drawn between nodes if there is a flight route connecting the corresponding nodes. # # The cell code below loads the flight route network that we are going to use. The graph itself, represented as a `networkx` graph, is returned in the variable `flight_graph`. The varible `pos`, useful for plotting the graph against a world map, is a dictionary containing airport acronyms as keys and (longitude, latitude) pairs as values. # In[2]: routes, airports, pos, flight_graph = utils.preprocess_flight_routes() # The `routes` dataframe contains information on the flight routes, such as source and destination airports, and distance. You can get a glimpse of its contents by running the cell below. # In[3]: routes.head() # Similarly, the `airports` dataframe contains information on the various airports in the dataset. Recall that they represent the nodes in our first graph. # In[4]: airports.head() # The line of code below allows us to overlay the graph onto the world map. The flight routes graph is an instance of a network with a "natural" embedding (the surface of the earth). Such is not always the case and sometimes we need to come up with our own embeddings, as we will see in the next notebook. # In[5]: utils.display_map(flight_graph, pos) # Our second graph is going to be a road network constructed from [Open Street Map](https://en.wikipedia.org/wiki/OpenStreetMap) data, made possible thanks to the [`OSMnx`](https://osmnx.readthedocs.io/en/stable/index.html) Python package. # # The cell code below will load a road graph whose nodes represent intersections and whose edges represent roads. We set it up so that the actual loaded graph covers roads within a 1 km radius of a point within EPFL. # In[21]: cache = '../data/osmnx' ox.config(cache_folder=cache, use_cache=True) # In[84]: point = (46.52019, 6.56591) # EPFL road_graph = ox.graph_from_point(point, distance=1000) # It's possible to query by city name as well. Note, however, that the graph will be larger. #places = dict(city='Lausanne') #road_graph = ox.graph_from_place(places, network_type='drive') road_graph = road_graph.to_undirected() # Work with its undirected version for convenience # The `osmnx` package also allows us to plot the graph using cartographic coordinates, as well as drawing the edges atop the actual roads. # # Run the cell below to see such a drawing of the graph we have just loaded. Can you identify where are we in the plot? # In[85]: ox.plot_graph(road_graph, fig_height=9) plt.show() # ## 2.1. Degree distribution # # The first thing we will examine is the degree distribution of the graphs we have loaded. That is, we want to compute and plot the histogram of the degree vectors of the networks above and compare their shapes. Try to mark as well on the histogram plot where the average degree is located. *Hints:* 1) Because both networks are represented as `networkx` graphs, you can access their degree vector via the `.degree()` function. 2) The [`collections`](https://docs.python.org/2/library/collections.html) module has a `Counter` function to help construct the histogram. # In[55]: degree_sequence = sorted([d for n, d in flight_graph.degree()], reverse=True) # degree sequence # print "Degree sequence", degree_sequence degreeCount = collections.Counter(degree_sequence) deg, cnt = zip(*degreeCount.items()) fig, ax = plt.subplots(figsize = (20,10)) plt.bar(deg, cnt, width=0.80, color='b') mean = np.array(degree_sequence).mean() plt.axvline(mean, color='r', linestyle='dashed', linewidth=1) plt.title("Degree Histogram") plt.ylabel("Count") plt.xlabel("Degree") plt.text(20,480,'Average = {}'.format(mean),bbox=dict(facecolor='white', alpha=0.5), fontsize=16) plt.show() degree_sequence = sorted([d for n, d in road_graph.degree()], reverse=True) # degree sequence # print "Degree sequence", degree_sequence degreeCount = collections.Counter(degree_sequence) deg, cnt = zip(*degreeCount.items()) fig, ax = plt.subplots(figsize = (20,10)) plt.bar(deg, cnt, width=0.80, color='b') mean = np.array(degree_sequence).mean() plt.axvline(mean, color='r', linestyle='dashed', linewidth=1) plt.title("Degree Histogram") plt.ylabel("Count") plt.xlabel("Degree") plt.text(mean,48,'Average = {}'.format(mean),bbox=dict(facecolor='white', alpha=0.5), fontsize=16) plt.show() # ## 2.2. Sparsity # # When we speak of the sparsity of a graph, we mean the number of edges that it has, in comparison to the number of edges that a complete (dense) graph with the same number of nodes would have. # # Compute and print in the cell below the following quantities, for both the flight and road graphs: # # - Actual number of edges. # - Maximum number of edges for a graph with the same number of nodes. # - The ratio between the two previous quantities. # # After examining the numbers, can you say that the graphs we have here are sparse? # In[132]: print("Number of edges in the Flight graph: {}".format(flight_graph.number_of_edges())) n = flight_graph.number_of_nodes() print("Maximum number of edges in a graph with the same number of nodes as the Flight graph: {}" .format(.5 * n * (n-1))) print("Ratio: {}".format(flight_graph.number_of_edges() / (.5 * n * (n-1)))) print("Number of edges in the Flight graph: {}".format(road_graph.number_of_edges())) n = road_graph.number_of_nodes() print("Maximum number of edges in a graph with the same number of nodes as the Flight graph: {}" .format(.5 * n * (n-1))) print("Ratio: {}".format(road_graph.number_of_edges() / (.5 * n * (n-1)))) # In order to visualize the sparsity pattern of both graphs, we ask you to plot the respective adjacency matrices in the cell below. *Hint:* the function [`matplotlib.pyplot.spy`](https://matplotlib.org/api/_as_gen/matplotlib.pyplot.spy.html) might be useful here. # In[133]: fig, ax = plt.subplots(1, 1, figsize=(6, 6)) ax.spy(nx.adjacency_matrix(flight_graph).todense()) ax.set_title("Adjacency matrix of flight graph") fig, ax = plt.subplots(1, 1, figsize=(6, 6)) ax.spy(nx.adjacency_matrix(road_graph).todense()) ax.set_title("Adjacency matrix of road graph") plt.show() # ## 2.3. Connectedness # # Print the number of connected components for each graph. *Hint:* you can use `networkx` functions. # In[40]: print('Number of connected components in the Flight graph: {}'.format(nx.number_connected_components(flight_graph))) print('Number of connected components in the Road graph: {}'.format(nx.number_connected_components(road_graph))) # If any of those graphs has more than 1 connected component, use the cell below to restrict this graph to its **largest** such component. Then, use the next cell to plot the graph. Compare this with the original plots. Can you identify which parts of the original graph stayed and which were removed? # In[45]: flight_graph = max(nx.connected_component_subgraphs(flight_graph), key=len) # In[46]: utils.display_map(flight_graph, pos) # ## 2.4. Shortest paths # # Note that functions depending on polling shortest paths may take a while to compute. # # ### 2.4.1 Diameter # # Recall that the diameter of the graph is the length of the shortest path between any pair of nodes. Use the cell below to print the diameter of the Flight and Road graphs. *Hint:* you can use `networkx` functions. # In[57]: print("Diameter of the Flight graph: {}".format(nx.diameter(flight_graph))) print("Diameter of the Road graph: {}".format(nx.diameter(road_graph))) # ### 2.4.2. Average shortest path length # # Compute and print below the average shortest path lengths of both graphs. How do those quantities compare with the respective diameters? What does this say about the graphs themselves? # In[88]: print("Average shortest path length on the Flight graph: {}".format(nx.average_shortest_path_length(flight_graph))) print("Average shortest path length on the Road graph: {}".format(nx.average_shortest_path_length(road_graph))) # ### 2.4.3. Betweenness centrality # # In the cell below, compute the betweenness centralities of the nodes in the Flight graph. Remember that betweenness centrality is a measure of the fraction of shortest paths that pass though the given node. *Hint:* you can use `networkx` functions. # In[58]: betweenness = nx.betweenness_centrality(flight_graph) # Find the node with largest betweenness centrality and check its corresponding airport acronym. Does it make sense? *Hint:* you can sort the values of the `betweenness` dictionary to find the key corresponding to the largest value. # In[77]: i = np.argsort(list(betweenness.values()))[-1] print('Airport with the largest betweenness centrality: {}'.format(list(betweenness.keys())[i])) # ## 2.6. Clustering coefficient # # The clustering coefficient of each node $i$ is a measure of the fraction of neighbors of $i$ that are also connected themselves. In this last subsection we will ask you to compute the average clustering coefficient of the networks under study. But first, in case you intend to use the `networkx` implementation of the average clustering coefficient, we suggest using the following `road_graph_simple` graph as a representation of the road graph. This line of code is simply a trick to "flatten" the information content on the edges of the graph so that the average clustering coefficient can be computed by the `networkx` implementation. # In[86]: road_graph_simple = nx.Graph(road_graph) # MultiGraph to Graph # Compute and print the average clustering coefficient of the Flight and Road networks in the cell below. # In[87]: print("Average clustering coefficient on the Flight graph: {}".format(nx.average_clustering(flight_graph))) print("Average clustering coefficient on the Road graph: {}".format(nx.average_clustering(nx.Graph(road_graph_simple)))) # In[ ]: