This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

14.3. Resolving dependencies in a Directed Acyclic Graph with a topological sort

You need the python-apt package in order to build the package dependency graph. (https://pypi.python.org/pypi/python-apt/)

We also assume that this notebook is executed on a Debian system (like Ubuntu). If you don't have such a system, you can download the data Debian directly on the book's website. Extract it in the current directory, and start this notebook directly at step 7. (http://ipython-books.github.io)

  1. We import the apt module and we build the list of packages.
In [ ]:
import json
import apt
cache = apt.Cache()
  1. The graph dictionary will contain the adjacency list of a small portion of the dependency graph.
In [ ]:
graph = {}
  1. We define a function that returns the list of dependencies of a package.
In [ ]:
def get_dependencies(package):
    if package not in cache:
        return []
    pack = cache[package]
    ver = pack.candidate or pack.versions[0]
    # We flatten the list of dependencies,
    # and we remove the duplicates.
    return sorted(set([item.name 
            for sublist in ver.dependencies 
            for item in sublist]))
  1. We now define a recursive function that builds the dependency graph for a particular package. This function updates the graph variable.
In [ ]:
def get_dep_recursive(package):
    if package not in cache:
        return []
    if package not in graph:
        dep = get_dependencies(package)
        graph[package] = dep
    for dep in graph[package]:
        if dep not in graph:
            graph[dep] = get_dep_recursive(dep)
    return graph[package]
  1. Let's build the dependency graph for IPython.
In [ ]:
get_dep_recursive('ipython');
  1. Finally, let's save the adjacency list in JSON.
In [ ]:
with open('data/apt.json', 'w') as f:
    json.dump(graph, f, indent=1)
  1. We import a few packages.
In [ ]:
import json
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline
  1. Let's load the adjacency list from the JSON file.
In [ ]:
with open('data/apt.json', 'r') as f:
    graph = json.load(f)
  1. Now, we create a directed graph (DiGraph in NetworkX) from our adjacency list. We reverse the graph to get a more natural ordering.
In [ ]:
g = nx.DiGraph(graph).reverse()
  1. A topological sort only exists when the graph is a directed acyclic graph (DAG). This means that there is no cycle in the graph, i.e. no circular dependency here. Is our graph a DAG?
In [ ]:
nx.is_directed_acyclic_graph(g)
  1. What are the packages responsible for the cycles? We can find it out with the simple_cycles function.
In [ ]:
set([cycle[0] for cycle in nx.simple_cycles(g)])
  1. Here, we can try to remove these packages. In an actual package manager, these cycles need to be carefully taken into account.
In [ ]:
g.remove_nodes_from(_)
In [ ]:
nx.is_directed_acyclic_graph(g)
  1. The graph is now a DAG. Let's display it first.
In [ ]:
ug = g.to_undirected()
deg = ug.degree()
In [ ]:
plt.figure(figsize=(4,4));
# The size of the nodes depends on the number of dependencies.
nx.draw(ug, font_size=6, 
        node_size=[20*deg[k] for k in ug.nodes()]);
  1. Finally, we can perform the topological sort, thereby obtaining a linear installation order satisfying all dependencies.
In [ ]:
nx.topological_sort(g)

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).