This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

# 14.3. Resolving dependencies in a Directed Acyclic Graph with a topological sort¶

You need the python-apt package in order to build the package dependency graph. (https://pypi.python.org/pypi/python-apt/)

We also assume that this notebook is executed on a Debian system (like Ubuntu). If you don't have such a system, you can download the data Debian directly on the book's website. Extract it in the current directory, and start this notebook directly at step 7. (http://ipython-books.github.io)

1. We import the apt module and we build the list of packages.
In [ ]:
import json
import apt
cache = apt.Cache()

1. The graph dictionary will contain the adjacency list of a small portion of the dependency graph.
In [ ]:
graph = {}

1. We define a function that returns the list of dependencies of a package.
In [ ]:
def get_dependencies(package):
if package not in cache:
return []
pack = cache[package]
ver = pack.candidate or pack.versions[0]
# We flatten the list of dependencies,
# and we remove the duplicates.
return sorted(set([item.name
for sublist in ver.dependencies
for item in sublist]))

1. We now define a recursive function that builds the dependency graph for a particular package. This function updates the graph variable.
In [ ]:
def get_dep_recursive(package):
if package not in cache:
return []
if package not in graph:
dep = get_dependencies(package)
graph[package] = dep
for dep in graph[package]:
if dep not in graph:
graph[dep] = get_dep_recursive(dep)
return graph[package]

1. Let's build the dependency graph for IPython.
In [ ]:
get_dep_recursive('ipython');

1. Finally, let's save the adjacency list in JSON.
In [ ]:
with open('data/apt.json', 'w') as f:
json.dump(graph, f, indent=1)

1. We import a few packages.
In [ ]:
import json
import numpy as np
import networkx as nx
import matplotlib.pyplot as plt
%matplotlib inline

In [ ]:
with open('data/apt.json', 'r') as f:

1. Now, we create a directed graph (DiGraph in NetworkX) from our adjacency list. We reverse the graph to get a more natural ordering.
In [ ]:
g = nx.DiGraph(graph).reverse()

1. A topological sort only exists when the graph is a directed acyclic graph (DAG). This means that there is no cycle in the graph, i.e. no circular dependency here. Is our graph a DAG?
In [ ]:
nx.is_directed_acyclic_graph(g)

1. What are the packages responsible for the cycles? We can find it out with the simple_cycles function.
In [ ]:
set([cycle[0] for cycle in nx.simple_cycles(g)])

1. Here, we can try to remove these packages. In an actual package manager, these cycles need to be carefully taken into account.
In [ ]:
g.remove_nodes_from(_)

In [ ]:
nx.is_directed_acyclic_graph(g)

1. The graph is now a DAG. Let's display it first.
In [ ]:
ug = g.to_undirected()
deg = ug.degree()

In [ ]:
plt.figure(figsize=(4,4));
# The size of the nodes depends on the number of dependencies.
nx.draw(ug, font_size=6,
node_size=[20*deg[k] for k in ug.nodes()]);

1. Finally, we can perform the topological sort, thereby obtaining a linear installation order satisfying all dependencies.
In [ ]:
nx.topological_sort(g)


You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).