Programming with Python (lab classes)

Vedran Šego, vsego.org

Content:

  1. Flow networks

Note: Some older versions of Anaconda have an old version of NetworkX that lacks some of the functionality needed to solve the problems below. Execute this program:

import networkx
print(networkx.__version__)

and if it prints the version smaller than 1.9.1, uninstall your Anaconda and download and install the new version.

As a quick fix, you can instead download networkx-anaconda_patch.zip (3.1MiB) and unzip it in the same directory in which you will write the solutions for the problems below.

A flow network is a directed graph in which every arc (a directed edge) has a real-numbered, nonnegative capacity, and in which two nodes are distinguished from the others: the source (a node in which no arcs enter) and the sink (a node which no arcs leave). Here is an example of a flow:

Flow

The basic idea is to see how much of something can be "sent" through the network in a way that

  1. we never overflow the capacity of any of the edges, and

  2. each node except the source and the sink recieves and sends the same amount of something (i.e., nothing is retained nor created).

This has a plethora of applications in modeling of electrical networks, internet traffic, public transportation, water supply networks, transportation of goods, international travel, etc.

The most common questions here are:

  1. What is the capacity of the network, i.e., how much can we send from the source to the sink?

  2. What are the weak links in the network, i.e., where can we improve the network?

The answer to the first one is the maximum flow of the network, while the answer to the second one is its minimum cut (the set of edges of the minimum total capacity that, when cut, split the graph in two or more unconnected components). Incidentally, these two are the same. For the above flow, the minimum cut is marked on the following image:

Minimum cut

The maximum flow is simply the sum of the capacities of all the edges of the minimum cut. In our example, this is $5$.

While it is nice to know how much we can send via network, it would also be nice to find out how to actually achieve it. For our problem, here is one such solution:

Maximum flow

The numbers on the arcs now represent how much is transported through them. The function that assigns these numbers to the arcs (visualised in the above image) is called a flow.

Problem 1. Write a module with a function

def csv2digraph(fname, encoding="utf8", delimiter=",", has_header=True, default_capacity=1):

that reads the CSV file fname with the columns "From", "To", and "Weight", creates a networkx.DiGraph with the arcs (directed edges) corresponding to that info, and returns it.

If has_header is True, the first line of the file is considered a header and is skipped. In case some of the capacity data is missing or malformed (not convertible to a number), the value default_capacity is used instead.

If something is wrong with the file, the function should raise (actually, just pass on) the appropriate OSError exception.

Hint: To add an arc from node_from to node_to with a capacity cap, use

networkx.add_edge(node_from, node_to, capacity=cap)

Since the data from the CSV files is loaded as strings, convert the capacities to integers. Those for which this conversion failes, should be converted to floats, and -- if that fails as well -- their value should be set to default_capacity.

Example: The graphs that are plotted above are defined with the following CSV file:

From,To,Weight
01,02,2
01,03,5
02,04,1
02,05,2
03,05,2
03,06,1
04,07,2
05,07,5
05,08,2
05,09,1
06,10,1
04,11,2
06,11,5
07,12,2
08,12,5
09,12,2
10,12,3
11,12,7

Problem 2. Add to the module the function is_network(G) that returns True if G is a flow network and False otherwise.

You need to check that G has exactly one source and one sink, and that all of the capacities are convertible to floats (the csv2digraph function should make sure of that, but it is possible to create G without using that function).

Hint: Additional functions sources(G), source(G), sinks(G), and sink(G) could be useful for implementing this function and for solving the remaining problems in this document.

Problem 3. Add to the module the function network_layout(G) that returns the layout for a network-like plotting of G (if G is not a network, it should raise a custom NotNetworkError exception).

A layout is a dictionary in which the keys are the node labels and the values are tuples of their positions (in any scale). For example, the graphs above use the following layout:

{
    '01': (0, 2.0),
    '02': (1, 1.5),
    '03': (1, 2.5),
    '04': (2, 1.0),
    '05': (2, 2.0),
    '06': (2, 3.0),
    '07': (3, 0.0),
    '08': (3, 1.0),
    '09': (3, 2.0),
    '10': (3, 3.0),
    '11': (3, 4.0),
    '12': (4, 2.0)
}

The following problems involve plotting graphs. In order to do that more easily, find what is required (minimum cut arcs and maximum flow) and print them first to see the format of the data. Then use that data for plotting.

Problem 4. Write a program that inputs the name of a CSV file, creates a digraph from its data, and plots it in a manner similar to the first image above. The graph should be shown on the screen and saved to the file with the same name as the CSV file, but with the extension changed to "png".

If you make a plotting function in the module, it should always check if the graph that it gets is a network.

Problem 5. Write a program that inputs the name of a CSV file, creates a digraph from its data, finds its maximum flow value (how much can be sent through the network) and minimal cut, prints the former and plots the network in a manner similar to the second image above. The graph should be shown on the screen and saved to the file with the same name as the CSV file, but with the extension changed to "png".

Note that there can be more than one minimum cut (for example, in some of the graphs that have all their capacities set to the same value), so your drawing might differ from the above. However, the value of the maximum flow is unique for every flow network (as it must correspond to the total capacity of the edges in the minimum cut).

Problem 6. Write a program that inputs the name of a CSV file, creates a digraph from its data, finds one of its maximum flows and plots it in a manner similar to the third image above. The graph should be shown on the screen and saved to the file with the same name as the CSV file, but with the extension changed to "png".

Note that there can be more than one maximum flow (for example, in most of the graphs that have all their capacities set to the same value), so your drawing might differ from the above.

An additional example

For the following CSV file:

From,To,Capacity
01,02,1
01,03,3
01,04,5
01,05,5
01,06,3
01,07,1
02,08,1
02,09,2
03,09,3
03,10,4
04,10,5
04,11,6
05,11,6
05,12,5
06,12,4
06,13,3
07,13,2
07,14,1
08,15,2
09,15,2
10,15,2
10,15,2
11,15,2
11,16,2
12,15,2
12,16,2
13,16,2
14,16,2
09,16,2
15,17,11
16,17,11

the graphs plotted by the solutions of the above problems should look like this: A flow network

A flow network
Minimum cut

A minimum cut
Maximum flow

A maximum flow
The maximum flow value is $17$.