This notebook gives a fairly complicated example of building a Sankey diagram from the sample "fruit" database. Other examples (TODO) break this process down into simpler stages.

In [1]:
from sankeyview import *
from sankeyview.jupyter import show_sankey, show_view_graph
from IPython.display import SVG

Load the dataset:

In [2]:
dataset = Dataset.from_csv('fruit_flows.csv', 'fruit_processes.csv')

This made-up dataset describes flows from farms to consumers:

In [3]:
source target material time value
0 farm1 eat1 apples 2011-08-01 2.720691
1 eat1 landfill Cambridge apples 2011-08-01 1.904484
2 eat1 composting Cambridge apples 2011-08-01 0.816207
3 farm1 eat1 apples 2011-08-02 8.802195
4 eat1 landfill Cambridge apples 2011-08-02 6.161537

Additional information is available in the process dimension table:

In [4]:
type location function sector
inputs stock * inputs NaN
farm1 process Cambridge small farm farming
farm2 process Cambridge small farm farming
farm3 process Ely small farm farming
farm4 process Ely allotment farming

We'll also define some partitions that will be useful:

In [5]:
farm_ids = ['farm{}'.format(i) for i in range(1, 16)]

farm_partition_5 = Partition.Simple('process', [('Other farms', farm_ids[5:])] + farm_ids[:5])
partition_fruit = Partition.Simple('material', ['bananas', 'apples', 'oranges'])
partition_sector = Partition.Simple('process.sector', ['government', 'industry', 'domestic'])

Now define the Sankey diagram definition.

  • Process groups represent sets of processes in the underlying database. The underlying processes can be specified as a list of ids (e.g. ['inputs']) or as a Pandas query expression (e.g. 'function == "landfill"').
  • Waypoints allow extra control over the partitioning and placement of flows.
In [6]:
nodes = {
    'inputs':     ProcessGroup(['inputs'], title='Other inputs'),
    'compost':    ProcessGroup('function == "composting stock"', title='Compost'),
    'farms':      ProcessGroup('function in ["allotment", "large farm", "small farm"]', farm_partition_5),
    'eat':        ProcessGroup('function == "consumers" and location != "London"', partition_sector,
                               title='consumers by sector'),
    'landfill':   ProcessGroup('function == "landfill" and location != "London"', title='Landfill'),
    'composting': ProcessGroup('function == "composting process" and location != "London"', title='Composting'),

    'fruit':        Waypoint(partition_fruit, title='fruit type'),
    'w1':           Waypoint(direction='L', title=''),
    'w2':           Waypoint(direction='L', title=''),
    'export fruit': Waypoint(Partition.Simple('material', ['apples', 'bananas', 'oranges'])),
    'exports':      Waypoint(title='Exports'),

The ordering defines how the process groups and waypoints are arranged in the final diagram. It is structured as a list of vertical layers (from left to right), each containing a list of horizontal bands (from top to bottom), each containing a list of process group and waypoint ids (from top to bottom).

In [7]:
ordering = [
    [[], ['inputs', 'compost'], []],
    [[], ['farms'], ['w2']],
    [['exports'], ['fruit'], []],
    [[], ['eat'], []],
    [['export fruit'], ['landfill', 'composting'], ['w1']],

Bundles represent flows in the underlying database:

In [8]:
bundles = [
    Bundle('inputs', 'farms'),
    Bundle('compost', 'farms'),
    Bundle('farms', 'eat', waypoints=['fruit']),
    Bundle('farms', 'compost', waypoints=['w2']),
    Bundle('eat', 'landfill'),
    Bundle('eat', 'composting'),
    Bundle('composting', 'compost', waypoints=['w1', 'w2']),
    Bundle('farms', Elsewhere, waypoints=['exports', 'export fruit', ]),

Finally, the process groups, waypoints, bundles and ordering are combined into a Sankey diagram definition (SDD). When applied to the dataset, the result is a Sankey diagram!

In [9]:
sdd = SankeyDefinition(nodes, bundles, ordering, flow_partition=dataset.partition('material'))
sankey = show_sankey(sdd, dataset, width=800, height=500)
In [10]:
# For viewing on nbviewer, save a static version of the diagram
fruit typeconsumers by sectorfarm5 → compost 16.9farm2 → compost 30.8farm3 → compost 39.0farm1 → compost 48.6farm4 → compost 49.4government → Composting apples 51.4farm2 → apples apples 63.6farm5 → oranges oranges 83.8farm5 → Exports oranges 88.1farm3 → Exports apples 93.5farm4 → Exports oranges 104farm2 → Exports bananas 116Compost → farm5 compost 120government → Landfill apples 120government → Composting oranges 131government → Composting bananas 151farm3 → bananas bananas 165apples → government apples 171Other farms → Exports oranges 177farm2 → bananas bananas 189Compost → farm2 compost 209farm5 → apples apples 211farm4 → bananas bananas 230Other farms → Exports bananas 234Compost → farm1 compost 243Compost → farm3 compost 276Other inputs → farm5 inputs 280farm1 → oranges oranges 283farm4 → apples apples 296farm2 → oranges oranges 298industry → Composting apples 300government → Landfill oranges 305Compost → farm4 compost 305farm3 → apples apples 307farm3 → oranges oranges 316farm4 → oranges oranges 337 → bananas bananas 349Exports → bananas 349government → Landfill bananas 353Other farms → compost 358 → oranges oranges 369Exports → oranges 369industry → Composting bananas 400Other farms → Exports apples 403domestic → Composting bananas 414domestic → Composting apples 434oranges → government oranges 435farm1 → apples apples 480industry → Composting oranges 481Other inputs → farm2 inputs 488Exports → apples 496 → apples apples 496bananas → government bananas 505domestic → Composting oranges 564Other inputs → farm1 inputs 568Other inputs → farm3 inputs 645industry → Landfill apples 700Other inputs → farm4 inputs 711industry → Landfill bananas 933domestic → Landfill bananas 966apples → industry apples 1.00kdomestic → Landfill apples 1.01kindustry → Landfill oranges 1.12kOther farms → apples apples 1.26kdomestic → Landfill oranges 1.32kbananas → industry bananas 1.33kbananas → domestic bananas 1.38kapples → domestic apples 1.45koranges → industry oranges 1.60koranges → domestic oranges 1.88kCompost → Other farms compost 2.30kOther farms → oranges oranges 2.60kOther farms → bananas bananas 2.63k → compost 2.93k → compost 2.93kComposting → compost 2.93k → compost 2.93k → compost 3.47k → Compost compost 3.47kOther inputs → Other farms inputs 5.37kOther farmsOther farmsapplesapplesorangesorangesbananasbananasExportsExportsfarm4farm4industryindustryCompostingCompostingLandfillLandfillCompostCompostfarm2farm2farm3farm3farm1farm1farm5farm5domesticdomesticgovernmentgovernmentapplesapplesorangesorangesbananasbananasOther inputsOther inputs

To help get a better understanding of what's going on, it may be helpful to look at the intermediate "view graph":

This depends on graphviz being available

In [11]:
%3 inputs inputs compost compost inputs->compost farms farms inputs->farms __w2_compost_0 compost->__w2_compost_0 compost->farms w2 w2 farms->w2 exports exports farms->exports fruit fruit farms->fruit w2->__w2_compost_0 __exports_export fruit_3 exports->__exports_export fruit_3 eat eat fruit->eat __w1_w2_2 __w1_w2_2->w2 export fruit export fruit __exports_export fruit_3->export fruit landfill landfill eat->landfill composting composting eat->composting __w1_w2_3 __w1_w2_3->__w1_w2_2 landfill->composting w1 w1 composting->w1 w1->__w1_w2_3

Waypoints are shown with dashed borders. The black dots are "dummy nodes", added so that each link in the Sankey diagram has to pass only between adjacent layers.