matta - view and scaffold d3.js visualizations in IPython notebooks

basic examples

By @carnby.

This notebook showcases the basic matta visualizations, as well as their usage.

Note that the init_javascript call is not needed when running on local server having added the javascript code to your IPython profile.

In [1]:
import pandas as pd
import networkx as nx
import matta
import json
import requests

from networkx.readwrite import json_graph

# we do this to load the required libraries when viewing on NBViewer
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')
/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/cartography/template.css
/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/parsets/template.css
Out[1]:
matta Javascript code added.

Wordclouds

Wordclouds are implemented using the d3.layout.cloud layout by Jason Davies. They work with bags of words. The python Counter class is perfect for this purposes.

In [2]:
hamlet = requests.get('http://www.gutenberg.org/cache/epub/2265/pg2265.txt').text
hamlet[0:100]
Out[2]:
u"\ufeff***The Project Gutenberg's Etext of Shakespeare's First Folio***\r\n*********************The Tragedie"
In [3]:
import re
from collections import Counter

words = re.split(r'[\W]+', hamlet.lower())
counts = Counter(words)
In [4]:
df = pd.DataFrame.from_records(counts.iteritems(), columns=['word', 'frequency'])
df.sort_values(['frequency'], ascending=False, inplace=True)
df.head()
Out[4]:
word frequency
995 the 1108
1877 and 920
2656 to 762
2437 of 698
4951 you 593
In [5]:
matta.wordcloud(dataframe=df.head(500), text='word', font_size='frequency', 
                typeface='Helvetica', font_weight='bold',
               font_color={'value': 'frequency', 'palette': 'cubehelix', 'scale': 'threshold'})

Treemaps

Treemaps use the Treemap Layout from d3.js. They work with trees, which we construct through networkx.DiGraph.

In [6]:
flare_data = requests.get('https://gist.githubusercontent.com/mbostock/4063582/raw/a05a94858375bd0ae023f6950a2b13fac5127637/flare.json').json()
/home/egraells/.virtualenvs/ipython/local/lib/python2.7/site-packages/requests/packages/urllib3/util/ssl_.py:100: InsecurePlatformWarning: A true SSLContext object is not available. This prevents urllib3 from configuring SSL appropriately and may cause certain SSL connections to fail. For more information, see https://urllib3.readthedocs.org/en/latest/security.html#insecureplatformwarning.
  InsecurePlatformWarning
In [7]:
flare_data['name']
Out[7]:
u'flare'
In [8]:
tree = nx.DiGraph()

def add_node(node):
    node_id = tree.number_of_nodes() + 1
    n = tree.add_node(node_id, name=node['name'])
    
    if 'size' in node:
        tree.node[node_id]['size'] = node['size']
    
    if 'children' in node:
        for child in node['children']:
            child_id = add_node(child)
            tree.add_edge(node_id, child_id)
    
    return node_id

root = add_node(flare_data)
# treemap requires this attribute
tree.graph['root'] = root
In [9]:
nx.is_arborescence(tree)
Out[9]:
True
In [10]:
import seaborn as sns
In [11]:
matta.treemap(tree=tree, node_value='size', node_label='name',
              node_color={'value': 'parent.name', 'scale': 'ordinal', 'palette': sns.husl_palette(15, l=.4, s=.9)})

Sankey

Sankey or flow diagrams use the Sankey plugin by Mike Bostock. They work with digraphs, just like treemaps. Note that graphs with loops are not supported.

In [12]:
sankey_data = requests.get('http://bost.ocks.org/mike/sankey/energy.json')
In [13]:
sankey_graph = json_graph.node_link_graph(json.loads(sankey_data.text), directed=True)
In [14]:
sankey_graph.nodes_iter(data=True).next(), sankey_graph.edges_iter(data=True).next()
Out[14]:
((0, {u'name': u"Agricultural 'waste'"}), (0, 1, {u'value': 124.729}))
In [22]:
matta.flow(graph=sankey_graph, node_label='name', link_weight='value', node_color='indigo', 
       node_width=12, node_padding=13,
       link_color={'value': 'value', 'palette': 'Greys', 'scale': 'threshold'}, link_opacity=0.8)

Parallel Coordinates

Parallel Coordinates are based on the code by Jason Davies. They work with pandas.DataFrame.

In [23]:
df = pd.read_csv('http://bl.ocks.org/jasondavies/raw/1341281/cars.csv', index_col='name')
df.head()
Out[23]:
economy (mpg) cylinders displacement (cc) power (hp) weight (lb) 0-60 mph (s) year
name
AMC Ambassador Brougham 13.0 8 360 175 3821 11.0 73
AMC Ambassador DPL 15.0 8 390 190 3850 8.5 70
AMC Ambassador SST 17.0 8 304 150 3672 11.5 72
AMC Concord DL 6 20.2 6 232 90 3265 18.2 79
AMC Concord DL 18.1 6 258 120 3410 15.1 78
In [24]:
matta.parcoords(dataframe=df)

Parallel Sets

In [25]:
df = pd.read_csv('https://www.jasondavies.com/parallel-sets/titanic.csv')
df.head()
Out[25]:
Class Age Sex Survived
0 Second Class Child Female Survived
1 Second Class Child Female Survived
2 Second Class Child Female Survived
3 Second Class Child Female Survived
4 Second Class Child Female Survived
In [27]:
matta.parsets(dataframe=df, columns=['Survived', 'Sex', 'Age', 'Class'])

Graph

Graphs from networkx.DiGraph are visualized using the Force Layout in d3.js.

In [28]:
graph = nx.davis_southern_women_graph()
In [29]:
for node in graph.nodes_iter(data=True):
    graph.node[node[0]]['color'] = 'purple' if node[1]['bipartite'] else 'green'
    graph.node[node[0]]['size'] = graph.degree(node[0])
In [30]:
matta.force(graph=graph, link_distance=100, height=600,
            node_ratio='size',
            node_color={'value': 'bipartite', 'scale': 'ordinal', 'palette': 'Set2'})