matta - Introduction: Let's Make Scaffold a Barchart

By @carnby

Introduction

Probably you have seen the barchart example by Mike Bostock. It is here. In this notebook I explain how to use matta to implement this barchart.

Why use matta?

One thing is to have an example of a visualization, and another one is to have a reusable implementation. Reusable implementations are not about having a specific function to draw. In my opinion, they are about an entire context where you can easily use your visualization with other datasets.

How do we do it?

In this notebook we see the basic scaffolding done by matta to reproduce the example chart to visualize a pandas DataFrame. By being able to use a DataFrame, we can forget about converting the dataset to the specific layout the visualization designer had in mind, and instead, you can focus on converting to a DataFrame (which will probably be very, very easy)

Let's begin.

Initial Setup

Here we load matta.

If you see the README, you will notice that you can install matta's javascript and css into your IPython profile. In this way you do not need to issue a init_javascript call. It is here just for demonstration - if you use a core matta visualization and export the notebook to NBViewer, you will need to execute it, to allow your visitor's browser to load the required js/css files.

If you installed matta into your profile, then using the function will do no harm - it detects that matta was loaded and does nothing.

In [1]:
import matta
# we do this to load the required libraries when viewing on NBViewer
matta.init_javascript(path='https://rawgit.com/carnby/matta/master/matta/libs')
/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/cartography/template.css
/home/egraells/Dropbox/phd/apps/matta/matta/visualizations/parsets/template.css
Out[1]:
matta Javascript code added.

Data

Mike's example loads a TSV (Tab Separated Values) file with letter frequency. We can load directly into a pandas DataFrame.

In [2]:
import pandas as pd

df = pd.read_csv('http://bl.ocks.org/mbostock/raw/3885304/964f9100166627a89c7e6c23ce8128f5aefd5510/data.tsv', delimiter='\t')
df.head()
Out[2]:
letter frequency
0 A 0.08167
1 B 0.01492
2 C 0.02782
3 D 0.04253
4 E 0.12702

Sketching the Visualization

First, let's sketch the visualization by defining what are its options and code.

The visualization options or arguments are contained in a dictionary. Note that the dictionary contains a subdictionary named variables. Those variables will be exposed as methods of the scaffolded visualization, and are available in code as _variable_name.

Note also the data dictionary. It indicates that the visualization receives a pandas DataFrame. This dataframe is available internally as the _data_dataframe variable.

This is the visualization code. Note that is almost a copy-and-paste version of the original example. We just renamed the variables to _variable_name and used other auxiliary variables like _vis_width which are exposed by matta.

Note that the code is not strictly javascript. Actually, the file is expected to be a jinja2 template.

We save this template as barchart.js, as barchart_args['visualization_js'] points to it.

skeleton/__init__.py

VISUALIZATION_CONFIG = {
    'requirements': ['d3'],
    'visualization_name': 'barchart',
    'visualization_js': './barchart.js',
    'figure_id': None,
    'container_type': 'svg',
    'data': {
        'dataframe': None,
    },
    'options': {
        'background_color': None,
        'x_axis': True,
        'y_axis': True,
    },
    'variables': {
        'width': 960,
        'height': 500,
        'padding': {'left': 30, 'top': 20, 'right': 30, 'bottom': 30},
        'x': 'x',
        'y': 'y',
        'y_axis_ticks': 10,
        'color': 'steelblue',
        'y_label': None,
        'rotate_label': True,
    },
}

skeleton/template.js

var x = d3.scale.ordinal()
    .rangeRoundBands([0, _vis_width], .1);

var y = d3.scale.linear()
    .range([_vis_height, 0]);

if (_y_label == null) {
    _y_label = _y;
}

x.domain(_data_dataframe.map(function(d) { return d[_x]; }));
y.domain([0, d3.max(_data_dataframe, function(d) { return d[_y]; })]);

{% if options.x_axis %}
    var xAxis = d3.svg.axis()
        .scale(x)
        .orient("bottom");

    container.append("g")
        .attr("class", "x axis")
        .attr("transform", "translate(0," + _vis_height + ")")
        .call(xAxis);
{% endif %}

{% if options.y_axis %}
    var yAxis = d3.svg.axis()
        .scale(y)
        .orient("left");

    if (_y_axis_ticks != null) {
        yAxis.ticks(_y_axis_ticks);
    }

    var y_label = container.append("g")
        .attr("class", "y axis")
        .call(yAxis)
        .append("text");

    if (_rotate_label) {
        y_label.attr("transform", "rotate(-90)")
        .attr("y", 6)
        .attr("dy", ".71em")
        .style("text-anchor", "end");
    } else {
        y_label
        .attr("y", 6)
            .attr('x', 12)
        .attr("dy", ".71em")
        .style("text-anchor", "start");
    }

    y_label.text(_y_label);
{% endif %}

// NOTE: this is needed for the internal color scale manager.
_bar_color_update_scale_func(_data_dataframe);

var bar = container.selectAll(".bar")
    .data(_data_dataframe);

bar.enter().append('rect').classed('bar', true);

bar.exit().remove();

bar.attr("x", function(d) { return x(d[_x]); })
    .attr("width", x.rangeBand())
    .attr("y", function(d) { return y(d[_y]); })
    .attr("height", function(d) { return _vis_height - y(d[_y]); })
    .attr('fill', _bar_color);

This is the actual matta code to display the visualization in the notebook.

Importing the Visualization

In [3]:
barchart = matta.import_visualization('skeleton')
In [4]:
barchart(dataframe=df, x='letter', y='frequency', rotate_label=False, bar_color='purple')

Note that the keyword arguments are keys from the VISUALIZATION_CONFIG dictionary. If you use a keyword argument not present in the dictionary, an Exception will be raised.

Remember that in the visualization configuration we had a "colorables" section. The colorable bar_color was specified as "purple" in the previous chart, but we can also make it dynamic by specifying a source column from the dataframe, a color palette and a scale type:

In [7]:
barchart(dataframe=df, x='letter', y='frequency', rotate_label=False, 
         bar_color={'value': 'letter', 'palette': 'cubehelix', 'n_colors': df.shape[0], 'scale': 'ordinal'})

Visualization Scaffolding

The next step is to scaffold a reusable visualization. Actually, the code is very similar:

In [6]:
barchart(x='letter', y='frequency').scaffold(filename='./scaffolded_barchart.js')

What this does is to create a file named scaffolded_barchart.js which contains a reusable visualization. All variables declared in the arguments dictionary are available as property methods. The values specified when defining the arguments or when scaffolding will serve as defaults, but everything is changeable. Note that we did not specify a DataFrame this time!

Testing the Visualization

To test the visualization, we will serialize the DataFrame and then display an IFrame with the visualization using a very simple template (which we, again, copied from the original source by Mike).

matta includes a dump_data function that calls a JSON serializer under the hoods. This serializer is able to handle DataFrames and other typical python data structures.

In [ ]:
from matta import dump_data
dump_data(df, './data.json')

Now let's write the HTML file:

<!DOCTYPE html>
<meta charset="utf-8">
<style>
.bar { fill: steelblue; }
.bar:hover { fill: brown; }
.axis { font: 10px sans-serif; }
.axis path, .axis line { fill: none; stroke: #000; shape-rendering: crispEdges; }
.x.axis path { display: none; }
</style>
<body>
<script src="require.js"></script>
<script>

require.config({
    shim: {
        'legend': {
            'deps': ['d3'],
            'exports': 'd3.legend'
        }
    },
    paths: {
        'matta': 'matta',
        'd3': 'd3.min',
        'legend': 'd3-legend.min',
        'barchart': 'scaffolded_barchart'
    }
});

require(['d3', 'barchart'], function(d3, matta_barchart) {
    d3.json('data.json', function (json) {
        var barchart = matta_barchart();
        d3.select('body').datum({dataframe: json}).call(barchart)
    });
});
</script>

Note that we include d3-legend because it is required by the core matta library, which is required under the hood by our barchart.

I uploaded the result to the following gist. You can see it on bl.ocks.org.

Conclusions

That's it! :)

We copied-and-pasted implemented a barchart. The cool thing is that we didn't had to worry about data formats, since we knew the data was a DataFrame. We also didn't have to worry about dependencies like loading d3.js or to have a reusable visualization, because matta does all that.

With matta you can have readymade visualizations :)