Data Visualization With D3

This lab is adapted from the fantastic D3 Tutorial by Scott Murray. We’ll work through a quick version of his tutorial, and then build a simple interactive visualization dashboard for a sample dataset. If you're interested in a more thorough introduction to D3, take a look at his book Interactive Data Visualization for the Web, which is available for free online. Additionally, if you're thinking of working with D3 for your project, have a look at this fantastic gallery for some really good starting points!

We're going to use this lab to learn and practice D3 fundamentals, to make sure the concepts are solidified to make fancier visualizations.

Don't forget to fill out the response form!

Basics

  • HTML - By now, hopefully you’ve heard about HTML. HTML, invented by lazy physicists, is the language of the web, and we’ll use it for laying out and displaying documents - in this case, our visualizations.
  • CSS - In modern webpages, HTML is used for layout of elements, while CSS is used to style the visual presentation of HTML, doing things like setting fonts and line widths, etc. We'll make heavy use of CSS to style our visualizations.
  • DOM - A DOM is an object model used to represent HTML documents as well as documents in other markup formats. A DOM can be thought of as a tree of nodes, where each node represents an element in the document, with the tree rooted at as the document. A visualization of the DOM for a simple table looks like this:

  • Javascript - A client side scripting language embedded in most web browsers. Has really nice integration with the current page's DOM and can manipulate it.

  • SVG - Scalable Vector Graphics format for describing vector graphics in XML. Very low level - allows for description of basic shapes. A green rectangle with a black border in SVG looks like this:

      <svg xmlns="http://www.w3.org/2000/svg" version="1.1" width="200" height="200">
          <rect width="150" height="150" fill="rgb(0, 255, 0)" stroke-width="1" stroke="rgb(0, 0, 0)" transform=translate(5,5)/>
       </svg>
    
    

    Which, rendered by your web browser, looks like this:

    Since it’s XML - we can load it as a DOM! And we can manipulate it with Javascript!

  • D3 - A javascript library to generate visualizations. Works on things other than SVG, but SVG enables us to do things we can’t with raster images - including interactivity.

The key idea behind D3 is that we can programmatically manipulate SVG elements in a webpage to create great visualizations, and with the help of a little javascript, we can make them interactive.

Setup

Since an iPython notebook is just a webpage, we can actually modify elements inside it while we're running. Let's take a look at what we mean - create a new html element in the middle of this webpage by running the following snippet.

In [2]:
%%html
<div id="d3-lab">

What did we just do? We created an empty element (a div) with the ID 'd3-lab' in the middle of this very web page.

Hmm... so what can we do with it? Well, let's execute some JavaScript inline as well.

In [3]:
%%javascript

require.config({paths: {d3: "/files/d3.v3.min"}});

require(["d3"], function(d3) {
  d3.select("#d3-lab").append("b").text("My insanely cool visualization.");
});

You should see some text right beneath your html block above. OK! This is a building block for creating visualizations.

Using D3

Ignoring the boilerplate - let's decompose the line that really matters - d3.select(#d3-lab).text("My insanely cool visualization.");. What is this thing doing?

First - d3 is an object which contains a number of methods. One of them is select which allows us to select DOM elements.

One of the key things in D3 is that almost all methods return a selection. Thus, we can chain together any methods that are avaiable to a selection on top of each other.

In this example, once we've selected the ID, we care about (d3-lab), we then append a b element (b is bold in HTML). The append method returns another selection - namely, the element it just created. We then chain on the text method onto the append, which inserts text inside its selection.

We now have the building blocks we need to build real visualizations with D3.

Binding to Data and Creating Visual Elements

One of the central concepts in D3 is binding data to visual elements. This comes from the concept that a visualization is a transformation of data elements to some visual representation of that data. This might be transforming a bunch of numbers into a bar chart, or using a statistical function to extract important words from Shakespeare's text and projecting them onto blades of a fancy chandelier above a bar in a museum.

First, let's clear up some terminology:

  • Data Elements - Think of these as set of data points. They can be individual numbers, records in a data frame or some other kind of data.
  • DOM Nodes - These are a set of items in the DOM of the canvas that we're manipulating. For example, a bar chart might contain several rect's - one for each bar.

Binding Data Elements to DOM Nodes is the process of associating Data Elements with a selection DOM Nodes in a one-to-one mapping. You can think of this like a join in a database system.

Let's look at a concrete example

In [4]:
%%html
<div id="d3-bind">
In [5]:
%%javascript

require.config({paths: {d3: "/files/d3.v3.min"}});

require(["d3"], function(d3) {
  //Setup local variables.
  var w = 500;
  var h = 100;
  var barPadding = 1;
    
  //Set up our dataset.
  var dataset = [15,12,21,42,12,10,1];
    
  //Create an svg element that's w x h in size.
  var svg = d3.select("#d3-bind")
    .append("svg")
    .attr("width", w)
    .attr("height", h);
    
  //Bind our data to SVG and create a rectangle for each one
  //This is where the magic happens!
  var bars = svg.selectAll("rect")
    .data(dataset)
    .enter()
    .append("rect");
    
  //For each bar, set its attributes as a function of its position
  //in the dataset and 
  bars.attr("x", function(d, i) { return i * (w / dataset.length); })
    .attr("y", function(d) { return h - (d * 4); })
    .attr("width", w / dataset.length - barPadding)
    .attr("height", function(d) { return d * 4; })
    .attr("fill", "teal");
});

Alright, this function is a bit more substantial. What did we do here?

First, we set up some local variables - width and height of the SVG element that we're going to manipulate. Then - we create our dataset. Here, it's just a list of numbers.

Next, we bind our SVG to our dataset - more specifically, we bind all rect elements inside our SVG to each element in our dataset. But - we don't actually have any rect elements in our SVG - so won't that selection be empty? This is where this myseterious function enter comes in. The selection acts as a placeholder for any items that don't exist yet. This picture from the D3 paper you read for class summarizes what this function does pretty well.

That is, we can think of the binding process as a database full outer join, and enter is a function that runs on the items in the result where the data is not bound to an element yet, update is a function which runs on the results where the data is bound to an element, and exit is a function which runs on elements that aren't bound to data. Another way to think of this is with a Constructor/Update/Destructor pattern from Object Oriented Programming. The first time a data item comes in, we run enter on it, and when it is deleted we run exit.

Finally, we set some visual attributes of each rect item in our dataset. These attributes: x and y position of each bar, height and width of each bar, and fill color - are defined either as values (as in the case of width and fill), functions of each data item (as in the case of height and y), or functions of the data item and its index in the dataset (as in the case of x). D3 figures out how to set the attribute based on the type of the second argument you pass to the attr method.

DIY

  1. Execute the code above again - you should see another copy of the bar chart. Why?
  2. Now try changing the color to something other than teal - generate a new chart.
  3. Now make the bars 1/10th as wide with the same width between them (Hint: you'll need to modify the width and x attribute definitions).

Scales and Axes

At this point, you've seen how to manipulate D3 selections, bind data to SVG elements, and update visual attributes based on our data. So far the result is a simple bar chart, but we're reasoning about everything in pixel space, and everything seems pretty low level.

Fortunately, D3 provides a bunch of tools to help us plot data on different scales and ,with different axes, or with a different layout. We'll have a look at each of these now.

Scales

"Scales are functions that map from an input domain to an output range." - Mike Bostock

You can think about scales as the things that let you load up data and manipulate it one unit - inches, feet, tons - and translate it into some desired output unit (often pixels on the screen of our web browser).

D3 provides a set of funcitonality to create these functions. Let's take a look at a simple linear scale. D3 contains other kinds of scale - logarithmic, power, quantile, etc.

A scale can be constructed with code that looks like the following:

var scale = d3.scale.linear().domain([x1,x2]).range([y1,y2]);

After executing this code, the value of scale is a function which translates things from the range (x1,x2) to the range (y1,y2) with a linear transformation.

In the code below, we've created a few example scales - designed to help with sizing of the bars, as well as positioning them - these scales take care of adding in some padding on either side of the chart.

DIY

  1. In the last exercise - you compressed the x-scale manually. Now do it simply by manipulating the height and width scales below. You should be able to make this change in just a few characters in the definition of xScale and widthScale.
  2. Now try making the y-scale a log scale. This change will require a few more changes than the last one, but should be contained in definitions of the heightScale and yScale variables. (Hint: you'll also need to change the left end of the domain.)
In [336]:
%%html
<div id="d3-scale">
In [266]:
%%javascript

require.config({paths: {d3: "/files/d3.v3.min"}});

require(["d3"], function(d3) {
  //Setup local variables.
  var w = 500;
  var h = 400;
  var barPadding = 20;
  var padding = 50;
    
  //Set up our dataset.
  var dataset = [15,12,21,42,12,10,5,85];
    
  //Define your scales here
  var xScale = d3.scale.linear()
    .domain([0, dataset.length])
    .range([padding,(w-padding)]); 
    
  var widthScale = d3.scale.linear()
    .domain([0, dataset.length])
    .range([0, (w-2*padding)]);  
    
  var yScale = d3.scale.linear()
    .domain([0, d3.max(dataset)])
    .range([h-padding,padding]);

  var heightScale = d3.scale.linear()
    .domain([0, d3.max(dataset)])
    .range([0,h-2*padding]);
    
  //Create an svg element that's w x h in size.
  var svg = d3.select("#d3-scale")
    .append("svg")
    .attr("width", w)
    .attr("height", h);
    
  //Bind our data to SVG and create a rectangle for each one
  //This is where the magic happens!
  var bars = svg.selectAll("rect")
    .data(dataset)
    .enter()
    .append("rect");
    
  //For each bar, set its attributes as a function of its position
  //in the dataset and its value.
  bars.attr("x", function(d, i) { return xScale(i); })
    .attr("y", function(d) { return yScale(d); })
    .attr("width", widthScale(0.8))
    .attr("height", function(d) { return heightScale(d); })
    .attr("fill", "teal");
});

Linear scales may seem trivial - but constructing these translations automatically can be super powerful when you're working on more complicated visualizations.

Axes

Axes are functions which generate visual elements used to generate the visual elements of an axis - the x axis, y-axis, gridlines, etc. you see in many visualizations.

In order to create an axis, we should have a scale for that axis. Luckily, we've already defined a few.

Creating a new axis looks something like this:

var yAxis = d3.svg.axis()
    .scale(yScale)
    .orient("left")
    .ticks(5)

svg.append("g")
    .attr("class", "axis")
    .attr("transform", "translate(" + xScale(0) + ",0)")
    .call(yAxis)

In that code, first we create a new axis, then we assign it to a scale (in this case, the yScale). Next, we assign a number of ticks that we want the axis to display (by default, evenly spaced). Finally we append the axis to the svg element we've already selected, and translate it to the right place. The call piece of the last bit of code reflects the fact that yAxis is a function which generates the axis object.

DIY

  1. Add an axis to your visualization above by adding this code to your javascript.
  2. With the same basic code - add an x axis. It can be aligned to the top or the bottom of the chart - but make sure the number of ticks makes sense, and that the xAxis gets translated to the right place.

We could style this axis by adding some CSS to the page that would control the line width, what the tick marks look like, etc.

Updates, Transitions, and Motion

So far, we've looked at static visualizations. But the web is an incredibly dynamic medium, and we've already seen that we can change it. D3 provides some nice tools for describing what to do when data changes, and ways to control movement in our visualizations - let's make our charts move!

Updates

In this section, we'll work with our original bar chart again, but with an additional element inserted to act like a button.

In [335]:
%%html
<div id="d3-update">
In [334]:
%%javascript

require.config({paths: {d3: "/files/d3.v3.min"}});

require(["d3"], function(d3) {
  //Setup local variables.
  var w = 500;
  var h = 400;
  var barPadding = 20;
  var padding = 50;
    
  //Set up our dataset.
  var dataset = [ 15, 12, 21, 42, 12, 10, 5, 85 ];
    
  //Define your scales here
  var xScale = d3.scale.linear()
    .domain([0, dataset.length])
    .range([padding,(w-padding)]); 
    
  var widthScale = d3.scale.linear()
    .domain([0, dataset.length])
    .range([0, (w-2*padding)]);  
    
  var yScale = d3.scale.linear()
    .domain([0, d3.max(dataset)])
    .range([h-padding,padding]);

  var heightScale = d3.scale.linear()
    .domain([0, d3.max(dataset)])
    .range([0,h-2*padding]);
    
  //Clear out old content, then create an svg element that's w x h in size.
  d3.select("#d3-update").selectAll("*").remove();  
    
  var svg = d3.select("#d3-update")
    .append("svg")
    .attr("width", w)
    .attr("height", h);
    
  //Bind our data to SVG and create a rectangle for each one
  //This is where the magic happens!
  var bars = svg.selectAll("rect")
    .data(dataset)
    .enter()
    .append("rect");
    
  //For each bar, set its attributes as a function of its position
  //in the dataset and 
  bars.attr("x", function(d, i) { return xScale(i); })
    .attr("y", function(d) { return yScale(d); })
    .attr("width", widthScale(0.8))
    .attr("height", function(d) { return heightScale(d); })
    .attr("fill", "teal");
    
  //Add a text element to our div
  d3.select("#d3-update").append("p").text("Click here to update our data.")
  
  //On click, update with new data.
  d3.select("#d3-update").select("p")
    .on("click", function() {

        //New values for dataset
        dataset = [ 11, 12, 62, 20, 18, 17, 16, 18 ];

        //Update all rects
        svg.selectAll("rect")
           .data(dataset)
           .attr("y", function(d) {
                return yScale(d);
           })
           .attr("height", function(d) {
                return heightScale(d);
           });

    });
});

What did we do here? We simply added a new paragraph element to the end of our div, then we bound a javascript listener to that paragraph.

When we bind a listener to the element, it means that we specify a function to be executed when the listener is triggered - in this case when the text in the paragraph element is clicked.

In this function, we define a new dataset, and then bind the data to all rect objects in our svg. That is - we overwrite the old data values with the new ones, and then update the data elements accordingly.

Transitions

Ok, so we can make the data change, but that change was kind of abrupt.

D3 has some black magic built in that lets us animate the data.

Try adding the following two lines after ".data(dataset)" in the "update all the rects" box.

    .transition()
    .duration(2000)

Did you see that? Two extremely simple lines of code and D3 animated our chart for us.

What's more - you can change the functions it uses to animate. Try adding:

    .ease("bounce")

Below duration(2000)

D3 Provides several easing functions that can animate not just shape changes, but also color changes, too.

DIY

  1. Try a different argument to ease: available options include: "linear", "circle", "elastic". Can you describe how these are different from "bounce".
  2. D3 Also lets us introduce delays. These can be done on a per element basis. Add .delay(function (d,i) { return i*200; }) beneath transition above. What happens?

Interactivity

Now that we know how to make static visualizations, and we have an understanding about how we can handle updates and changes to our charts - let's put that knowledge together to make interactive visualizations.

Let's make a final version of our chart in which we'll add some interactivity. The effect we're creating is the following:

  1. When a bar is moused over, we're going to make it "red".
  2. When a bar is moused over, we're going to display a tooltip with the current data value next to the bar.
  3. When the mouse moves off a bar, the tool tip should be hidden.

We've actually already seen a little bit of interactivity - when we clicked the paragraph text the chart updated, and we're going to use a very similar trick to handle the mouse interaction.

In [302]:
%%html
<div id="d3-interactive">
In [330]:
%%javascript

require.config({paths: {d3: "/files/d3.v3.min"}});

require(["d3"], function(d3) {
  //Setup local variables.
  var w = 500;
  var h = 400;
  var barPadding = 20;
  var padding = 50;
    
  //Set up our dataset.
  var dataset = [ 15, 12, 21, 42, 12, 10, 5, 85 ];
    
  //Define your scales here
  var xScale = d3.scale.linear()
    .domain([0, dataset.length])
    .range([padding,(w-padding)]); 
    
  var widthScale = d3.scale.linear()
    .domain([0, dataset.length])
    .range([0, (w-2*padding)]);  
    
  var yScale = d3.scale.linear()
    .domain([0, d3.max(dataset)])
    .range([h-padding,padding]);

  var heightScale = d3.scale.linear()
    .domain([0, d3.max(dataset)])
    .range([0,h-2*padding]);
    
  //Clear out old content, then create an svg element that's w x h in size.
  d3.select("#d3-interactive").selectAll("*").remove();  
    
  var svg = d3.select("#d3-interactive")
    .append("svg")
    .attr("width", w)
    .attr("height", h);
    
  //Bind our data to SVG and create a rectangle for each one
  //This is where the magic happens!
  var bars = svg.selectAll("rect")
    .data(dataset)
    .enter()
    .append("rect");
    
  //For each bar, set its attributes as a function of its position
  //in the dataset and 
  bars.attr("x", function(d, i) { return xScale(i); })
    .attr("y", function(d) { return yScale(d); })
    .attr("width", widthScale(0.8))
    .attr("height", function(d) { return heightScale(d); })
    .attr("fill", "teal");
    
  //Create a tooltip.
  var tip = d3.select("#d3-interactive")
    .append("div")
    .attr("id", "tooltip");
    
  tip.append("p")
    .attr("id", "value")
    .style("text-align", "center");
  
  tip.style("width", "30px")
    .style("background-color", "white")
    .style("position", "absolute")
    .style("border", "2px solid");
  
  
  
  //Your code for the DIY goes here.
});

The code should look pretty familiar - with one catch - we've added a visual element called "tooltip" that is hidden.

Making The Right Selection

We can bind events to a set of elements by making a selection, which we've already seen before.

DIY

  1. Beneath the code above, select all of the rect objects in the SVG.

Handling the mouse ins

The code for handling mouse ins is not too difficult. It should look something like this:

.on("mouseover", function(d,i) {
  tip
    .style("left", (padding+xScale(i))+"px")
    .style("top", yScale(d/2)+"px")
    .style("display", null)
    .select("#value")
    .text(d);

  d3.select(this).attr("fill", "red");
})


Basically, we set the visibility of the tooltip to true by updating its "display" attribute to "true", update its horizontal and vertical position, and set the color of the current bar to "red".

DIY

  1. Chain your selection with this code to handle mouse-ins on your selection above.

Handling the mouse outs

With the mouse outs, we'll simply hide the tooltip and then make the bar teal again. The event for a mouseout is called "mouseout".

DIY

  1. Update the code you just wrote to handle the mouseouts. To hide the tooltip, set it's "display" style to "none". Don't forget your semicolon!

What else can D3 do for me?

So, we've seen some of the features that D3 provides out of the box, but there's much more - here are a few other things D3 can help you do.

Layouts

Contrary to what the name implies, a layout does not actually handle how visual elements are layed out on the screen. Instead, they can be thought of as helper functions that take your input data and transform it to a new data that is easier for certain visualizations to work with.

Example layouts include functions to translate your data (bar height) into information for a pie chart (width of wedge in degrees and offset), a layout for stacked bar charts, a layout for graph data (nodes and vertices), and many cartographic layouts for drawing maps.

Utilities

D3 also includes tools to load up CSV and JSON files from a URL. In this case, your data becomes a record per row of your file, rather than a list of numbers. Handling this is slightly more complicated, but it's very similar to what we did above.

More Complicated Shapes

To handle shapes beyond basic circles and lines, SVG supports the concept of paths. A path is like a digital line. You start your pen on a canvas, move to some other point, and continue until you pick your pen up. Optionally you might choose to fill in the area created by your curve.

A triangle in SVG looks like this:

    <?xml version="1.0" standalone="no"?>
    <!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
    <svg width="4cm" height="4cm" viewBox="0 0 400 400"
         xmlns="http://www.w3.org/2000/svg" version="1.1">
      <title>Example triangle01- simple example of a 'path'</title>
      <desc>A path that draws a triangle</desc>
      <rect x="1" y="1" width="398" height="398"
            fill="none" stroke="blue" />
      <path d="M 100 100 L 300 100 L 200 300 z"
            fill="red" stroke="blue" stroke-width="3" />
    </svg>

As you can imagine, drawing complicated curves like this would be tricky, so d3 provides a line method to help you draw lines more easily.

Moving Up The Stack

Ok - so D3 is pretty cool, but manipulating shapes and positioning them on the canvas just so feels pretty low-level. You might be asking yourself "Aren't there libraries like Matplotlib where I can say, 'here's my data, give me a bar chart.'?" The answer is, YES!

Some options:

  1. Matplotlib - There is a backend for Matplotlib, mpld3, which converts Matplotlib Figure objects to D3 visualizations. It has iPython notebook integration built in.
  2. Vega by Trifacta - is a declarative layer for designing visualizations built on top of D3. Follow's Wilkinson's Grammar of Graphics. There is an early python wrapper called Vincent that lets you generate visualizations from python. The Grammar of Graphics is what allows R to built sophisticated plots in readable, concise, one-liners with the ggplot2 package. ggplot2's invtentor, Hadley Wickham, now actively contributes to Vega.

So - try making those your first stop when designing an interactive visualization - it may be that you can create what you want in just a few lines of code, rather than spending hours fiddling with Javascript and SVG.

This website does not host notebooks, it only renders notebooks available on other websites.

Delivered by Fastly, Rendered by Rackspace

nbviewer GitHub repository.

nbviewer version: aa567da

nbconvert version: 5.3.1

Rendered (Tue, 25 Sep 2018 05:46:53 UTC)