Creating Interactive Visualizations with Bokeh

Bokeh is a Python package for creating interactive, browser-based visualizations, and is well-suited for "big data" applications.

  • Bindings can (and have) been created for other languages.

Bokeh allows users to create interactive html visualizations without using JS.

Bokeh is a language-based visualization system. This allows for:

  • high-level commands for data binding, transformation, interaction
  • low-level power to deeply customize

Bokeh philosophy:

Make a smart choice when it is possible to do so automatically, and expose low-level capabilities when it is not.

How does Bokeh work?

Bokeh writes to a custom-built HTML5 Canvas library, which affords it high performance. This allows it to integrate with other web tools, such as Google Maps.

Bokeh plots are based on visual elements called glyphs that are bound to data objects.

Installation

Bokeh can be installed easily either via pip or conda (if using Anaconda):

pip install bokeh

conda install bokeh

A Simple Example

First we'll import the bokeh.plotting module, which defines the graphical functions and primitives.

In [1]:
import bokeh.plotting as bk

Next, we'll tell Bokeh to display its plots directly into the notebook. This will cause all of the Javascript and data to be embedded directly into the HTML of the notebook itself. (Bokeh can output straight to HTML files, or use a server, which we'll look at later.)

In [2]:
bk.output_notebook()
Loading BokehJS ...

Next, we'll import NumPy and create some simple data.

In [3]:
import numpy as np

x = np.linspace(-6, 6, 100)
y = np.random.normal(0.3*x, 1)

Now we'll call Bokeh's circle() function to render a red circle at each of the points in x and y.

We can immediately interact with the plot:

  • click-and-drag will pan the plot around.
  • Shift + mousewheel will zoom in and out
In [4]:
fig = bk.figure(plot_width=500, plot_height=500)
fig.circle(x, y, color="red")
bk.show(fig)
Out[4]:

<Bokeh Notebook handle for In[4]>

Statistical Plots

Let's try plotting multiple series on the same axes.

First, we generate some data from an exponential distribution with mean $\theta=1$.

In [5]:
from scipy.stats import expon

theta = 1

measured = np.random.exponential(theta, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

Next, create our figure, which is not displayed until we ask Bokeh to do so explicitly. We will customize the intractive toolbar, as well as customize the background color.

In [6]:
fig = bk.figure(title="Exponential Distribution (θ=1)",tools="previewsave",
       background_fill="#E8DDCB")
/Users/fonnescj/anaconda3/lib/python3.5/site-packages/bokeh/models/plots.py:465: UserWarning: 
            Plot property 'background_fill' was deprecated in Bokeh
            0.11.0 and will be removed. Use 'background_fill_color' instead.
            
  """)

The quad glyph displays axis-aligned rectangles with the given attributes.

In [7]:
fig.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="#036564", line_color="#033649")
Out[7]:
<bokeh.models.renderers.GlyphRenderer at 0x113934908>

Next, add lines showing the form of the probability distribution function (PDF) and cumulative distribution function (CDF).

In [8]:
x = np.linspace(0, 10, 1000)
fig.line(x, expon.pdf(x, scale=1), line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
fig.line(x, expon.cdf(x, scale=1), line_color="white", line_width=2, alpha=0.7, legend="CDF")
Out[8]:
<bokeh.models.renderers.GlyphRenderer at 0x113a590f0>

Finally, add a legend before releasing the hold and displaying the complete plot.

In [9]:
fig.legend.location = "top_right"

bk.show(fig)
Out[9]:

<Bokeh Notebook handle for In[9]>

Bar Plot Example

Bokeh's core display model relies on composing graphical primitives which are bound to data series. A more sophisticated example demonstrates this idea.

Bokeh ships with a small set of interesting "sample data" in the bokeh.sampledata package. We'll load up some historical automobile fuel efficiency data, which is returned as a Pandas DataFrame.

In [10]:
from bokeh.sampledata.autompg import autompg

We first need to reshape the data, by grouping it according to the year of the car, and then by the country of origin (here, USA or Japan).

In [11]:
grouped = autompg.groupby("yr")
mpg = grouped["mpg"]
mpg_avg = mpg.mean()
mpg_std = mpg.std()
years = np.asarray(list(grouped.groups.keys()))
american = autompg[autompg["origin"]==1]
japanese = autompg[autompg["origin"]==3]

american.head(10)
Out[11]:
mpg cyl displ hp weight accel yr origin name
0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
3 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
4 17.0 8 302.0 140 3449 10.5 70 1 ford torino
5 15.0 8 429.0 198 4341 10.0 70 1 ford galaxie 500
6 14.0 8 454.0 220 4354 9.0 70 1 chevrolet impala
7 14.0 8 440.0 215 4312 8.5 70 1 plymouth fury iii
8 14.0 8 455.0 225 4425 10.0 70 1 pontiac catalina
9 15.0 8 390.0 190 3850 8.5 70 1 amc ambassador dpl

Fury

For each year, we want to plot the distribution of MPG within that year. As a guide, we will include a box that represents the mean efficiency, plus and minus one standard deviation. We will make these boxes partly transparent.

In [12]:
fig = bk.figure(title='Automobile mileage by year and country')

fig.quad(left=years-0.4, right=years+0.4, bottom=mpg_avg-mpg_std, top=mpg_avg+mpg_std, fill_alpha=0.4)
Out[12]:
<bokeh.models.renderers.GlyphRenderer at 0x113ab8470>

Next, we overplot the actual data points, using contrasting symbols for American and Japanese cars.

In [13]:
# Add Japanese cars as circles
fig.circle(x=np.asarray(japanese["yr"]), 
       y=np.asarray(japanese["mpg"]), 
       size=8, alpha=0.4, line_color="red", fill_color=None, line_width=2)

# Add American cars as triangles
fig.triangle(x=np.asarray(american["yr"]), 
         y=np.asarray(american["mpg"]),
         size=8, alpha=0.4, line_color="blue", fill_color=None, line_width=2)
Out[13]:
<bokeh.models.renderers.GlyphRenderer at 0x113ab8f60>

We can add axis labels by binding them to the axis_label attribute of each axis.

In [14]:
fig.xaxis.axis_label = 'Year'
fig.yaxis.axis_label = 'MPG'
In [15]:
bk.show(fig)
Out[15]:

<Bokeh Notebook handle for In[15]>

Linked Brushing

To link plots together at a data level, we can explicitly wrap the data in a ColumnDataSource. This allows us to reference columns by name.

In [16]:
variables = autompg.to_dict("list")
variables.update({'yr':autompg["yr"]})
source = bk.ColumnDataSource(variables)

The gridplot function takes a 2-dimensional list containing elements to be arranged in a grid on the same canvas.

In [17]:
plot_config = dict(plot_width=300, plot_height=300, tools="box_select,lasso_select,help")

left = bk.figure(title="MPG by Year", **plot_config)
left.circle("yr", "mpg", color="blue", source=source)

center = bk.figure(title="HP vs. Displacement", **plot_config)
center.circle("hp", "displ", color="green", source=source)

right = bk.figure(title="MPG vs. Displacement", **plot_config)
right.circle("mpg", "displ", size="cyl", line_color="red", source=source)
Out[17]:
<bokeh.models.renderers.GlyphRenderer at 0x113ae88d0>

We can use the select tool to select points on one plot, and the linked points on the other plots will automagically highlight.

In [18]:
p = bk.gridplot([[left, center, right]])
bk.show(p)
Out[18]:

<Bokeh Notebook handle for In[18]>

Visualization of US unemployment rates

Our first example of an interactive chart involves generating a heat map of US unemployment rates by month and year. This plot will be made interactive by invoking a HoverTool that displays information as the pointer hovers over any cell within the plot.

First, we import the data with Pandas and manipulate it as needed.

In [20]:
import pandas as pd
from bokeh.models import HoverTool
from bokeh.sampledata.unemployment1948 import data
from collections import OrderedDict

data['Year'] = [str(x) for x in data['Year']]
years = list(data['Year'])
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
data = data.set_index('Year')

Specify a color map (where do we get color maps, you ask? -- Try Color Brewer)

In [21]:
colors = [
    "#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
    "#ddb7b1", "#cc7878", "#933b41", "#550b1d"
]

Set up the data for plotting. We will need to have values for every pair of year/month names. Map the rate to a color.

In [22]:
month = []
year = []
color = []
rate = []
for y in years:
    for m in months:
        month.append(m)
        year.append(y)
        monthly_rate = data[m][y]
        rate.append(monthly_rate)
        color.append(colors[min(int(monthly_rate)-2, 8)])

Create a ColumnDataSource with columns: month, year, color, rate

In [23]:
source = bk.ColumnDataSource(
    data=dict(
        month=month,
        year=year,
        color=color,
        rate=rate,
    )
)

Create a new figure.

In [32]:
fig = bk.figure(plot_width=900, plot_height=400, x_axis_location="above", tools="resize,hover",
     x_range=years, y_range=list(reversed(months)), title="US Unemployment (1948 - 2013)")

use the `rect renderer with the following attributes:

  • x_range is years, y_range is months (reversed)
  • fill color for the rectangles is the 'color' field
  • line_color for the rectangles is None
  • tools are resize and hover tools
  • add a nice title, and set the plot_width and plot_height
In [33]:
fig.rect('year', 'month', 0.95, 0.95, source=source,
     color='color', line_color=None)
Out[33]:
<bokeh.models.renderers.GlyphRenderer at 0x113d0b400>

Style the plot, including:

  • remove the axis and grid lines
  • remove the major ticks
  • make the tick labels smaller
  • set the x-axis orientation to vertical, or angled
In [35]:
fig.grid.grid_line_color = None
fig.axis.axis_line_color = None
fig.axis.major_tick_line_color = None
fig.axis.major_label_text_font_size = "5pt"
fig.axis.major_label_standoff = 0
fig.xaxis.major_label_orientation = np.pi/3

Configure the hover tool to display the month, year and rate

In [40]:
hover = HoverTool(
        tooltips=OrderedDict([
    ('date', '@month @year'),
    ('rate', '@rate'),
])
    )

Now we can display the plot. Try moving your pointer over different cells in the plot.

In [41]:
bk.show(fig)
Out[41]:

<Bokeh Notebook handle for In[41]>

Similarly, we can provide a geographic heatmap, here using data just from Texas.

In [44]:
from bokeh.sampledata import us_counties, unemployment
from collections import OrderedDict

# Longitude and latitude values for county boundaries
county_xs=[
    us_counties.data[code]['lons'] for code in us_counties.data
    if us_counties.data[code]['state'] == 'tx'
]
county_ys=[
    us_counties.data[code]['lats'] for code in us_counties.data
    if us_counties.data[code]['state'] == 'tx'
]

# Color palette from colorbrewer2.org
colors = ['#ffffd4','#fee391','#fec44f','#fe9929','#d95f0e','#993404']

# Assign colors based on unemployment
county_colors = []
for county_id in us_counties.data:
    if us_counties.data[county_id]['state'] != 'tx':
        continue
    try:
        rate = unemployment.data[county_id]
        idx = min(int(rate/2), 5)
        county_colors.append(colors[idx])
    except KeyError:
        county_colors.append("black")
        
fig = bk.figure(tools="pan,wheel_zoom,box_zoom,reset,hover,previewsave", title="Texas Unemployment 2009")

# Here are the polygons for plotting
fig.patches(county_xs, county_ys, fill_color=county_colors, fill_alpha=0.7, 
           line_color="white", line_width=0.5)

# Configure hover tool
hover = HoverTool(
        tooltips=OrderedDict([
    ("index", "$index"),
    ("(x,y)", "($x, $y)"),
    ("fill color", "$color[hex, swatch]:fill_color"),
])
    )


bk.show(fig)