Creating Interactive Visualizations with Bokeh

Bokeh is a Python package for creating interactive, browser-based visualizations, and is well-suited for "big data" applications.

  • Bindings can (and have) been created for other languages.

Bokeh allows users to create interactive html visualizations without using JS.

Bokeh is a language-based visualization system. This allows for:

  • high-level commands for data binding, transformation, interaction
  • low-level power to deeply customize

Bokeh philosophy:

Make a smart choice when it is possible to do so automatically, and expose low-level capabilities when it is not.

How does Bokeh work?

Bokeh writes to a custom-built HTML5 Canvas library, which affords it high performance. This allows it to integrate with other web tools, such as Google Maps.

Bokeh plots are based on visual elements called glyphs that are bound to data objects.

Installation

Bokeh can be installed easily either via pip or conda (if using Anaconda):

pip install bokeh

conda install bokeh

A Simple Example

First we'll import the bokeh.plotting module, which defines the graphical functions and primitives.

In [1]:
import bokeh.plotting as bk

Next, we'll tell Bokeh to display its plots directly into the notebook. This will cause all of the Javascript and data to be embedded directly into the HTML of the notebook itself. (Bokeh can output straight to HTML files, or use a server, which we'll look at later.)

In [2]:
bk.output_notebook()
Loading BokehJS ...

Next, we'll import NumPy and create some simple data.

In [3]:
import numpy as np

x = np.linspace(-6, 6, 100)
y = np.random.normal(0.3*x, 1)

Now we'll call Bokeh's circle() function to render a red circle at each of the points in x and y.

We can immediately interact with the plot:

  • click-and-drag will pan the plot around.
  • Shift + mousewheel will zoom in and out
In [4]:
fig = bk.figure(plot_width=500, plot_height=500)
fig.circle(x, y, color="red")
bk.show(fig)

Statistical Plots

Let's try plotting multiple series on the same axes.

First, we generate some data from an exponential distribution with mean $\theta=1$.

In [5]:
from scipy.stats import expon

theta = 1

measured = np.random.exponential(theta, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)

Next, create our figure, which is not displayed until we ask Bokeh to do so explicitly. We will customize the intractive toolbar, as well as customize the background color.

In [6]:
fig = bk.figure(title="Exponential Distribution (θ=1)",tools="previewsave",
       background_fill="#E8DDCB")
/Users/fonnescj/anaconda3/envs/dev/lib/python3.6/site-packages/bokeh/util/deprecation.py:34: BokehDeprecationWarning: Plot.background_fill was deprecated in Bokeh 0.11.0 and will be removed, use Plot.background_fill_color instead.
  warn(message)

The quad glyph displays axis-aligned rectangles with the given attributes.

In [7]:
fig.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="#036564", line_color="#033649")
Out[7]:
GlyphRenderer(
id = '20c9ae6a-394e-4fde-a960-6059fe8943c7', …)

Next, add lines showing the form of the probability distribution function (PDF) and cumulative distribution function (CDF).

In [8]:
x = np.linspace(0, 10, 1000)
fig.line(x, expon.pdf(x, scale=1), line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
fig.line(x, expon.cdf(x, scale=1), line_color="white", line_width=2, alpha=0.7, legend="CDF")
Out[8]:
GlyphRenderer(
id = 'e5173358-43d3-45df-9378-22fc75bc6bb4', …)

Finally, add a legend before releasing the hold and displaying the complete plot.

In [9]:
fig.legend.location = "top_right"

bk.show(fig)

Bar Plot Example

Bokeh's core display model relies on composing graphical primitives which are bound to data series. A more sophisticated example demonstrates this idea.

Bokeh ships with a small set of interesting "sample data" in the bokeh.sampledata package. We'll load up some historical automobile fuel efficiency data, which is returned as a Pandas DataFrame.

In [10]:
from bokeh.sampledata.autompg import autompg

We first need to reshape the data, by grouping it according to the year of the car, and then by the country of origin (here, USA or Japan).

In [11]:
grouped = autompg.groupby("yr")
mpg = grouped["mpg"]
mpg_avg = mpg.mean()
mpg_std = mpg.std()
years = np.asarray(list(grouped.groups.keys()))
american = autompg[autompg["origin"]==1]
japanese = autompg[autompg["origin"]==3]

american.head(10)
Out[11]:
mpg cyl displ hp weight accel yr origin name
0 18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu
1 15.0 8 350.0 165 3693 11.5 70 1 buick skylark 320
2 18.0 8 318.0 150 3436 11.0 70 1 plymouth satellite
3 16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst
4 17.0 8 302.0 140 3449 10.5 70 1 ford torino
5 15.0 8 429.0 198 4341 10.0 70 1 ford galaxie 500
6 14.0 8 454.0 220 4354 9.0 70 1 chevrolet impala
7 14.0 8 440.0 215 4312 8.5 70 1 plymouth fury iii
8 14.0 8 455.0 225 4425 10.0 70 1 pontiac catalina
9 15.0 8 390.0 190 3850 8.5 70 1 amc ambassador dpl

Fury

For each year, we want to plot the distribution of MPG within that year. As a guide, we will include a box that represents the mean efficiency, plus and minus one standard deviation. We will make these boxes partly transparent.

In [12]:
fig = bk.figure(title='Automobile mileage by year and country')

fig.quad(left=years-0.4, right=years+0.4, bottom=mpg_avg-mpg_std, top=mpg_avg+mpg_std, fill_alpha=0.4)
Out[12]:
GlyphRenderer(
id = '5c601201-74bf-45aa-a27f-69f145c19dcd', …)

Next, we overplot the actual data points, using contrasting symbols for American and Japanese cars.

In [13]:
# Add Japanese cars as circles
fig.circle(x=np.asarray(japanese["yr"]), 
       y=np.asarray(japanese["mpg"]), 
       size=8, alpha=0.4, line_color="red", fill_color=None, line_width=2)

# Add American cars as triangles
fig.triangle(x=np.asarray(american["yr"]), 
         y=np.asarray(american["mpg"]),
         size=8, alpha=0.4, line_color="blue", fill_color=None, line_width=2)
Out[13]:
GlyphRenderer(
id = '2c770e3f-a44d-487b-bcf3-c09030cca8ab', …)

We can add axis labels by binding them to the axis_label attribute of each axis.

In [14]:
fig.xaxis.axis_label = 'Year'
fig.yaxis.axis_label = 'MPG'
In [15]:
bk.show(fig)

Linked Brushing

To link plots together at a data level, we can explicitly wrap the data in a ColumnDataSource. This allows us to reference columns by name.

In [16]:
variables = autompg.to_dict("list")
variables.update({'yr':autompg["yr"]})
source = bk.ColumnDataSource(variables)

The gridplot function takes a 2-dimensional list containing elements to be arranged in a grid on the same canvas.

In [17]:
plot_config = dict(plot_width=300, plot_height=300, tools="box_select,lasso_select,help")

left = bk.figure(title="MPG by Year", **plot_config)
left.circle("yr", "mpg", color="blue", source=source)

center = bk.figure(title="HP vs. Displacement", **plot_config)
center.circle("hp", "displ", color="green", source=source)

right = bk.figure(title="MPG vs. Displacement", **plot_config)
right.circle("mpg", "displ", size="cyl", line_color="red", source=source)
Out[17]:
GlyphRenderer(
id = '8ef03dbe-0b2b-4d4d-b291-3e10fe744ed0', …)

We can use the select tool to select points on one plot, and the linked points on the other plots will automagically highlight.

In [18]:
p = bk.gridplot([[left, center, right]])
bk.show(p)

Visualization of US unemployment rates

Our first example of an interactive chart involves generating a heat map of US unemployment rates by month and year. This plot will be made interactive by invoking a HoverTool that displays information as the pointer hovers over any cell within the plot.

First, we import the data with Pandas and manipulate it as needed.

In [19]:
import pandas as pd
from bokeh.models import HoverTool
from bokeh.sampledata.unemployment1948 import data
from collections import OrderedDict

data['Year'] = [str(x) for x in data['Year']]
years = list(data['Year'])
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
data = data.set_index('Year')

Specify a color map (where do we get color maps, you ask? -- Try Color Brewer)

In [20]:
colors = [
    "#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
    "#ddb7b1", "#cc7878", "#933b41", "#550b1d"
]

Set up the data for plotting. We will need to have values for every pair of year/month names. Map the rate to a color.

In [21]:
month = []
year = []
color = []
rate = []
for y in years:
    for m in months:
        month.append(m)
        year.append(y)
        monthly_rate = data[m][y]
        rate.append(monthly_rate)
        color.append(colors[min(int(monthly_rate)-2, 8)])

Create a ColumnDataSource with columns: month, year, color, rate

In [22]:
source = bk.ColumnDataSource(
    data=dict(
        month=month,
        year=year,
        color=color,
        rate=rate,
    )
)

Create a new figure.

In [23]:
fig = bk.figure(plot_width=900, plot_height=400, x_axis_location="above", tools="resize,hover",
     x_range=years, y_range=list(reversed(months)), title="US Unemployment (1948 - 2013)")

use the `rect renderer with the following attributes:

  • x_range is years, y_range is months (reversed)
  • fill color for the rectangles is the 'color' field
  • line_color for the rectangles is None
  • tools are resize and hover tools
  • add a nice title, and set the plot_width and plot_height
In [24]:
fig.rect('year', 'month', 0.95, 0.95, source=source,
     color='color', line_color=None)
Out[24]:
GlyphRenderer(
id = '298eb6ec-6c51-48eb-9816-b113cfb70e07', …)

Style the plot, including:

  • remove the axis and grid lines
  • remove the major ticks
  • make the tick labels smaller
  • set the x-axis orientation to vertical, or angled
In [25]:
fig.grid.grid_line_color = None
fig.axis.axis_line_color = None
fig.axis.major_tick_line_color = None
fig.axis.major_label_text_font_size = "5pt"
fig.axis.major_label_standoff = 0
fig.xaxis.major_label_orientation = np.pi/3

Configure the hover tool to display the month, year and rate

In [26]:
hover = HoverTool(
        tooltips=OrderedDict([
    ('date', '@month @year'),
    ('rate', '@rate'),
])
    )

Now we can display the plot. Try moving your pointer over different cells in the plot.

In [27]:
bk.show(fig)

Similarly, we can provide a geographic heatmap, here using data just from Texas.

In [28]:
from bokeh.sampledata import download
download()
Using data directory: /Users/fonnescj/.bokeh/data
Downloading: CGM.csv (1589982 bytes)
   1589982 [100.00%]
Downloading: US_Counties.zip (3182088 bytes)
   3182088 [100.00%]
Unpacking: US_Counties.csv
Downloading: us_cities.json (713565 bytes)
    713565 [100.00%]
Downloading: unemployment09.csv (253301 bytes)
    253301 [100.00%]
Downloading: AAPL.csv (166698 bytes)
    166698 [100.00%]
Downloading: FB.csv (9706 bytes)
      9706 [100.00%]
Downloading: GOOG.csv (113894 bytes)
    113894 [100.00%]
Downloading: IBM.csv (165625 bytes)
    165625 [100.00%]
Downloading: MSFT.csv (161614 bytes)
    161614 [100.00%]
Downloading: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.zip (5148539 bytes)
   5148539 [100.00%]
Unpacking: WPP2012_SA_DB03_POPULATION_QUINQUENNIAL.csv
Downloading: gapminder_fertility.csv (64346 bytes)
     64346 [100.00%]
Downloading: gapminder_population.csv (94509 bytes)
     94509 [100.00%]
Downloading: gapminder_life_expectancy.csv (73243 bytes)
     73243 [100.00%]
Downloading: gapminder_regions.csv (7781 bytes)
      7781 [100.00%]
Downloading: world_cities.zip (646858 bytes)
    646858 [100.00%]
Unpacking: world_cities.csv
Downloading: airports.json (6373 bytes)
      6373 [100.00%]
Downloading: movies.db.zip (5067833 bytes)
   5067833 [100.00%]
Unpacking: movies.db
In [29]:
from bokeh.sampledata import us_counties, unemployment
from collections import OrderedDict

# Longitude and latitude values for county boundaries
county_xs=[
    us_counties.data[code]['lons'] for code in us_counties.data
    if us_counties.data[code]['state'] == 'tx'
]
county_ys=[
    us_counties.data[code]['lats'] for code in us_counties.data
    if us_counties.data[code]['state'] == 'tx'
]

# Color palette from colorbrewer2.org
colors = ['#ffffd4','#fee391','#fec44f','#fe9929','#d95f0e','#993404']

# Assign colors based on unemployment
county_colors = []
for county_id in us_counties.data:
    if us_counties.data[county_id]['state'] != 'tx':
        continue
    try:
        rate = unemployment.data[county_id]
        idx = min(int(rate/2), 5)
        county_colors.append(colors[idx])
    except KeyError:
        county_colors.append("black")
        
fig = bk.figure(tools="pan,wheel_zoom,box_zoom,reset,hover,previewsave", title="Texas Unemployment 2009")

# Here are the polygons for plotting
fig.patches(county_xs, county_ys, fill_color=county_colors, fill_alpha=0.7, 
           line_color="white", line_width=0.5)

# Configure hover tool
hover = HoverTool(
        tooltips=OrderedDict([
    ("index", "$index"),
    ("(x,y)", "($x, $y)"),
    ("fill color", "$color[hex, swatch]:fill_color"),
])
    )


bk.show(fig)

High-level Plots

The examples so far have been relatively low-level, in that individual elements of plots need to be specified by hand. The bokeh.charts interface makes it easy to get up-and-running with a high-level API that tries to make smart layout and design decisions by default.

To use them, you simply import the chart type you need from bokeh.charts:

  • Bar
  • BoxPlot
  • HeatMap
  • Histogram
  • Scatter
  • Timeseries

To illustrate, let's create some random data and display it as histograms.

In [30]:
normal = np.random.standard_normal(1000)
student_t = np.random.standard_t(6, 1000)
distributions = pd.DataFrame({'Normal': normal, 'Student-T': student_t})
In [31]:
from bokeh.charts import Histogram
from bokeh.charts import hplot

hist = Histogram(distributions, bins=int(np.sqrt(len(normal))), notebook=True)

bk.show(hplot(hist))
/Users/fonnescj/anaconda3/envs/dev/lib/python3.6/site-packages/bokeh/util/deprecation.py:34: BokehDeprecationWarning: bokeh.io.hplot() was deprecated in Bokeh 0.12.0 and will be removed, use bokeh.models.layouts.Row instead.
  warn(message)