Bokeh is a Python package for creating interactive, browser-based visualizations, and is well-suited for "big data" applications.
Bokeh allows users to create interactive html visualizations without using JS.
Bokeh is a language-based visualization system. This allows for:
Bokeh philosophy:
Make a smart choice when it is possible to do so automatically, and expose low-level capabilities when it is not.
Bokeh writes to a custom-built HTML5 Canvas library, which affords it high performance. This allows it to integrate with other web tools, such as Google Maps.
Bokeh plots are based on visual elements called glyphs that are bound to data objects.
Bokeh can be installed easily either via pip
or conda
(if using Anaconda):
pip install bokeh
conda install bokeh
First we'll import the bokeh.plotting module, which defines the graphical functions and primitives.
import bokeh.plotting as bk
Next, we'll tell Bokeh to display its plots directly into the notebook. This will cause all of the Javascript and data to be embedded directly into the HTML of the notebook itself. (Bokeh can output straight to HTML files, or use a server, which we'll look at later.)
bk.output_notebook()
Next, we'll import NumPy and create some simple data.
import numpy as np
x = np.linspace(-6, 6, 100)
y = np.random.normal(0.3*x, 1)
Now we'll call Bokeh's circle()
function to render a red circle at
each of the points in x and y.
We can immediately interact with the plot:
fig = bk.figure(plot_width=500, plot_height=500)
fig.circle(x, y, color="red")
bk.show(fig)
Let's try plotting multiple series on the same axes.
First, we generate some data from an exponential distribution with mean $\theta=1$.
from scipy.stats import expon
theta = 1
measured = np.random.exponential(theta, 1000)
hist, edges = np.histogram(measured, density=True, bins=50)
Next, create our figure, which is not displayed until we ask Bokeh to do so explicitly. We will customize the intractive toolbar, as well as customize the background color.
fig = bk.figure(title="Exponential Distribution (θ=1)",tools="previewsave",
background_fill="#E8DDCB")
The quad glyph displays axis-aligned rectangles with the given attributes.
fig.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:], fill_color="#036564", line_color="#033649")
Next, add lines showing the form of the probability distribution function (PDF) and cumulative distribution function (CDF).
x = np.linspace(0, 10, 1000)
fig.line(x, expon.pdf(x, scale=1), line_color="#D95B43", line_width=8, alpha=0.7, legend="PDF")
fig.line(x, expon.cdf(x, scale=1), line_color="white", line_width=2, alpha=0.7, legend="CDF")
Finally, add a legend before releasing the hold and displaying the complete plot.
fig.legend.location = "top_right"
bk.show(fig)
Bokeh's core display model relies on composing graphical primitives which are bound to data series. A more sophisticated example demonstrates this idea.
Bokeh ships with a small set of interesting "sample data" in the bokeh.sampledata
package. We'll load up some historical automobile fuel efficiency data, which is returned as a Pandas DataFrame
.
from bokeh.sampledata.autompg import autompg
We first need to reshape the data, by grouping it according to the year of the car, and then by the country of origin (here, USA or Japan).
grouped = autompg.groupby("yr")
mpg = grouped["mpg"]
mpg_avg = mpg.mean()
mpg_std = mpg.std()
years = np.asarray(list(grouped.groups.keys()))
american = autompg[autompg["origin"]==1]
japanese = autompg[autompg["origin"]==3]
american.head(10)
For each year, we want to plot the distribution of MPG within that year. As a guide, we will include a box that represents the mean efficiency, plus and minus one standard deviation. We will make these boxes partly transparent.
fig = bk.figure(title='Automobile mileage by year and country')
fig.quad(left=years-0.4, right=years+0.4, bottom=mpg_avg-mpg_std, top=mpg_avg+mpg_std, fill_alpha=0.4)
Next, we overplot the actual data points, using contrasting symbols for American and Japanese cars.
# Add Japanese cars as circles
fig.circle(x=np.asarray(japanese["yr"]),
y=np.asarray(japanese["mpg"]),
size=8, alpha=0.4, line_color="red", fill_color=None, line_width=2)
# Add American cars as triangles
fig.triangle(x=np.asarray(american["yr"]),
y=np.asarray(american["mpg"]),
size=8, alpha=0.4, line_color="blue", fill_color=None, line_width=2)
We can add axis labels by binding them to the axis_label
attribute of each axis.
fig.xaxis.axis_label = 'Year'
fig.yaxis.axis_label = 'MPG'
bk.show(fig)
To link plots together at a data level, we can explicitly wrap the data in a ColumnDataSource. This allows us to reference columns by name.
variables = autompg.to_dict("list")
variables.update({'yr':autompg["yr"]})
source = bk.ColumnDataSource(variables)
The gridplot
function takes a 2-dimensional list containing elements to be arranged in a grid on the same canvas.
plot_config = dict(plot_width=300, plot_height=300, tools="box_select,lasso_select,help")
left = bk.figure(title="MPG by Year", **plot_config)
left.circle("yr", "mpg", color="blue", source=source)
center = bk.figure(title="HP vs. Displacement", **plot_config)
center.circle("hp", "displ", color="green", source=source)
right = bk.figure(title="MPG vs. Displacement", **plot_config)
right.circle("mpg", "displ", size="cyl", line_color="red", source=source)
We can use the select
tool to select points on one plot, and the linked points on the other plots will automagically highlight.
p = bk.gridplot([[left, center, right]])
bk.show(p)
Our first example of an interactive chart involves generating a heat map of US unemployment rates by month and year. This plot will be made interactive by invoking a HoverTool
that displays information as the pointer hovers over any cell within the plot.
First, we import the data with Pandas and manipulate it as needed.
import pandas as pd
from bokeh.models import HoverTool
from bokeh.sampledata.unemployment1948 import data
from collections import OrderedDict
data['Year'] = [str(x) for x in data['Year']]
years = list(data['Year'])
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
data = data.set_index('Year')
Specify a color map (where do we get color maps, you ask? -- Try Color Brewer)
colors = [
"#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce",
"#ddb7b1", "#cc7878", "#933b41", "#550b1d"
]
Set up the data for plotting. We will need to have values for every pair of year/month names. Map the rate to a color.
month = []
year = []
color = []
rate = []
for y in years:
for m in months:
month.append(m)
year.append(y)
monthly_rate = data[m][y]
rate.append(monthly_rate)
color.append(colors[min(int(monthly_rate)-2, 8)])
Create a ColumnDataSource
with columns: month, year, color, rate
source = bk.ColumnDataSource(
data=dict(
month=month,
year=year,
color=color,
rate=rate,
)
)
Create a new figure.
fig = bk.figure(plot_width=900, plot_height=400, x_axis_location="above", tools="resize,hover",
x_range=years, y_range=list(reversed(months)), title="US Unemployment (1948 - 2013)")
use the `rect renderer with the following attributes:
x_range
is years, y_range
is months (reversed)line_color
for the rectangles is None
plot_width
and plot_height
fig.rect('year', 'month', 0.95, 0.95, source=source,
color='color', line_color=None)
Style the plot, including:
fig.grid.grid_line_color = None
fig.axis.axis_line_color = None
fig.axis.major_tick_line_color = None
fig.axis.major_label_text_font_size = "5pt"
fig.axis.major_label_standoff = 0
fig.xaxis.major_label_orientation = np.pi/3
Configure the hover tool to display the month, year and rate
hover = HoverTool(
tooltips=OrderedDict([
('date', '@month @year'),
('rate', '@rate'),
])
)
Now we can display the plot. Try moving your pointer over different cells in the plot.
bk.show(fig)
Similarly, we can provide a geographic heatmap, here using data just from Texas.
from bokeh.sampledata import download
download()
from bokeh.sampledata import us_counties, unemployment
from collections import OrderedDict
# Longitude and latitude values for county boundaries
county_xs=[
us_counties.data[code]['lons'] for code in us_counties.data
if us_counties.data[code]['state'] == 'tx'
]
county_ys=[
us_counties.data[code]['lats'] for code in us_counties.data
if us_counties.data[code]['state'] == 'tx'
]
# Color palette from colorbrewer2.org
colors = ['#ffffd4','#fee391','#fec44f','#fe9929','#d95f0e','#993404']
# Assign colors based on unemployment
county_colors = []
for county_id in us_counties.data:
if us_counties.data[county_id]['state'] != 'tx':
continue
try:
rate = unemployment.data[county_id]
idx = min(int(rate/2), 5)
county_colors.append(colors[idx])
except KeyError:
county_colors.append("black")
fig = bk.figure(tools="pan,wheel_zoom,box_zoom,reset,hover,previewsave", title="Texas Unemployment 2009")
# Here are the polygons for plotting
fig.patches(county_xs, county_ys, fill_color=county_colors, fill_alpha=0.7,
line_color="white", line_width=0.5)
# Configure hover tool
hover = HoverTool(
tooltips=OrderedDict([
("index", "$index"),
("(x,y)", "($x, $y)"),
("fill color", "$color[hex, swatch]:fill_color"),
])
)
bk.show(fig)
The examples so far have been relatively low-level, in that individual elements of plots need to be specified by hand. The bokeh.charts
interface makes it easy to get up-and-running with a high-level API that tries to make smart layout and design decisions by default.
To use them, you simply import the chart type you need from bokeh.charts
:
Bar
BoxPlot
HeatMap
Histogram
Scatter
Timeseries
To illustrate, let's create some random data and display it as histograms.
normal = np.random.standard_normal(1000)
student_t = np.random.standard_t(6, 1000)
distributions = pd.DataFrame({'Normal': normal, 'Student-T': student_t})
from bokeh.charts import Histogram
from bokeh.charts import hplot
hist = Histogram(distributions, bins=int(np.sqrt(len(normal))), notebook=True)
bk.show(hplot(hist))