You may need to install some libraries first for this notebook to run smoothly. Edit the commands below to the subset of libraries you need to install and execute it. Skip it if you have all of these.
!pip install six, pytz, dateutil, Flask, Redis
!pip install numpy, scipy, statsmodels, patsy, pandas, networkx
!pip install matplotlib, mpld3, seaborn, bokeh, rpy2
#git clone https://github.com/vispy/vispy.git
#cd vispy
#python setup.py install
Matplotlib provides a stateful scripting interface for generating graphics similar to MATLAB's syntax and appearance. Matplotlib renders to a "back end", which is usually a raster graphics canvas. The strength of this approach is that, once rendered, the data loads into a web page as an image and is therefore very fast. For ipython notebooks with lots of plots, this is the way to go because web browsers are optimized for displaying tons of images.
Can be used to:
Primary use case: Very nice "publication" quality plots with customized look, typefaces, and annotations, that can be exported to pdf.
Strengths: Very widely adopted, very flexible, tight integration with iPython notebook, great for making presentation quality graphs and figures.
Weaknesses: Uses MATLAB-inspired plotting syntax, renders things slowly without GPU acceleration, steep learning curve, have to specify/customize a lot of variables to make it look good, and the defaults are pretty terrible.
Matplotlib Example Gallery: http://matplotlib.org/gallery.html
%pylab inline
import numpy as np
import pandas as pd
matplotlib.rcParams['figure.figsize'] = 15, 5 #set default image size for this interactive session
matplotlib.rcParams.update({'font.size': 16, 'font.family': 'serif'}) #update the matplotlib configuration parameters
Populating the interactive namespace from numpy and matplotlib
x = linspace(0, 5, 10)
y = x ** 2
fig, ax1 = subplots()
ax1.plot(x, x**2, lw=2, color="blue", label="test")
ax1.set_ylabel(r"area $(m^2)$", fontsize=18, color="blue")
for label in ax1.get_yticklabels():
label.set_color("blue")
ax2 = ax1.twinx()
ax2.plot(x, x**3, lw=2, color="red", label="test")
ax2.set_ylabel(r"volume $(m^3)$", fontsize=18, color="red")
for label in ax2.get_yticklabels():
label.set_color("red")
n = array([0,1,2,3,4,5])
xx = np.linspace(-0.75, 1., 100)
fig, axes = subplots(1, 4, figsize=(15,5))
axes[0].scatter(xx, xx + 0.25*randn(len(xx)), label="scatter")
axes[1].step(n, n**2, lw=2, label="step")
axes[2].bar(n, n**2, align="center", width=0.5, alpha=0.5, label="bar")
axes[3].fill_between(x, x**2, x**3, color="green", alpha=0.5);
axes[0].legend(loc=2); # upper left corner
axes[1].legend(loc=2);
axes[2].legend(loc=2);
IPython has recently introduced interactive widgets that tie together Python code running in the kernel and JavaScript/HTML/CSS running in the browser. Just like manipulate command in Mathematica, it works by repeatedly calling the function that renders the graphics with the new variable values, generating a new image on each iteration. This can be slow.
from IPython.html.widgets import interact
import networkx as nx
matplotlib.rcParams['figure.figsize'] = 5, 5
# wrap a few graph generation functions so they have the same signature
def random_lobster(n, m, k, p):
return nx.random_lobster(n, p, p / m)
def powerlaw_cluster(n, m, k, p):
return nx.powerlaw_cluster_graph(n, m, p)
def erdos_renyi(n, m, k, p):
return nx.erdos_renyi_graph(n, p)
def newman_watts_strogatz(n, m, k, p):
return nx.newman_watts_strogatz_graph(n, k, p)
def plot_random_graph(n, m, k, p, generator):
g = generator(n, m, k, p)
nx.draw(g)
plt.show()
interact(plot_random_graph, n=(2,30), m=(1,10), k=(1,10), p=(0.0, 1.0, 0.001),
generator={'lobster': random_lobster,
'power law': powerlaw_cluster,
'Newman-Watts-Strogatz': newman_watts_strogatz,
u'Erdős-Rényi': erdos_renyi,
});
To save publication quality figures, we use the pdf backend to matplotlib, and generate our figure as a vector graphic rather than the raster graphic image. We can call pdf.savefig() multiple times and it will save multiple pages to the pdf filename specified.
from matplotlib.backends.backend_pdf import PdfPages
with PdfPages('polarplot.pdf') as pdf:
# polar plot using add_axes and polar projection
fig = plt.figure(figsize=(5,5))
ax = fig.add_axes([0.0, 0.0, .6, .6], polar=True)
t = linspace(0, 2 * pi, 100)
ax.plot(t, t, color='blue', lw=3);
pdf.savefig(fig)
Seaborn is a library for making attractive and informative statistical graphics in Python. It is built on top of matplotlib and tightly integrated with the PyData stack, including numpy, pandas data structures, and statistical routines from scipy and statsmodels. It offers nice color palettes and simple built-in defaults for statistical plots, so not much tweaking is needed compared to matplotlib to get a sophisticated result.
Strengths: well-suited to statistical plots (like those in R). Can be used to "freshen up" the look of matplotlib plots. Runs on top of matplotlib, so you get all of that functionality as well, including exporting to vector graphic pdfs.
Weaknesses: Relatively new codebase, API is still emerging / evolving. Has most of the same limitations as matplotlib in terms of rendering speed.
import seaborn as sns
from scipy import stats, optimize
#Diverging: useful when data has natural, meaningful break-point
#like correlation values which are spread around zero
sns.palplot(sns.color_palette("coolwarm", 7))
#Qualitative: useful when data is categorical
sns.palplot(sns.color_palette("Set2", 10))
Sequential: useful when data range from 'low' to 'high' values The cubehelix color palette system makes sequential palettes with a linear increase or decrease in brightness and some variation in hue. This means that the information in your colormap will be preserved when converted to black and white (for printing) or when viewed by a colorblind individual. Matplotlib has the default cubehelix version built into it:
sns.palplot(sns.color_palette("cubehelix", 8))
Seaborn adds an interface to the cubehelix system so that you can make a variety of palettes that all have a well-behaved linear brightness ramp. The default palette returned by the seaborn cubehelix_palette() function is a bit different from the matplotlib default in that it does not rotate as far around the hue wheel or cover as wide a range of intensities. It also reverses the order so that more important values are darker:
sns.palplot(sns.cubehelix_palette(8, start=2))
One of my favorite functions within Seaborn is a joint distribution plot that allows visualization of a bivariate distribution and its marginals.
This is similar to a histogram, except instead of coding the number of observations in each bin with a position on one of the axes, it uses a color-mapping to give the plot three quantitative dimensions.
x = stats.gamma(3).rvs(5000)
y = stats.gamma(5).rvs(5000)
with sns.axes_style("white"):
sns.jointplot(x, y, kind="hex", color="#4CB391"); #change kind="reg" to show regression line
Another great statistical plot, the violin plot conveys the same information as a boxplot, showing the median, and 25th and 75th percentiles, adding the shape of the distribution. And we can add the individual observations as well in two different ways:
d1 = stats.norm(0, 5).rvs(100)
d2 = np.concatenate([stats.gamma(4).rvs(50),
-1 * stats.gamma(4).rvs(50)])
data = pd.DataFrame(dict(d1=d1, d2=d2))
data = pd.melt(data.ix[:50], value_name="y", var_name="group")
f, (ax_l, ax_r) = plt.subplots(1, 2)
sns.violinplot(data.y, data.group, "points", positions=[1, 2], color="RdBu", ax=ax_l)
sns.violinplot(data.y, data.group, "stick", positions=[3, 4], color="PRGn", ax=ax_r)
plt.tight_layout()
Seaborn switches many graphics defaults that will affect other graphics from libraries based on matplotlib, like mpld3 below. We need to reset the defaults with the following command.
sns.reset_orig
mpld3 is a package allowing seamless visualization of matplotlib plots using D3js javascript renderer. D3js is a popular Javascript library for interactive data visualizations for the web. This means you can use the same syntax as within matplotlib as well as add custom javascript plugins for interactivity and then view your graphics within the browser, a webpage or IPython. Since your figure is now an HTML Canvas object, you can benefit from GPU acceleration.
Figures can be saved to file as stand-alone HTML format (save_html()), or as JSON format (save_json() note that custom plugins which are not built into mpld3 will not be part of the JSON serialization).
Strengths: familiar matplotlib syntax, instantly turn any matplotlib graphic into an HTML Canvas object and add interactivity, can have GPU acceleration
Weaknesses: need familiarity with Javascript to add most interactivity, can only export graphics for the web
import mpld3
from mpld3 import plugins, utils
For example, here is the built-in Linked Brushing plugin that allows exploration of multi-dimensional datasets. Selecting points with the brush lets you quickly explore the relationships between the points in many different 2D projections.
fig, ax = plt.subplots(3, 3, figsize=(6, 6))
fig.subplots_adjust(hspace=0.1, wspace=0.1)
ax = ax[::-1]
X = np.random.normal(size=(3, 100))
for i in range(3):
for j in range(3):
ax[i, j].xaxis.set_major_formatter(plt.NullFormatter())
ax[i, j].yaxis.set_major_formatter(plt.NullFormatter())
points = ax[i, j].scatter(X[j], X[i])
plugins.connect(fig, plugins.LinkedBrush(points))
mpld3.display()
My next example is borrowed from the Pythonic Perambulations blog. Here we can hover over points in our scatterplot to see the associated sinusoid above. This is clearly a bit more complicated, requiring knowledge of Javascript to create a custom plugin for the interactivity.
Use the toolbar buttons at the bottom-right of the plot to enable zooming and panning, and to reset the view.
class LinkedView(plugins.PluginBase):
"""A simple plugin showing how multiple axes can be linked"""
JAVASCRIPT = """
mpld3.register_plugin("linkedview", LinkedViewPlugin);
LinkedViewPlugin.prototype = Object.create(mpld3.Plugin.prototype);
LinkedViewPlugin.prototype.constructor = LinkedViewPlugin;
LinkedViewPlugin.prototype.requiredProps = ["idpts", "idline", "data"];
LinkedViewPlugin.prototype.defaultProps = {}
function LinkedViewPlugin(fig, props){
mpld3.Plugin.call(this, fig, props);
};
LinkedViewPlugin.prototype.draw = function(){
var pts = mpld3.get_element(this.props.idpts);
var line = mpld3.get_element(this.props.idline);
var data = this.props.data;
function mouseover(d, i){
line.data = data[i];
line.elements().transition()
.attr("d", line.datafunc(line.data))
.style("stroke", this.style.fill);
}
pts.elements().on("mouseover", mouseover);
};
"""
def __init__(self, points, line, linedata):
if isinstance(points, matplotlib.lines.Line2D):
suffix = "pts"
else:
suffix = None
self.dict_ = {"type": "linkedview",
"idpts": utils.get_id(points, suffix),
"idline": utils.get_id(line),
"data": linedata}
fig, ax = plt.subplots(2)
# scatter periods and amplitudes
np.random.seed(0)
P = 0.2 + np.random.random(size=20)
A = np.random.random(size=20)
x = np.linspace(0, 10, 100)
data = np.array([[x, Ai * np.sin(x / Pi)]
for (Ai, Pi) in zip(A, P)])
points = ax[1].scatter(P, A, c=P + A,
s=200, alpha=0.5)
ax[1].set_xlabel('Period')
ax[1].set_ylabel('Amplitude')
# create the line object
lines = ax[0].plot(x, 0 * x, '-w', lw=3, alpha=0.5)
ax[0].set_ylim(-1, 1)
ax[0].set_title("Hover over points to see lines")
# transpose line data and add plugin
linedata = data.transpose(0, 2, 1).tolist()
plugins.connect(fig, LinkedView(points, lines[0], linedata))
mpld3.display()
Bokeh is an interactive web visualization library for Python with a modern "grammar for graphics". It provides d3-like html canvas graphics for large or streaming datasets, all without requiring any knowledge of Javascript. Bokeh makes it really fun and easy to interactively explore your data.
Bokeh renders vector graphics directives to an intermediate representation that it sends over a communications socket to the javascript interpreter in your web browser. Then, there is a javascript layer (called BokehJS) that unserializes the JSON and draws it into the HTML5 canvas.
Strengths: uses a modern "visual grammar" for programming graphics, uses the HTML5 canvas object and thus has GPU acceleration in modern browsers, plots can be interactive and easier to explore.
Weaknesses: API and examples are still evolving
from __future__ import division
from collections import OrderedDict
from six.moves import zip
from bokeh.plotting import *
from bokeh.objects import Range1d, ColumnDataSource, HoverTool
from bokeh.sampledata.unemployment1948 import data
output_notebook()
First import the bokeh.plotting module, which defines the graphical functions and primitives. Next tell Bokeh to display its plots directly to the notebook. This causes all of the Javascript and data to be embedded directly into the HTML of the notebook itself (or output straight to HTML files, or use a server).
Use Bokeh's circle() function to render a glyph at each of the points in x and y. We can immediately interact with the plot, click-and-drag to pan, shift + mousewheel to zoom. Tool bar is default, but can configure dynamically via 'tools' keyword arg.
# Lets plot 4000 circles, you can play around with this if you like
N = 4000
# Create a bunch of random points, radii and colors for plotting
x = np.random.random(size=N) * 100
y = np.random.random(size=N) * 100
radii = np.random.random(size=N) * 1.5
colors = ["#%02x%02x%02x" % (r, g, 150) for r, g in zip(np.floor(50+2*x), np.floor(30+2*y))]
figure()
hold()
circle(x, y, radius=radii, fill_color=colors, fill_alpha=0.6, line_color=None, title="Colorful Scatter")
show()
# Read in the data with pandas. Convert the year column to string
data['Year'] = [str(x) for x in data['Year']]
years = list(data['Year'])
months = ["Jan","Feb","Mar","Apr","May","Jun","Jul","Aug","Sep","Oct","Nov","Dec"]
data = data.set_index('Year')
colors = ["#75968f", "#a5bab7", "#c9d9d3", "#e2e2e2", "#dfccce", "#ddb7b1", "#cc7878", "#933b41", "#550b1d"]
# need to have values for every pair of year/month names, map the rate to a color
month = []
year = []
color = []
rate = []
for y in years:
for m in months:
month.append(m)
year.append(y)
monthly_rate = data[m][y]
rate.append(monthly_rate)
color.append(colors[min(int(monthly_rate)-2, 8)])
# create a `ColumnDataSource` with columns: month, year, color, rate
source = ColumnDataSource(
data=dict(
month=month,
year=year,
color=color,
rate=rate,
)
)
This example uses the built in hover tool from bokeh. Bokeh understands the semantics of pandas dataframes, easily showing the data in the tooltip, and you don't need to write a complicated javascript plugin.
figure() # x_range is years, y_range is months (reversed)
rect('year', 'month', 0.95, 0.95, source=source,
x_range=years, y_range=list(reversed(months)),
color='color', line_color=None,
tools="resize,hover", title="US Unemployment (1948 - 2013)",
plot_width=900, plot_height=400)
# remove axis and grid lines, remove major ticks, make tick labels smaller, set x-axis orientation to angled
grid().grid_line_color = None
axis().axis_line_color = None
axis().major_tick_line_color = None
axis().major_label_text_font_size = "5pt"
axis().major_label_standoff = 0
xaxis().major_label_orientation = np.pi/3
#configure the hover tool to display the month, year and rate
hover = [t for t in curplot().tools if isinstance(t, HoverTool)][0]
hover.tooltips = OrderedDict([
('date', '@month @year'),
('rate', '@rate'),
])
show()
import rpy2
%load_ext rpy2.ipython
##### A random sample from a few normal distributions #####
#%R install.packages("bcp")
%R library(bcp)
%R testdata <- c(rnorm(50), rnorm(50, 5, 1), rnorm(50))
%R bcp.0 <- bcp(testdata)
%R plot.bcp(bcp.0)
%R legacyplot(bcp.0)
Loading required package: foreach foreach: simple, scalable parallel programming from Revolution Analytics Use Revolution R for scalability, fault tolerance and more. http://www.revolutionanalytics.com Loading required package: iterators Loading required package: grid Loading required package: Rcpp
<ListVector - Python:0x10dea4e18 / R:0x7fe4e3df3480> [ndarray, ndarray, ndarray] mfrow: <type 'numpy.ndarray'> array([2, 1], dtype=int32) col.lab: <type 'numpy.ndarray'> array(['black'], dtype='|S5') col.main: <type 'numpy.ndarray'> array(['black'], dtype='|S5')
Can use ggplot within R within IPython as above, alternatively can use the ggplot stylesheet within matplotlib (see matplotlib example gallery).
There are many openGL based libraries for Python including visvis, galry, glumpy, and PyQtGraph. The developers from those projects all recently got together and started a single "master" python OpenGL visualization library. Vispy is under heavy development at this time and has limited examples and built-in plotting tools. Notably, Vispy now ships a very basic, experimental OpenGL backend for matplotlib. VisPy targets two categories of users:
Users knowing OpenGL who want to create beautiful, fast, interactive 2D/3D visualizations in Python
Scientists without any knowledge of OpenGL, who are seeking a high-level, high-performance plotting toolkit, because OpenGL utilizes the most efficient GPU acceleration.
Vispy takes vector graphics directives or raw OpenGL commands and renders them in an OpenGL context. This context can either be a local window running natively, or they are starting work on an approach similar to Bokeh to plot things interactively in the web browser using WebGL.
Strengths: Fast interactive rendering of many points (100k - millions) using OpenGL. Great for understanding time dynamics in data, since data can animate in real time.
Weaknesses: new (but promising) framework born out of four other efforts at OpenGL plotting in Python. Rapidly evolving API. Buggy.
import time, sys
from timeit import default_timer
import numpy as np
from vispy import app, use, gloo, scene
from vispy.util.transforms import perspective, translate, rotate, ortho
from vispy.geometry import create_cube
from vispy.io import load_data_file
from vispy.gloo import (Program, VertexBuffer, IndexBuffer, Texture2D, clear,
FrameBuffer, DepthBuffer, set_viewport, set_state)
#use('ipynb_static')
use('ipynb_vnc')
NOTE: this backend requires the Chromium browser
render_vertex = """
attribute vec2 position;
attribute vec2 texcoord;
varying vec2 v_texcoord;
void main()
{
gl_Position = vec4(position, 0.0, 1.0);
v_texcoord = texcoord;
}
"""
render_fragment = """
uniform int pingpong;
uniform sampler2D texture;
varying vec2 v_texcoord;
void main()
{
float v;
v = texture2D(texture, v_texcoord)[pingpong];
gl_FragColor = vec4(1.0-v, 1.0-v, 1.0-v, 1.0);
}
"""
compute_vertex = """
attribute vec2 position;
attribute vec2 texcoord;
varying vec2 v_texcoord;
void main()
{
gl_Position = vec4(position, 0.0, 1.0);
v_texcoord = texcoord;
}
"""
compute_fragment = """
uniform int pingpong;
uniform sampler2D texture;
uniform float dx; // horizontal distance between texels
uniform float dy; // vertical distance between texels
varying vec2 v_texcoord;
void main(void)
{
vec2 p = v_texcoord;
float old_state, new_state, count;
old_state = texture2D(texture, p)[pingpong];
count = texture2D(texture, p + vec2(-dx,-dy))[pingpong]
+ texture2D(texture, p + vec2( dx,-dy))[pingpong]
+ texture2D(texture, p + vec2(-dx, dy))[pingpong]
+ texture2D(texture, p + vec2( dx, dy))[pingpong]
+ texture2D(texture, p + vec2(-dx, 0.0))[pingpong]
+ texture2D(texture, p + vec2( dx, 0.0))[pingpong]
+ texture2D(texture, p + vec2(0.0,-dy))[pingpong]
+ texture2D(texture, p + vec2(0.0, dy))[pingpong];
new_state = old_state;
if( old_state > 0.5 ) {
// Any live cell with fewer than two live neighbours dies
// as if caused by under-population.
if( count < 1.5 )
new_state = 0.0;
// Any live cell with two or three live neighbours
// lives on to the next generation.
// Any live cell with more than three live neighbours dies,
// as if by overcrowding.
else if( count > 3.5 )
new_state = 0.0;
} else {
// Any dead cell with exactly three live neighbours becomes
// a live cell, as if by reproduction.
if( (count > 2.5) && (count < 3.5) )
new_state = 1.0;
}
if( pingpong == 0) {
gl_FragColor[1] = new_state;
gl_FragColor[0] = old_state;
} else {
gl_FragColor[1] = old_state;
gl_FragColor[0] = new_state;
}
}
"""
class Canvas(app.Canvas):
def __init__(self):
app.Canvas.__init__(self, title="Conway game of life",
size=(512, 512), keys='interactive')
self._timer = app.Timer('auto', connect=self.update, start=True)
def on_initialize(self, event):
# Build programs
# --------------
self.comp_size = (512, 512)
size = self.comp_size + (4,)
Z = np.zeros(size, dtype=np.float32)
Z[...] = np.random.randint(0, 2, size)
Z[:256, :256, :] = 0
gun = """
........................O...........
......................O.O...........
............OO......OO............OO
...........O...O....OO............OO
OO........O.....O...OO..............
OO........O...O.OO....O.O...........
..........O.....O.......O...........
...........O...O....................
............OO......................"""
x, y = 0, 0
for i in range(len(gun)):
if gun[i] == '\n':
y += 1
x = 0
elif gun[i] == 'O':
Z[y, x] = 1
x += 1
self.pingpong = 1
self.compute = Program(compute_vertex, compute_fragment, 4)
self.compute["texture"] = Z
self.compute["position"] = [(-1, -1), (-1, +1), (+1, -1), (+1, +1)]
self.compute["texcoord"] = [(0, 0), (0, 1), (1, 0), (1, 1)]
self.compute['dx'] = 1.0 / size[1]
self.compute['dy'] = 1.0 / size[0]
self.compute['pingpong'] = self.pingpong
self.render = Program(render_vertex, render_fragment, 4)
self.render["position"] = [(-1, -1), (-1, +1), (+1, -1), (+1, +1)]
self.render["texcoord"] = [(0, 0), (0, 1), (1, 0), (1, 1)]
self.render["texture"] = self.compute["texture"]
self.render['pingpong'] = self.pingpong
self.fbo = FrameBuffer(self.compute["texture"],
DepthBuffer(self.comp_size))
set_state(depth_test=False, clear_color='black')
def on_draw(self, event):
with self.fbo:
set_viewport(0, 0, *self.comp_size)
self.compute["texture"].interpolation = 'nearest'
self.compute.draw('triangle_strip')
clear()
set_viewport(0, 0, *self.size)
self.render["texture"].interpolation = 'linear'
self.render.draw('triangle_strip')
self.pingpong = 1 - self.pingpong
self.compute["pingpong"] = self.pingpong
self.render["pingpong"] = self.pingpong
def on_reshape(self, event):
set_viewport(0, 0, *event.size)
canvas = Canvas()
canvas.show()
canvas.close()