Plotly is a collaborative data analysis and graphing platform. Plotly's Scientific Graphing Libraries interface Plotly's online graphing tools with the following scientific computing languages:
You can think of Plotly as "Graphics as a Service". It generates interactive, publication-quality plots that can be embedded in the locaiton of your choice. You can style them locally with code or via the online interface; plots can be shared publicly or privately with a url, and your graphs are accessible from anywhere.
You can install Plotly on Python via pip:
pip install plotly
or in R via the devtools
library:
> install.packages("devtools")
> library("devtools")
> install_github("R-api", "plotly")
import plotly
In order to use Plotly, you need an account. You can sign up using the API, without visiting the website:
# You should replace these values with your own registration information
reg = plotly.signup("foo12345", "fake_email_address@vanderbilt.edu")
Thanks for signing up to plotly! Your username is: foo12345 Your temporary password is: la9di. You use this to log into your plotly account at https://plot.ly/plot. Your API key is: daia9p1ryi. You use this to access your plotly account through the API. To get started, initialize a plotly object with your username and api_key, e.g. >>> py = plotly.plotly('foo12345', 'daia9p1ryi') Then, make a graph! >>> res = py.plot([1,2,3],[4,2,1]) >>> print(res['url'])
The resulting dict contains information including your user name (un
) and an API key (api_key
).
reg
{u'api_key': u'daia9p1ryi', u'error': u'', u'message': u"Thanks for signing up to plotly!\n\nYour username is: foo12345\n\nYour temporary password is: la9di. You use this to log into your plotly account at https://plot.ly/plot.\n\nYour API key is: daia9p1ryi. You use this to access your plotly account through the API.\n\nTo get started, initialize a plotly object with your username and api_key, e.g. \n>>> py = plotly.plotly('foo12345', 'daia9p1ryi')\nThen, make a graph!\n>>> res = py.plot([1,2,3],[4,2,1])\n\n>>> print(res['url'])\n", u'tmp_pw': u'la9di', u'un': u'foo12345'}
Your login information can be used to generate a plotly
instance.
ply = plotly.plotly(username_or_email=reg['un'], key=reg['api_key'])
The easiest way to get started is to pass a dict including data and other plotting information to the iplot
method:
import numpy as np
data = {
'x': np.random.randn(1000),
'y': np.random.randn(1000),
"type": "scatter",
"name": "Random Numbers",
'mode': 'markers'
}
ply.iplot(data)
High five! You successfuly sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~foo12345/0 or inside your plot.ly account where it is named 'plot from API'
Since plotly uses lists and dicts to generate plots, one can employ Python idioms like list comprehensions to easily generate more substantial output.
import numpy as np
ply.iplot([{'y': np.random.randn(10), 'type':'box'} for i in range(20)],
layout={'showlegend':False})
Let's walk through a few simple plots. Many of these are taken from the Plot.ly API examples.
We first specify the dataset(s). In this case, we will have two series, each on its own y-axis.
data = [
{
"x":[1,2,3],
"y":[40,50,60],
"name":"yaxis data"
},
{
"x":[2,3,4],
"y":[4,5,6],
"yaxis":"y2",
"name": "yaxis2 data"
}
]
A separate dicationary takes care of layout specification:
layout = {
"yaxis":{
"title": "yaxis title",
},
"yaxis2":{
"title": "yaxis2 title",
"titlefont":{
"color":"rgb(148, 103, 189)"
},
"tickfont":{
"color":"rgb(148, 103, 189)"
},
"overlaying":"y",
"side":"right",
},
"title": "Double Y Axis Example",
}
ply.iplot(data, layout=layout)
We can add as many axes as we deem necessary:
data = [
{
"x":[1,2,3],
"y":[4,5,6],
"name":"yaxis1 data"
},
{
"x":[2,3,4],
"y":[40,50,60],
"name":"yaxis2 data",
"yaxis":"y2"
},
{
"x":[3,4,5],
"y":[400,500,600],
"name":"yaxis3 data",
"yaxis":"y3"
}
]
c = ['#1f77b4', # muted blue
'#ff7f0e', # safety orange
'#2ca02c'] # cooked asparagus green
layout = {
"width":800,
"xaxis":{
"domain":[0.3,0.7]
},
"yaxis":{
"title": "yaxis title",
"titlefont":{
"color":c[0]
},
"tickfont":{
"color":c[0]
},
},
"yaxis2":{
"overlaying":"y",
"side":"tight",
"anchor":"free",
"position":0.15,
"title": "yaxis2 title",
"titlefont":{
"color":c[1]
},
"tickfont":{
"color":c[1]
},
},
"yaxis3":{
"overlaying":"y",
"side":"left",
"anchor":"free",
"position":0,
"title": "yaxis3 title",
"titlefont":{
"color":c[2]
},
"tickfont":{
"color":c[2]
},
},
"title": "multiple y-axes example"
}
ply.iplot(data, layout=layout)
It is similarly straightforward to generate subplots. This is done by allocating a proportion of the x or y axis to the domain
of each subplot. Here is an example that illustrates the use of subplots and multiple y-axes to compare changes in vessel speeds according to a number of speed enforcement programs. In this case, we place a timeline of the enforcement programs on a smaller subplot above a larger subplot containing the estimates of ship behavior.
'''
Created January 14, 2014 by Chris Fonnesbeck
Licensed under a Creative Commons Attribution 4.0 International License.
'''
import pandas as pd
dates = pd.date_range(start='11/1/2008', end='7/31/2012', freq='M')
c = ['#fbb4ae',
'#b3cde3',
'#ccebc5',
'#decbe4']
band_width = 15
opacity = 0.8
# Management interventions in seasonal management areas (SMA)
SMA = [{'name': 'Intermittant, at-sea radio',
'x': pd.date_range(start='2/1/2009', end='5/1/2009', freq='M'),
'y': ['USCG']*100,
'mode':'lines',
'line':{'color': c[0], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity
},
{'name': 'Intermittant, at-sea radio',
'x': pd.date_range(start='1/1/2010', end='7/1/2010', freq='M'),
'y': ['USCG']*100,
'mode':'lines',
'line':{'color': c[0], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity
},
{'name': 'Intermittant, at-sea radio',
'x': pd.date_range(start='11/1/2010', end='6/1/2011', freq='M'),
'y': ['USCG']*100,
'mode':'lines',
'line':{'color': c[0], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity
},
{'name': 'Intermittant, at-sea radio',
'x': pd.date_range(start='1/1/2012', end='3/1/2012', freq='M'),
'y': ['USCG']*100,
'mode':'lines',
'line':{'color': c[0], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity
}
,{'name': 'Intermittant, letter',
'x': pd.date_range(start='10/1/2009', end='12/31/2009', freq='M'),
'y': ['COPPS']*100,
'mode':'lines',
'line':{'color': c[1], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity
},
{'name': 'Certified mail, ongoing litigation',
'x': pd.date_range(start='11/1/2010', end='8/1/2012', freq='M'),
'y': ['NOVA']*100,
'mode':'lines',
'line':{'color': c[2], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity
},
{'name': 'E-mail, monthly summaries',
'x': pd.date_range(start='12/1/2010', end='8/1/2012', freq='M'),
'y': ['WSC']*100,
'mode':'lines',
'line':{'color': c[3], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity
},
{'name': 'E-mail, monthly summaries',
'x': pd.date_range(start='2/1/2011', end='8/1/2012', freq='M'),
'y': ['CSA']*100,
'mode':'lines',
'line':{'color': c[3], 'width': band_width},
"yaxis": "y2",
'showlegend': False,
'opacity': opacity,
}
][::-1]
sma_dates = pd.date_range(start='2/1/2009', end='11/1/2012', freq='12M')
sds = 2
line_width = 2
line_color = "rgb(3,78,123)"
# Parameter estimates from model
estimates = [{'x': sma_dates,
'y': np.array([0, -0.04, 0.16, -0.67]),
'name': 'Passenger',
'showlegend': True,
"line":{"color": line_color,
"width": line_width,
"dash":"dashdot"},
'error_y': {'type': 'data',
'array': np.array([0, 0.08, 0.08, 0.08]) * sds,
'visible': True,
"color": line_color}},
{'x': sma_dates,
'y': np.array([0, -0.26, -0.92, -1.35]),
'name': 'Cargo',
'showlegend': True,
"line":{"color": line_color,
"width": line_width,
"dash":"dot"},
'error_y': {'type': 'data',
'array': np.array([0, 0.02, 0.02, 0.02]) * sds,
'visible': True,
"color": line_color}},
{'x': sma_dates,
'y': np.array([0, -0.02, -0.41, -0.62]),
'name': 'Tanker',
'showlegend': True,
"line":{"color": line_color,
"width": line_width,
"dash":"solid"},
'error_y': {'type': 'data',
'array': np.array([0, 0.03, 0.03, 0.03]) * sds,
'visible': True,
"color": line_color}},]
legendstyle = {"x" : 0,
"y" : 0,
"bgcolor" : "#F0F0F0",
"bordercolor" : "#FFFFFF",}
layout = {
"yaxis2":{'showgrid': False,
'zeroline': True,
'side': 'right',
"showticklabels" : True,
'domain': [2./3., 1],
'title': 'Program'
},
"yaxis":{"title": "Speed change from 1st season",
'showgrid': False,
'zeroline': True,
"zerolinecolor" : "#F0F0F0",
"zerolinewidth" : 4,
'mode': 'markers',
'domain': [0, 2./3.]
},
"xaxis": {'showgrid':False,
'zeroline':False,
'title': 'Date',
'range': [dates[0], dates[-1]]},
"title": "Vessel speed change in response to notification programs",
'showlegend': True,
"legend": legendstyle
}
ply.iplot(SMA + estimates, layout=layout)
We can use Pandas to manage larger datasets, and generate visualizations of it using Plotly. Here is a simple boxplot with data points overplotted.
titanic = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv")
titanic.head()
pclass | survived | name | sex | age | sibsp | parch | ticket | fare | cabin | embarked | boat | body | home.dest | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1 | 1 | Allen, Miss. Elisabeth Walton | female | 29.00 | 0 | 0 | 24160 | 211.3375 | B5 | S | 2 | NaN | St Louis, MO |
1 | 1 | 1 | Allison, Master. Hudson Trevor | male | 0.92 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | 11 | NaN | Montreal, PQ / Chesterville, ON |
2 | 1 | 0 | Allison, Miss. Helen Loraine | female | 2.00 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NaN | NaN | Montreal, PQ / Chesterville, ON |
3 | 1 | 0 | Allison, Mr. Hudson Joshua Creighton | male | 30.00 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NaN | 135 | Montreal, PQ / Chesterville, ON |
4 | 1 | 0 | Allison, Mrs. Hudson J C (Bessie Waldo Daniels) | female | 25.00 | 1 | 2 | 113781 | 151.5500 | C22 C26 | S | NaN | NaN | Montreal, PQ / Chesterville, ON |
5 rows × 14 columns
age_by_class = [{'y': data.values,
'name': pclass,
'type': 'box',
'boxpoints': 'all',
'jitter': 0.3} for pclass,data in list(titanic.groupby('pclass')['age'])]
layout = {'xaxis': {'showgrid':False,'zeroline':False,
'title': 'Passenger class'},
'yaxis': {'zeroline':False,'gridcolor':'white', 'title': 'Age'},
'plot_bgcolor': 'rgb(233,233,233)',
'showlegend':False}
ply.iplot(age_by_class, layout=layout)
Along with the Python API, Plotly includes API for several other languages suitable for statistical analysis, such as R and Julia. In the IPython notebook, we can run R code via the R magic command, which gives us access to R's rich library of modules without having to open a separate R session.
%load_ext rmagic
%%R -o hist_url
library(plotly)
py <- plotly('foo12345', key='daia9p1ryi')
# Sample data
samples <- rnorm(50)
# Normal density
x_norm <- seq(-5,5,length=100)
y_norm <- 1./sqrt(2*pi)*exp(-x_norm**2/2.)
# layout
l <- list(showlegend = FALSE,
xaxis = list(zeroline = FALSE),
yaxis = list(zeroline = FALSE))
# Histogram data
dataHistogram <- list(y = samples,
type = 'histogramy',
histnorm = 'probability density')
# Curve data
dataArea <- list(x = x_norm,
y = y_norm,
fill = 'tozeroy')
# Call plotly
response <- py$plotly(list(dataHistogram, dataArea), kwargs = list(layout = l))
# url and filename
hist_url <- response$url
Loading required package: RCurl Loading required package: bitops Loading required package: RJSONIO
To view the resulting plot in IPython notebook, we can use the URL returned by plotly and place it within an HTML object.
hist_url
array(['https://plot.ly/~foo12345/7'], dtype='|S27')
from IPython.display import HTML
HTML('<iframe src={0} width=700 height=500></iframe>'.format(hist_url[0]))
Running this example requires that PyMC 2.3 be installed.
In Bayesian modeling using Markov chain Monte Carlo (MCMC) methods, we make inferences based on samples drawn from the posterior distribution of unknown variables in our model. Therefore, it is useful to plot these samples to visually assess whether the model has converged.
PyMC includes an example dataset, which is a time series of recorded coal mining disasters in the UK from 1851 to 1962.
from pymc.examples import disaster_model
from pymc import MCMC, graph
M = MCMC(disaster_model)
ply.iplot({'y': disaster_model.disasters_array,
'x': range(1851, 1963),
"type": "scatter",
"mode": "lines",
"name": "UK coal mining disasters, 1851-1962"})
Occurrences of disasters in the time series is thought to be derived from a Poisson process with a large rate parameter in the early part of the time series, and from one with a smaller rate in the later part. We are interested in locating the change point in the series, which perhaps is related to changes in mining safety regulations.
nchains = 3
for i in range(nchains):
M.sample(5000, progress_bar=False)
Two simple visualizations is to look at the trace, essentially a time series of the samples, and the histogram of the samples.
trace = M.early_mean.trace()
data = [{'y': trace}]
data.append({'y': trace,
'xaxis': 'x2',
'yaxis': 'y2',
'type': 'histogramy'})
layout = {
"xaxis":{
"domain":[0,0.5],
"title": "Iteration"
},
"yaxis":{
"title": "Value"
},
"xaxis2":{
"domain":[0.55,1],
"title": "Value"
},
"yaxis2":{
"anchor":"x2",
"side": "right",
"title": "Frequency"
},
"showlegend": False,
"title": "Posterior samples of early Poisson mean"
}
ply.iplot(data, layout=layout, width=850,height=400)
This shows that the chain is relatively homogeneous (the mean and variance of the samples do not appear to change much over the course of sampling).
However, stronger evidence can be obtained by comparing multiple idependent samples, which is why we ran three chains above. We can use Plotly to compare samples from each of the chains.
nparams = len(M.stochastics)
mcmc_data = {'chain {0}'.format(i): {s.__name__:s.trace(chain=i)[:100] for s in M.stochastics} for i in range(nchains)}
attr = mcmc_data['chain 1'].keys()
colors = {'chain 0': 'rgb(31, 119, 180)',
'chain 1': 'rgb(255, 127, 14)',
'chain 2': 'rgb(44, 160, 44)'}
data = []
for i in range(nparams):
for j in range(nparams):
for chain in mcmc_data.keys():
data.append({"name": chain,
"x": mcmc_data[chain][attr[i]], "y": mcmc_data[chain][attr[j]],
"type":"scatter","mode":"markers",
'marker': {'color': colors[chain], 'opacity':0.2},
"xaxis": "x"+(str(i) if i!=0 else ''), "yaxis": "y"+(str(j) if j!=0 else '')})
padding = 0.04;
domains = [[i*padding + i*(1-3*padding)/nparams, i*padding + ((i+1)*(1-3*padding)/nparams)] for i in range(nparams)]
layout = {
"xaxis":{"domain":domains[0], "title":attr[0],
'zeroline':False,'showline':False,'ticks':'',
'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
"yaxis":{"domain":domains[0], "title":attr[0],
'zeroline':False,'showline':False,'ticks':'',
'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
"xaxis1":{"domain":domains[1], "title":attr[1],
'zeroline':False,'showline':False,'ticks':'',
'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
"yaxis1":{"domain":domains[1], "title":attr[1],
'zeroline':False,'showline':False,'ticks':'',
'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
"xaxis2":{"domain":domains[2], "title":attr[2],
'zeroline':False,'showline':False,'ticks':'',
'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
"yaxis2":{"domain":domains[2], "title":attr[2],
'zeroline':False,'showline':False,'ticks':'',
'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
"showlegend":False,
"title":"Posterior samples for coal mining disasters model",
"titlefont":{'color':'rgb(67,67,67)', 'size': 20}
}
ply.iplot(data,layout=layout)