Using Plotly for Interactive and Collaborative Data Visualization

Plotly is a collaborative data analysis and graphing platform. Plotly's Scientific Graphing Libraries interface Plotly's online graphing tools with the following scientific computing languages:

  • Python
  • R
  • Matlab
  • Julia

You can think of Plotly as "Graphics as a Service". It generates interactive, publication-quality plots that can be embedded in the locaiton of your choice. You can style them locally with code or via the online interface; plots can be shared publicly or privately with a url, and your graphs are accessible from anywhere.

You can install Plotly on Python via pip:

pip install plotly

or in R via the devtools library:

> install.packages("devtools")
> library("devtools")
> install_github("R-api", "plotly")
In [1]:
import plotly

In order to use Plotly, you need an account. You can sign up using the API, without visiting the website:

In [2]:
# You should replace these values with your own registration information
reg = plotly.signup("foo12345", "[email protected]")
Thanks for signing up to plotly!

Your username is: foo12345

Your temporary password is: la9di. You use this to log into your plotly account at https://plot.ly/plot.

Your API key is: daia9p1ryi. You use this to access your plotly account through the API.

To get started, initialize a plotly object with your username and api_key, e.g. 
>>> py = plotly.plotly('foo12345', 'daia9p1ryi')
Then, make a graph!
>>> res = py.plot([1,2,3],[4,2,1])

>>> print(res['url'])

The resulting dict contains information including your user name (un) and an API key (api_key).

In [3]:
reg
Out[3]:
{u'api_key': u'daia9p1ryi',
 u'error': u'',
 u'message': u"Thanks for signing up to plotly!\n\nYour username is: foo12345\n\nYour temporary password is: la9di. You use this to log into your plotly account at https://plot.ly/plot.\n\nYour API key is: daia9p1ryi. You use this to access your plotly account through the API.\n\nTo get started, initialize a plotly object with your username and api_key, e.g. \n>>> py = plotly.plotly('foo12345', 'daia9p1ryi')\nThen, make a graph!\n>>> res = py.plot([1,2,3],[4,2,1])\n\n>>> print(res['url'])\n",
 u'tmp_pw': u'la9di',
 u'un': u'foo12345'}

Your login information can be used to generate a plotly instance.

In [5]:
ply = plotly.plotly(username_or_email=reg['un'], key=reg['api_key'])

The easiest way to get started is to pass a dict including data and other plotting information to the iplot method:

In [6]:
import numpy as np

data = {
    'x': np.random.randn(1000), 
    'y': np.random.randn(1000),
    "type": "scatter",
    "name": "Random Numbers",
    'mode': 'markers'
}

ply.iplot(data) 
High five! You successfuly sent some data to your account on plotly. View your plot in your browser at https://plot.ly/~foo12345/0 or inside your plot.ly account where it is named 'plot from API'
Out[6]:

Since plotly uses lists and dicts to generate plots, one can employ Python idioms like list comprehensions to easily generate more substantial output.

In [7]:
import numpy as np
In [8]:
ply.iplot([{'y': np.random.randn(10), 'type':'box'} for i in range(20)],
          layout={'showlegend':False})
Out[8]:

Simple Examples

Let's walk through a few simple plots. Many of these are taken from the Plot.ly API examples.

We first specify the dataset(s). In this case, we will have two series, each on its own y-axis.

In [9]:
data = [
   {
    "x":[1,2,3],
    "y":[40,50,60],
      "name":"yaxis data"
   },
   {
    "x":[2,3,4],
    "y":[4,5,6],
    "yaxis":"y2",
    "name": "yaxis2 data"
   }   
]

A separate dicationary takes care of layout specification:

In [10]:
layout = {
   "yaxis":{
      "title": "yaxis title", 
   }, 

   "yaxis2":{
      "title": "yaxis2 title",
      "titlefont":{
         "color":"rgb(148, 103, 189)"
      },
      "tickfont":{
         "color":"rgb(148, 103, 189)"
      },
      "overlaying":"y",
      "side":"right",
   },  

   "title": "Double Y Axis Example",
}
In [12]:
ply.iplot(data, layout=layout)
Out[12]:

We can add as many axes as we deem necessary:

In [14]:
data = [
    {
    "x":[1,2,3],
    "y":[4,5,6],
    "name":"yaxis1 data"
   },
   {
    "x":[2,3,4],
    "y":[40,50,60],
    "name":"yaxis2 data",
    "yaxis":"y2" 
   },
   {
    "x":[3,4,5],
    "y":[400,500,600],
    "name":"yaxis3 data",
    "yaxis":"y3"
    }
]

c = ['#1f77b4', # muted blue
    '#ff7f0e', # safety orange
    '#2ca02c'] # cooked asparagus green

layout = {
    "width":800,
    "xaxis":{
        "domain":[0.3,0.7]
    },
   "yaxis":{
      "title": "yaxis title",
      "titlefont":{
         "color":c[0]
      },
      "tickfont":{
         "color":c[0]
      },
   },
   "yaxis2":{
      "overlaying":"y",
      "side":"tight",
      "anchor":"free",
      "position":0.15,
      
      "title": "yaxis2 title",
      "titlefont":{
         "color":c[1]
      },
      "tickfont":{
         "color":c[1]
      },
   },
   "yaxis3":{
      "overlaying":"y",
      "side":"left",
      "anchor":"free",
      "position":0,
      
      "title": "yaxis3 title",
      "titlefont":{
         "color":c[2]
      },
      "tickfont":{
         "color":c[2]
      },
   },
    "title": "multiple y-axes example"
}

ply.iplot(data, layout=layout)
Out[14]:

Subplots

It is similarly straightforward to generate subplots. This is done by allocating a proportion of the x or y axis to the domain of each subplot. Here is an example that illustrates the use of subplots and multiple y-axes to compare changes in vessel speeds according to a number of speed enforcement programs. In this case, we place a timeline of the enforcement programs on a smaller subplot above a larger subplot containing the estimates of ship behavior.

In [15]:
'''
Created January 14, 2014 by Chris Fonnesbeck

Licensed under a Creative Commons Attribution 4.0 International License.
'''

import pandas as pd

dates = pd.date_range(start='11/1/2008', end='7/31/2012', freq='M')

c = ['#fbb4ae',
     '#b3cde3',
     '#ccebc5',
     '#decbe4'] 

band_width = 15
opacity = 0.8

# Management interventions in seasonal management areas (SMA)
SMA = [{'name': 'Intermittant, at-sea radio',
    'x': pd.date_range(start='2/1/2009', end='5/1/2009', freq='M'),
    'y': ['USCG']*100,
    'mode':'lines',
    'line':{'color': c[0], 'width': band_width},
    "yaxis": "y2", 
    'showlegend': False, 
    'opacity': opacity
    },
    {'name': 'Intermittant, at-sea radio',
        'x': pd.date_range(start='1/1/2010', end='7/1/2010', freq='M'),
        'y': ['USCG']*100,
        'mode':'lines',
        'line':{'color': c[0], 'width': band_width},
        "yaxis": "y2", 
        'showlegend': False, 
        'opacity': opacity
    },
    {'name': 'Intermittant, at-sea radio',
        'x': pd.date_range(start='11/1/2010', end='6/1/2011', freq='M'),
        'y': ['USCG']*100,
        'mode':'lines',
        'line':{'color': c[0], 'width': band_width},
        "yaxis": "y2", 
        'showlegend': False, 
        'opacity': opacity
    },
    {'name': 'Intermittant, at-sea radio',
        'x': pd.date_range(start='1/1/2012', end='3/1/2012', freq='M'),
        'y': ['USCG']*100,
        'mode':'lines',
        'line':{'color': c[0], 'width': band_width},
        "yaxis": "y2", 
        'showlegend': False, 
        'opacity': opacity
    }
    ,{'name': 'Intermittant, letter',
        'x': pd.date_range(start='10/1/2009', end='12/31/2009', freq='M'),
        'y': ['COPPS']*100,
        'mode':'lines',
        'line':{'color': c[1], 'width': band_width},
        "yaxis": "y2", 
        'showlegend': False, 
        'opacity': opacity
    },
    {'name': 'Certified mail, ongoing litigation',
        'x': pd.date_range(start='11/1/2010', end='8/1/2012', freq='M'),
        'y': ['NOVA']*100,
        'mode':'lines',
        'line':{'color': c[2], 'width': band_width},
        "yaxis": "y2", 
        'showlegend': False, 
        'opacity': opacity
    },
    {'name': 'E-mail, monthly summaries',
        'x': pd.date_range(start='12/1/2010', end='8/1/2012', freq='M'),
        'y': ['WSC']*100,
        'mode':'lines',
        'line':{'color': c[3], 'width': band_width},
        "yaxis": "y2", 
        'showlegend': False, 
        'opacity': opacity
    },
    {'name': 'E-mail, monthly summaries',
        'x': pd.date_range(start='2/1/2011', end='8/1/2012', freq='M'),
        'y': ['CSA']*100,
        'mode':'lines',
        'line':{'color': c[3], 'width': band_width},
        "yaxis": "y2", 
        'showlegend': False, 
        'opacity': opacity,
    }
][::-1]

sma_dates = pd.date_range(start='2/1/2009', end='11/1/2012', freq='12M')

sds = 2
line_width = 2
line_color = "rgb(3,78,123)"

# Parameter estimates from model
estimates = [{'x': sma_dates, 
              'y': np.array([0, -0.04, 0.16, -0.67]), 
              'name': 'Passenger',
              'showlegend': True,
              "line":{"color": line_color, 
                      "width": line_width, 
                      "dash":"dashdot"},
              'error_y': {'type': 'data', 
                          'array': np.array([0, 0.08, 0.08, 0.08]) * sds, 
                          'visible': True, 
                          "color": line_color}},
              {'x': sma_dates, 
               'y': np.array([0, -0.26, -0.92, -1.35]), 
               'name': 'Cargo',
               'showlegend': True,
               "line":{"color": line_color, 
                      "width": line_width, 
                      "dash":"dot"},
               'error_y': {'type': 'data', 
                          'array': np.array([0, 0.02, 0.02, 0.02]) * sds, 
                          'visible': True, 
                          "color": line_color}},
              {'x': sma_dates, 
               'y': np.array([0, -0.02, -0.41, -0.62]), 
               'name': 'Tanker',
              'showlegend': True,
              "line":{"color": line_color, 
                      "width": line_width, 
                      "dash":"solid"},
              'error_y': {'type': 'data', 
                          'array': np.array([0, 0.03, 0.03, 0.03]) * sds, 
                          'visible': True, 
                          "color": line_color}},]


legendstyle = {"x" : 0, 
               "y" : 0, 
               "bgcolor" : "#F0F0F0",
               "bordercolor" : "#FFFFFF",}

layout = {
    "yaxis2":{'showgrid': False,
             'zeroline': True,
             'side': 'right',
             "showticklabels" : True,
             'domain': [2./3., 1],
             'title': 'Program'
   }, 

    "yaxis":{"title": "Speed change from 1st season",
            'showgrid': False,
            'zeroline': True,
            "zerolinecolor" : "#F0F0F0",
            "zerolinewidth" : 4,
            'mode': 'markers',
            'domain': [0, 2./3.]
   }, 

    "xaxis": {'showgrid':False,
              'zeroline':False, 
              'title': 'Date',
              'range': [dates[0], dates[-1]]},

   "title": "Vessel speed change in response to notification programs",
   'showlegend': True,
   "legend": legendstyle
}


ply.iplot(SMA + estimates, layout=layout) 
Out[15]:

Titanic Dataset

We can use Pandas to manage larger datasets, and generate visualizations of it using Plotly. Here is a simple boxplot with data points overplotted.

In [16]:
titanic = pd.read_csv("http://biostat.mc.vanderbilt.edu/wiki/pub/Main/DataSets/titanic3.csv")
In [17]:
titanic.head()
Out[17]:
pclass survived name sex age sibsp parch ticket fare cabin embarked boat body home.dest
0 1 1 Allen, Miss. Elisabeth Walton female 29.00 0 0 24160 211.3375 B5 S 2 NaN St Louis, MO
1 1 1 Allison, Master. Hudson Trevor male 0.92 1 2 113781 151.5500 C22 C26 S 11 NaN Montreal, PQ / Chesterville, ON
2 1 0 Allison, Miss. Helen Loraine female 2.00 1 2 113781 151.5500 C22 C26 S NaN NaN Montreal, PQ / Chesterville, ON
3 1 0 Allison, Mr. Hudson Joshua Creighton male 30.00 1 2 113781 151.5500 C22 C26 S NaN 135 Montreal, PQ / Chesterville, ON
4 1 0 Allison, Mrs. Hudson J C (Bessie Waldo Daniels) female 25.00 1 2 113781 151.5500 C22 C26 S NaN NaN Montreal, PQ / Chesterville, ON

5 rows × 14 columns

In [18]:
age_by_class = [{'y': data.values, 
                 'name': pclass,
                 'type': 'box',
                 'boxpoints': 'all', 
                 'jitter': 0.3} for pclass,data in list(titanic.groupby('pclass')['age'])]
In [19]:
layout = {'xaxis': {'showgrid':False,'zeroline':False, 
                    'title': 'Passenger class'},
          'yaxis': {'zeroline':False,'gridcolor':'white', 'title': 'Age'},
          'plot_bgcolor': 'rgb(233,233,233)',
          'showlegend':False}
In [20]:
ply.iplot(age_by_class, layout=layout)
Out[20]:

Plotly from R

Along with the Python API, Plotly includes API for several other languages suitable for statistical analysis, such as R and Julia. In the IPython notebook, we can run R code via the R magic command, which gives us access to R's rich library of modules without having to open a separate R session.

In [22]:
%load_ext rmagic
In [29]:
%%R -o hist_url

library(plotly)

py <- plotly('foo12345', key='daia9p1ryi')

# Sample data
samples <- rnorm(50)

# Normal density
x_norm <- seq(-5,5,length=100)
y_norm <- 1./sqrt(2*pi)*exp(-x_norm**2/2.)

# layout
l <- list(showlegend = FALSE, 
        xaxis = list(zeroline = FALSE),
        yaxis = list(zeroline = FALSE))

# Histogram data
dataHistogram <- list(y = samples, 
                      type = 'histogramy',
                      histnorm = 'probability density')

# Curve data
dataArea <- list(x = x_norm,
                 y = y_norm,
                 fill = 'tozeroy')

# Call plotly
response <- py$plotly(list(dataHistogram, dataArea), kwargs = list(layout = l))

# url and filename
hist_url <- response$url
Loading required package: RCurl
Loading required package: bitops
Loading required package: RJSONIO

To view the resulting plot in IPython notebook, we can use the URL returned by plotly and place it within an HTML object.

In [30]:
hist_url
Out[30]:
array(['https://plot.ly/~foo12345/7'], 
      dtype='|S27')
In [32]:
from IPython.display import HTML
HTML('<iframe src={0} width=700 height=500></iframe>'.format(hist_url[0]))
Out[32]:

Displaying MCMC output

Running this example requires that PyMC 2.3 be installed.

In Bayesian modeling using Markov chain Monte Carlo (MCMC) methods, we make inferences based on samples drawn from the posterior distribution of unknown variables in our model. Therefore, it is useful to plot these samples to visually assess whether the model has converged.

PyMC includes an example dataset, which is a time series of recorded coal mining disasters in the UK from 1851 to 1962.

In [47]:
from pymc.examples import disaster_model
from pymc import MCMC, graph

M = MCMC(disaster_model)

ply.iplot({'y': disaster_model.disasters_array, 
           'x': range(1851, 1963),
           "type": "scatter", 
           "mode": "lines", 
           "name": "UK coal mining disasters, 1851-1962"})
Out[47]:

Occurrences of disasters in the time series is thought to be derived from a Poisson process with a large rate parameter in the early part of the time series, and from one with a smaller rate in the later part. We are interested in locating the change point in the series, which perhaps is related to changes in mining safety regulations.

In [41]:
nchains = 3
for i in range(nchains):
    M.sample(5000, progress_bar=False)

Two simple visualizations is to look at the trace, essentially a time series of the samples, and the histogram of the samples.

In [43]:
trace = M.early_mean.trace()

data = [{'y': trace}]
data.append({'y': trace,
             'xaxis': 'x2',
             'yaxis': 'y2',
             'type': 'histogramy'})

layout = {
    "xaxis":{
        "domain":[0,0.5],
        "title": "Iteration"
    },
    "yaxis":{
        "title": "Value"
    },
    "xaxis2":{
        "domain":[0.55,1],
        "title": "Value"
    },
    "yaxis2":{
        "anchor":"x2",
        "side": "right",
        "title": "Frequency"
    },
    "showlegend": False,
    "title": "Posterior samples of early Poisson mean"
}

ply.iplot(data, layout=layout, width=850,height=400)
Out[43]:

This shows that the chain is relatively homogeneous (the mean and variance of the samples do not appear to change much over the course of sampling).

However, stronger evidence can be obtained by comparing multiple idependent samples, which is why we ran three chains above. We can use Plotly to compare samples from each of the chains.

In [45]:
nparams = len(M.stochastics)
mcmc_data = {'chain {0}'.format(i): {s.__name__:s.trace(chain=i)[:100] for s in M.stochastics} for i in range(nchains)}

attr = mcmc_data['chain 1'].keys()
colors = {'chain 0': 'rgb(31, 119, 180)', 
          'chain 1': 'rgb(255, 127, 14)',
          'chain 2': 'rgb(44, 160, 44)'}

data = []
for i in range(nparams):
    for j in range(nparams):
        for chain in mcmc_data.keys():
            data.append({"name": chain, 
                     "x": mcmc_data[chain][attr[i]], "y": mcmc_data[chain][attr[j]],
                     "type":"scatter","mode":"markers",
                     'marker': {'color': colors[chain], 'opacity':0.2},
                     "xaxis": "x"+(str(i) if i!=0 else ''), "yaxis": "y"+(str(j) if j!=0 else '')})
padding = 0.04;
domains = [[i*padding + i*(1-3*padding)/nparams, i*padding + ((i+1)*(1-3*padding)/nparams)] for i in range(nparams)]

layout = {
    "xaxis":{"domain":domains[0], "title":attr[0], 
             'zeroline':False,'showline':False,'ticks':'', 
             'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
    "yaxis":{"domain":domains[0], "title":attr[0], 
             'zeroline':False,'showline':False,'ticks':'', 
             'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
    "xaxis1":{"domain":domains[1], "title":attr[1], 
              'zeroline':False,'showline':False,'ticks':'', 
              'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
    "yaxis1":{"domain":domains[1], "title":attr[1], 
              'zeroline':False,'showline':False,'ticks':'', 
              'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
    "xaxis2":{"domain":domains[2], "title":attr[2], 
              'zeroline':False,'showline':False,'ticks':'', 
              'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
    "yaxis2":{"domain":domains[2], "title":attr[2], 
              'zeroline':False,'showline':False,'ticks':'', 
              'titlefont':{'color': "rgb(67, 67, 67)"},'tickfont':{'color': 'rgb(102,102,102)'}},
    
    "showlegend":False,
    "title":"Posterior samples for coal mining disasters model",
    "titlefont":{'color':'rgb(67,67,67)', 'size': 20}
    }

ply.iplot(data,layout=layout)
Out[45]: