Plotly's Python API User Guide

Section 4: Histograms and Box Plots

Welcome to Plotly's Python API User Guide.

Links to the other sections are on the User Guide's homepage
The Github repository is available here

Quickstart (make vertical histogram of 100 normally distributed random numbers):

>>> import plotly.plotly as py
>>> from plotly.graph_objs import *
>>> # auto sign-in with credentials or use py.sign_in()
>>> import numpy as np
>>> x = np.random.randn(100)
>>> trace1 = Histogram(x=x)
>>> data = Data([trace1])
>>> py.plot(data)


Check which version is installed on your machine and please upgrade if needed.

In [1]:
# (*) Import plotly package
import plotly

# Check plolty version (if not latest, please upgrade)
plotly.__version__
Out[1]:
'1.6.6'

See the User Guide's homepage for more info on installation and upgrading.


In this section, imagine that you are a Teaching Assistant. You have the course's grades in a file and you would like to generate basic statistics and analyze them using Plotly histograms and box plots. Moreover, conjointly in subsection 4.3, we cover:

We first import a few modules and sign in to Plotly using our credentials file:

In [2]:
# (*) To communicate with Plotly's server, sign in with credentials file
import plotly.plotly as py  

# (*) Useful Python/Plotly tools
import plotly.tools as tls   

# (*) Graph objects to piece together plots
from plotly.graph_objs import *

import numpy as np  # (*) numpy for math functions and arrays

If you are not familiar with credentials files, refer to the User Guide's homepage.

In similar fashion to section 2 (Bar Charts and Error Bars), we read and plot data from a csv file. The csv file in question grades.csv can be found in the section's Github folder. It contains students grades in percent. In this file, each row corresponds to a student and each column corresponds to a particular component of the total grade. These components (Final, Midterm and Homeworks) are labelled in the top row of the file.

In order to use the data according, let's first define a csv column reader function.

In [3]:
# (*) csv file read/write
import csv          

# Define a csv column reader function
def get_csv_col(filepath, col_id):
    ''' 
    Read column of csv file, return a Numpy array where
    each entry corresp. to a particular student
    pos. arg (1) filepath: relative path to csv file 
    pos. arg (2) col_id: id of column requested, found in first row (a string)
    '''
    with open(filepath, 'r') as data_file:
        reader = csv.reader(data_file)         # define reader object
        the_col = reader.next().index(col_id)  # look for 'col_id' row 1 then skip row
        
        # Retrieve all entry of column 'the_col', 
        # put in numpy array
        return np.array([float(row[the_col]) for row in reader])
    
# (-) The 'with' statement automatically closes  
#    'filepath' at the end of its block of code

Let's extract each column and check how many (non-NaN) grades they contain:

In [4]:
# Extract columns for 'grades.csv' 
homeworks = get_csv_col('grades.csv', 'Homeworks')
midterm = get_csv_col('grades.csv', 'Midterm Exam')
final = get_csv_col('grades.csv', 'Final Exam')

The data in grades.csv contains some NaNs (not a number) entries. For example,

In [5]:
homeworks[11:20]
Out[5]:
array([  82.,   90.,   68.,   85.,   nan,   82.,   40.,  100.,   92.])

That said, Plotly makes plotting histograms and box plots with arrays containing NaNs a breeze. Please note that:

All NaNs and string entries are ignored in Plotly's built-in computations.

Check exactly how many NaNs are contained in the arrays with:

In [6]:
# Compare number of non-NaN entries to total number of entries in column
def check_nan(X, X_name):
    '''
    Identify which item is a NaN, compare to total number of items,
    pos. arg (1) X: array in question (must a numpy array)
    pos. arg (2) X_name: name of array (a string)
    '''
    X_no_nan = X[~np.isnan(X)]
    print "%s: %i grades out of %i entries" % (X_name, len(X_no_nan), len(X))

check_nan(homeworks, 'Homeworks')
check_nan(midterm, 'Midterm Exam')
check_nan(final, 'Final Exam')
Homeworks: 62 grades out of 68 entries
Midterm Exam: 64 grades out of 68 entries
Final Exam: 63 grades out of 68 entries

4.1 Overlaid histograms

For our first histogram plot, we compare the grade distributions of the midterm exam and final exam. Plotly allows users to choose the histogram normalization (i.e. the units of the generated y-axis) of each histogram object:

The histogram object key 'histnorm' is set to '' by default (i.e. no normalization), the other possibilities are: 'percent', 'probability', 'density', 'probability density' (all in lower case). We compare the different 'histnorm' values in fig. 4.2.

Plot customization for the histogram object (named Histogram) is similar to the customization for the bar object:

  • Users can select different bar modes (the 'barmode' key in the layout object). As for the bar object, the available values are 'stack' (default), 'overlay' or 'group'.

  • Spacing between bars can be modify using 'bargap' and 'bargroupgap' in the layout object.

  • Users can make an histogram with horizontal bars by setting 'orientation' to 'h' in histogram object.

For more information on these options, see section 2 of the User Guide.

Our first histogram plot will feature:

  • Two overlaid and partly transparent histograms.

Opacity for histograms is set with the 'opacity' key in the Histogram object (not in Marker, unlike for scatter plots),

  • The default 'histnorm' value (i.e. with no normalization), as well as all other histogram options are set to their default value.
In [7]:
# Define colors for the midterm and final histograms
color_midterm = 'rgb(231,41,138)'  # a nice fuchsia
color_final = 'rgb(230,171,2)'     # a nice yellow

# Define a trace-generating function (returns an Histogram object)
def make_trace(x, name, color):
    x_cnt = len(x[~np.isnan(x)])     # find number of non-NaNs grades
    return Histogram(
        x=x,  # distribution binded to x-axis
        name="<b>{}</b> ({} grades)".format(name, x_cnt),  # legend/hover text 
        opacity=0.55,               # set partly transparent bars
        marker=Marker(color=color)  # set bar color
    )

# Make data object using make_trace()
data = Data([
    make_trace(midterm, 'Midterm', color_midterm),
    make_trace(final, 'Final', color_final)
])

Onto layout:

In [8]:
title = "Fig 4.1: Midterm and Final Exam Grade Distribution"  # plot's title

# Define dictionary of axis style options
axis_style = dict(
    zeroline=False,       # remove thick zero line
    showgrid=True,        # show grid lines (not default on bar/histogram)
    gridcolor='#FFFFFF',  # white grid lines
    ticks='outside',      # draw ticks outside axes 
    ticklen=8,            # tick length
    tickwidth=1.5         #   and width
)

# Make layout object
layout = Layout(
    barmode='overlay',  # (!) overlay barmode
    title=title,        # set plot title
    xaxis=XAxis(
        axis_style,               # style options
        title='<b>Grade [%]</b>'  # x-axis title
    ),  
    yaxis=YAxis(
        axis_style,               # sytle options
        title='<b>Count</b>'      # y-axis title 
    ),
    legend=Legend(
        x=0, 
        y=1   # legend at upper left corner of plot
    ),
    plot_bgcolor='#EFECEA'   # set plot color to grey
) 
In [9]:
# Make Figure object
fig = Figure(data=data, layout=layout)

# (@) Send to Plotly and show in notebook
py.iplot(fig, filename='s4_midterm-final')
Out[9]:

Reducing the opacity of each histogram together with Plotly's hover capacities makes comparing two sets of data easy.

4.2 Histogram normalization

In this subsection, we compare Plotly's histogram normalizations.

Let $y_{\text{count}}(i)$ be the number of non-NaN data points in bin $i$, $w_i$ be the width of bin $i$ and $N$ be the number of data points in the sample. Then, Plotly's available histogram normalizations are:

$$ y_{\text{probability}}(i) = \frac{1}{N} y_{\text{count}}(i) $$

$$ y_{\text{percent}}(i) = \frac{1}{N} y_{\text{count}}(i) \times 100 $$$$ y_{\text{density}}(i) = \frac{1}{w_i} y_{\text{count}}(i) $$$$ y_{\text{prob. density}}(i) = \frac{1}{N} \frac{1}{w_i} y_{\text{count}} $$

With the same data as the previous plot, let's compare these four histogram normalizations in a 2 by 2 plot. So, first we define a simple subplot grid:

In [10]:
# Generate a figure object with 4 axes on 2 rows and 2 columns
fig = tls.make_subplots(rows=2, cols=2, start_cell='bottom-left')
This is the format of your plot grid:
[ (2,1) x3,y3 ]  [ (2,2) x4,y4 ]
[ (1,1) x1,y1 ]  [ (1,2) x2,y2 ]

In [11]:
# Print fig in notebook
print(fig.to_string())
Figure(
    data=Data(),
    layout=Layout(
        xaxis1=XAxis(
            domain=[0.0, 0.45],
            anchor='y1'
        ),
        xaxis2=XAxis(
            domain=[0.55, 1.0],
            anchor='y2'
        ),
        xaxis3=XAxis(
            domain=[0.0, 0.45],
            anchor='y3'
        ),
        xaxis4=XAxis(
            domain=[0.55, 1.0],
            anchor='y4'
        ),
        yaxis1=YAxis(
            domain=[0.0, 0.425],
            anchor='x1'
        ),
        yaxis2=YAxis(
            domain=[0.0, 0.425],
            anchor='x2'
        ),
        yaxis3=YAxis(
            domain=[0.575, 1.0],
            anchor='x3'
        ),
        yaxis4=YAxis(
            domain=[0.575, 1.0],
            anchor='x4'
        )
    )
)

Make a few definitions

In [12]:
# List of subplot [1],[2],[3],[4] labels
splts = range(1, 5)

# Define a dictionary of histogram normalization 
#   (values of the 'histnorm' key)
histnorms = {
    1: 'percent', 
    2: 'probability',
    3: 'density', 
    4: 'probability density'
}

# Define a dictionary of flags setting which trace appeats on the legend
#   (values of 'showlegend' key in each trace object)
showlegends = {
    1: True,
    2: False,
    3: False, 
    4: False
}

Make eight histogram trace objects (with midterm and final data for four histogram normalizations):

In [13]:
# Define a trace-generating function (returns a Histogram object)
def make_trace(splt, x, name, color):
    return Histogram(
        x=x,                      # distribution to be plotted
        histnorm=histnorms[splt], # (!) histogram normalization
        name="<b>{}</b>".format(name),  # label for legend/hover text
        opacity=0.55,                   # partly transparent bars
        marker=Marker(color=color),     # set bar color
        showlegend=showlegends[splt],   # (!) show only 2 traces in legend
        xaxis="x{}".format(splt),  # (!) plot on 'splt' x-axis
        yaxis="y{}".format(splt),  # (!) plot on 'splt' y-axis
    )

# Fill in data object with 8 traces using make_trace()
#   and data arrays and colors defined in 4.1
fig['data'] = Data(
    [make_trace(splt, midterm, 'Midterm', color_midterm) for splt in splts] +
    [make_trace(splt, final, 'Final', color_final) for splt in splts]
)

Add a few layout features:

In [14]:
# (a) Update x and y axis of all subplots
#     using the axis_style dictionary defined in subsection 4.1
fig['layout'].update(
    {'xaxis{}'.format(splt): axis_style for splt in splts}
)
fig['layout'].update(
    {'yaxis{}'.format(splt): axis_style for splt in splts[0:2]}
)

# (b) Add title to the 2 bottom x axes
fig['layout'].update(
    {'xaxis{}'.format(splt): {'title':'<b>Grade [%]</b>'} for splt in [1,2]}
)

# (c) Make y-axis title correspond to 'histnorm' key
#     with 1st letter in upper case (using .title() string method)
fig['layout'].update(
    {'yaxis{}'.format(splt): {'title': histnorms[splt].title()} for splt in splts})

title = "Fig 4.2: Midterm and Final Exam \
Grade Distribution"     # plot's title

# (d) Set barmode, plot title, global font, plot background and legend
fig['layout'].update(
    barmode='overlay',  # (!) overlay barmode
    title=title,        # plot title
    legend=Legend(
        x=100,  # outside plotting area to the right
        y=1     # top of plotting area 
    ),
    plot_bgcolor='#EFECEA' # set plot color to grey
)      

And finally,

In [15]:
# (@) Send figure object to Plotly and show in notebook
py.iplot(fig, filename='s4_midterm-final-histnorm')
Out[15]:

4.3 Custom histogram bins

Plotly allows users to generate histograms without having to set every bins' size and location. Quick data visualization is made easy in Plotly. Users looking for more control, can easily modified the bins' options inside Plotly's web GUI.

Here's a snapshot of the web GUI:


Histogram custom

Alternatively, histogram bin options can be modified by first setting 'autobinx: False' in histogram graph object and using the XBins graph object (or 'autobiny' and YBins for histograms with horizontal bars).

In [16]:
help(XBins)  # call help()!
Help on class XBins in module plotly.graph_objs.graph_objs:

class XBins(PlotlyDict)
 |  A dictionary-like object containing specifications of the bins lying along
 |      the x-axis.
 |  
 |  Online examples:
 |  
 |      https://plot.ly/python/histograms/
 |      https://plot.ly/python/2D-Histograms/
 |  
 |  Parent key:
 |  
 |      xbins
 |  
 |  Quick method reference:
 |  
 |      XBins.update(changes)
 |      XBins.strip_style()
 |      XBins.get_data()
 |      XBins.to_graph_objs()
 |      XBins.validate()
 |      XBins.to_string()
 |      XBins.force_clean()
 |  
 |  Valid keys:
 |  
 |      start [required=False] (value=number: x > 0):
 |          Sets the starting point on the x-axis for the first bin.
 |  
 |      end [required=False] (value=number: x > 0):
 |          Sets the end point on the x-axis for the last bin.
 |  
 |      size [required=False] (value=number: x > 0) (streamable):
 |          Sets the size (i.e. their width) of each x-axis bin.
 |  
 |  Method resolution order:
 |      XBins
 |      PlotlyDict
 |      __builtin__.dict
 |      __builtin__.object
 |  
 |  Methods inherited from PlotlyDict:
 |  
 |  __init__(self, *args, **kwargs)
 |  
 |  __setitem__(self, key, value)
 |  
 |  force_clean(self, caller=True)
 |      Attempts to convert to graph_objs and call force_clean() on values.
 |      
 |      Calling force_clean() on a PlotlyDict will ensure that the object is
 |      valid and may be sent to plotly. This process will also remove any
 |      entries that end up with a length == 0.
 |      
 |      Careful! This will delete any invalid entries *silently*.
 |  
 |  get_data(self)
 |      Returns the JSON for the plot with non-data elements stripped.
 |  
 |  get_ordered(self, caller=True)
 |  
 |  strip_style(self)
 |      Strip style from the current representation.
 |      
 |      All PlotlyDicts and PlotlyLists are guaranteed to survive the
 |      stripping process, though they made be left empty. This is allowable.
 |      
 |      Keys that will be stripped in this process are tagged with
 |      `'type': 'style'` in graph_objs_meta.json.
 |      
 |      This process first attempts to convert nested collections from dicts
 |      or lists to subclasses of PlotlyList/PlotlyDict. This process forces
 |      a validation, which may throw exceptions.
 |      
 |      Then, each of these objects call `strip_style` on themselves and so
 |      on, recursively until the entire structure has been validated and
 |      stripped.
 |  
 |  to_graph_objs(self, caller=True)
 |      Walk obj, convert dicts and lists to plotly graph objs.
 |      
 |      For each key in the object, if it corresponds to a special key that
 |      should be associated with a graph object, the ordinary dict or list
 |      will be reinitialized as a special PlotlyDict or PlotlyList of the
 |      appropriate `kind`.
 |  
 |  to_string(self, level=0, indent=4, eol='\n', pretty=True, max_chars=80)
 |      Returns a formatted string showing graph_obj constructors.
 |      
 |      Example:
 |      
 |          print(obj.to_string())
 |      
 |      Keyword arguments:
 |      level (default = 0) -- set number of indentations to start with
 |      indent (default = 4) -- set indentation amount
 |      eol (default = '\n') -- set end of line character(s)
 |      pretty (default = True) -- curtail long list output with a '...'
 |      max_chars (default = 80) -- set max characters per line
 |  
 |  update(self, dict1=None, **dict2)
 |      Update current dict with dict1 and then dict2.
 |      
 |      This recursively updates the structure of the original dictionary-like
 |      object with the new entries in the second and third objects. This
 |      allows users to update with large, nested structures.
 |      
 |      Note, because the dict2 packs up all the keyword arguments, you can
 |      specify the changes as a list of keyword agruments.
 |      
 |      Examples:
 |      # update with dict
 |      obj = Layout(title='my title', xaxis=XAxis(range=[0,1], domain=[0,1]))
 |      update_dict = dict(title='new title', xaxis=dict(domain=[0,.8]))
 |      obj.update(update_dict)
 |      obj
 |      {'title': 'new title', 'xaxis': {'range': [0,1], 'domain': [0,.8]}}
 |      
 |      # update with list of keyword arguments
 |      obj = Layout(title='my title', xaxis=XAxis(range=[0,1], domain=[0,1]))
 |      obj.update(title='new title', xaxis=dict(domain=[0,.8]))
 |      obj
 |      {'title': 'new title', 'xaxis': {'range': [0,1], 'domain': [0,.8]}}
 |      
 |      This 'fully' supports duck-typing in that the call signature is
 |      identical, however this differs slightly from the normal update
 |      method provided by Python's dictionaries.
 |  
 |  validate(self, caller=True)
 |      Recursively check the validity of the keys in a PlotlyDict.
 |      
 |      The valid keys constitute the entries in each object
 |      dictionary in graph_objs_meta.json
 |      
 |      The validation process first requires that all nested collections be
 |      converted to the appropriate subclass of PlotlyDict/PlotlyList. Then,
 |      each of these objects call `validate` and so on, recursively,
 |      until the entire object has been validated.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from PlotlyDict:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from __builtin__.dict:
 |  
 |  __cmp__(...)
 |      x.__cmp__(y) <==> cmp(x,y)
 |  
 |  __contains__(...)
 |      D.__contains__(k) -> True if D has a key k, else False
 |  
 |  __delitem__(...)
 |      x.__delitem__(y) <==> del x[y]
 |  
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(...)
 |      x.__gt__(y) <==> x>y
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __le__(...)
 |      x.__le__(y) <==> x<=y
 |  
 |  __len__(...)
 |      x.__len__() <==> len(x)
 |  
 |  __lt__(...)
 |      x.__lt__(y) <==> x<y
 |  
 |  __ne__(...)
 |      x.__ne__(y) <==> x!=y
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __sizeof__(...)
 |      D.__sizeof__() -> size of D in memory, in bytes
 |  
 |  clear(...)
 |      D.clear() -> None.  Remove all items from D.
 |  
 |  copy(...)
 |      D.copy() -> a shallow copy of D
 |  
 |  fromkeys(...)
 |      dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v.
 |      v defaults to None.
 |  
 |  get(...)
 |      D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
 |  
 |  has_key(...)
 |      D.has_key(k) -> True if D has a key k, else False
 |  
 |  items(...)
 |      D.items() -> list of D's (key, value) pairs, as 2-tuples
 |  
 |  iteritems(...)
 |      D.iteritems() -> an iterator over the (key, value) items of D
 |  
 |  iterkeys(...)
 |      D.iterkeys() -> an iterator over the keys of D
 |  
 |  itervalues(...)
 |      D.itervalues() -> an iterator over the values of D
 |  
 |  keys(...)
 |      D.keys() -> list of D's keys
 |  
 |  pop(...)
 |      D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
 |      If key is not found, d is returned if given, otherwise KeyError is raised
 |  
 |  popitem(...)
 |      D.popitem() -> (k, v), remove and return some (key, value) pair as a
 |      2-tuple; but raise KeyError if D is empty.
 |  
 |  setdefault(...)
 |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
 |  
 |  values(...)
 |      D.values() -> list of D's values
 |  
 |  viewitems(...)
 |      D.viewitems() -> a set-like object providing a view on D's items
 |  
 |  viewkeys(...)
 |      D.viewkeys() -> a set-like object providing a view on D's keys
 |  
 |  viewvalues(...)
 |      D.viewvalues() -> an object providing a view on D's values
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from __builtin__.dict:
 |  
 |  __hash__ = None
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T

Take note:

The 'start' key sets the starting point of the first bin and 'end' key sets the end point of the final bin.

For our next plot, we compare the grade distribution of the homeworks, the midterm, the final exam as well as the (total) course grade. The first step is to compute the course grade from the three other grades:

In [17]:
# Compute course grade (ignoring NaNs in sum)
def get_course(data, weight):
    d = np.vstack(data)  # 2d array, columns rep. each component
    w = np.array(weight)[:, np.newaxis] # 2d array of weight
    return np.nansum(d*w, axis=0)       # sum across columns
            
# Send grades and weights to get_course()
course = get_course([homeworks, midterm, final], [0.3, 0.3, 0.4])

# Check how many NaN 'course' array
check_nan(course,'Course Grade')
Course Grade: 64 grades out of 68 entries
/usr/lib/python2.7/dist-packages/numpy/lib/nanfunctions.py:514: FutureWarning:

In Numpy 1.9 the sum along empty slices will be zero.

Now, build a 4 rows, 1 column subplot grid with shared x-axes using make_subplot():

In [18]:
# Generate Figure object with 4 axes on 2 rows and 2 columns
fig = tls.make_subplots(rows=4, cols=1, shared_xaxes=True)
This is the format of your plot grid:
[ (1,1) x1,y1 ]
[ (2,1) x1,y2 ]
[ (3,1) x1,y3 ]
[ (4,1) x1,y4 ]

In [19]:
# Print Figure object to stdout
print(fig.to_string())
Figure(
    data=Data(),
    layout=Layout(
        xaxis1=XAxis(
            domain=[0.0, 1.0],
            anchor='y4'
        ),
        yaxis1=YAxis(
            domain=[0.8062499999999999, 0.9999999999999999],
            anchor='free',
            position=0.0
        ),
        yaxis2=YAxis(
            domain=[0.5375, 0.73125],
            anchor='free',
            position=0.0
        ),
        yaxis3=YAxis(
            domain=[0.26875, 0.4625],
            anchor='free',
            position=0.0
        ),
        yaxis4=YAxis(
            domain=[0.0, 0.19375],
            anchor='x1'
        )
    )
)

Next, we fill in the data and layout objects. The following code cells are split into four sections: (0) a few definitions, (1) data and trace style options], (2) layout options and (3) a call to Plotly.

-0- A few definitions
In [20]:
splts = range(1, 5)  # list of subplot labels

# Concatenate all grade arrays
grades_all = np.concatenate((homeworks, midterm, final, course))

# Set bin start, end and size (for all subplots)
xbin_start = np.nanmin(grades_all)  # min of arrays
xbin_end = 100                      # max possible grade
xbin_size = 4                       # size of each bin (in %)

histnorm = 'probability density'  # prob. density as histogram norm. 
color = 'rgb(106,204,101)'        # bin color 
x_range = [5, 105]          # x-axis range 
y_range = [-0.009, 0.0509]  # (!) to include tick at range limits
width = 650    # plot width (in pixels)
height = 800   # plot height (in pixels)
-1- Data and trace options
In [21]:
# Define a trace-generating function (returns a Histogram object)
def make_trace(x, splt):
    return Histogram(
        x=x,                # distribution to be plotted
        histnorm=histnorm,  # histogram normalization
        name='<b>Grades</b>',        # label for legend/hover
        marker=Marker(color=color), # bar color
        autobinx=False,        # (!) custom bins
        xbins=XBins(
            start=xbin_start,  # first bin start
            end=xbin_end,      # last bin end
            size=xbin_size     # size of each bin
        ),
        xaxis='x1',          # (!) all subplots on same x-axis
        yaxis='y'+str(splt), # (!) plot on 'sbplt' y-axis
    )

# (1.1b) Update 'data' key in figure object with Histogram objects
fig['data'] = Data([
    make_trace(homeworks, 1),
    make_trace(midterm, 2),
    make_trace(final, 3),
    make_trace(course, 4)
])
-2- Layout options

We use annotation object to display the plot's and the subplot titles:

In [22]:
title = "Fig. 3.4a: <b>Course Grade Distributions</b>"  # plot's title

# (2.1) Make 'title' annotation object
anno_title = Annotation(
    text=title,    # set plot's title
    xref='paper',  # use paper coordinates
    yref='paper',  #   for both x and y coords
    x=0,        # x and y position 
    y=1.15,        #   in norm. coord.     
    font=Font(size=22),  # text font size
    showarrow=False,       # no arrow (default is True)
    bgcolor='#F5F3F2',     # light grey background color
    bordercolor='#FFFFFF', # white borders
    borderwidth=1,         # set border width
    borderpad=22           # set border-text space
)

# Define an annotation-generation function, for subplot titles
def make_anno_splt(text, yref):
    return Annotation(
        text=text,   # set subplot title
        xref='x1',     # (!) subplot share the same x-axis
        yref=yref,     # (!) ref on y-axis
        x=8,            # set x position ' 
        xanchor='left', #   and anchor
        y=y_range[1],   # set y position
        yanchor='top',  #   and anchor
        font=Font(size=14),  # text font size
        showarrow=False,       # no arrow (default is True)
        bgcolor='#F5F3F2',     # light grey background color
        bordercolor='#FFFFFF', # white borders
        borderwidth=1,         # set border width
        borderpad=5           # set border-text space
    )

# Define dictionary of subplot title
splt_title = {
    1: '<b>Homework</b> (30% of course grade)', 
    2: '<b>Midterm Exam</b> (30%)',
    3: '<b>Final Exam</b> (40%)', 
    4: '<b>Course Grade</b>'
}

# (2.3a) Make Annotations object
annotations = Annotations(
    [anno_title] +
    [make_anno_splt(splt_title[splt], "y{}".format(splt)) for splt in splts]
)

# (2.3b) Link 'annotations' to Annotations object
fig['layout'].update(annotations=annotations)

# (2.4) Add (an invisible) 'title' to be placed in the plot's URL
fig['layout'].update(
    title="course-grade-distribution",    # placed in plot's URL
    titlefont=Font(color='rgba(0,0,0,0)') # (!) invisible color
)

onto axis style options

In [23]:
# Define an axis style dictionary
axis_style=dict(
    tickfont=Font(size=14),   # font size (default is 12)
    titlefont=Font(size=14),  # title font size (default is 12)
    showgrid=True,            # show grid lines
    gridcolor='#FFFFFF',      # white grid lines
    zeroline=False,           # remove thick zero line
    autotick=False            # turn off autotick
)

# (2.5a) Update x axis with style options
fig['layout']['xaxis1'].update(
    axis_style,  # link style options dict, (!) must be first argument
    title='<b>Grades [%]</b>', # set axis title
    range=x_range,             # x-axis range, defined in (0)
    ticks='outside',   # draw ticks outside axes 
    dtick=10,          # set distance between ticks
    ticklen=8,         # set tick length
    tickwidth=1.5      #   and width 
)
      
# Define dictionary of style options for all y axes
yaxis_style = YAxis(
    axis_style,   # link style options dict, (!) must be first argument
    title='<b>Prob. dist.</b>',  # y-axis title  
    range=y_range,               # y-axis range, defined in (0)
    ticks='',     # do no draw ticks  
    dtick=0.01    # set distance between grid lines 
)

# (2.5b) Update all four y axes
fig['layout'].update(
    {'yaxis{}'.format(splt): yaxis_style for splt in splts}
)

Set a few other layout features:

In [24]:
fig['layout'].update(
    bargap=0.01,     # norm. spacing between bars
    showlegend=False, # remove legend
    font=Font(
        family="Droid Serif, serif", 
        color='#635F5D'
    ),
    margin=Margin(  # set frame/axes margins
        t=100,
        b=100,
        r=25,
        l=70
    ), 
    plot_bgcolor='#EFECEA',   # set plot and 
    paper_bgcolor='#EFECEA',  #   frame background color to grey
    autosize=False,  # turn off autosize 
    height=height,   # plot's height in pixels (defined in -0-) 
    width=width,     # plot's width in pixels (defined in -0-)
)
-3- Call to Plotly
In [25]:
# (@) Send figure object to Plotly and show in notebook
py.iplot(fig, filename='s4_grades-bins', width=width, height=height)  

# adjust output cell with 'width' and 'height'
Out[25]:

Note that Plotly allows users to customize histogram bins in another way: by setting the total number of bins.

This is done using the 'nbinx' key in Histogram (or correspondingly the 'nbiny' for histograms with horizontal bars).

Adding Gaussian fits to Plotly histograms

The above figure is pretty good, but let's make it even better by overlaying Gaussian fits on top of the histograms.

Hover around on the plot above and notice that the x values corresponding to the bins are placed in the middle of each bin. So, after finding the best-fit Gaussian curves, we will evaluate them at these points. This will allow us to compare the probability density values of each bin and the fit by hovering the cursor on the plot.

The following code make use the scipy module.

In [26]:
# (*) Import normal distribution best fits function from scipy
from scipy.stats import norm
In [27]:
# Space on which Gaussian fit will be evaluated
x_space = np.arange(xbin_start + xbin_size/2, xbin_end, xbin_size)

# Gaussian (or Normal) distribution
def gaussian(x, mu, sig):
    return 1./(sig * np.sqrt(2*np.pi)) * np.exp(-(x-mu)**2 / (2*sig**2))
    
# Define a text-generating function (returns a strings)
def make_text(mu, sig):
    return "<br>\
    <b>Mean</b>: {:5.2f}\
    <br><b>Standard deviation</b>: {:5.2f}".format(mu, sig)
    
# Define a trace-generating function (returns a Scatter object)
def make_gaussian_trace(x, splt):
    mu, sig=norm.fit(x[~np.isnan(x)])  # compute fit coeffs (removing NaNs first)
    y_gaussian=gaussian(x_space,mu,sig) # eval fit at x_space
    return Scatter(
        x=x_space,      # x-coords corresp. to bins'
        y=y_gaussian,   # gaussian fit
        mode='lines',         # interpolate in-between points
        name='Gaussian fit',  # set the hover label
        text=np.repeat(make_text(mu,sig),len(x_space)), # (!) hover text for all pts 
        textfont=Font(size=14),      # increase text font size
        line=Line(color='#7570b3'),  # line color (a nice purple)
        xaxis='x1',           # (!) all subplots on same x-axis
        yaxis='y'+str(splt)   # (!) plot on y-axis of 'sbplt'
    )
    
# (3.3b) Update 'data' key in figure object with Scatter objects
fig['data'] += Data([
    make_gaussian_trace(homeworks, 1),
    make_gaussian_trace(midterm, 2),
    make_gaussian_trace(final, 3),
    make_gaussian_trace(course, 4)
])

Finally, update the plot's title:

In [28]:
# Update title annotation (1st entry in Annotations)
title = 'Fig. 3.5b: <b>Course Grade Distributions</b>'
fig['layout']['annotations'][0].update(text=title)

This gives

In [29]:
# (@) Send figure object to Plotly and show in notebook
py.iplot(fig, filename='s4_grades-bins-lines', width=width, height=height)
Out[29]:

Someone could win TA of the year with this plot.

4.4 Box plots

Box plots are a great way to present the distribution of an array of numbers. While the shape of an histogram depends strongly on the specified bin size and locations, box plots have no such parameters, which could be advantageous in some situations.

For readers unfamiliar with box plots:

  • The bottom and top of the box are the first and third quartiles respectively.
  • The band inside the box is the second quartile (i.e the median).

Plotly box plots have whiskers:

  • The end of the bottom whisker corresponds to the largest value within:
$$ Q_1 - 1.5\,(\mathrm{IQR}) \;.$$
  • The end of the top whisker corresponds to the smaller value within:
$$ Q_3 + 1.5\,(\mathrm{IQR}) \;.$$

Where $Q_1$ is the first quartile, $Q_3$ is the third quartile and $\mathrm{IQR}$ is the interquartile range (defined as $Q_3-Q_1$).

So, let's first make a simple box plot comparing the course grade distributions:

In [30]:
help(Box)  # call help()!
Help on class Box in module plotly.graph_objs.graph_objs:

class Box(PlotlyTrace)
 |  A dictionary-like object for representing a box trace in plotly.
 |  
 |  Example:
 |  
 |      >>> py.plot([Box(name='boxy', y=[1,3,9,2,4,2,3,5,2])])
 |  
 |  Online example:
 |  
 |      https://plot.ly/python/box-plots/
 |  
 |  Quick method reference:
 |  
 |      Box.update(changes)
 |      Box.strip_style()
 |      Box.get_data()
 |      Box.to_graph_objs()
 |      Box.validate()
 |      Box.to_string()
 |      Box.force_clean()
 |  
 |  Valid keys:
 |  
 |      y [required=True] (value=list or 1d numpy array of numbers, strings,
 |      datetimes) (streamable):
 |          This array is used to define an individual box plot, or, a
 |          concatenation of multiple box plots. Statistics from these numbers
 |          define the bounds of the box, the length of the whiskers, etc. For
 |          details on defining multiple boxes with locations see 'x'. Each box
 |          spans from the first quartile to the third. The second quartile is
 |          marked by a line inside the box. By default, the whiskers are
 |          correspond to box' edges +/- 1.5 times the interquartile range. See
 |          also 'boxpoints' for more info
 |  
 |      x0 [required=False] (value=number):
 |          The location of this box. When 'y' defines a single box, 'x0' can be
 |          used to set where this box is centered on the x-axis. If many boxes
 |          are set to appear at the same 'x0' location, they will form a box
 |          group.
 |  
 |      x [required=False] (value=list or 1d numpy array of numbers, strings,
 |      datetimes) (streamable):
 |          Usually, you do not need to set this value as plotly will handle box
 |          locations for you. However this allows you to have fine control over
 |          the location data for the box. Unlike making a bar, a box plot is
 |          made of many y values. Therefore, to give location data to the
 |          values you place in 'y', the length of 'x' must equal the length of
 |          'y'. when making multiple box plots, you can concatenate the data
 |          sets for each box into a single 'y' array. then, the entries in 'x'
 |          define which box plot each entry in 'y' belongs to. When making a
 |          single box plot, you must set each entry in 'x' to the same value,
 |          see 'x0' for a more practical way to handle this case. If you don't
 |          include 'x', the box will simply be assigned a location.
 |  
 |      name [required=False] (value=a string):
 |          The label associated with this trace. This name will appear in the
 |          legend, on hover and in the column header in the online spreadsheet.
 |  
 |      boxmean [required=False] (value=False | True | 'sd'):
 |          Choose between add-on features for this box trace. If True then the
 |          mean of the data linked to 'y' is shown as a dashed line in the box.
 |          If 'sd', then the standard deviation is also shown. If False (the
 |          default), then no line are shown.
 |  
 |      boxpoints [required=False] (value='outliers' | 'all' |
 |      'suspectedoutliers' | False):
 |          Choose between boxpoints options for this box trace. If 'outliers'
 |          (the default), then only the points lying outside the box' whiskers
 |          (more info in 'y') are shown. If 'all', then all data points linked
 |          'y' are shown. If 'suspectedoutliers', then outliers points are
 |          shown and points either less than 4*Q1-3*Q3 or greater than
 |          4*Q3-3*Q1 are highlighted (with 'outliercolor' in Marker). If False,
 |          then only the boxes are shown and the whiskers correspond to the
 |          minimum and maximum value linked to 'y'.
 |  
 |      jitter [required=False] (value=number: x in [0, 1]):
 |          Sets the width of the jitter in the boxpoints scatter in this trace.
 |          Has an no effect if 'boxpoints' is set to False. If 0, then the
 |          boxpoints are aligned vertically. If 1 then the boxpoints are placed
 |          in a random horizontal jitter of width equal to the width of the
 |          boxes.
 |  
 |      pointpos [required=False] (value=number: x in [-2, 2]):
 |          Sets the horizontal position of the boxpoints in relation to the
 |          boxes in this trace. Has an no effect if 'boxpoints' is set to
 |          False. If 0, then the boxpoints are placed over the center of each
 |          box. If 1 (-1), then the boxpoints are placed on the right (left)
 |          each box border. If 2 (-2), then the boxpoints are  placed 1 one box
 |          width to right (left) of each box.
 |  
 |      whiskerwidth [required=False] (value=number: x in [0, 1]):
 |          Sets the width of the whisker of the box relative to the box' width
 |          (in normalized coordinates, e.g. if 'whiskerwidth' set 1, then the
 |          whiskers are as wide as the box.
 |  
 |      fillcolor [required=False] (value=a string describing color):
 |          Sets the color of the box interior.
 |  
 |          Examples:
 |              'green' | 'rgb(0, 255, 0)' | 'rgba(0, 255, 0, 0.3)' |
 |              'hsl(120,100%,50%)' | 'hsla(120,100%,50%,0.3)' | '#434F1D'
 |  
 |      marker [required=False] (value=Marker object | dictionary-like object)
 |      (streamable):
 |          Links a dictionary-like object containing marker style parameters
 |          for this the boxpoints of box trace. Has an effect only 'boxpoints'
 |          is set to 'outliers', 'suspectedoutliers' or 'all'.
 |  
 |          For more, run `help(plotly.graph_objs.Marker)`
 |  
 |      line [required=False] (value=Line object | dictionary-like object)
 |      (streamable):
 |          Links a dictionary-like object containing line parameters for the
 |          border of this box trace (including the whiskers).
 |  
 |          For more, run `help(plotly.graph_objs.Line)`
 |  
 |      opacity [required=False] (value=number: x in [0, 1]):
 |          Sets the opacity, or transparency, of the entire object, also known
 |          as the alpha channel of colors. If the object's color is given in
 |          terms of 'rgba' color model, 'opacity' is redundant.
 |  
 |      xaxis [required=False] (value='x1' | 'x2' | 'x3' | etc.):
 |          This key determines which x-axis the x-coordinates of this trace
 |          will reference in the figure.  Values 'x1' and 'x' reference to
 |          'xaxis' in 'layout', 'x2' references to 'xaxis2' in 'layout', and so
 |          on. Note that 'x1' will always refer to 'xaxis' or 'xaxis1' in
 |          'layout', they are the same.
 |  
 |      yaxis [required=False] (value='y1' | 'y2' | 'y3' | etc.):
 |          This key determines which y-axis the y-coordinates of this trace
 |          will reference in the figure.  Values 'y1' and 'y' reference to
 |          'yaxis' in 'layout', 'y2' references to 'yaxis2' in 'layout', and so
 |          on. Note that 'y1' will always refer to 'yaxis' or 'yaxis1' in
 |          'layout', they are the same.
 |  
 |      showlegend [required=False] (value=a boolean: True | False):
 |          Toggle whether or not this trace will be labeled in the legend.
 |  
 |      stream [required=False] (value=Stream object | dictionary-like object):
 |          Links a dictionary-like object that initializes this trace as a
 |          writable-stream, for use with the streaming API.
 |  
 |          For more, run `help(plotly.graph_objs.Stream)`
 |  
 |      visible [required=False] (value=a boolean: True | False):
 |          Toggles whether or not this object will be visible on the rendered
 |          figure.
 |  
 |      xsrc [required=False] (value=a string equal to the unique identifier of
 |      a plotly grid column) (streamable):
 |          Usually, you do not need to set this value as plotly will handle box
 |          locations for you. However this allows you to have fine control over
 |          the location data for the box. Unlike making a bar, a box plot is
 |          made of many y values. Therefore, to give location data to the
 |          values you place in 'y', the length of 'x' must equal the length of
 |          'y'. when making multiple box plots, you can concatenate the data
 |          sets for each box into a single 'y' array. then, the entries in 'x'
 |          define which box plot each entry in 'y' belongs to. When making a
 |          single box plot, you must set each entry in 'x' to the same value,
 |          see 'x0' for a more practical way to handle this case. If you don't
 |          include 'x', the box will simply be assigned a location.
 |  
 |      ysrc [required=True] (value=a string equal to the unique identifier of a
 |      plotly grid column) (streamable):
 |          This array is used to define an individual box plot, or, a
 |          concatenation of multiple box plots. Statistics from these numbers
 |          define the bounds of the box, the length of the whiskers, etc. For
 |          details on defining multiple boxes with locations see 'x'. Each box
 |          spans from the first quartile to the third. The second quartile is
 |          marked by a line inside the box. By default, the whiskers are
 |          correspond to box' edges +/- 1.5 times the interquartile range. See
 |          also 'boxpoints' for more info
 |  
 |      type [required=False] (value='box'):
 |          Plotly identifier for this data's trace type.
 |  
 |  Method resolution order:
 |      Box
 |      PlotlyTrace
 |      PlotlyDict
 |      __builtin__.dict
 |      __builtin__.object
 |  
 |  Methods inherited from PlotlyTrace:
 |  
 |  __init__(self, *args, **kwargs)
 |  
 |  to_string(self, level=0, indent=4, eol='\n', pretty=True, max_chars=80)
 |      Returns a formatted string showing graph_obj constructors.
 |      
 |      Example:
 |      
 |          print(obj.to_string())
 |      
 |      Keyword arguments:
 |      level (default = 0) -- set number of indentations to start with
 |      indent (default = 4) -- set indentation amount
 |      eol (default = '\n') -- set end of line character(s)
 |      pretty (default = True) -- curtail long list output with a '...'
 |      max_chars (default = 80) -- set max characters per line
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from PlotlyDict:
 |  
 |  __setitem__(self, key, value)
 |  
 |  force_clean(self, caller=True)
 |      Attempts to convert to graph_objs and call force_clean() on values.
 |      
 |      Calling force_clean() on a PlotlyDict will ensure that the object is
 |      valid and may be sent to plotly. This process will also remove any
 |      entries that end up with a length == 0.
 |      
 |      Careful! This will delete any invalid entries *silently*.
 |  
 |  get_data(self)
 |      Returns the JSON for the plot with non-data elements stripped.
 |  
 |  get_ordered(self, caller=True)
 |  
 |  strip_style(self)
 |      Strip style from the current representation.
 |      
 |      All PlotlyDicts and PlotlyLists are guaranteed to survive the
 |      stripping process, though they made be left empty. This is allowable.
 |      
 |      Keys that will be stripped in this process are tagged with
 |      `'type': 'style'` in graph_objs_meta.json.
 |      
 |      This process first attempts to convert nested collections from dicts
 |      or lists to subclasses of PlotlyList/PlotlyDict. This process forces
 |      a validation, which may throw exceptions.
 |      
 |      Then, each of these objects call `strip_style` on themselves and so
 |      on, recursively until the entire structure has been validated and
 |      stripped.
 |  
 |  to_graph_objs(self, caller=True)
 |      Walk obj, convert dicts and lists to plotly graph objs.
 |      
 |      For each key in the object, if it corresponds to a special key that
 |      should be associated with a graph object, the ordinary dict or list
 |      will be reinitialized as a special PlotlyDict or PlotlyList of the
 |      appropriate `kind`.
 |  
 |  update(self, dict1=None, **dict2)
 |      Update current dict with dict1 and then dict2.
 |      
 |      This recursively updates the structure of the original dictionary-like
 |      object with the new entries in the second and third objects. This
 |      allows users to update with large, nested structures.
 |      
 |      Note, because the dict2 packs up all the keyword arguments, you can
 |      specify the changes as a list of keyword agruments.
 |      
 |      Examples:
 |      # update with dict
 |      obj = Layout(title='my title', xaxis=XAxis(range=[0,1], domain=[0,1]))
 |      update_dict = dict(title='new title', xaxis=dict(domain=[0,.8]))
 |      obj.update(update_dict)
 |      obj
 |      {'title': 'new title', 'xaxis': {'range': [0,1], 'domain': [0,.8]}}
 |      
 |      # update with list of keyword arguments
 |      obj = Layout(title='my title', xaxis=XAxis(range=[0,1], domain=[0,1]))
 |      obj.update(title='new title', xaxis=dict(domain=[0,.8]))
 |      obj
 |      {'title': 'new title', 'xaxis': {'range': [0,1], 'domain': [0,.8]}}
 |      
 |      This 'fully' supports duck-typing in that the call signature is
 |      identical, however this differs slightly from the normal update
 |      method provided by Python's dictionaries.
 |  
 |  validate(self, caller=True)
 |      Recursively check the validity of the keys in a PlotlyDict.
 |      
 |      The valid keys constitute the entries in each object
 |      dictionary in graph_objs_meta.json
 |      
 |      The validation process first requires that all nested collections be
 |      converted to the appropriate subclass of PlotlyDict/PlotlyList. Then,
 |      each of these objects call `validate` and so on, recursively,
 |      until the entire object has been validated.
 |  
 |  ----------------------------------------------------------------------
 |  Data descriptors inherited from PlotlyDict:
 |  
 |  __dict__
 |      dictionary for instance variables (if defined)
 |  
 |  __weakref__
 |      list of weak references to the object (if defined)
 |  
 |  ----------------------------------------------------------------------
 |  Methods inherited from __builtin__.dict:
 |  
 |  __cmp__(...)
 |      x.__cmp__(y) <==> cmp(x,y)
 |  
 |  __contains__(...)
 |      D.__contains__(k) -> True if D has a key k, else False
 |  
 |  __delitem__(...)
 |      x.__delitem__(y) <==> del x[y]
 |  
 |  __eq__(...)
 |      x.__eq__(y) <==> x==y
 |  
 |  __ge__(...)
 |      x.__ge__(y) <==> x>=y
 |  
 |  __getattribute__(...)
 |      x.__getattribute__('name') <==> x.name
 |  
 |  __getitem__(...)
 |      x.__getitem__(y) <==> x[y]
 |  
 |  __gt__(...)
 |      x.__gt__(y) <==> x>y
 |  
 |  __iter__(...)
 |      x.__iter__() <==> iter(x)
 |  
 |  __le__(...)
 |      x.__le__(y) <==> x<=y
 |  
 |  __len__(...)
 |      x.__len__() <==> len(x)
 |  
 |  __lt__(...)
 |      x.__lt__(y) <==> x<y
 |  
 |  __ne__(...)
 |      x.__ne__(y) <==> x!=y
 |  
 |  __repr__(...)
 |      x.__repr__() <==> repr(x)
 |  
 |  __sizeof__(...)
 |      D.__sizeof__() -> size of D in memory, in bytes
 |  
 |  clear(...)
 |      D.clear() -> None.  Remove all items from D.
 |  
 |  copy(...)
 |      D.copy() -> a shallow copy of D
 |  
 |  fromkeys(...)
 |      dict.fromkeys(S[,v]) -> New dict with keys from S and values equal to v.
 |      v defaults to None.
 |  
 |  get(...)
 |      D.get(k[,d]) -> D[k] if k in D, else d.  d defaults to None.
 |  
 |  has_key(...)
 |      D.has_key(k) -> True if D has a key k, else False
 |  
 |  items(...)
 |      D.items() -> list of D's (key, value) pairs, as 2-tuples
 |  
 |  iteritems(...)
 |      D.iteritems() -> an iterator over the (key, value) items of D
 |  
 |  iterkeys(...)
 |      D.iterkeys() -> an iterator over the keys of D
 |  
 |  itervalues(...)
 |      D.itervalues() -> an iterator over the values of D
 |  
 |  keys(...)
 |      D.keys() -> list of D's keys
 |  
 |  pop(...)
 |      D.pop(k[,d]) -> v, remove specified key and return the corresponding value.
 |      If key is not found, d is returned if given, otherwise KeyError is raised
 |  
 |  popitem(...)
 |      D.popitem() -> (k, v), remove and return some (key, value) pair as a
 |      2-tuple; but raise KeyError if D is empty.
 |  
 |  setdefault(...)
 |      D.setdefault(k[,d]) -> D.get(k,d), also set D[k]=d if k not in D
 |  
 |  values(...)
 |      D.values() -> list of D's values
 |  
 |  viewitems(...)
 |      D.viewitems() -> a set-like object providing a view on D's items
 |  
 |  viewkeys(...)
 |      D.viewkeys() -> a set-like object providing a view on D's keys
 |  
 |  viewvalues(...)
 |      D.viewvalues() -> an object providing a view on D's values
 |  
 |  ----------------------------------------------------------------------
 |  Data and other attributes inherited from __builtin__.dict:
 |  
 |  __hash__ = None
 |  
 |  __new__ = <built-in method __new__ of type object>
 |      T.__new__(S, ...) -> a new object with type S, a subtype of T

In [31]:
# Define a color for the boxes (a nice blue-green)
color = '#1c9099'

# Define a trace-generating function (returns a Box object)
def make_trace(y, name):
    return Box(
        y=y,        # list/array of values to be used for box plot
        name=name,  # label (on hover and x-axis)
        line=Line(color=color)  # line color, fills in box with lighter hue
                                # (!) use 'fillcolor' to manually set area color 
    )                           

# Make Data object made up of 4 Box objects
data = Data([
    make_trace(homeworks, 'Homeworks'),
    make_trace(midterm, 'Midterm Exam'),
    make_trace(final, 'Final Exam'),
    make_trace(course, 'Course Grade')
])

# A few layout options in Layout
layout = Layout(
    title='Fig 4.4a: Course Grade Distributions',
    yaxis=YAxis(title='Grade [%]'),
    showlegend=False
) 
In [32]:
# Make Figure object
fig = Figure(data=data, layout=layout)

# (@) Send to Plotly and show in notebook
py.iplot(fig, filename='s4_grades-box')
Out[32]:

Plotly automatically separates and labels individual Box objects on the x-axis. That said, Plotly also allows users to specify the position of each box using the 'x' and 'x0' keys. For more info, refer to the help(Box) documentation.

By default, data points outside the whiskers are plot as scatter point. These outliers can be remove by setting 'boxpoints':False in the Box object.

Plotly box plots have a custom hovermode. Hovering over any part of the box object labels the quartiles and the whiskers. Note that resetting 'hovermode' will disable it.

Now, let's add a couple style options to our box plot:

In [33]:
# Make dictionary of style updates
box_style = dict(
    boxpoints='all', # show all data pts on plot (default is 'outliers')
    jitter=0.5,      # spacing bt. boxpoints (0=no spacing, 1=from box to box)
    pointpos=-2,     # norm. pos. of boxpoints w.r.t. boxes
    marker=Marker(
        color='#feb24c', # set boxpoints' color
        size=10,         # set boxpoints' size
        line=Line(
            color='#FFFFFF',  # white line around boxpoints,
            width=1           #   1 pixel wide
        )
    )
)

# Update each (yes all 4 of them) Box objs 
#   using the graph object update() method
fig['data'].update(box_style)
In [34]:
# Update plot's title
fig['layout'].update(title='Fig 4.4b: Course Grade Distributions')

# (@) Send to Plotly and show in notebook
py.iplot(fig, filename='s4_grades-box-styled')
Out[34]:

Box plots are really a great to visualize data distributions.

Let's add one more feature. When the 'boxmean' key in Box is set to True, Plotly draws a dashed line inside the boxes representing the mean of the distribution. With 'boxmean':'sd', Plotly adds two triangles representing +/- one standard deviation.

Next, we also take the chance to the style our plot further. Consider

In [35]:
# Add 'boxmean' key (set to True, for just a line a the mean)
fig['data'].update(dict(boxmean='sd'))  

# (!) boxmean='sd' must be sent inside a dict here as .update() for list-like
#     Plotly graph object accept only dictionaries or lists of dictionaries.

# Define a dictionary of axis style options
axis_style = dict(
    zeroline=False,       # remove thick zero line
    gridcolor='#FFFFFF',  # white grid lines
    ticks='outside',      # draw ticks outside axes 
    showgrid=True,        # show grid line (for x-axis)
    ticklen=8,            # set tick length
    tickwidth=1.5         #   and width
)

# Update 'layout', add 'xaxis' key and link it to style dict
fig['layout'].update(xaxis=axis_style)

# Update 'yaxis'
fig['layout']['yaxis'].update(axis_style)

# Update a few frame style options
fig['layout'].update(
    plot_bgcolor='#EFECEA',  # set plot color to grey
    autosize=False,          # manual size
    width=650,               # plot's width
    height=500               #  and height
)
In [36]:
# Update plot's title
fig['layout'].update(title='Fig 4.4c: Course Grade Distributions')

# (@) Send to Plotly and show in notebook
py.iplot(fig, filename='s4_grades-box-styled2')
Out[36]:

Go to [Section 5 --- Heatmaps, Contours and 2D Histograms](https://plot.ly/python/heatmaps-contours-and-2dhistograms-tutorial)

Go to [Section 3 --- Bubble Charts](https://plot.ly/python/bubble-charts-tutorial)

Go back to [top of page](#Plotly's-Python-API-User-Guide)

Go back to User Guide's [homepage](https://plot.ly/python/user-guide)


Got Questions or Feedback?

About Plotly

  • email: [email protected]
  • tweet: @plotlygraphs

About the User Guide

  • email: [email protected]
  • tweet: @etpinard

Notebook styling ideas

Big thanks to


In [37]:
# CSS styling within IPython notebook
from IPython.display import display, HTML
display(HTML(open('../custom.css').read()))