matplotlib

Week 10 - Advanced Visualization

Today's Agenda

Today we will be focusing on plotting and on how to produce publication-ready plots

  • Modifying your matplotlibrc file
  • Matplotlib Style sheets
  • Seaborn
  • Bokeh

Modifying your matplotlibrc file

Matplotlib uses a special file called matplotlibrc file. This file is usually in your ~/.matplotlib/matplotlibrc

You can check the location of this file by typing:

In [1]:
%matplotlib inline
import matplotlib
matplotlib.matplotlib_fname()
Out[1]:
u'/Users/victor2/.matplotlib/matplotlibrc'
In [2]:
import numpy as np
import matplotlib.pyplot as plt

You can also edit the different parameters of the matplotlibrc file.

In [3]:
# List of possible values to be changed
import matplotlib as mpl
matplotlib.rcParams
print('Linewidth: {0}   Color: {1}'.format(
                                    mpl.rcParams['lines.linewidth'],
                                    mpl.rcParams['lines.color']))
Linewidth: 1.5   Color: k
In [4]:
# Changing line properties
mpl.rc('lines', linewidth=10, color='g')
print('Linewidth: {0}   Color: {1}'.format(
                                    mpl.rcParams['lines.linewidth'],
                                    mpl.rcParams['lines.color']))
Linewidth: 10.0   Color: g

Matplotlib Style sheets

Newer versions of matplotlib offer the option to setup a style sheet. For example, one can have a plot look like taken from ggplot or SuperMongo.

In [5]:
def plotting(stylename='classic'):
    # Defining data
    x = np.arange(0,10)
    y = np.random.randint(20,30,x.size)
    # Defining style sheet
    try:
        plt.style.use(stylename)
    except IOError,e:
        msg = '{0} not found'.format(stylename)
        raise IOError(msg)
    # Plotting
    plt.clf()
    plt.plot(x,y,'-ro', label=stylename)
    plt.xlabel('X label')
    plt.ylabel('X label')
    plt.legend(loc=1)
    plt.title('Plot using "{0}" Style'.format(stylename), fontsize=20 )
    plt.show()
In [6]:
plotting(stylename='classic')

Now with a new style sheet

In [7]:
plotting(stylename='ggplot')
In [8]:
for style in plt.style.available[0:5]:
    plotting(stylename=style)

To get a list of the all the styles available:

In [9]:
plt.style.available
Out[9]:
[u'seaborn-darkgrid',
 u'seaborn-notebook',
 u'classic',
 u'seaborn-ticks',
 u'grayscale',
 u'bmh',
 u'seaborn-talk',
 u'dark_background',
 u'ggplot',
 u'fivethirtyeight',
 u'seaborn-colorblind',
 u'seaborn-deep',
 u'seaborn-whitegrid',
 u'seaborn-bright',
 u'seaborn-poster',
 u'seaborn-muted',
 u'seaborn-paper',
 u'seaborn-white',
 u'seaborn-pastel',
 u'seaborn-dark',
 u'seaborn',
 u'seaborn-dark-palette']

Seaborn

Seaborn is visualization library based on matplotlib. It provides high-level interface for drawing attractive statistical graphics.

Some of the data here was taken from: http://blog.insightdatalabs.com/advanced-functionality-in-seaborn/

The Data

We'll be using the UCI "Auto MPG" data for the purpose of this module.

We'll be using pandas along with Seaborn.

In [10]:
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

%matplotlib inline
import seaborn as sns
sns.set_style('darkgrid')

Reading in the data that we'll be using:

In [11]:
names = [
       'mpg'
    ,  'cylinders'
    ,  'displacement'
    ,  'horsepower'
    ,  'weight'
    ,  'acceleration'
    ,  'model_year'
    ,  'origin'
    ,  'car_name'
]
df = pd.read_csv("http://archive.ics.uci.edu/ml/machine-learning-databases/auto-mpg/auto-mpg.data", sep='\s+', names=names)
df['maker'] = df.car_name.map(lambda x: x.split()[0])
df.origin = df.origin.map({1: 'America', 2: 'Europe', 3: 'Asia'})
df=df.applymap(lambda x: np.nan if x == '?' else x).dropna()
df['horsepower'] = df.horsepower.astype(float)
df.head()
Out[11]:
mpg cylinders displacement horsepower weight acceleration model_year origin car_name maker
0 18.0 8 307.0 130.0 3504.0 12.0 70 America chevrolet chevelle malibu chevrolet
1 15.0 8 350.0 165.0 3693.0 11.5 70 America buick skylark 320 buick
2 18.0 8 318.0 150.0 3436.0 11.0 70 America plymouth satellite plymouth
3 16.0 8 304.0 150.0 3433.0 12.0 70 America amc rebel sst amc
4 17.0 8 302.0 140.0 3449.0 10.5 70 America ford torino ford

Factorplot and FacetGrid

In [12]:
# sns.set_style(None)
sns.set_context('notebook')
sns.factorplot(data=df, x="model_year", y="mpg")
Out[12]:
<seaborn.axisgrid.FacetGrid at 0x111179e10>

We can start off by visualizing 'model_year' vs 'mpg' for each type of 'origin' class.
We can do this by using the factorplot command:

In [13]:
sns.factorplot(data=df, x="model_year", y="mpg", col="origin")
Out[13]:
<seaborn.axisgrid.FacetGrid at 0x10d4d3390>

You can easily change the type of graph that you're plotting

In [14]:
g = sns.FacetGrid(df, col="origin")
g.map(sns.distplot, "mpg")
Out[14]:
<seaborn.axisgrid.FacetGrid at 0x111f341d0>

Or look at a scatter plot of the data

In [15]:
g = sns.FacetGrid(df, col="origin")
g.map(plt.scatter, "horsepower", "mpg")
Out[15]:
<seaborn.axisgrid.FacetGrid at 0x112136650>

You can easily compute and plot a regression of the data

In [16]:
g = sns.FacetGrid(df, col="origin")
g.map(sns.regplot, "horsepower", "mpg")
plt.xlim(0, 250)
plt.ylim(0, 60)
Out[16]:
(0, 60)

Let's say you want to visualize the "Kernel Density Estimation" for each type

In [17]:
# Define new variable `tons`
df['tons'] = (df.weight/2000).astype(int)
# Create grid to plot your data
g = sns.FacetGrid(df, col="origin", row="tons")
# 1) Specify type of function
# 2) Specify 'x' and 'y' for each plot
g.map(sns.kdeplot, "horsepower", "mpg")
# Define the x- and y-limits for each subplot
plt.xlim(0, 250)
plt.ylim(0, 60)
Out[17]:
(0, 60)

pairplot and PairGrid

These functions allow you to plot pairwise relations in a dataset.
Let's say we want to plot the relation between mpg, horsepower, weight, and origin.
And we also want to separate them based on the origin.

No types

In [18]:
g = sns.PairGrid(df[["mpg", "horsepower", "weight", "origin"]])
g.map_upper(sns.regplot)
g.map_lower(sns.residplot)
g.map_diag(plt.hist)
for ax in g.axes.flat:
    plt.setp(ax.get_xticklabels(), rotation=45)
g.set(alpha=0.5)
Out[18]:
<seaborn.axisgrid.PairGrid at 0x113471510>

We can specify the types of origin

In [19]:
g = sns.PairGrid(df[["mpg", "horsepower", "weight", "origin"]], hue="origin")
g.map_upper(sns.regplot)
g.map_lower(sns.residplot)
g.map_diag(plt.hist)
for ax in g.axes.flat:
    plt.setp(ax.get_xticklabels(), rotation=45)
g.add_legend()
g.set(alpha=0.5)
Out[19]:
<seaborn.axisgrid.PairGrid at 0x113d733d0>

Or maker

In [20]:
g = sns.PairGrid(df[["mpg", "horsepower", "weight", "origin","maker"]], hue="maker")
g.map_upper(sns.regplot)
g.map_lower(sns.residplot)
g.map_diag(plt.hist)
for ax in g.axes.flat:
    plt.setp(ax.get_xticklabels(), rotation=45)
g.add_legend()
g.set(alpha=0.5)
Out[20]:
<seaborn.axisgrid.PairGrid at 0x1144435d0>

You can see, you have some freedom when it comes to what you want to plot~

In [21]:
g = sns.PairGrid(df[["mpg", "horsepower", "weight", "origin"]])
g.map_diag(sns.kdeplot)
g.map_offdiag(sns.kdeplot, cmap="Blues_d", n_levels=6);
/Users/victor2/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.py:545: UserWarning: No labelled objects found. Use label='...' kwarg on individual plots.
  warnings.warn("No labelled objects found. "