This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

6.2. Creating beautiful statistical plots with seaborn

  1. Let's import NumPy, matplotlib, and seaborn.
In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
  1. We generate a random dataset (following this example on seaborn's website: http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/linear_models.ipynb)
In [ ]:
x1 = np.random.randn(80)
x2 = np.random.randn(80)
x3 = x1 * x2
y1 = .5 + 2 * x1 - x2 + 2.5 * x3 + 3 * np.random.randn(80)
y2 = .5 + 2 * x1 - x2 + 2.5 * np.random.randn(80)
y3 = y2 + np.random.randn(80)
  1. Seaborn implements many easy-to-use statistical plotting functions. For example, here is how to create a violin plot (showing the distribution of several sets of points).
In [ ]:
plt.figure(figsize=(4,3));
sns.violinplot([x1,x2, x3]);
  1. Seaborn also implement all-in-one statistical visualization functions. For example, one can use a single function (regplot) to perform and display a linear regression between two variables.
In [ ]:
plt.figure(figsize=(4,3));
sns.regplot(x2, y2);
  1. Seaborn has built-in support for Pandas data structures. Here, we display the pairwise correlations between all variables defined in a DataFrame.
In [ ]:
df = pd.DataFrame(dict(x1=x1, x2=x2, x3=x3, 
                       y1=y1, y2=y2, y3=y3))
sns.corrplot(df);

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).