#!/usr/bin/env python
# coding: utf-8

# > This is one of the 100 recipes of the [IPython Cookbook](http://ipython-books.github.io/), the definitive guide to high-performance scientific computing and data science in Python.
# 

# # 6.2. Creating beautiful statistical plots with seaborn

# 1. Let's import NumPy, matplotlib, and seaborn.

# In[ ]:


import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
get_ipython().run_line_magic('matplotlib', 'inline')


# 2. We generate a random dataset (following this example on seaborn's website: http://nbviewer.ipython.org/github/mwaskom/seaborn/blob/master/examples/linear_models.ipynb)

# In[ ]:


x1 = np.random.randn(80)
x2 = np.random.randn(80)
x3 = x1 * x2
y1 = .5 + 2 * x1 - x2 + 2.5 * x3 + 3 * np.random.randn(80)
y2 = .5 + 2 * x1 - x2 + 2.5 * np.random.randn(80)
y3 = y2 + np.random.randn(80)


# 2. Seaborn implements many easy-to-use statistical plotting functions. For example, here is how to create a violin plot (showing the distribution of several sets of points).

# In[ ]:


plt.figure(figsize=(4,3));
sns.violinplot([x1,x2, x3]);


# 4. Seaborn also implement all-in-one statistical visualization functions. For example, one can use a single function (`regplot`) to perform *and* display a linear regression between two variables.

# In[ ]:


plt.figure(figsize=(4,3));
sns.regplot(x2, y2);


# 5. Seaborn has built-in support for Pandas data structures. Here, we display the pairwise correlations between all variables defined in a `DataFrame`.

# In[ ]:


df = pd.DataFrame(dict(x1=x1, x2=x2, x3=x3, 
                       y1=y1, y2=y2, y3=y3))
sns.corrplot(df);


# > You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
# 
# > [IPython Cookbook](http://ipython-books.github.io/), by [Cyrille Rossant](http://cyrille.rossant.net), Packt Publishing, 2014 (500 pages).