Seaborn Release 0.9 Highlights

Full post on Practical Business Python

In [1]:
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt

Now that imports are done, enable inline display, set seaborn style and read in the file

In [2]:
%matplotlib inline
In [3]:
sns.set()
In [4]:
# File is in the data directory of the repo
df = pd.read_csv("https://raw.githubusercontent.com/chris1610/pbpython/master/data/MN_Traffic_Fatalities.csv")
In [5]:
df
Out[5]:
County Twin_Cities Pres_Election Public_Transport(%) Travel_Time Population 2012 2013 2014 2015 2016
0 Hennepin Yes Clinton 7.2 23.2 1237604 33 42 34 33 45
1 Dakota Yes Clinton 3.3 24.0 418432 19 19 10 11 28
2 Anoka Yes Trump 3.4 28.2 348652 25 12 16 11 20
3 St. Louis No Clinton 2.4 19.5 199744 11 19 8 16 19
4 Ramsey Yes Clinton 6.4 23.6 540653 19 12 12 18 15
5 Washington Yes Clinton 2.3 25.8 253128 8 10 8 12 13
6 Olmsted No Clinton 5.2 17.5 153039 2 12 8 14 12
7 Cass No Trump 0.9 23.3 28895 6 5 6 4 10
8 Pine No Trump 0.8 30.3 28879 14 7 4 9 10
9 Becker No Trump 0.5 22.7 33766 4 3 3 1 9

Create a basic scatter plot

In [6]:
sns.scatterplot(x='2016', y='Travel_Time', data=df)
Out[6]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f4542cb0828>

Try a different style based on the Pres_Election column

In [7]:
sns.scatterplot(x='2016', y='Travel_Time', style='Pres_Election', data=df)
Out[7]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f4543531128>

Size of the marks can be controlled via the size parameter

In [8]:
sns.scatterplot(x='2016', y='Travel_Time', size='Population', data=df)
Out[8]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a3b7048>

Show size and hue

In [9]:
sns.scatterplot(x='2016', y='Travel_Time', size='Population', hue='Twin_Cities', data=df)
Out[9]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a22a6d8>

Need to create a tidy data frame in order to be most effective with the subsequent plots

In [10]:
df_melted = pd.melt(df, id_vars=['County', 'Twin_Cities', 'Pres_Election', 
                               'Public_Transport(%)', 'Travel_Time', 'Population'], 
                  value_vars=['2016', '2015', '2014', '2013', '2012'], 
                  value_name='Fatalities',
                  var_name=['Year']
                 )

Here's what the data looks like for Hennepin County

In [11]:
df_melted[df_melted.County == "Hennepin"]
Out[11]:
County Twin_Cities Pres_Election Public_Transport(%) Travel_Time Population Year Fatalities
0 Hennepin Yes Clinton 7.2 23.2 1237604 2016 45
10 Hennepin Yes Clinton 7.2 23.2 1237604 2015 33
20 Hennepin Yes Clinton 7.2 23.2 1237604 2014 34
30 Hennepin Yes Clinton 7.2 23.2 1237604 2013 42
40 Hennepin Yes Clinton 7.2 23.2 1237604 2012 33

Now that we have tidy data, let's plot some line plots

In [12]:
sns.lineplot(x='Year', y='Fatalities', data=df_melted, hue='Twin_Cities')
Out[12]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a241cf8>

Disable the confidence interval

In [13]:
sns.lineplot(x='Year', y='Fatalities', data=df_melted, hue='Twin_Cities', ci=False)
Out[13]:
<matplotlib.axes._subplots.AxesSubplot at 0x7f453a14e780>

relplot supports line and scatter plots on a FacetGrid. relplots provide more plotting options than the basic line or scatter plot

In [14]:
sns.relplot(x='2016', y='Travel_Time', size='Population', hue='Twin_Cities',
            sizes=(100, 200), style='Pres_Election', data=df, legend='brief')
Out[14]:
<seaborn.axisgrid.FacetGrid at 0x7f453a110eb8>
In [15]:
# Here's how to review 2016 only data
df_melted.query("Year == '2016'")
Out[15]:
County Twin_Cities Pres_Election Public_Transport(%) Travel_Time Population Year Fatalities
0 Hennepin Yes Clinton 7.2 23.2 1237604 2016 45
1 Dakota Yes Clinton 3.3 24.0 418432 2016 28
2 Anoka Yes Trump 3.4 28.2 348652 2016 20
3 St. Louis No Clinton 2.4 19.5 199744 2016 19
4 Ramsey Yes Clinton 6.4 23.6 540653 2016 15
5 Washington Yes Clinton 2.3 25.8 253128 2016 13
6 Olmsted No Clinton 5.2 17.5 153039 2016 12
7 Cass No Trump 0.9 23.3 28895 2016 10
8 Pine No Trump 0.8 30.3 28879 2016 10
9 Becker No Trump 0.5 22.7 33766 2016 9
In [16]:
sns.relplot(x='Fatalities', y='Travel_Time', size='Population', hue='Twin_Cities',
            sizes=(100, 200), data=df_melted.query("Year == '2016'"))
Out[16]:
<seaborn.axisgrid.FacetGrid at 0x7f4539fe17b8>

We can split the data into two columns with the col keyword

In [17]:
sns.relplot(x='Fatalities', y='Travel_Time', size='Population', hue='Twin_Cities',
            sizes=(100, 200), col='Pres_Election', data=df_melted.query("Year == '2016'"))
Out[17]:
<seaborn.axisgrid.FacetGrid at 0x7f4539fc1588>

We can use kind='line' to create line plots on the FacetGrid

In [18]:
sns.relplot(x='Year', y='Fatalities', data=df_melted, kind='line', hue='Twin_Cities', col='Pres_Election')
Out[18]:
<seaborn.axisgrid.FacetGrid at 0x7f4539f85208>

This example has rows and columns

In [19]:
sns.relplot(x='Year', y='Fatalities', data=df_melted, kind='line', size='Population',
            row='Twin_Cities', col='Pres_Election')
Out[19]:
<seaborn.axisgrid.FacetGrid at 0x7f4539dba6d8>

factorplot is deprecated and replaced with catplot

In [20]:
sns.factorplot(x='Fatalities', y='County', data=df_melted)
/home/chris/miniconda3/envs/pbp3/lib/python3.6/site-packages/seaborn/categorical.py:3666: UserWarning: The `factorplot` function has been renamed to `catplot`. The original name will be removed in a future release. Please update your code. Note that the default `kind` in `factorplot` (`'point'`) has changed `'strip'` in `catplot`.
  warnings.warn(msg)
Out[20]:
<seaborn.axisgrid.FacetGrid at 0x7f4539d0c6d8>

Here's how to use the category plot to replicate the factorplot

In [21]:
sns.catplot(x='Fatalities', y='County', data=df_melted, kind='point')
Out[21]:
<seaborn.axisgrid.FacetGrid at 0x7f45399c8c50>

Try a boxplot

In [22]:
sns.catplot(x='Year', y='Fatalities', kind='box', data=df_melted)
Out[22]:
<seaborn.axisgrid.FacetGrid at 0x7f453993b2b0>

A default catplot with two columns

In [23]:
sns.catplot(x='Year', y='Fatalities', data=df_melted, col='Twin_Cities')
Out[23]:
<seaborn.axisgrid.FacetGrid at 0x7f4539934668>

Change colors with hue

In [24]:
sns.catplot(y='County', x='Fatalities', data=df_melted, col='Twin_Cities', hue='Year')
Out[24]:
<seaborn.axisgrid.FacetGrid at 0x7f45384ff780>

Use a catplot with the newly named boxen plot

In [25]:
sns.catplot(x='Year', y='Fatalities', data=df_melted, col='Twin_Cities', kind='boxen')
Out[25]:
<seaborn.axisgrid.FacetGrid at 0x7f453849f438>
In [26]:
# Here's a little easter egg
sns.dogplot()