E. Tufte Slope Graphs contest

So here is my entry for the slope Graph contest. (You can find the initial bounty description here )

Installation

Dependancies

This script is written in Python and relies on Numpy, Pandas and Matplotlib.
The easiest way to have a clean and robust install is to download one the great Scientific Python distribution, namely :

Everything you'll need is included. I personally use Anaconda from the guys at Continuum Analytics. All of them should work on Linux, Windows and Mac.

Sources

  1. Grab the sources at https://github.com/pascal-schetelat/Slope

  2. Launch Spyder, the Scientific python IDE bundled with Anaconda

  3. Set the Spyder working directory where plotSlope.py is and import it in the console :

    >>> from plotSlope import slope
     >>> import pandas as pd
    

    You are good to go.

Interactive example

In [14]:
# Initial import. Load code to the console
from plotSlope import slope
import pandas as pd
import os
In [20]:
# Load data from a csv file using pandas and display it 
data = pd.read_csv(os.path.join('Data','EU_GDP_2007_2013.csv'),index_col=0,na_values='-')
data.head()/1000
Out[20]:
2007 2008 2009 2010 2011 2012 2013
Austria 274.0198 282.7460 274.8182 286.1973 300.8913 310.1333 322.1904
Belgium 335.6100 346.1300 340.3980 354.3780 370.4364 381.7799 396.2738
Bulgaria 30.7724 35.4305 34.9328 36.0335 38.9899 NaN NaN
Croatia 43.3804 47.7602 45.6661 45.8992 46.0216 46.7810 48.1752
Cyprus 15.9015 17.1571 16.8535 17.3336 17.9286 18.4096 19.1675
In [24]:
# Specify an optional dictionnary of color 
color  = {"France":'blue',
          'Germany':'red',
          'Ireland':'chocolate',
          'United Kingdom': 'purple', 
          'Poland':'green'}
color
Out[24]:
{'France': 'blue',
 'Germany': 'red',
 'Ireland': 'chocolate',
 'Poland': 'green',
 'United Kingdom': 'purple'}
In [32]:
f = slope(data/1000,kind='interval',color=color,
          height= 18,width=20,font_size=15,font_family='GillSans',dpi=150,savename='test.png',
          title = u'European GPD until 2010 and forecasts at market prices (billions of Euro) source : EUROSTAT') 

Notice that when the vertical density of lines is greater than the maximal density of text, labels are lumped together.
The layout is mainly controlled by the ratio of the font size and of the figure size.
For instance lets increase the font size from 15 to 20 :

In [34]:
# Changing font and figure size.
# Notice how some label get lumped together
f = slope(data/1000,width =30,height= 12,kind='interval',font_size=20,marker='%0.f',color=None,savename=None,dpi=200) 

The only issue is that lumped labels tends to be very long and make the figure very large.
Another option is to suppress the Marker. In the mean time, lets reduce the figure width:

In [37]:
# Without numbers
f = slope(data/1000,width =15,height= 12,kind='interval',marker = None,font_size=12,color=color,savename=None,dpi=200) 

Other data :

In [41]:
cancer_data = pd.read_csv(os.path.join('Data','cancer_survival_rate.csv'),index_col=0)
cancer_data.head()
Out[41]:
5 year 10 year 15 year 20 year
Cancer type
Prostate 99 95 87 81
Thyroid 96 96 94 95
Testis 95 94 91 88
Melanomas 89 87 84 83
Breast 86 78 71 65
In [42]:
f = slope(cancer_data,height= 18,width=18,font_size=20,savename='cancer.png')   

Lets add some color

In [46]:
cancer_color = {'Prostate':'chocolate','Leukemia':'red'}
In [47]:
f = slope(cancer_data,height= 18,width=18,font_size=20,color=cancer_color,savename='cancer.png')