Pandas + Plotly!

Aim: To show how simple it is to plot a Pandas DataFrame using Plotly's IPython interface

Importing Pandas, NumPy and plotly (also importing getpass to type the password in the console running IPython)

In [1]:
import pandas as pd
import numpy as np
import plotly
import getpass

Instantiating Plotly with username and API key

In [ ]:
api_key=getpass.getpass()
p = plotly.plotly('nipun.batra.1', api_key)

Create a random Pandas DataFrame consisting of 3 columns and 100 rows

In [2]:
df=pd.DataFrame({'A':np.random.rand(100), 'B':np.random.rand(100),'C':np.random.rand(100)} ,index= np.array(range(100)))
In [3]:
df
Out[3]:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 100 entries, 0 to 99
Data columns (total 3 columns):
A    100  non-null values
B    100  non-null values
C    100  non-null values
dtypes: float64(3)
In [4]:
df.describe()
Out[4]:
A B C
count 100.000000 100.000000 100.000000
mean 0.466735 0.497376 0.506264
std 0.272216 0.286994 0.266733
min 0.007997 0.007294 0.000326
25% 0.234741 0.243540 0.296997
50% 0.462034 0.521211 0.511778
75% 0.690845 0.711072 0.730888
max 0.986756 0.993439 0.979052

Standard plot produced by Matplotlib/Pandas

In [5]:
df.plot()
Out[5]:
<matplotlib.axes.AxesSubplot at 0x4a24e50>

Function to create Plotly series ([x1,y1,x2,y2,....]) from the Pandas DataFrame

In [31]:
def df_to_iplot(df):
    
    '''
    Coverting a Pandas Data Frame to Plotly interface
    '''
    x = df.index.values
    lines={}
    for key in df:
        lines[key]={}
        lines[key]["x"]=x
        lines[key]["y"]=df[key].values
        lines[key]["name"]=key

        #Appending all lines
    lines_plotly=[lines[key] for key in df]
    return lines_plotly

Plotting the DataFrame using iplot

In [32]:
p.iplot(df_to_iplot(df))

Out[32]:

That is it! You can pan, zoom and do a bunch more now!

Now let us try some time series

In [47]:
date_rng = pd.date_range('2013-01-01 00:00','2013-01-03 10:00',freq='300s')
In [48]:
df2=pd.DataFrame({'A':np.random.rand(len(date_rng)), 'B':np.random.rand(len(date_rng))}, index=date_rng)
In [49]:
df2
Out[49]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 697 entries, 2013-01-01 00:00:00 to 2013-01-03 10:00:00
Freq: 300S
Data columns (total 2 columns):
A    697  non-null values
B    697  non-null values
dtypes: float64(2)

Standard Pandas plot

In [50]:
df2.plot()
Out[50]:
<matplotlib.axes.AxesSubplot at 0x504c150>
In [51]:
p.iplot(df_to_iplot(df2))

Out[51]:

We now observe that the x axis is showing the epoch in nano seconds rather than datetime

We modify the function defined above to take care of this

In [92]:
def df_to_iplot(df):
    
    '''
    Coverting a Pandas Data Frame to Plotly interface
    '''
    
    if df.index.__class__.__name__=="DatetimeIndex":
        #Convert the index to MySQL Datetime like strings
        x=df.index.format()
        #Alternatively, directly use x, since DateTime index is np.datetime64
        #see http://nbviewer.ipython.org/gist/cparmer/7721116  
        #x=df.index.values.astype('datetime64[s]')
    else:
        x = df.index.values        
        
    lines={}
    for key in df:
        lines[key]={}
        lines[key]["x"]=x
        lines[key]["y"]=df[key].values
        lines[key]["name"]=key

    #Appending all lines
    lines_plotly=[lines[key] for key in df]
    return lines_plotly
In [93]:
p.iplot(df_to_iplot(df2))

Out[93]:
In [34]:
from IPython.display import HTML
import requests

Styling up the IPython notebook. Stylesheet courtesy Cam Davidson Pilon and his Book Bayesian Methods for Hackers

In [35]:
styles = requests.get("https://raw.github.com/CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/master/styles/custom.css")
HTML(styles.text)
Out[35]:

Contact:

Nipun Batra PhD Student, IIIT Delhi

Webpage Twitter