Dražen's humble introduction to

Becoming a master statistician with IPython notebook

What's IPython notebook? A web interface for a scientific Python shell.

Started by Fernando Perez to bring modern, open source tools to the research community.

Main features:

  • the power of Python at your fingertips :)
  • IPython, a "smarter" shell specialized for scientific workflows
  • Markdown annotations and inline plots as a perfect "lab notebook"

IPython

Basically a Python interpreter with some nice features...

  • Auto-completion and documentation
In [1]:
a=range(5)
a
Out[1]:
[0, 1, 2, 3, 4]
In [2]:
a.remove(3)
In [3]:
a.remove?
In [4]:
a
Out[4]:
[0, 1, 2, 4]
  • Clean stack traces
In [5]:
def divide():
    a = 1
    b = 4
    c = b / (a-1)

divide()
---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-5-a8416233c05c> in <module>()
      4     c = b / (a-1)
      5 
----> 6 divide()

<ipython-input-5-a8416233c05c> in divide()
      2     a = 1
      3     b = 4
----> 4     c = b / (a-1)
      5 
      6 divide()

ZeroDivisionError: integer division or modulo by zero
  • bash commands
In [ ]:
pwd
In [ ]:
! open unpause_action.pdf
  • magic commands (enter %magic for some extra info)
In [ ]:
%timeit a*3

You can easily install it on your server to have a consistent environment

sudo apt-get install ipython

IPython notebook

The ultimate lab notebook! Inside it you can use cool features such as:

  • Markdown annotations
  • Syntax highlighting

Can you believe it? I used to code in Java!

private static int maxValue(char[] chars) {
    int max = chars[0];
    for (int ktr = 0; ktr < chars.length; ktr++) {
        if (chars[ktr] > max) {
            max = chars[ktr];
        }
    }
    return max;
}

  • $\LaTeX$ formulæ

$E = mc^2 \ne \sum_{i \in N}e^i + \int_0^\infty e^{-x}dx$

  • Videos!
In [6]:
from IPython.lib.display import YouTubeVideo
YouTubeVideo('HaS4NXxL5Qc')
Out[6]:
  • inline plots + imported scipy libraries = MATLAB replacement
In [7]:
x = linspace(0, 2*pi)
y = sin(x)
plot(x,y)
show()

Cool libraries

Pandas

Analysing time series data about wind farm power generation. (kaggle competition)

In [15]:
! head data/kaggle/wind_forecast/train.csv
date,wp1,wp2,wp3,wp4,wp5,wp6,wp7
2009070100,0.045,0.233,0.494,0.105,0.056,0.118,0.051
2009070101,0.085,0.249,0.257,0.105,0.066,0.066,0.051
2009070102,0.02,0.175,0.178,0.033,0.015,0.026,0
2009070103,0.06,0.085,0.109,0.022,0.01,0.013,0
2009070104,0.045,0.032,0.079,0.039,0.01,0,0
2009070105,0.035,0.011,0.099,0.066,0.015,0.013,0
2009070106,0.005,0,0.069,0.105,0.015,0.079,0
2009070107,0,0.011,0,0.017,0.025,0.013,0.025
2009070108,0,0.016,0,0.017,0.046,0,0
In [9]:
import pandas as pd

def format_timestamp(raw):
        return '%s %s:00' % (raw[:-2], raw[-2:])

wind = pd.read_csv('data/kaggle/wind_forecast/train.csv', parse_dates=['date'], index_col=['date'], converters={'date':format_timestamp})
wind
Out[9]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 18757 entries, 2009-07-01 00:00:00 to 2012-06-26 12:00:00
Data columns:
wp1    18757  non-null values
wp2    18757  non-null values
wp3    18757  non-null values
wp4    18757  non-null values
wp5    18757  non-null values
wp6    18757  non-null values
wp7    18757  non-null values
dtypes: float64(7)
In [10]:
wind.index
Out[10]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-07-01 00:00:00, ..., 2012-06-26 12:00:00]
Length: 18757, Freq: None, Timezone: None
In [11]:
a = '2009-07-01'
b = '2009-07-03'
wind[a:b].plot()
ylabel('normalized wind power')
Out[11]:
<matplotlib.text.Text at 0x10ddab450>
In [12]:
wind['wp1'][a:b].plot(color='red', label='raw power')
pd.ewma(wind['wp1'], span=100)[a:b].plot(color='blue', label='smoothed')
legend()
Out[12]:
<matplotlib.legend.Legend at 0x10ea42f90>

Sympy

In [13]:
from sympy import *
v = symbols('v')
integrate(log(v), v)
Out[13]:
v*log(v) - v

The future of IPython?

$1.15M Sloan Foundation Grant

  • more focus on notebook interactivity through JavaScript
  • integrated version control
  • multi-user support
  • other languages

Installation

Installing everything in Ubuntu is easy

sudo apt-get install ipython-notebook

In another OS it might give you some headache - better to install a Python distribution such as Enthought or Python(x,y)

Take a stroll to the cheese shop for the necessary packages

sudo pip install ipython[notebook]

Conclusion

Wes McKinney

If you're not using IPython, you're doing something wrong.

-- [Wes McKinney](http://blog.wesmckinney.com/), creator of Pandas and author of "Python for Data Analysis"

Thank you!

http://nbviewer.ipython.org/5792121

References