Dražen's humble introduction to

Becoming a master statistician with IPython notebook¶

What's IPython notebook? A web interface for a scientific Python shell.

Started by Fernando Perez to bring modern, open source tools to the research community.

Main features:

• the power of Python at your fingertips :)
• IPython, a "smarter" shell specialized for scientific workflows
• Markdown annotations and inline plots as a perfect "lab notebook"

IPython¶

Basically a Python interpreter with some nice features...

• Auto-completion and documentation
In [1]:
a=range(5)
a

Out[1]:
[0, 1, 2, 3, 4]
In [2]:
a.remove(3)

In [3]:
a.remove?

In [4]:
a

Out[4]:
[0, 1, 2, 4]
• Clean stack traces
In [5]:
def divide():
a = 1
b = 4
c = b / (a-1)

divide()

---------------------------------------------------------------------------
ZeroDivisionError                         Traceback (most recent call last)
<ipython-input-5-a8416233c05c> in <module>()
4     c = b / (a-1)
5
----> 6 divide()

<ipython-input-5-a8416233c05c> in divide()
2     a = 1
3     b = 4
----> 4     c = b / (a-1)
5
6 divide()

ZeroDivisionError: integer division or modulo by zero
• bash commands
In [ ]:
pwd

In [ ]:
! open unpause_action.pdf

• magic commands (enter %magic for some extra info)
In [ ]:
%timeit a*3


You can easily install it on your server to have a consistent environment

sudo apt-get install ipython

IPython notebook¶

The ultimate lab notebook! Inside it you can use cool features such as:

• Markdown annotations
• Syntax highlighting

Can you believe it? I used to code in Java!

private static int maxValue(char[] chars) {
int max = chars[0];
for (int ktr = 0; ktr < chars.length; ktr++) {
if (chars[ktr] > max) {
max = chars[ktr];
}
}
return max;
}


• $\LaTeX$ formulæ

$E = mc^2 \ne \sum_{i \in N}e^i + \int_0^\infty e^{-x}dx$

• Videos!
In [6]:
from IPython.lib.display import YouTubeVideo

Out[6]:
• inline plots + imported scipy libraries = MATLAB replacement
In [7]:
x = linspace(0, 2*pi)
y = sin(x)
plot(x,y)
show()


Cool libraries¶

Pandas¶

Analysing time series data about wind farm power generation. (kaggle competition)

In [15]:
! head data/kaggle/wind_forecast/train.csv

date,wp1,wp2,wp3,wp4,wp5,wp6,wp7
2009070100,0.045,0.233,0.494,0.105,0.056,0.118,0.051
2009070101,0.085,0.249,0.257,0.105,0.066,0.066,0.051
2009070102,0.02,0.175,0.178,0.033,0.015,0.026,0
2009070103,0.06,0.085,0.109,0.022,0.01,0.013,0
2009070104,0.045,0.032,0.079,0.039,0.01,0,0
2009070105,0.035,0.011,0.099,0.066,0.015,0.013,0
2009070106,0.005,0,0.069,0.105,0.015,0.079,0
2009070107,0,0.011,0,0.017,0.025,0.013,0.025
2009070108,0,0.016,0,0.017,0.046,0,0

In [9]:
import pandas as pd

def format_timestamp(raw):
return '%s %s:00' % (raw[:-2], raw[-2:])

wind = pd.read_csv('data/kaggle/wind_forecast/train.csv', parse_dates=['date'], index_col=['date'], converters={'date':format_timestamp})
wind

Out[9]:
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 18757 entries, 2009-07-01 00:00:00 to 2012-06-26 12:00:00
Data columns:
wp1    18757  non-null values
wp2    18757  non-null values
wp3    18757  non-null values
wp4    18757  non-null values
wp5    18757  non-null values
wp6    18757  non-null values
wp7    18757  non-null values
dtypes: float64(7)
In [10]:
wind.index

Out[10]:
<class 'pandas.tseries.index.DatetimeIndex'>
[2009-07-01 00:00:00, ..., 2012-06-26 12:00:00]
Length: 18757, Freq: None, Timezone: None
In [11]:
a = '2009-07-01'
b = '2009-07-03'
wind[a:b].plot()
ylabel('normalized wind power')

Out[11]:
<matplotlib.text.Text at 0x10ddab450>
In [12]:
wind['wp1'][a:b].plot(color='red', label='raw power')
pd.ewma(wind['wp1'], span=100)[a:b].plot(color='blue', label='smoothed')
legend()

Out[12]:
<matplotlib.legend.Legend at 0x10ea42f90>

Sympy¶

In [13]:
from sympy import *
v = symbols('v')
integrate(log(v), v)

Out[13]:
v*log(v) - v

The future of IPython?¶

\$1.15M Sloan Foundation Grant

• more focus on notebook interactivity through JavaScript
• integrated version control
• multi-user support
• other languages

Installation¶

Installing everything in Ubuntu is easy

sudo apt-get install ipython-notebook



In another OS it might give you some headache - better to install a Python distribution such as Enthought or Python(x,y)

Take a stroll to the cheese shop for the necessary packages

sudo pip install ipython[notebook]



Conclusion¶

If you're not using IPython, you're doing something wrong.

-- [Wes McKinney](http://blog.wesmckinney.com/), creator of Pandas and author of "Python for Data Analysis"