Python for scientific computing

Marcos Duarte
Laboratory of Biomechanics and Motor Control http://demotu.org
Federal University of ABC, Brazil

This talk

The Python programming language with its ecosystem for scientific programming has features, maturity, and a community of developers and users that makes it the ideal environment for the scientific community.

This talk will show some of these features and usage examples.

If you are viewing this notebook online (served by http://nbviewer.ipython.org), you can click the button 'View as Slides' on the toolbar above to start the slide show.

The lifecycle of a scientific idea

In [1]:
from IPython.display import Image
Image(filename='../images/lifecycle_FPerez.png')  # From F. Perez
Out[1]:

About Python

Python is a programming language that lets you work more quickly and integrate your systems more effectively. You can learn to use Python and see almost immediate gains in productivity and lower maintenance costs [python.org].

Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, well suited for Rapid Application Development and for scripting or glue language to connect existing components. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and standard libraries are available without charge for all major platforms, and can be freely distributed [Python documentation].

About me

As a scientist, what I do it's similar to this other fellow:

In [2]:
from IPython.display import YouTubeVideo
YouTubeVideo('9ZlBUglE6Hc', width=480, height=360, rel=0)
Out[2]:

Python ecosystem for scientific computing (main libraries)

  • Numpy: fundamental package for scientific computing with a N-dimensional array package.
  • Scipy: numerical routines for scientific computing.
  • Matplotlib: comprehensive 2D Plotting.
  • Sympy: symbolic mathematics.
  • Pandas: data structures and data analysis tools.
  • Jupyter Notebook: web application for creating and sharing documents with live code, equations, visualizations and text.
  • Statsmodels: to explore data, estimate statistical models, and perform statistical tests.
  • Scikit-learn: tools for data mining and data analysis (including machine learning).
  • Pillow: Python Imaging Library.
  • Spyder: interactive development environment.

Why Python and not 'X' (put any other language here)

Python is not the best programming language for all needs and for all people. There is no such language. But, if you are doing scientific computing, chances are that Python is perfect for you because:

  1. Python is free, open source, and cross-platform.
  2. Python is easy to learn, with readable code, well documented, and with a huge and friendly user community.
  3. Python is a real programming language, able to handle a variety of problems, easy to scale from small to huge problems, and easy to integrate with other systems (including other programming languages).
  4. Python code is not the fastest but Python is one the fastest languages for programming. It is not uncommon in science to care more about the time we spend programming than the time the program took to run. But if code speed is important, one can easily integrate in different ways a code written in other languages (such as C and Fortran) with Python.
  5. The Jupyter Notebook is a versatile tool for programming, data visualization, plotting, simulation, numeric and symbolic mathematics, and writing for daily use.

Popularity of Python for teaching

In [3]:
from IPython.display import IFrame
IFrame('http://cacm.acm.org/blogs/blog-cacm/176450-python-is-now-the-most-popular-' +
       'introductory-teaching-language-at-top-us-universities/fulltext',
       width='100%', height=450)
Out[3]:

The Jupyter Notebook

The Jupyter Notebook App is a server-client application that allows editing and running notebook documents via a web browser. The Jupyter Notebook App can be executed on a local desktop requiring no internet access (as described in this document) or installed on a remote server and accessed through the internet.

Notebook documents (or “notebooks”, all lower case) are documents produced by the Jupyter Notebook App which contain both computer code (e.g. python) and rich text elements (paragraph, equations, figures, links, etc...). Notebook documents are both human-readable documents containing the analysis description and the results (figures, tables, etc..) as well as executable documents which can be run to perform data analysis.

Try Jupyter Notebook in your browser.

In [4]:
from IPython.display import IFrame
IFrame('https://jupyter.org/', width='100%', height=450)
Out[4]:

Jupyter Notebook and IPython kernel architectures

Jupyter Notebook and IPython kernel architectures

Installing the Python ecosystem

The easy way
The easiest way to get Python and the most popular packages for scientific programming is to install them with a Python distribution such as Anaconda. In fact, you don't even need to install Python in your computer, you can run Python for scientific programming in the cloud using python.org, SageMathCloud, Wakari, pythonanywhere, or repl.it.

The hard way
You can download Python and all individual packages you need and install them one by one. In general, it's not that difficult, but it can become challenging and painful for certain big packages heavily dependent on math, image visualization, and your operating system (i.e., Microsoft Windows).

Anaconda

Go to the Anaconda website and download the appropriate version for your computer (but download Anaconda3! for Python 3.x). The file is big (about 350 MB). From their website:
Linux Install
In your terminal window type and follow the instructions:

bash Anaconda3-4.1.1-Linux-x86_64.sh

OS X Install
For the graphical installer, double-click the downloaded .pkg file and follow the instructions
For the command-line installer, in your terminal window type and follow the instructions:

bash Anaconda3-4.1.1-MacOSX-x86_64.sh

Windows
Double-click the .exe file to install Anaconda and follow the instructions on the screen

Miniconda

A variation of Anaconda is Miniconda (Miniconda3 for Python 3.x), which contains only the Conda package manager and Python.

Once Miniconda is installed, you can use the conda command to install any other packages and create environments, etc.

My current installation

In [5]:
# pip install version_information
%load_ext version_information
%version_information numpy, scipy, matplotlib, sympy, pandas, ipython, jupyter
Out[5]:
SoftwareVersion
Python3.5.2 64bit [MSC v.1900 64 bit (AMD64)]
IPython5.1.0
OSWindows 10 10.0.10586 SP0
numpy1.11.1
scipy0.18.0
matplotlib1.5.3
sympy1.0
pandas0.18.1
ipython5.1.0
jupyter1.0.0
Thu Sep 22 01:04:43 2016 E. South America Standard Time

To learn more about Python

There is a lot of good material in the internet about Python for scientific computing, some of them are:

Brief tutorial on Python

Python as a calculator

Once in the IPython notebook, if you type a simple mathematical expression and press Shift+Enter it will give the result of the expression:

In [6]:
1 + 2 - 5
Out[6]:
-2
In [7]:
import math  # use the import function to import the math library
math.sqrt(12)
Out[7]:
3.4641016151377544
In [8]:
x = 1
y = 1 + math.pi
y
Out[8]:
4.141592653589793

Main built-in datatypes in Python

  • Bolleans: True, False
  • NoneType: None
  • Numbers: int, float, complex
  • Sequences: list, tuple, range
  • Text sequence: str
  • Binary sequence: bytes, bytearray, memoryview
  • Mapping: dict
  • Set: set, frozenset
  • Boolean operations: and, or, not
  • Comparisons: <, <=, >, >=, ==, !=, is, is not
  • Math operations: +, -, *, /, //, %, **
  • Bitwise operations: |, ^, &, <<, >>, ~

Example: strings

In [9]:
s = 'P' + 'y' + 't' + 'h' + 'o' + 'n'
print(s)
print(s*5)
Python
PythonPythonPythonPythonPython

Strings can be subscripted (indexed); like in C, the first character of a string has subscript (index) 0:

In [10]:
print('s[0] = ', s[0], '  (s[index], start at 0)')
print('s[5] = ', s[5])
print('s[-1] = ', s[-1], '  (last element)')
print('s[:] = ', s[:], '  (all elements)')
print('s[1:] = ', s[1:], '  (from this index (inclusive) till the last (inclusive))')
print('s[2:4] = ', s[2:4], '  (from 1st index (inclusive) till 2nd index (exclusive))')
print('s[:2] = ', s[:2], '  (till this index, exclusive)')
print('s[:10] = ', s[:10], '  (Python handles the index if it''s larger than length)')
print('s[-10:] = ', s[-10:])
print('s[0:5:2] = ', s[0:5:2], '  (s[ini:end:step])')
print('s[::2] = ', s[::2], '  (s[::step], initial and final indexes can be omitted)')
print('s[0:5:-1] = ', s[::-1], '  (s[::-step] reverses the string)')
print('s[:2] + s[2:] = ', s[:2] + s[2:], '  (this sounds natural with Python indexing)')
s[0] =  P   (s[index], start at 0)
s[5] =  n
s[-1] =  n   (last element)
s[:] =  Python   (all elements)
s[1:] =  ython   (from this index (inclusive) till the last (inclusive))
s[2:4] =  th   (from 1st index (inclusive) till 2nd index (exclusive))
s[:2] =  Py   (till this index, exclusive)
s[:10] =  Python   (Python handles the index if its larger than length)
s[-10:] =  Python
s[0:5:2] =  Pto   (s[ini:end:step])
s[::2] =  Pto   (s[::step], initial and final indexes can be omitted)
s[0:5:-1] =  nohtyP   (s[::-step] reverses the string)
s[:2] + s[2:] =  Python   (this sounds natural with Python indexing)

Defining a function in Python

In [11]:
def fibo(N):
    """Fibonacci series: the sum of two elements defines the next.
    
    The series is calculated till the input parameter N and
    returned as an ouput variable.
    
    """
    
    a, b, c = 0, 1, []
    while b < N:
        c.append(b)
        a, b = b, a + b
        
    return c
In [12]:
fibo(9)
Out[12]:
[1, 1, 2, 3, 5, 8]

Defining a function in Python II

In [13]:
def bmi(weight, height):
    """Body mass index calculus and categorization.
    Enter the weight in kg and the height in m.
    See http://en.wikipedia.org/wiki/Body_mass_index
    """
    bmi = weight / height**2
    if bmi < 15:
        c = 'very severely underweight'
    elif 15 <= bmi < 16:
        c = 'severely underweight'
    elif 16 <= bmi < 18.5:
        c = 'underweight'
    elif 18.5 <= bmi < 25:
        c = 'normal'
    elif 25 <= bmi < 30:
        c = 'overweight'
    elif 30 <= bmi < 35:
        c = 'moderately obese'
    elif 35 <= bmi < 40:
        c = 'severely obese'
    else:
        c = 'very severely obese'
     
    s = 'For a weight of {0:.1f} kg and a height of {1:.2f} m,\n\
    the body mass index (bmi) is {2:.1f} kg/m2,\n\
    which is considered {3:s}.'\
    .format(weight, height, bmi, c)
    print(s)
In [14]:
bmi(70, 1.90);
For a weight of 70.0 kg and a height of 1.90 m,
    the body mass index (bmi) is 19.4 kg/m2,
    which is considered normal.

Numeric data manipulation with Numpy

Numpy is the fundamental package for scientific computing in Python and has a N-dimensional array package convenient to work with numerical data. With Numpy it's much easier and faster to work with numbers grouped as 1-D arrays (a vector), 2-D arrays (like a table or matrix), or higher dimensions.

In [15]:
import numpy as np

x = np.array([1, 2, 3, 4, 5, 6])
print(x)
x = np.random.randn(2,4)
print(x)
[1 2 3 4 5 6]
[[ 0.6504986   1.21639113  0.06680213  0.43133861]
 [ 0.35556254  0.43596075  1.17614962  1.0677548 ]]

Moving-average filter (Numpy use for performance)

A moving-average filter has the general formula:

$$ y[i] = \sum_{j=0}^{m-1} x[i+j] \;\;\;\; for \;\;\; i=1, \; \dots, \; n-m+1 $$

Here are two different versions of a function to implement the moving-average filter:

In [16]:
import numpy as np
def mav1(x, window):
    """Moving average of 'x' with window size 'window'."""
    y = np.empty(len(x)-window+1)
    for i in range(len(y)):
        y[i] = np.sum(x[i:i+window])/window
    return y

def mav2(x, window):
    """Moving average of 'x' with window size 'window'."""
    xsum = np.cumsum(x)
    xsum[window:] = xsum[window:] - xsum[:-window]
    return xsum[window-1:]/window
In [17]:
x = np.random.randn(300)/10
x[100:200] += 1
window = 10

print('Performance of mav1:')
%timeit mav1(x, window)
print('Performance of mav2:')
%timeit mav2(x, window)
Performance of mav1:
1000 loops, best of 3: 1.56 ms per loop
Performance of mav2:
The slowest run took 5.70 times longer than the fastest. This could mean that an intermediate result is being cached.
100000 loops, best of 3: 11.6 µs per loop

Ploting with matplotlib

Matplotlib is the most-widely used packge for plotting data in Python. Let's see some examples of it.

In [18]:
import matplotlib.pyplot as plt
#%matplotlib notebook
%matplotlib inline
import numpy as np
In [19]:
y1 = mav1(x, window)
y2 = mav2(x, window)
# plot
fig, ax = plt.subplots(1, 1, figsize=(8, 4))
ax.plot(x,  'b-',  linewidth=1, label = 'raw data')
ax.plot(y1, 'y-',  linewidth=2, label = 'moving average 1')
ax.plot(y2, 'g--', linewidth=2, label = 'moving average 2')
ax.legend(frameon=False, loc='upper right', fontsize=12)
ax.set_xlabel("Data #")
ax.set_ylabel("Amplitude")
ax.grid();
In [20]:
plt.figure(figsize=(8, 4))
plt.plot(x,  'b-',  linewidth=1, label = 'raw data')
plt.plot(y1, 'y-',  linewidth=2, label = 'moving average 1')
plt.plot(y2, 'g--', linewidth=2, label = 'moving average 2')
plt.legend(frameon=False, loc='upper right', fontsize=12)
plt.xlabel("Data #")
plt.ylabel("Amplitude")
plt.grid()
plt.show()

Ploting with matplotlib II

Plot figure in an external window (outside the ipython notebook area):

In [21]:
#%matplotlib qt
In [22]:
mu, sigma = 10, 2
x = mu + sigma * np.random.randn(1000)
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(8, 4))
ax1.plot(x, 'ro')
ax1.set_title('Data')
ax1.grid()

n, bins, patches = ax2.hist(x, 25, normed=True, facecolor='r') # histogram
ax2.set_xlabel('Bins')
ax2.set_ylabel('Probability')
ax2.set_title('Histogram')
fig.suptitle('Another example using matplotlib', fontsize=18, y=1.02)
ax2.grid()

plt.tight_layout()
plt.show()
In [23]:
# get back the inline plot
#%matplotlib inline
#%matplotlib notebook

Instead of "%matplotlib inline" you can use "%matplotlib notebook" which gives you a nice toolbar for zooming, panning, etc. The caveat is that once "%matplotlib notebook" is used you can't alternate between matplotlib backends as we just did.

Symbolic mathematics with Sympy

Sympy is a package to perform symbolic mathematics in Python. Let's see some of its features:

In [24]:
from IPython.display import display
import sympy as sym
from sympy.interactive import printing
printing.init_printing()

Define some symbols and the create a second-order polynomial function (a.k.a., parabola), plot, and find the roots:

In [25]:
x, y = sym.symbols('x y')
y = -x**3 + 4*x
y
Out[25]:
$$- x^{3} + 4 x$$
In [26]:
from sympy.plotting import plot
%matplotlib inline
plot(y, (x, -3, 3));
In [27]:
sym.solve(y, x)
Out[27]:
$$\left [ -2, \quad 0, \quad 2\right ]$$

This entire document was written in the Jupyter Notebook (which can be statically viewed here or downloaded here). If you are watching my presentation right now, these slides are just a visualization of the same notebook (probably using the RISE: "Live" Reveal.js Jupyter/IPython Slideshow Extension).