In [ ]:
%run ../../DataFiles_and_Notebooks/00_AdvancedPythonConcepts/talktools.py

Speeding up scientific Python code using Cython

UC Berkeley AY 250 'Python Computing for Data Science'

materials prepared by Paul Ivanov (2013-11-21); Stefan van der Walt; JBloom

Motivation

Cython allows us to cross the gap

Cython

Cython is good for two things:

  1. Wrapping legacy code
  2. Making your python code go faster

This is good news because

  • we get to keep coding in Python (or, at least, a superset)
  • but with the speed advantage of C
  • You can’t have your cake and eat it. Or can you?
  • Conditions / loops approx. 2–8x speed increase, 30% overall; with annotations: hundreds of times faster

Let's take a step back...

Python is a C program (yes, there are other implementations, but CPython is the dominant one)

  • you can write "C extensions" that can be imported as python modules
In [ ]:
import math
math??

Deep introspection of math will not show us the source code, because it is actually implemented in C, compiled in a special way using the Python C API, and exposed to us (mere mortals) using a python interface.

In [ ]:
import os
os.path??

os.path on the other hand, is written in pure Python, so we can see the sourcecode

In [ ]:
!cython --version

What's Cython?

Cython is a programming language that makes writing C extensions for the Python language as easy as Python itself.

Cython is a superset of Python (i.e. all Python programs are valid Cython programs).

Cython allows you to:

  1. get convenient handles on C libraries, objects, and functions using in your Python code.
  2. "sprinkle in" type annotations into Python code to get speedups

Use Cases

  • Optimize execution of Python code (profile, if possible! – demo)
  • Wrap existing C and C++ code
  • Breaking out of the Global Interpreter Lock; openmp
  • Mixing C and Python, but without the pain of the Python C API

Overview

For this introduction, we’ll take the following approach:

  1. Take a piece of pure Python code and benchmark (we’ll find that it is too slow)
  2. Run the code through Cython, compile and benchmark (we’ll find that it is somewhat faster)
  3. Annotate the types and benchmark (we’ll find that it is much faster)

Then we’ll look at how Cython allows us to

  • Work with NumPy arrays
  • Use multiple threads from Python
  • Wrap native C libraries

Benchmark Python Code

We want to approximate the integral: $$ \int_a^b f(x) dx $$

Or more a finer grid...

In [ ]:
cd demos/integrate
In [ ]:
# %load integrate.py
from __future__ import division

def f(x):
    return x**4 - 3 * x

def integrate_f(a, b, N):
    """Rectangle integration of a function.

    Parameters
    ----------
    a, b : ints
        Interval over which to integrate.
    N : int
        Number of intervals to use in the discretisation.

    """
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx
In [ ]:
import integrate
In [ ]:
integrate.integrate_f(1,10,10000)
In [ ]:
%timeit integrate.integrate_f(1,10,100000)

Let's compile the code with Cython

cython filename.[py|pyx]
In [ ]:
!cython integrate.py 

What is happening behind the scenes?

Cython translates Python to C, using the Python C API (let’s have a look)

In [ ]:
!cat integrate.c

Three ways of making use of Cython

1. Compiling

This C code can be compiled directly (e.g. using gcc) requiring the right library paths. We can also use a setup.py file to help us.

I copied integrate.py to integrate_compiled.pyx just to make a distinction between the compiled version and the pure python version.

In [ ]:
# %load setup.py
from distutils.core import setup
from Cython.Build import cythonize
from distutils.extension import Extension

setup(
    ext_modules=cythonize(
       [Extension("integrate_compiled", ["integrate_compiled.pyx"])], 
        compiler_directives={'language_level': 3}
    )
)
In [ ]:
!python setup.py build_ext --inplace
In [ ]:
!ls integrate_compiled*
In [ ]:
import integrate_compiled
integrate_compiled.integrate_f(1,10,10000)
In [ ]:
!rm integrate_compiled.*.so
In [ ]:
cd demos/integrate/

2. pyximport

A little bit magical, it will grab and compile .pyx files the first time they are imported, if a compilation is necessary.

In [ ]:
import pyximport
pyximport.install()

import integrate_compiled
In [ ]:
integrate_compiled.integrate_f(1,10,10000)

Benchmark the new code

In [ ]:
%timeit integrate_compiled.integrate_f(1,10,100000)
In [ ]:
import integrate
In [ ]:
%timeit integrate.integrate_f(1,10,100000)

caveat emptor! C extensions cannot be reloaded in Python, see this bug marked as WON'T FIX, so you have to either restart the kernel (i.e. quit python and start it again), or you could use the %cython magic...

3. %cython magic

The most magical of all

In [ ]:
%load_ext Cython
In [ ]:
%%cython

from __future__ import division

def f(x):
    return x**4 - 3 * x

def inline_integrate_f(a, b, N):
    """Rectangle integration of a function.

    Parameters
    ----------
    a, b : ints
        Interval over which to integrate.
    N : int
        Number of intervals to use in the discretisation.

    """
    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx
In [ ]:
inline_integrate_f(1,10,1000)

Slight speed increase (≈ 1.4×) probably not worth it.

  • Can we help Cython to do even better?
    • Yes—by giving it some clues.
    • Cython has a basic type inferencing engine, but it is very conservative for safety reasons.
  • Why does type information allow such vast speed increases?

Python is great, but slow at some things...

Dinner with Guido van Rossum, 2009

Slow things worth knowing about:

  1. for loops
  2. function calls

Making your code go faster

"Premature optimization is the root of all evil" -- Don Knuth

  1. Use version control and have a test suite
  2. Profile code (don't just add type declarations everwhere)
  3. Sprinkle in optimizations (refacoring code as necessary)

Let's tell Cython about the types...

In [ ]:
%%writefile integrate_types.pyx
from __future__ import division

def f(double x):
    return x**4 - 3 * x

def types_integrate_f(double a, double b, int N):
    """Rectangle integration of a function.

    Parameters
    ----------
    a, b : ints
        Interval over which to integrate.
    N : int
        Number of intervals to use in the discretisation.

    """
    cdef:
        double s = 0
        double dx = (b - a) / N
        int i

    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx
In [ ]:
%%cython

from __future__ import division

def f(double x):
    return x**4 - 3 * x

def types_integrate_f(double a, double b, int N):
    """Rectangle integration of a function.

    Parameters
    ----------
    a, b : ints
        Interval over which to integrate.
    N : int
        Number of intervals to use in the discretisation.

    """
    cdef:
        double s = 0
        double dx = (b - a) / N
        int i

    s = 0
    dx = (b - a) / N
    for i in range(N):
        s += f(a + i * dx)
    return s * dx
In [ ]:
%timeit types_integrate_f(1,10,100000)
In [ ]:
%timeit integrate_compiled.integrate_f(1,10,100000)
In [ ]:
%timeit integrate.integrate_f(1,10,100000)

There's even more bottlenecks: We need to define f(x) as a C function.

In [ ]:
# %load integrate_cy.pyx
# cython: cdivision=True

# ^^^ Could also use @cython.cdivision(True) decorator

cdef double f(double x):
    return x*x*x*x - 3 * x

def integrate_f(double a, double b, int N):
    cdef:
        double s = 0
        double dx = (b - a) / N
        size_t i

    for i in range(N):
        s += f(a + i * dx)

    return s * dx
In [ ]:
%%cython
# cython: cdivision=True

# ^^^ Could also use @cython.cdivision(True) decorator

cdef double f(double x):
    return x*x*x*x - 3 * x

def cy_integrate_f(double a, double b, int N):
    cdef:
        double s = 0
        double dx = (b - a) / N
        size_t i

    for i in range(N):
        s += f(a + i * dx)

    return s * dx
In [ ]:
%timeit cy_integrate_f(1,10,100000)
In [ ]:
%timeit integrate.integrate_f(1,10,100000)
In [ ]:
49.9/0.270

With annotations, Cython gives us a nice path to find bottlenecks. Use the -a.

In [ ]:
%load_ext Cython
In [ ]:
%%cython -a
def omg(c):
    wtf = 0
    for a in range(c):
        wtf += a
        
    return wtf

def lol(x):
    for i in range(x):
        omg(x*2)

The color allows us to see the density of the C code generated for a given line of Python. These lines will take longer to run, but we need not eliminate all of the unoptimized lines in order to get performance. We'll talk about profiling soon to help us find which lines are worth it.

In [ ]:
%timeit lol(5000)
In [ ]:
%%cython -a
cdef int omg(int c):
    cdef int wtf = 0
    cdef int a
    for a in range(c):
        wtf += a
        
    return wtf

def lola(int x):
    cdef i
    for i in range(x):
        omg(x*2)
In [ ]:
%timeit lola(5000)

Exercise

  1. Here's a python function that performs an elementwise computations:

    $y_i = x_i^3 - 4x_i + 2$

  2. Write a Cython equivalent.

In [ ]:
%%cython
import numpy as np
cdef my_poly(double [:] x):
    out = np.zeros_like(x)
    L = x.shape[0]

    for i in range(L):
        out[i] = x[i] * x[i] * x[i] - 4 * x[i] + 2

    return np.asarray(out)

def other_poly(x):
    return my_poly(x)
In [ ]:
import numpy as np
x = np.linspace(-10,10,100000)
%timeit my_poly(x)
In [ ]:
%%cython -a
import numpy as np

def blah(float y):
    return y * y * y - 4 * y + 2
    
def my_poly(x):
    out = np.zeros_like(x)
    L = x.shape[0]

    for i in range(L,x):
        out[i] = blah(x[i])

    return np.asarray(out)
In [ ]:
import numpy as np
x = np.linspace(-10,10,100000)
%timeit my_poly(x)

Exercise

  1. With a partner, have one of you log into Github, fork this gist

  2. modify it together so that all functions perform some computation, but the whole thing doesn't take too long to run (1-5 second).

    • You can use numpy in some of them, if you'd like, but be sure to include some for loops, as well.
    • The template is there just to get you started, feel free to modify the parameters, or add new functions as you see fit.
  3. Update your gist with this new code, and hand it off to another pair (who will give you theirs).

  4. Fork and clone the other pair's program.

  5. Now, without looking at the code, profile this code to identify the bottleneck.

  6. Put a comment next to the function that eats up the most time.

  7. Time permitting, numpy-fy and Cython-ize the offending function

  8. Time permitting, change the parameters used in main to make a different function the bottleneck.

Now you know enough about how to wrap C code

See also Calling C functions and Using C Libraries in the Cython documentation.

What about Fortran?

Export the Fortran code to C and then tell Cython to pretend it's calling C code:

Notice that we didn’t write any C code — we only told fortran to use the C calling convention when producing the ".o" files, and then we pretended in Cython, that the function is implemented in C, but in fact, it is linked in from Fortran directly

References:

Cython Profiling Tutorial : from Cython's official docs

In [ ]: