In [ ]:

```
%run ../../DataFiles_and_Notebooks/00_AdvancedPythonConcepts/talktools.py
```

materials prepared by Paul Ivanov (2013-11-21); Stefan van der Walt; JBloom

Cython allows us to cross the gap

Cython is good for two things:

- Wrapping legacy code
- Making your python code go faster

This is good news because

- we get to keep coding in Python (or, at least, a superset)
- but with the speed advantage of C
- You can’t have your cake and eat it. Or can you?
- Conditions / loops approx. 2–8x speed increase, 30% overall; with annotations: hundreds of times faster

Python is a C program (yes, there are other implementations, but CPython is the dominant one)

- you can write "C extensions" that can be imported as python modules

In [ ]:

```
import math
math??
```

`math`

will not show us the source code, because it is actually implemented in C, compiled in a special way using the Python C API, and exposed to us (mere mortals) using a python interface.

In [ ]:

```
import os
os.path??
```

`os.path`

on the other hand, is written in pure Python, so we can see the sourcecode

In [ ]:

```
!cython --version
```

*Cython
is a programming language that makes writing C extensions for the Python language as easy as Python itself.*

Cython is a superset of Python (i.e. all Python programs are valid Cython programs).

Cython allows you to:

- get convenient handles on C libraries, objects, and functions using in your Python code.
- "sprinkle in" type annotations into Python code to get speedups

- Optimize execution of Python code (profile, if possible! – demo)
- Wrap existing C and C++ code
- Breaking out of the Global Interpreter Lock; openmp
- Mixing C and Python, but without the pain of the Python C API

For this introduction, we’ll take the following approach:

- Take a piece of pure Python code and benchmark (we’ll find that it is too slow)
- Run the code through Cython, compile and benchmark (we’ll find that it is somewhat faster)
- Annotate the types and benchmark (we’ll find that it is much faster)

Then we’ll look at how Cython allows us to

- Work with NumPy arrays
- Use multiple threads from Python
- Wrap native C libraries

We want to approximate the integral: $$ \int_a^b f(x) dx $$

Or more a finer grid...

In [ ]:

```
cd demos/integrate
```

In [ ]:

```
# %load integrate.py
from __future__ import division
def f(x):
return x**4 - 3 * x
def integrate_f(a, b, N):
"""Rectangle integration of a function.
Parameters
----------
a, b : ints
Interval over which to integrate.
N : int
Number of intervals to use in the discretisation.
"""
s = 0
dx = (b - a) / N
for i in range(N):
s += f(a + i * dx)
return s * dx
```

In [ ]:

```
import integrate
```

In [ ]:

```
integrate.integrate_f(1,10,10000)
```

In [ ]:

```
%timeit integrate.integrate_f(1,10,100000)
```

Let's compile the code with Cython

```
cython filename.[py|pyx]
```

In [ ]:

```
!cython integrate.py
```

What is happening behind the scenes?

Cython translates Python to C, using the Python C API (let’s have a look)

In [ ]:

```
!cat integrate.c
```

Three ways of making use of Cython

**1. Compiling**

This C code can be compiled directly (e.g. using gcc) requiring the right library paths. We can also use a `setup.py`

file to help us.

I copied `integrate.py`

to `integrate_compiled.pyx`

just to make a distinction between the compiled version and the pure python version.

In [ ]:

```
# %load setup.py
from distutils.core import setup
from Cython.Build import cythonize
from distutils.extension import Extension
setup(
ext_modules=cythonize(
[Extension("integrate_compiled", ["integrate_compiled.pyx"])],
compiler_directives={'language_level': 3}
)
)
```

In [ ]:

```
!python setup.py build_ext --inplace
```

In [ ]:

```
!ls integrate_compiled*
```

In [ ]:

```
import integrate_compiled
integrate_compiled.integrate_f(1,10,10000)
```

In [ ]:

```
!rm integrate_compiled.*.so
```

In [ ]:

```
cd demos/integrate/
```

A little bit magical, it will grab and compile .pyx files the first time they are imported, if a compilation is necessary.

In [ ]:

```
import pyximport
pyximport.install()
import integrate_compiled
```

In [ ]:

```
integrate_compiled.integrate_f(1,10,10000)
```

Benchmark the new code

In [ ]:

```
%timeit integrate_compiled.integrate_f(1,10,100000)
```

In [ ]:

```
import integrate
```

In [ ]:

```
%timeit integrate.integrate_f(1,10,100000)
```

**caveat emptor!** C extensions **cannot** be reloaded in Python, see this bug marked as WON'T FIX, so you have to either *restart the kernel* (i.e. quit python and start it again), or you could use the `%cython`

magic...

The most magical of all

In [ ]:

```
%load_ext Cython
```

In [ ]:

```
%%cython
from __future__ import division
def f(x):
return x**4 - 3 * x
def inline_integrate_f(a, b, N):
"""Rectangle integration of a function.
Parameters
----------
a, b : ints
Interval over which to integrate.
N : int
Number of intervals to use in the discretisation.
"""
s = 0
dx = (b - a) / N
for i in range(N):
s += f(a + i * dx)
return s * dx
```

In [ ]:

```
inline_integrate_f(1,10,1000)
```

Slight speed increase (≈ 1.4×) probably not worth it.

- Can we help Cython to do even better?
- Yes—by giving it some clues.
- Cython has a basic type inferencing engine, but it is very conservative for safety reasons.

- Why does type information allow such vast speed increases?

Slow things worth knowing about:

- for loops
- function calls

Let's tell Cython about the types...

In [ ]:

```
%%writefile integrate_types.pyx
from __future__ import division
def f(double x):
return x**4 - 3 * x
def types_integrate_f(double a, double b, int N):
"""Rectangle integration of a function.
Parameters
----------
a, b : ints
Interval over which to integrate.
N : int
Number of intervals to use in the discretisation.
"""
cdef:
double s = 0
double dx = (b - a) / N
int i
s = 0
dx = (b - a) / N
for i in range(N):
s += f(a + i * dx)
return s * dx
```

In [ ]:

```
%%cython
from __future__ import division
def f(double x):
return x**4 - 3 * x
def types_integrate_f(double a, double b, int N):
"""Rectangle integration of a function.
Parameters
----------
a, b : ints
Interval over which to integrate.
N : int
Number of intervals to use in the discretisation.
"""
cdef:
double s = 0
double dx = (b - a) / N
int i
s = 0
dx = (b - a) / N
for i in range(N):
s += f(a + i * dx)
return s * dx
```

In [ ]:

```
%timeit types_integrate_f(1,10,100000)
```

In [ ]:

```
%timeit integrate_compiled.integrate_f(1,10,100000)
```

In [ ]:

```
%timeit integrate.integrate_f(1,10,100000)
```

There's even more bottlenecks:
We need to define `f(x)`

as a C function.

In [ ]:

```
# %load integrate_cy.pyx
# cython: cdivision=True
# ^^^ Could also use @cython.cdivision(True) decorator
cdef double f(double x):
return x*x*x*x - 3 * x
def integrate_f(double a, double b, int N):
cdef:
double s = 0
double dx = (b - a) / N
size_t i
for i in range(N):
s += f(a + i * dx)
return s * dx
```

In [ ]:

```
%%cython
# cython: cdivision=True
# ^^^ Could also use @cython.cdivision(True) decorator
cdef double f(double x):
return x*x*x*x - 3 * x
def cy_integrate_f(double a, double b, int N):
cdef:
double s = 0
double dx = (b - a) / N
size_t i
for i in range(N):
s += f(a + i * dx)
return s * dx
```

In [ ]:

```
%timeit cy_integrate_f(1,10,100000)
```

In [ ]:

```
%timeit integrate.integrate_f(1,10,100000)
```

In [ ]:

```
49.9/0.270
```

With annotations, Cython gives us a nice path to find bottlenecks. Use the `-a`

.

In [ ]:

```
%load_ext Cython
```

In [ ]:

```
%%cython -a
def omg(c):
wtf = 0
for a in range(c):
wtf += a
return wtf
def lol(x):
for i in range(x):
omg(x*2)
```

In [ ]:

```
%timeit lol(5000)
```

In [ ]:

```
%%cython -a
cdef int omg(int c):
cdef int wtf = 0
cdef int a
for a in range(c):
wtf += a
return wtf
def lola(int x):
cdef i
for i in range(x):
omg(x*2)
```

In [ ]:

```
%timeit lola(5000)
```

Here's a python function that performs an elementwise computations:

$y_i = x_i^3 - 4x_i + 2$

Write a Cython equivalent.

In [ ]:

```
%%cython
import numpy as np
cdef my_poly(double [:] x):
out = np.zeros_like(x)
L = x.shape[0]
for i in range(L):
out[i] = x[i] * x[i] * x[i] - 4 * x[i] + 2
return np.asarray(out)
def other_poly(x):
return my_poly(x)
```

In [ ]:

```
import numpy as np
x = np.linspace(-10,10,100000)
%timeit my_poly(x)
```

In [ ]:

```
%%cython -a
import numpy as np
def blah(float y):
return y * y * y - 4 * y + 2
def my_poly(x):
out = np.zeros_like(x)
L = x.shape[0]
for i in range(L,x):
out[i] = blah(x[i])
return np.asarray(out)
```

In [ ]:

```
import numpy as np
x = np.linspace(-10,10,100000)
%timeit my_poly(x)
```

With a partner, have one of you log into Github, fork this gist

modify it together so that all functions perform some computation, but the whole thing doesn't take too long to run (1-5 second).

- You can use numpy in some of them, if you'd like, but be sure to include some for loops, as well.
- The template is there just to get you started, feel free to modify the parameters, or add new functions as you see fit.

Update your gist with this new code, and hand it off to another pair (who will give you theirs).

Fork and clone the other pair's program.

Now, without looking at the code, profile this code to identify the bottleneck.

Put a comment next to the function that eats up the most time.

Time permitting, numpy-fy and Cython-ize the offending function

Time permitting, change the parameters used in main to make a

*different*function the bottleneck.

See also Calling C functions and Using C Libraries in the Cython documentation.

Export the Fortran code to C and then tell Cython to pretend it's calling C code:

*Notice that we didn’t write any C code — we only told fortran to use the C calling convention when producing the ".o" files, and then we pretended in Cython, that the function is implemented in C, but in fact, it is linked in from Fortran directly*

Cython Profiling Tutorial : from Cython's official docs

In [ ]:

```
```