This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

4.2. Profiling your code easily with cProfile and IPython¶

Standard imports.

In [ ]:

import numpy as np
import matplotlib.pyplot as plt

In [ ]:

%matplotlib inline

This function generates an array with random, uniformly distributed +1 and -1.

In [ ]:

def step(*shape):
    # Create a random n-vector with +1 or -1 values.
    return 2 * (np.random.random_sample(shape) < .5) - 1

We simulate $n$ random walks, and look at the histogram of the walks over time.

In [ ]:

%%prun -s cumulative -q -l 10 -T prun0
# We profile the cell, sort the report by "cumulative time",
# limit it to 10 lines, and save it to a file "prun0".
n = 10000
iterations = 50
x = np.cumsum(step(iterations, n), axis=0)
bins = np.arange(-30, 30, 1)
y = np.vstack([np.histogram(x[i,:], bins)[0] for i in range(iterations)])

In [ ]:

print(open('prun0', 'r').read())

The most expensive functions are respectively histogram (37 ms), rand (19 ms), and cumsum (5 ms).

We plot the array y, representing the distribution of the particles over time.

In [ ]:

plt.figure(figsize=(6,6));
plt.imshow(y, cmap='hot');

We now run the same code with 10 times more iterations.

In [ ]:

%%prun -s cumulative -q -l 10 -T prun1
n = 10000
iterations = 500
x = np.cumsum(step(iterations, n), axis=0)
bins = np.arange(-30, 30, 1)
y = np.vstack([np.histogram(x[i,:], bins)[0] for i in range(iterations)])

In [ ]:

print(open('prun1', 'r').read())

The most expensive functions are this time respectively histogram (566 ms), cumsum (388 ms) and rand (241 ms). cumsum's execution time was negligible in the first case, whereas it is not in this case (due to the higher number of iterations).

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).