This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.

1.3. Introducing the multidimensional array in NumPy for fast array computations¶

Let's import the built-in random Python module and NumPy.

In [ ]:

import random
import numpy as np

We use the %precision magic (defined in IPython) to show only 3 decimals in the Python output. This is just to alleviate the text.

In [ ]:

%precision 3

We generate two Python lists x and y, each one containing one million random numbers between 0 and 1.

In [ ]:

n = 1000000
x = [random.random() for _ in range(n)]
y = [random.random() for _ in range(n)]

In [ ]:

x[:3], y[:3]

Let's compute the element-wise sum of all these numbers: the first element of x plus the first element of y, and so on. We use a for loop in a list comprehension.

In [ ]:

z = [x[i] + y[i] for i in range(n)]
z[:3]

How long does this computation take? IPython defines a handy %timeit magic command to quickly evaluate the time taken by a single command.

In [ ]:

%timeit [x[i] + y[i] for i in range(n)]

Now, we will perform the same operation with NumPy. NumPy works on multidimensional arrays, so we need to convert our lists to arrays. The np.array() function does just that.

In [ ]:

xa = np.array(x)
ya = np.array(y)

In [ ]:

xa[:3]

The arrays xa and ya contain the exact same numbers than our original lists x and y. Whereas those lists where instances of a built-in class list, our arrays are instances of a NumPy class ndarray. Those types are implemented very differently in Python and NumPy. We will see that, in this example, using arrays instead of lists leads to drastic performance improvements.

Now, to compute the element-wise sum of these arrays, we don't need to do a for loop anymore. In NumPy, adding two arrays means adding the elements of the arrays component by component.

In [ ]:

za = xa + ya
za[:3]

We see that the list z and the array za contain the same elements (the sum of the numbers in x and y).

Let's compare the performance of this NumPy operation with the native Python loop.

In [ ]:

%timeit xa + ya

We observe that this operation is more than one order of magnitude faster in NumPy than in pure Python!

Now, we will compute something else: the sum of all elements in x or xa. Although this is not an element-wise operation, NumPy is still highly efficient here. The pure Python version uses the built-in sum function on an iterable. The NumPy version uses the np.sum() function on a NumPy array.

In [ ]:

%timeit sum(x)  # pure Python
%timeit np.sum(xa)  # NumPy

We also observe an impressive speedup here.

Finally, let's perform a last operation: computing the arithmetic distance between any pair of numbers in our two lists (we only consider the first 1000 elements to keep computing times reasonable). First, we implement this in pure Python with two nested for loops.

In [ ]:

d = [abs(x[i] - y[j]) 
     for i in range(1000) for j in range(1000)]

In [ ]:

d[:3]

Now, we use a NumPy implementation, bringing out two slightly more advanced notions. First, we consider a two-dimensional array (or matrix). This is how we deal with two indices i and j. Second, we use broadcasting to perform an operation between a 2D array and a 1D array. We will give more details in How it works...

In [ ]:

da = np.abs(xa[:1000,None] - ya[:1000])

In [ ]:

da

In [ ]:

%timeit [abs(x[i] - y[j]) for i in range(1000) for j in range(1000)]
%timeit np.abs(xa[:1000, None] - ya[:1000])

Here again, observe observe the significant speedups.

You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).

IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).