This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
import numpy as np
id = lambda x: x.__array_interface__['data'][0]
We create a large array.
n, d = 100000, 100
a = np.random.random_sample((n, d)); aid = id(a)
We take a selection using two different methods: with a view and with fancy indexing.
b1 = a[::10]
b2 = a[np.arange(0, n, 10)]
np.array_equal(b1, b2)
The view refers to the original data buffer, whereas fancy indexing yields a copy.
id(b1) == aid, id(b2) == aid
Fancy indexing is several orders of magnitude slower as it involves copying a large array. Fancy indexing is more general as it allows to select any portion of an array (using any list of indices), not just a strided selection.
%timeit a[::10]
%timeit a[np.arange(0, n, 10)]
Given a list of indices, there are two ways of selecting the corresponding sub-array: fancy indexing, or the np.take function.
i = np.arange(0, n, 10)
b1 = a[i]
b2 = np.take(a, i, axis=0)
np.array_equal(b1, b2)
%timeit a[i]
%timeit np.take(a, i, axis=0)
Using np.take instead of fancy indexing is faster.
Note: Performance of fancy indexing has been improved in recent versions of NumPy; this trick is especially useful on older versions of NumPy.
Let's create a mask of booleans, where each value indicates whether the corresponding row needs to be selected in x.
i = np.random.random_sample(n) < .5
The selection can be made using fancy indexing or the np.compress function.
b1 = a[i]
b2 = np.compress(i, a, axis=0)
np.array_equal(b1, b2)
%timeit a[i]
%timeit np.compress(i, a, axis=0)
Once again, the alternative method to fancy indexing is faster.
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).