This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
import numpy as np
def id(x):
# This function returns the memory
# block address of an array.
return x.__array_interface__['data'][0]
a = np.zeros(10); aid = id(a); aid
b = a.copy(); id(b) == aid
a *= 2; id(a) == aid
c = a * 2; id(c) == aid
In-place operation.
%%timeit a = np.zeros(10000000)
a *= 2
With memory copy.
%%timeit a = np.zeros(10000000)
b = a * 2
a = np.zeros((10, 10)); aid = id(a); aid
Reshaping an array while preserving its order does not trigger a copy.
b = a.reshape((1, -1)); id(b) == aid
Transposing an array changes its order so that a reshape triggers a copy.
c = a.T.reshape((1, -1)); id(c) == aid
To return a flattened version (1D) of a multidimensional array, one can use flatten
or ravel
. The former always return a copy, whereas the latter only makes a copy if necessary.
d = a.flatten(); id(d) == aid
e = a.ravel(); id(e) == aid
%timeit a.flatten()
%timeit a.ravel()
When performing operations on arrays with different shapes, you don't necessarily have to make copies to make their shapes match. Broadcasting rules allow you to make computations on arrays with different but compatible shapes. Two dimensions are compatible if they are equal or one of them is 1. If the arrays have different number of dimensions, dimensions are added to the smaller array from the trailing dimensions to the leading ones.
n = 1000
a = np.arange(n)
ac = a[:, np.newaxis]
ar = a[np.newaxis, :]
%timeit np.tile(ac, (1, n)) * np.tile(ar, (n, 1))
%timeit ar * ac
Can you explain the performance discrepancy between the following two similar operations?
a = np.random.rand(5000, 5000)
%timeit a[0, :].sum()
%timeit a[:, 0].sum()
You'll find all the explanations, figures, references, and much more in the book (to be released later this summer).
IPython Cookbook, by Cyrille Rossant, Packt Publishing, 2014 (500 pages).