This is one of the 100 recipes of the IPython Cookbook, the definitive guide to high-performance scientific computing and data science in Python.
import numpy as np
def id(x): # This function returns the memory # block address of an array. return x.__array_interface__['data']
a = np.zeros(10); aid = id(a); aid
b = a.copy(); id(b) == aid
a *= 2; id(a) == aid
c = a * 2; id(c) == aid
%%timeit a = np.zeros(10000000) a *= 2
With memory copy.
%%timeit a = np.zeros(10000000) b = a * 2
a = np.zeros((10, 10)); aid = id(a); aid
Reshaping an array while preserving its order does not trigger a copy.
b = a.reshape((1, -1)); id(b) == aid
Transposing an array changes its order so that a reshape triggers a copy.
c = a.T.reshape((1, -1)); id(c) == aid
To return a flattened version (1D) of a multidimensional array, one can use
ravel. The former always return a copy, whereas the latter only makes a copy if necessary.
d = a.flatten(); id(d) == aid
e = a.ravel(); id(e) == aid
When performing operations on arrays with different shapes, you don't necessarily have to make copies to make their shapes match. Broadcasting rules allow you to make computations on arrays with different but compatible shapes. Two dimensions are compatible if they are equal or one of them is 1. If the arrays have different number of dimensions, dimensions are added to the smaller array from the trailing dimensions to the leading ones.
n = 1000
a = np.arange(n) ac = a[:, np.newaxis] ar = a[np.newaxis, :]
%timeit np.tile(ac, (1, n)) * np.tile(ar, (n, 1))
%timeit ar * ac
Can you explain the performance discrepancy between the following two similar operations?
a = np.random.rand(5000, 5000)
%timeit a[0, :].sum()
%timeit a[:, 0].sum()