How are numpy arrays stored?¶

In [1]:

import numpy as np

Numpy presents an n-dimensional abstraction that has to be fit into 1-dimensional computer memory.

Even for 2 dimensions (matrices), this leads to confusion: row-major, column-major.

In [4]:

A = np.arange(9).reshape(3, 3)
print(A)

[[0 1 2]
 [3 4 5]
 [6 7 8]]

Strides and in-memory representation¶

How is this represented in memory?

In [6]:

A.strides

Out[6]:

(24, 8)

strides stores for each axis by how many bytes one needs to jump to get from one entry to the next (in that axis)
So how is the array above stored?
This captures row-major ("C" order) and column-major ("Fortran" order), but is actually much more general.

We can also ask for Fortran order:

In [10]:

A2 = np.arange(9).reshape(3, 3, order="F")
A2

Out[10]:

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

numpy defaults to row-major order.

In [11]:

A2.strides

Out[11]:

(8, 24)

How is the stride model more general than just saying "row major" or "column major"?

In [15]:

A = np.arange(16).reshape(4, 4)
A

Out[15]:

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15]])

In [18]:

A.strides

Out[18]:

(32, 8)

In [14]:

Asub = A[:3, :3]
Asub

Out[14]:

array([[ 0,  1,  2],
       [ 4,  5,  6],
       [ 8,  9, 10]])

Recall that Asub constitutes a view of the original data in A.

In [19]:

Asub.strides

Out[19]:

(32, 8)

Now Asub is no longer a contiguous array!

From the linear-memory representation (as show by the increasing numbers in A) 3, 7, 11 are missing.

This is easy to check by a flag:

In [20]:

Asub.flags

Out[20]:

  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False