# Numpy - multidimensional data arrays¶

Modified from J.R. Johansson ([email protected]) http://dml.riken.jp/~rob/

## Introduction¶

The numpy package (module) is used in almost all numerical computation using Python. It provides high-performance vector, matrix and higher-dimensional data structures for Python. It is implemented in C and Fortran so when calculations are vectorized (formulated with vectors and matrices), performance is very good.

To use numpy, you'll need to import the module before using it, for example:

In [1]:
import numpy as np
np.__version__

Out[1]:
'1.9.1'

## Creating numpy arrays¶

There are a number of ways to initialize new numpy arrays, for example:

• from a Python list or tuples
• using functions that are dedicated to generating numpy arrays, such as arange, linspace, etc.
• by reading data from files

### From lists¶

To create new vector and matrix arrays from Python lists we can use the array function.

In [2]:
# a vector: the argument to the array function is a Python list
v = np.array([1,2,3,4])
v

Out[2]:
array([1, 2, 3, 4])
In [3]:
# a matrix: the argument to the array function is a nested Python list
M = np.array([[1, 2], [3, 4]])
M.dtype

Out[3]:
dtype('int64')

The v and M objects are both of the type ndarray that the numpy module provides.

In [4]:
(type(v), type(M))

Out[4]:
(numpy.ndarray, numpy.ndarray)

The difference between the v and M arrays is only their shapes. We can get information about the shape of an array by using the ndarray.shape property.

In [5]:
v.shape

Out[5]:
(4,)
In [6]:
M.shape

Out[6]:
(2, 2)

The number of elements in the array is available through the ndarray.size property:

In [7]:
M.size

Out[7]:
4

thSo far the numpy.ndarray looks awefully much like a Python list (or nested list). Why not simply use Python lists for computations instead of creating a new array type?

There are several reasons:

• Python lists are very general. Each element can be any kind of object. They are dynamically typed. They do not support mathematical functions such as matrix and dot multiplications, etc. Implementating such functions for Python lists would not be very efficient because of the dynamic typing.
• Numpy arrays are statically typed and homogeneous. The type of the elements is determined when the array is created.
• Numpy arrays are memory efficient and element access is fast.
• Because of the static typing, fast implementation of mathematical functions such as multiplication and addition of numpy arrays can be implemented in a compiled language (C and Fortran is used).

Using the dtype (data type) property of an ndarray, we can see the type of an array:

In [8]:
M.dtype

Out[8]:
dtype('int64')

We get an error if we try to assign a value of an uncastable type to an element in a numpy array:

In [9]:
M[0, 0] = "hello"

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-9-ca39caffd164> in <module>()
----> 1 M[0, 0] = "hello"

ValueError: invalid literal for long() with base 10: 'hello'

But beware:

In [ ]:
M[0, 0] = 1.2345
M[0, 0]


If we want, we can explicitly define the data type of the array during creation, using the dtype keyword argument:

In [ ]:
M = np.array([[1, 2], [3, 4]], dtype=np.complex)
M


Common types that can be used with dtype are: int, float, complex, bool, object, etc.

We can also explicitly define the bit size of the data types, for example: int64, int16, float128, complex128.

### Using array-generating functions¶

For larger arrays it is inpractical to initialize the data manually, using explicit python lists. Instead we can use one of the many functions in numpy that generate arrays of different forms. Some of the more common ones are:

#### arange¶

In [10]:
x = np.arange(10) # creates a range, arguments: [start=0], stop, [step=1]
x

Out[10]:
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
In [11]:
x = np.arange(2, -1, -0.5)
x

Out[11]:
array([ 2. ,  1.5,  1. ,  0.5,  0. , -0.5])

Notice that the inverval is half-open: [start, stop)

#### linspace and logspace¶

In [12]:
np.linspace(0, 10, 11) # using linspace, both end points ARE included

Out[12]:
array([  0.,   1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.,  10.])
In [13]:
np.logspace(0, 10, 11, base=10)

Out[13]:
array([  1.00000000e+00,   1.00000000e+01,   1.00000000e+02,
1.00000000e+03,   1.00000000e+04,   1.00000000e+05,
1.00000000e+06,   1.00000000e+07,   1.00000000e+08,
1.00000000e+09,   1.00000000e+10])

#### random data¶

In [14]:
from numpy import random

In [15]:
random.rand(5, 5) # each element is from the uniform random distribution [0,1]

Out[15]:
array([[ 0.55145756,  0.66399271,  0.23810615,  0.10627829,  0.46366511],
[ 0.12843482,  0.48869407,  0.65794146,  0.13472381,  0.02173146],
[ 0.97220006,  0.80735363,  0.70041631,  0.74411871,  0.89026678],
[ 0.88920258,  0.92238375,  0.13881647,  0.24055108,  0.68132384],
[ 0.7308841 ,  0.7221326 ,  0.7609473 ,  0.46243407,  0.41222076]])

The standard normal distribution is available as random.randn

#### diag¶

In [16]:
x = np.diag([1, 2, 3]) # specifying the diagonal of an otherwise zero matrix
x

Out[16]:
array([[1, 0, 0],
[0, 2, 0],
[0, 0, 3]])
In [17]:
np.diag(x) # and back again

Out[17]:
array([1, 2, 3])
In [18]:
np.diag([1, 2, 3], k=1) # diagonal with offset from the main diagonal

Out[18]:
array([[0, 1, 0, 0],
[0, 0, 2, 0],
[0, 0, 0, 3],
[0, 0, 0, 0]])

#### zeros and ones¶

In [19]:
np.zeros(3, dtype=np.int)

Out[19]:
array([0, 0, 0])
In [20]:
np.ones((3, 3), dtype=np.float)

Out[20]:
array([[ 1.,  1.,  1.],
[ 1.,  1.,  1.],
[ 1.,  1.,  1.]])

## File I/O¶

### Comma-separated values (CSV)¶

A very common file format for data files are the comma-separated values (CSV) format. To read data from such file into Numpy arrays we can use the numpy.genfromtxt function. For example,

In [21]:
!head stockholm_td_adj.dat




In [22]:
data = np.genfromtxt('stockholm_td_adj.dat')

In [23]:
data.shape

Out[23]:
(77431, 7)
In [24]:
import matplotlib.pyplot as plt
# Turning on inline plots -- just for use in ipython notebooks.
%matplotlib inline
plt.plot(data[:100, 3])
plt.title('Temperatures in Stockholm')
plt.xlabel('year')
plt.ylabel('temperature (C)');


Using numpy.savetxt we can store a Numpy array to a file in CSV format:

In [25]:
M = np.random.rand(3, 3)
M

Out[25]:
array([[ 0.34330501,  0.29104987,  0.0532824 ],
[ 0.16291446,  0.40062912,  0.89767174],
[ 0.28197415,  0.88723002,  0.57370789]])
In [26]:
np.savetxt("random-matrix.csv", M)

In [27]:
!cat random-matrix.csv

3.433050072650738471e-01 2.910498745436623791e-01 5.328240041943832495e-02
1.629144620841614932e-01 4.006291159271071489e-01 8.976717418854102126e-01
2.819741453096537009e-01 8.872300177031857693e-01 5.737078875182647364e-01

In [28]:
np.savetxt("random-matrix.csv", M, delimiter=',', fmt='%.5f') # fmt specifies the format

!cat random-matrix.csv

0.34331,0.29105,0.05328
0.16291,0.40063,0.89767
0.28197,0.88723,0.57371


### Numpy's native file format¶

Useful when storing and reading back numpy array data. Use the functions numpy.save and numpy.load:

In [29]:
np.save("random-matrix.npy", M)

!file random-matrix.npy

random-matrix.npy: data

In [30]:
np.load("random-matrix.npy")

Out[30]:
array([[ 0.34330501,  0.29104987,  0.0532824 ],
[ 0.16291446,  0.40062912,  0.89767174],
[ 0.28197415,  0.88723002,  0.57370789]])

## More properties of the numpy arrays¶

In [31]:
print(M.dtype)
M.itemsize # bytes per element

float64

Out[31]:
8
In [32]:
M.nbytes # number of bytes

Out[32]:
72
In [33]:
M.ndim # number of dimensions

Out[33]:
2

## Manipulating arrays¶

### Indexing¶

We can index elements in an array using the square bracket and indices:

In [34]:
# v is a vector, and has only one dimension, taking one index
v[0]

Out[34]:
1
In [35]:
# M is a matrix, or a 2 dimensional array, taking two indices
M[1, 1] # same as M[1][1]

Out[35]:
0.40062911592710715

If we omit an index of a multidimensional array it returns the whole row (or, in general, a N-1 dimensional array)

In [36]:
M

Out[36]:
array([[ 0.34330501,  0.29104987,  0.0532824 ],
[ 0.16291446,  0.40062912,  0.89767174],
[ 0.28197415,  0.88723002,  0.57370789]])
In [37]:
M[1]

Out[37]:
array([ 0.16291446,  0.40062912,  0.89767174])

The same thing can be achieved with using : instead of an index:

In [38]:
M[1, :] # second row

Out[38]:
array([ 0.16291446,  0.40062912,  0.89767174])
In [39]:
M[:, 1] # second column

Out[39]:
array([ 0.29104987,  0.40062912,  0.88723002])

We can assign new values to elements in an array using indexing:

In [40]:
M[0, 0] = 1

In [41]:
M

Out[41]:
array([[ 1.        ,  0.29104987,  0.0532824 ],
[ 0.16291446,  0.40062912,  0.89767174],
[ 0.28197415,  0.88723002,  0.57370789]])
In [42]:
# also works for rows and columns
M[1, :] = 0
M[:, 2] = -1

In [43]:
M

Out[43]:
array([[ 1.        ,  0.29104987, -1.        ],
[ 0.        ,  0.        , -1.        ],
[ 0.28197415,  0.88723002, -1.        ]])
In [44]:
# or everything
M[:] = 42
M

Out[44]:
array([[ 42.,  42.,  42.],
[ 42.,  42.,  42.],
[ 42.,  42.,  42.]])

### Index slicing¶

Index slicing is the technical name for the syntax M[lower:upper:step] to extract part of an array:

In [45]:
A = np.array([1, 2, 3, 4, 5])
A

Out[45]:
array([1, 2, 3, 4, 5])
In [46]:
A[1:3]

Out[46]:
array([2, 3])

Array slices are mutable: if they are assigned a new value, the original array from which the slice was extracted is modified (as opposed to python lists, which are always copies):

In [47]:
A[1:3] = [-2, -3]

A

Out[47]:
array([ 1, -2, -3,  4,  5])

We can omit any of the three parameters in M[lower:upper:step]:

In [48]:
A[::] # lower, upper, step all take the default values

Out[48]:
array([ 1, -2, -3,  4,  5])
In [49]:
A[::2] # step is 2, lower and upper defaults to the beginning and end of the array

Out[49]:
array([ 1, -3,  5])
In [50]:
A[:3] # first three elements

Out[50]:
array([ 1, -2, -3])
In [51]:
A[3:] # elements from index 3

Out[51]:
array([4, 5])

Negative indices counts from the end of the array (positive index from the begining):

In [52]:
A = np.array([1, 2, 3, 4, 5])

In [53]:
A[-1] # the last element in the array

Out[53]:
5
In [54]:
A[-3:] # the last three elements

Out[54]:
array([3, 4, 5])

Index slicing works exactly the same way for multidimensional arrays:

In [55]:
A = np.array([[n + m * 10 for n in xrange(5)] for m in xrange(5)])
A

Out[55]:
array([[ 0,  1,  2,  3,  4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]])
In [56]:
# a block from the original array
A[1:4, 1:4]

Out[56]:
array([[11, 12, 13],
[21, 22, 23],
[31, 32, 33]])
In [57]:
# strides
A[::2, ::2]

Out[57]:
array([[ 0,  2,  4],
[20, 22, 24],
[40, 42, 44]])

### Fancy indexing¶

Fancy indexing is the name for when an array or list is used in-place of an index:

In [58]:
row_indices = [3, 2, 1]
A[row_indices]

Out[58]:
array([[30, 31, 32, 33, 34],
[20, 21, 22, 23, 24],
[10, 11, 12, 13, 14]])
In [59]:
col_indices = [1, 2, -1] # remember, index -1 means the last element
A[row_indices, col_indices]

Out[59]:
array([31, 22, 14])

We can also index masks: If the index mask is an Numpy array of with data type bool, then an element is selected (True) or not (False) depending on the value of the index mask at the position each element:

In [60]:
B = np.arange(5)
B

Out[60]:
array([0, 1, 2, 3, 4])
In [61]:
row_mask = np.array([True, False, True, False, False])
B[row_mask]

Out[61]:
array([0, 2])
In [62]:
# same thing
row_mask = np.array([1, 0, 1, 0, 0], dtype=bool)
B[row_mask]

Out[62]:
array([0, 2])

This feature is very useful to conditionally select elements from an array, using for example comparison operators:

In [63]:
x = np.arange(0, 10, 0.5)
x

Out[63]:
array([ 0. ,  0.5,  1. ,  1.5,  2. ,  2.5,  3. ,  3.5,  4. ,  4.5,  5. ,
5.5,  6. ,  6.5,  7. ,  7.5,  8. ,  8.5,  9. ,  9.5])
In [64]:
mask = (5 < x) & (x <= 7.5)

mask

Out[64]:
array([False, False, False, False, False, False, False, False, False,
False, False,  True,  True,  True,  True,  True, False, False,
False, False], dtype=bool)
In [65]:
x[mask]

Out[65]:
array([ 5.5,  6. ,  6.5,  7. ,  7.5])

## Functions for extracting data from arrays and creating arrays¶

### where¶

The index mask can be converted to position index using the where function

In [66]:
indices = np.where(mask)

indices

Out[66]:
(array([11, 12, 13, 14, 15]),)
In [67]:
x[indices] # this indexing is equivalent to the fancy indexing x[mask]

Out[67]:
array([ 5.5,  6. ,  6.5,  7. ,  7.5])

### diag¶

With the diag function we can also extract the diagonal and subdiagonals of an array:

In [68]:
np.diag(A)

Out[68]:
array([ 0, 11, 22, 33, 44])
In [69]:
np.diag(A, -1)

Out[69]:
array([10, 21, 32, 43])

### take (for completeness sake)¶

The take function is similar to fancy indexing described above:

In [70]:
v2 = np.arange(-3, 3)
v2

Out[70]:
array([-3, -2, -1,  0,  1,  2])
In [71]:
row_indices = [1, 3, 5]
v2[row_indices] # fancy indexing

Out[71]:
array([-2,  0,  2])
In [72]:
v2.take(row_indices)

Out[72]:
array([-2,  0,  2])

But take also works on lists and other objects:

In [73]:
np.take([-3, -2, -1,  0,  1,  2], row_indices)

Out[73]:
array([-2,  0,  2])

### choose¶

Constructs and array by picking elements form several arrays:

In [74]:
which = [1, 0, 1, 0]
choices = [[-2, -2, -2, -2], [5, 5, 5, 5]]
np.choose(which, choices)

Out[74]:
array([ 5, -2,  5, -2])

## Linear algebra¶

Vectorizing code is the key to writing efficient numerical calculation with Python/Numpy. That means that as much as possible of a program should be formulated in terms of matrix and vector operations, like matrix-matrix multiplication.

### Scalar-array operations¶

We can use the usual arithmetic operators to multiply, add, subtract, and divide arrays with scalar numbers.

In [75]:
v1 = np.arange(5)

In [76]:
v1 * 2

Out[76]:
array([0, 2, 4, 6, 8])
In [77]:
v1 + 2

Out[77]:
array([2, 3, 4, 5, 6])
In [78]:
A * 2, A + 2, A / 2, A % 2

Out[78]:
(array([[ 0,  2,  4,  6,  8],
[20, 22, 24, 26, 28],
[40, 42, 44, 46, 48],
[60, 62, 64, 66, 68],
[80, 82, 84, 86, 88]]), array([[ 2,  3,  4,  5,  6],
[12, 13, 14, 15, 16],
[22, 23, 24, 25, 26],
[32, 33, 34, 35, 36],
[42, 43, 44, 45, 46]]), array([[ 0,  0,  1,  1,  2],
[ 5,  5,  6,  6,  7],
[10, 10, 11, 11, 12],
[15, 15, 16, 16, 17],
[20, 20, 21, 21, 22]]), array([[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0],
[0, 1, 0, 1, 0]]))

### Element-wise array-array operations¶

When we add, subtract, multiply and divide arrays with each other, the default behaviour is element-wise operations:

In [79]:
A * A # element-wise multiplication

Out[79]:
array([[   0,    1,    4,    9,   16],
[ 100,  121,  144,  169,  196],
[ 400,  441,  484,  529,  576],
[ 900,  961, 1024, 1089, 1156],
[1600, 1681, 1764, 1849, 1936]])
In [80]:
v1 * v1

Out[80]:
array([ 0,  1,  4,  9, 16])

If we multiply arrays with compatible shapes, we get an element-wise multiplication of each row:

In [81]:
(A.shape, v1.shape)

Out[81]:
((5, 5), (5,))
In [82]:
(A, v1, A * v1)

Out[82]:
(array([[ 0,  1,  2,  3,  4],
[10, 11, 12, 13, 14],
[20, 21, 22, 23, 24],
[30, 31, 32, 33, 34],
[40, 41, 42, 43, 44]]),
array([0, 1, 2, 3, 4]),
array([[  0,   1,   4,   9,  16],
[  0,  11,  24,  39,  56],
[  0,  21,  44,  69,  96],
[  0,  31,  64,  99, 136],
[  0,  41,  84, 129, 176]]))

### Matrix algebra¶

What about matrix mutiplication? There are two ways. We can either use the dot function, which applies a matrix-matrix, matrix-vector, or inner vector multiplication to its two arguments:

In [83]:
np.dot(A, A)

Out[83]:
array([[ 300,  310,  320,  330,  340],
[1300, 1360, 1420, 1480, 1540],
[2300, 2410, 2520, 2630, 2740],
[3300, 3460, 3620, 3780, 3940],
[4300, 4510, 4720, 4930, 5140]])
In [84]:
(np.dot(A, v1), np.dot(v1, A))

Out[84]:
(array([ 30, 130, 230, 330, 430]), array([300, 310, 320, 330, 340]))
In [85]:
np.dot(v1, v1)

Out[85]:
30

Alternatively, we can cast the array objects to the type matrix. This changes the behavior of the standard arithmetic operators +, -, * to use matrix algebra.

In [86]:
M = np.matrix(A)
v = np.matrix(v1).T # make it a column vector

In [87]:
v

Out[87]:
matrix([[0],
[1],
[2],
[3],
[4]])
In [88]:
M * M

Out[88]:
matrix([[ 300,  310,  320,  330,  340],
[1300, 1360, 1420, 1480, 1540],
[2300, 2410, 2520, 2630, 2740],
[3300, 3460, 3620, 3780, 3940],
[4300, 4510, 4720, 4930, 5140]])
In [89]:
M * v

Out[89]:
matrix([[ 30],
[130],
[230],
[330],
[430]])
In [90]:
# inner product
v.T * v

Out[90]:
matrix([[30]])
In [91]:
# with matrix objects, standard matrix algebra applies
v + M * v

Out[91]:
matrix([[ 30],
[131],
[232],
[333],
[434]])

If we try to add, subtract or multiply objects with incomplatible shapes we get an error:

In [92]:
v = np.matrix([1,2,3,4,5,6]).T
(M.shape, v.shape)

Out[92]:
((5, 5), (6, 1))
In [93]:
M * v

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-93-995fb48ad0cc> in <module>()
----> 1 M * v

/Users/gormanky/.venvs/ipython/lib/python2.7/site-packages/numpy/matrixlib/defmatrix.pyc in __mul__(self, other)
339         if isinstance(other, (N.ndarray, list, tuple)) :
340             # This promotes 1-D vectors to row vectors
--> 341             return N.dot(self, asmatrix(other))
342         if isscalar(other) or not hasattr(other, '__rmul__') :
343             return N.dot(self, other)

ValueError: shapes (5,5) and (6,1) not aligned: 5 (dim 1) != 6 (dim 0)

See also the related functions: inner, outer, cross, kron, tensordot. Try for example help(kron).

### Array/Matrix transformations¶

Above we have used the .T to transpose the matrix object v. We could also have used the transpose function to accomplish the same thing.

Other mathematical functions that transforms matrix objects are:

In [153]:
C = np.matrix([[1j, 2j], [3j, 4j]])
C

Out[153]:
matrix([[ 0.+1.j,  0.+2.j],
[ 0.+3.j,  0.+4.j]])
In [154]:
np.conjugate(C)

Out[154]:
matrix([[ 0.-1.j,  0.-2.j],
[ 0.-3.j,  0.-4.j]])

Hermitian conjugate: transpose + conjugate

In [155]:
C.H

Out[155]:
matrix([[ 0.-1.j,  0.-3.j],
[ 0.-2.j,  0.-4.j]])

We can extract the real and imaginary parts of complex-valued arrays using real and imag:

In [156]:
C.real

Out[156]:
matrix([[ 0.,  0.],
[ 0.,  0.]])
In [157]:
C.imag

Out[157]:
matrix([[ 1.,  2.],
[ 3.,  4.]])

Or the complex argument and absolute value

In [158]:
np.angle(C + 1) # heads up MATLAB Users, angle is used instead of arg

Out[158]:
array([[ 0.78539816,  1.10714872],
[ 1.24904577,  1.32581766]])
In [159]:
np.abs(C)

Out[159]:
matrix([[ 1.,  2.],
[ 3.,  4.]])

### Matrix computations¶

#### Inverse¶

In [160]:
np.linalg.inv(C) # equivalent to C.I

Out[160]:
matrix([[ 0.+2.j ,  0.-1.j ],
[ 0.-1.5j,  0.+0.5j]])
In [161]:
C.I * C

Out[161]:
matrix([[  1.00000000e+00+0.j,   4.44089210e-16+0.j],
[  0.00000000e+00+0.j,   1.00000000e+00+0.j]])

#### Determinant¶

In [162]:
np.linalg.det(C)

Out[162]:
(2.0000000000000004+0j)
In [163]:
np.linalg.det(C.I)

Out[163]:
(0.50000000000000011+0j)

### Data processing¶

Often it is useful to store datasets in Numpy arrays. Numpy provides a number of functions to calculate statistics of datasets in arrays.

For example, let's calculate some properties data from the Stockholm temperature dataset used above.

In [164]:
# reminder, the temperature dataset is stored in the data variable:
data.shape

Out[164]:
(77431, 7)

#### mean¶

In [165]:
# the temperature data is in column 3
np.mean(data[:, 3])

Out[165]:
6.1971096847515854

The daily mean temperature in Stockholm over the last 200 year so has been about 6.2 C.

#### standard deviations and variance¶

In [166]:
np.std(data[:, 3]), np.var(data[:, 3])

Out[166]:
(8.2822716213405734, 68.596023209663414)

#### min and max¶

In [167]:
# lowest daily average temperature
data[:, 3].min()

Out[167]:
-25.800000000000001
In [168]:
# highest daily average temperature
data[:, 3].max()

Out[168]:
28.300000000000001

#### sum, prod, and trace¶

In [169]:
d = np.arange(0, 10)
# sum up all elements
d.sum()

Out[169]:
45
In [170]:
# product of all elements
np.prod(d + 1)

Out[170]:
3628800
In [171]:
# cummulative sum
np.cumsum(d)

Out[171]:
array([ 0,  1,  3,  6, 10, 15, 21, 28, 36, 45])
In [172]:
# cummulative product
np.cumprod(d + 1)

Out[172]:
array([      1,       2,       6,      24,     120,     720,    5040,
40320,  362880, 3628800])
In [173]:
(np.trace(A), np.diag(A).sum())

Out[173]:
(5, 5)

### Calculations with higher-dimensional data¶

When functions such as min, max, etc., are applied to a multidimensional arrays, it is sometimes useful to apply the calculation to the entire array, and sometimes only on a row or column basis. Using the axis argument we can specify how these functions should behave:

In [174]:
m = np.random.rand(3, 3)
m

Out[174]:
array([[ 0.20022724,  0.6398153 ,  0.69120939],
[ 0.02217061,  0.88314449,  0.17230136],
[ 0.34125876,  0.85354688,  0.55107958]])
In [175]:
# global max
m.max()

Out[175]:
0.88314449000001616
In [176]:
# max in each column
m.max(axis=0)

Out[176]:
array([ 0.34125876,  0.88314449,  0.69120939])
In [177]:
# max in each row
m.max(axis=1)

Out[177]:
array([ 0.69120939,  0.88314449,  0.85354688])

Many other functions and methods in the array and matrix classes accept the same (optional) axis keyword argument.

## Reshaping, resizing and stacking arrays¶

The shape of an Numpy array can be modified without copying the underlaying data, which makes it a fast operation even for large arrays.

In [178]:
A2 = A[:4, :4]
A2

Out[178]:
array([[1, 2],
[3, 4]])
In [179]:
B = A2.reshape((2, -1))
B

Out[179]:
array([[1, 2],
[3, 4]])
In [180]:
B = A2.reshape((1, -1))
B

Out[180]:
array([[1, 2, 3, 4]])

We can also use the function flatten to make a higher-dimensional array into a vector.

In [181]:
B = A2.flatten()
B

Out[181]:
array([1, 2, 3, 4])

## Adding a new dimension: newaxis¶

With newaxis, we can insert new dimensions in an array, for example converting a vector to a column or row matrix:

In [182]:
v = np.array([1, 2, 3])

In [183]:
v.shape

Out[183]:
(3,)
In [184]:
# make a column matrix of the vector v
v[:, np.newaxis]

Out[184]:
array([[1],
[2],
[3]])
In [185]:
# column matrix
v[:, np.newaxis].shape

Out[185]:
(3, 1)
In [186]:
# row matrix
v[np.newaxis, :].shape

Out[186]:
(1, 3)

## Stacking and repeating arrays¶

Using function repeat, tile, vstack, hstack, and concatenate we can create larger vectors and matrices from smaller ones:

### tile and repeat¶

In [187]:
a = np.array([[1, 2], [3, 4]])

In [188]:
# repeat each element 3 times
np.repeat(a, 3)

Out[188]:
array([1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4])
In [189]:
# tile the matrix 3 times
np.tile(a, 3)

Out[189]:
array([[1, 2, 1, 2, 1, 2],
[3, 4, 3, 4, 3, 4]])

### concatenate¶

In [190]:
b = np.array([[5, 6]])

In [191]:
np.concatenate((a, b), axis=0)

Out[191]:
array([[1, 2],
[3, 4],
[5, 6]])
In [192]:
np.concatenate((a, b.T), axis=1)

Out[192]:
array([[1, 2, 5],
[3, 4, 6]])

### hstack and vstack¶

In [193]:
np.vstack((a, b))

Out[193]:
array([[1, 2],
[3, 4],
[5, 6]])
In [194]:
np.hstack((a, b.T))

Out[194]:
array([[1, 2, 5],
[3, 4, 6]])

## Copy and "deep copy"¶

To achieve high performance, assignments in Python usually do not copy the underlaying objects. This is important for example when objects are passed between functions, to avoid an excessive amount of memory copying when it is not necessary (techincal term: pass by reference).

In [195]:
A = np.array([[1, 2], [3, 4]])
A

Out[195]:
array([[1, 2],
[3, 4]])
In [196]:
# now B is referring to the same array data as A
B = A

In [197]:
# changing B affects A
B[0,0] = 10
B

Out[197]:
array([[10,  2],
[ 3,  4]])
In [198]:
A

Out[198]:
array([[10,  2],
[ 3,  4]])

If we want to avoid this behavior, so that when we get a new completely independent object B copied from A, then we need to do a so-called "deep copy" using the function copy:

In [199]:
A = np.array([[1, 2], [3, 4]])
B = A.copy()

In [200]:
# now, if we modify B, A is not affected
B[0, 0] = -5
(A, B)

Out[200]:
(array([[1, 2],
[3, 4]]), array([[-5,  2],
[ 3,  4]]))

## Iterating over array elements¶

Generally, we want to avoid iterating over the elements of arrays whenever we can (at all costs). The reason is that in a interpreted language like Python (or MATLAB), iterations are really slow compared to vectorized operations.

However, sometimes iterations are unavoidable. For such cases, the Python for loop is the most convenient way to iterate over an array:

In [201]:
v = np.array([1,2,3,4])

for element in v:
print(element)

1
2
3
4

In [202]:
M = np.array([[1,2], [3,4]])

for row in M:
print("row", row)
for element in row:
print(element)

('row', array([1, 2]))
1
2
('row', array([3, 4]))
3
4


## Vectorizing functions¶

As mentioned several times by now, to get good performance we should try to avoid looping over elements in our vectors and matrices, and instead use vectorized algorithms. The first step in converting a scalar algorithm to a vectorized algorithm is to make sure that the functions we write works with vector inputs.

In [203]:
def Theta(x):
"""
Scalar implemenation of the Heaviside step function.
"""
if x >= 0:
return 1
else:
return 0

In [204]:
Theta(np.array([-3, -2, -1, 0, 1, 2, 3]))

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-204-8f3eabb588dd> in <module>()
----> 1 Theta(np.array([-3, -2, -1, 0, 1, 2, 3]))

<ipython-input-203-9a0cb13d93d4> in Theta(x)
3     Scalar implemenation of the Heaviside step function.
4     """
----> 5     if x >= 0:
6         return 1
7     else:

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

OK, that didn't work because we didn't write the Theta function so that it can handle with vector input...

To get a vectorized version of Theta we can use the Numpy function vectorize. In many cases it can automatically vectorize a function:

In [205]:
Theta_vec = np.vectorize(Theta)

In [206]:
Theta_vec(np.array([-3, -2, -1, 0, 1, 2, 3]))

Out[206]:
array([0, 0, 0, 1, 1, 1, 1])

We can also implement the function to accept vector input from the beginning (requires more effort but might give better performance):

In [207]:
def Theta(x):
"""
Vector-aware implemenation of the Heaviside step function.
"""
return 1 * (x >= 0)

In [208]:
Theta(np.array([-3, -2, -1, 0, 1, 2, 3]))

Out[208]:
array([0, 0, 0, 1, 1, 1, 1])
In [209]:
# still works for scalars as well
(Theta(-1.2), Theta(2.6))

Out[209]:
(0, 1)

## Using arrays in conditions¶

When using arrays in conditions in for example if statements and other boolean expressions, one need to use one of any or all, which requires that any or all elements in the array evalutes to True:

In [210]:
M

Out[210]:
array([[1, 2],
[3, 4]])
In [211]:
if (M > 5).any():
print("at least one element in M is larger than 5")
else:
print("no element in M is larger than 5")

no element in M is larger than 5

In [212]:
if (M > 5).all():
print("all elements in M are larger than 5")
else:
print("all elements in M are not larger than 5")

all elements in M are not larger than 5


## Type casting¶

Since Numpy arrays are statically typed, the type of an array does not change once created. But we can explicitly cast an array of some type to another using the astype functions (see also the similar asarray function). This always create a new array of new type:

In [213]:
M.dtype

Out[213]:
dtype('int64')
In [214]:
M2 = M.astype(float)
M2

Out[214]:
array([[ 1.,  2.],
[ 3.,  4.]])
In [215]:
M2.dtype

Out[215]:
dtype('float64')
In [216]:
M3 = M.astype(bool)
M3

Out[216]:
array([[ True,  True],
[ True,  True]], dtype=bool)