Introduction to NumPy¶

Tamás Gál (tamas.gal@fau.de)

The latest version of this notebook is available at https://github.com/escape2020/school2021

In [1]:

import numpy as np
import sys

print(f"Python version:  {sys.version}\n"
      f"NumPy version:   {np.__version__}")

rng = np.random.default_rng(42)  # initialise our random number generator

Python version:  3.9.4 | packaged by conda-forge | (default, May 10 2021, 22:10:52) 
[Clang 11.1.0 ]
NumPy version:   1.20.3

In [2]:

def describe(np_obj):
    """Print some information about a NumPy object"""
    print("object type: {0}\n"
          "size: {o.size}\n"
          "ndim: {o.ndim}\n"
          "shape: {o.shape}\n"
          "dtype: {o.dtype}"
          .format(type(np_obj), o=np_obj))

In [3]:

from IPython.core.magic import register_line_magic

@register_line_magic
def shorterr(line):
    """Show only the exception message if one is raised."""
    try:
        output = eval(line)
    except Exception as e:
        print("\x1b[31m\x1b[1m{e.__class__.__name__}: {e}\x1b[0m".format(e=e))
    else:
        return output
    
del shorterr

The basic datastructure in NumPy: `ndarray`¶

In [4]:

a = np.array([1, 2, 3, 4, 5, 6])
a

Out[4]:

array([1, 2, 3, 4, 5, 6])

In [5]:

type(a)

Out[5]:

numpy.ndarray

Array properties¶

In [6]:

a.size  # number of elements

Out[6]:

In [7]:

a.ndim

Out[7]:

In [8]:

a.shape

Out[8]:

(6,)

In [9]:

a.dtype

Out[9]:

dtype('int64')

Multi-Dimensional Arrays¶

In [10]:

b = np.array([[1, 2, 3, 4, 5], [6, 7, 8, 9, 10]])
b

Out[10]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [11]:

describe(b)

object type: <class 'numpy.ndarray'>
size: 10
ndim: 2
shape: (2, 5)
dtype: int64

Array Methods¶

In [12]:

a.min(), a.max(), a.mean(), a.sum()

Out[12]:

(1, 6, 3.5, 21)

In [13]:

Out[13]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [14]:

b.sum()

Out[14]:

In [15]:

b.sum(axis=0)

Out[15]:

array([ 7,  9, 11, 13, 15])

In [16]:

b.sum(axis=1)

Out[16]:

array([15, 40])

Operations with Arrays¶

In [17]:

Out[17]:

array([1, 2, 3, 4, 5, 6])

In [18]:

a - 42

Out[18]:

array([-41, -40, -39, -38, -37, -36])

In [19]:

a * 42 / np.pi

Out[19]:

array([13.36901522, 26.73803044, 40.10704566, 53.47606088, 66.8450761 ,
       80.21409132])

In [20]:

a**np.e, np.e**a

Out[20]:

(array([  1.        ,   6.58088599,  19.81299075,  43.30806043,
         79.43235917, 130.38703324]),
 array([  2.71828183,   7.3890561 ,  20.08553692,  54.59815003,
        148.4131591 , 403.42879349]))

In [21]:

a * a  # element-wise

Out[21]:

array([ 1,  4,  9, 16, 25, 36])

In [22]:

a @ a  # use np.dot(a, a) if you are using < Python 3.5

Out[22]:

In [23]:

Out[23]:

array([1, 2, 3, 4, 5, 6])

In [24]:

a < 3

Out[24]:

array([ True,  True, False, False, False, False])

In [25]:

a == 4

Out[25]:

array([False, False, False,  True, False, False])

In [26]:

(a > 3) & (a < 5)  # bitwise AND

Out[26]:

array([False, False, False,  True, False, False])

In [27]:

a < np.array([2, 3, 5, 2, 1, 5])

Out[27]:

array([ True,  True,  True, False, False, False])

In [28]:

np.sum(a > 2)

Out[28]:

Basic Indexing and Slicing¶

In [29]:

a[0]  # indexing starts at 0

Out[29]:

In [30]:

a[-1]  # -1 refers to the last element

Out[30]:

In [31]:

a[2:6:3]  # just like in Python: [start:end:step]

Out[31]:

array([3, 6])

In [32]:

a[::-1]  # reversing an array

Out[32]:

array([6, 5, 4, 3, 2, 1])

In [33]:

b[::-1]  # reverses axis 0

Out[33]:

array([[ 6,  7,  8,  9, 10],
       [ 1,  2,  3,  4,  5]])

Indixing and Slicing in Multiple Dimensions¶

In [34]:

Out[34]:

array([[ 1,  2,  3,  4,  5],
       [ 6,  7,  8,  9, 10]])

In [35]:

b[0, 2]

Out[35]:

In [36]:

b[0, 1:4]

Out[36]:

array([2, 3, 4])

In [37]:

b[:, 1:4]  # the `:` selects the whole axis

Out[37]:

array([[2, 3, 4],
       [7, 8, 9]])

In [38]:

b[:, 2:5:2]

Out[38]:

array([[ 3,  5],
       [ 8, 10]])

In [39]:

b[::-1, ::-1]  # reverses both axes

Out[39]:

array([[10,  9,  8,  7,  6],
       [ 5,  4,  3,  2,  1]])

Advanced Indexing¶

In [40]:

d = np.array([4, 3, 2, 5, 4, 5, 4, 4])

In [41]:

mask = np.array([True, False, False, True, False, False, True, True])
mask

Out[41]:

array([ True, False, False,  True, False, False,  True,  True])

In [42]:

d[mask]

Out[42]:

array([4, 5, 4, 4])

In [43]:

d[[1, 3, 1, 6]]

Out[43]:

array([3, 5, 3, 4])

Be careful with boolean indexing, the mask has to be a boolean array or a list of booleans.¶

In [44]:

Out[44]:

array([4, 3, 2, 5, 4, 5, 4, 4])

In [45]:

d[[False, True, False, False, True, False, False, True]]

Out[45]:

array([3, 4, 4])

In [46]:

d[[0, 1, 0, 0, 1, 0, 0, 1]]  # although we know that True==1 and False==0

Out[46]:

array([4, 3, 4, 4, 3, 4, 4, 3])

In [47]:

d[np.array([0, 1, 0, 0, 1, 0, 0, 1], dtype=bool)] 

Out[47]:

array([3, 4, 4])

The `dtype`¶

In [48]:

np.dtype

Out[48]:

numpy.dtype

In [49]:

a, a.dtype

Out[49]:

(array([1, 2, 3, 4, 5, 6]), dtype('int64'))

In [50]:

e = a * 42 / np.pi  # NumPy will choose the "right" `dtype` automatically
e, e.dtype

Out[50]:

(array([13.36901522, 26.73803044, 40.10704566, 53.47606088, 66.8450761 ,
        80.21409132]),
 dtype('float64'))

Some Basic `dtype`s¶

In [51]:

np.dtype('f')

Out[51]:

dtype('float32')

In [52]:

np.dtype('f8')

Out[52]:

dtype('float64')

In [53]:

np.dtype('i')

Out[53]:

dtype('int32')

In [54]:

np.dtype('i2')

Out[54]:

dtype('int16')

In [55]:

np.dtype('c16')

Out[55]:

dtype('complex128')

In [56]:

np.dtype('S8')  # String with a fixed length of 8

Out[56]:

dtype('S8')

Properties of `dtype`s¶

In [57]:

dt = np.dtype('>i4')

In [58]:

dt.byteorder  # endinanness: 

Out[58]:

'>'

In [59]:

dt.itemsize

Out[59]:

In [60]:

dt.name

Out[60]:

'int32'

Structured `dtypes`¶

In [61]:

dt = np.dtype([('x', 'f8'), ('y', 'f8'), ('E', 'i4')])

In [62]:

dt.itemsize

Out[62]:

In [63]:

dt['x']

Out[63]:

dtype('float64')

In [64]:

np.dtype("i4, (3,4)f8, c8")  # three fields, second field has shape (3, 4)

Out[64]:

dtype([('f0', '<i4'), ('f1', '<f8', (3, 4)), ('f2', '<c8')])

Using `dtype`s¶

In [65]:

np.array([1, 2, 3], dtype='c8')

Out[65]:

array([1.+0.j, 2.+0.j, 3.+0.j], dtype=complex64)

In [66]:

dt = np.dtype([('x', 'f8'), ('y', 'f8'), ('E', 'i4')])

In [67]:

f = np.array([(1, 2, 3), (4, 5, 6), (7, 8, 9)], dtype=dt)
f

Out[67]:

array([(1., 2., 3), (4., 5., 6), (7., 8., 9)],
      dtype=[('x', '<f8'), ('y', '<f8'), ('E', '<i4')])

In [68]:

f['x']

Out[68]:

array([1., 4., 7.])

In [69]:

f['E']

Out[69]:

array([3, 6, 9], dtype=int32)

In [70]:

f[2]['y']

Out[70]:

8.0

Helper Functions to Create Arrays¶

In [71]:

np.arange(7)

Out[71]:

array([0, 1, 2, 3, 4, 5, 6])

In [72]:

np.ones(10)

Out[72]:

array([1., 1., 1., 1., 1., 1., 1., 1., 1., 1.])

In [73]:

np.zeros(5)

Out[73]:

array([0., 0., 0., 0., 0.])

In [74]:

np.zeros((2, 4))

Out[74]:

array([[0., 0., 0., 0.],
       [0., 0., 0., 0.]])

In [75]:

np.empty(20)

Out[75]:

array([2.68156159e+154, 2.68156159e+154, 4.94065646e-323, 0.00000000e+000,
       0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
       0.00000000e+000, 0.00000000e+000, 0.00000000e+000, 0.00000000e+000,
       0.00000000e+000, 0.00000000e+000, 2.68156159e+154, 1.72723384e-077,
       2.68156159e+154, 1.72723384e-077, 9.88131292e-324, 1.39067116e-308])

In [76]:

np.eye(5)

Out[76]:

array([[1., 0., 0., 0., 0.],
       [0., 1., 0., 0., 0.],
       [0., 0., 1., 0., 0.],
       [0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 1.]])

In [77]:

np.linspace(1, 2, 10)

Out[77]:

array([1.        , 1.11111111, 1.22222222, 1.33333333, 1.44444444,
       1.55555556, 1.66666667, 1.77777778, 1.88888889, 2.        ])

In [78]:

np.ones_like(b)

Out[78]:

array([[1, 1, 1, 1, 1],
       [1, 1, 1, 1, 1]])

In [79]:

np.ones(10, dtype='i2')

Out[79]:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1], dtype=int16)

Random numbers¶

In [80]:

rng = np.random.default_rng(42)  # always create a generator with a seed!

In [81]:

rng.integers(1, 10, (2, 20))

Out[81]:

array([[1, 7, 6, 4, 4, 8, 1, 7, 2, 1, 5, 9, 7, 7, 7, 8, 5, 2, 8, 5],
       [5, 4, 2, 9, 8, 6, 4, 8, 5, 4, 5, 3, 1, 5, 8, 1, 8, 8, 3, 6]])

In [82]:

rng.random((3, 4))

Out[82]:

array([[0.75808774, 0.35452597, 0.97069802, 0.89312112],
       [0.7783835 , 0.19463871, 0.466721  , 0.04380377],
       [0.15428949, 0.68304895, 0.74476216, 0.96750973]])

In [83]:

rng.uniform(0, 5, 10)

Out[83]:

array([1.62912679, 1.85229853, 2.34777906, 0.9473568 , 0.64960753,
       2.37852463, 1.13454675, 3.34906997, 2.18575959, 4.16339098])

Broadcasting¶

In [84]:

g = np.array([1, 2, 3, 4])
h = np.array([5, 6, 7, 8])
g * h  # if the shapes match, operations are usually done element-by-element

Out[84]:

array([ 5, 12, 21, 32])

In [85]:

g * 23  # as we have already seen, the rule relaxes when the shapes meet certain constraints

Out[85]:

array([23, 46, 69, 92])

Broadcasting rules¶

NumPy compares the shapes element-wise, starting with the trailing dimension
two dimensions are compatible if they are equal or one of them is 1
raises a ValueError: frames are not aligned if the shapes are incompatible
the size of a successfully broadcasted array is the maximus size along each dimension of the input arrays

Operation on two arrays with different shapes¶

A      (4d array):  5 x 1 x 4 x 1
B      (3d array):      7 x 1 x 5
Result (4d array):  5 x 7 x 4 x 5

In [86]:

arr_1 = np.array([[1, 2, 3], [4, 5, 6]])
arr_2 = np.array([[1], [2]])

print('arr_1 shape:', arr_1.shape)
print('arr_2 shape:', arr_2.shape)

arr_3 = arr_1 + arr_2
print('arr_3 shape:', arr_3.shape)

arr_3

arr_1 shape: (2, 3)
arr_2 shape: (2, 1)
arr_3 shape: (2, 3)

Out[86]:

array([[2, 3, 4],
       [6, 7, 8]])

In [87]:

i = np.arange(20).reshape(4, 5)
i

Out[87]:

array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14],
       [15, 16, 17, 18, 19]])

In [88]:

describe(i)

object type: <class 'numpy.ndarray'>
size: 20
ndim: 2
shape: (4, 5)
dtype: int64

In [89]:

i * np.array([0, 1, 2, 4, 5])

Out[89]:

array([[ 0,  1,  4, 12, 20],
       [ 0,  6, 14, 32, 45],
       [ 0, 11, 24, 52, 70],
       [ 0, 16, 34, 72, 95]])

In [90]:

j = np.array([0, 10, 20, 30])
k = np.array([7, 8, 9])

In [91]:

%shorterr j+k

ValueError: operands could not be broadcast together with shapes (4,) (3,)

In [92]:

j[:, np.newaxis]  # inserts a new axis, making it two dimensional

Out[92]:

array([[ 0],
       [10],
       [20],
       [30]])

In [93]:

j[:, np.newaxis] + k

Out[93]:

array([[ 7,  8,  9],
       [17, 18, 19],
       [27, 28, 29],
       [37, 38, 39]])

Universal Functions (`ufunc`)¶

A `ufunc` is a "vectorized" wrapper for a function that takes a fixed number of scalar inputs and produces a fixed number of scalar outputs.¶

NumPy provides a bunch of ufuncs:

Math operations (add(), subtract(), square(), log10(), ...)
Trigonometric functions (sin(), cos(), tan(), deg2rad(), ...)
Bit-twiddling functions (bitwise_and(), right_shift(), ...)
Comparison functions (greater(), less_equal(), fmax(), ...)
Floating functions (isnan(), isinf(), floor(), ...)

They all are subclasses of np.ufunc

In [94]:

type(np.cos)  # they all are subclasses of np.ufunc

Out[94]:

numpy.ufunc

Create your own `ufunc` with `np.frompyfunc(func, nin, nout)`¶

In [95]:

m = rng.integers(0, 100, 17)
m

Out[95]:

array([62, 70,  9, 31, 76, 83, 43, 80, 84, 38, 89, 28, 23, 68, 63, 13, 83])

In [96]:

def step_23(x):
    return 1 if x > 23 else 0

In [97]:

%shorterr step_23(m)

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

In [98]:

ustep_23 = np.frompyfunc(step_23, 1, 1)

In [99]:

ustep_23(42)

Out[99]:

In [100]:

ustep_23(5)

Out[100]:

In [101]:

ustep_23(m)

Out[101]:

array([1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1], dtype=object)

In [102]:

ustep_23(rng.integers(0, 100, (2, 3, 4)))

Out[102]:

array([[[0, 1, 0, 1],
        [1, 1, 1, 1],
        [1, 1, 1, 1]],

       [[1, 1, 1, 0],
        [0, 1, 0, 1],
        [1, 1, 1, 1]]], dtype=object)

Views and Copies¶

In [103]:

original = np.arange(10)
original

Out[103]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [104]:

ref_to_original = original   # will point to `original`
ref_to_original[2] = 99
original             # changing `ref_to_original` has changed `original`

Out[104]:

array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])

In [105]:

single_value = original[5]      # single element access returns a copy
single_value

Out[105]:

In [106]:

single_value = 9999
original             # not affected when `single_value` is changed

Out[106]:

array([ 0,  1, 99,  3,  4,  5,  6,  7,  8,  9])

Slices return (memory) views¶

In [107]:

original = np.arange(10)
original

Out[107]:

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [108]:

a_slice = original[2:4]    # slices return (memory) views
a_slice

Out[108]:

array([2, 3])

In [109]:

a_slice[1] = 1000  # changing elements of `original` are actual changes to `a_slice`
original

Out[109]:

array([   0,    1,    2, 1000,    4,    5,    6,    7,    8,    9])

In [110]:

original[3:6] = [101, 102, 103]   # changing multiple elements at once
original

Out[110]:

array([  0,   1,   2, 101, 102, 103,   6,   7,   8,   9])