Now that we know how to do the basics of using NumPy's arrays, we can move on to doing some mathematical operations with them. As mentioned in the previous secion on benchmarking, NumPy arrays have additional functionality to Python lists when it comes to manipulating the entries.
For comparison on the semantics, compare what happens when we multiply a list and an array by some factor:
import numpy as np
python_list = [1, 2, 3, 4, 5, 6]
python_list * 3
[1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6, 1, 2, 3, 4, 5, 6]
numpy_array = np.array([1, 2, 3, 4, 5, 6])
numpy_array * 3
array([ 3, 6, 9, 12, 15, 18])
You can see that multiply means very different things to the two types:
This is at the very core of how NumPy makes mathematical operations fast.
As we just saw, any mathematical operation will be applied to the whole array elementwise:
a = np.array([1, 2, 3, 4])
a + 1
array([2, 3, 4, 5])
2**a # two to the power of a
array([ 2, 4, 8, 16])
b = np.ones(4) + 1 # [2, 2, 2, 2]
a - b
array([-1., 0., 1., 2.])
a * b
array([2., 4., 6., 8.])
j = np.arange(5)
2**(j + 1) - j
array([ 2, 3, 6, 13, 28])
Do note however that array multiplication is not the same matrix multiplication:
c = np.ones((3, 3))
c * c # This will do element-wise multiplication
array([[1., 1., 1.], [1., 1., 1.], [1., 1., 1.]])
To do matrix multiplication you must either use the dot()
method to calculate the dot product:
c.dot(c)
array([[3., 3., 3.], [3., 3., 3.], [3., 3., 3.]])
or use the new @
operator which was added in Python 3.5:
c @ c
array([[3., 3., 3.], [3., 3., 3.], [3., 3., 3.]])
Try simple arithmetic elementwise operations: add even elements with odd elements
Comparisons between arrays give an array containing booleans:
a = np.array([1, 2, 3, 4])
b = np.array([4, 2, 2, 4])
a == b
array([False, True, False, True])
a > b
array([False, False, True, False])
If you want to perform an array-wise comparison, you can use np.array_equal()
.
NumPy also has a series of more complicated function which can be applied to an array such as:
a = np.arange(5)
np.sin(a)
array([ 0. , 0.84147098, 0.90929743, 0.14112001, -0.7568025 ])
np.sqrt(a)
array([0. , 1. , 1.41421356, 1.73205081, 2. ])
np.exp(a)
array([ 1. , 2.71828183, 7.3890561 , 20.08553692, 54.59815003])
Look at the help for np.allclose()
. When might this be useful?
A reduction in programming is taking come compound object and reducing it down to some basic property of itself. For example the sum of all the elements of an array is a reduction, as is computing its maximum value.
The sum of an array can be calculated with the sum()
method or the np.sum()
function:
x = np.array([1, 2, 3, 4])
np.sum(x)
10
x.sum()
10
The maxiumum and minimum can also be calculated:
x = np.array([1, 3, 2])
x.min()
1
x.max()
3
x.argmin() # index of minimum
0
x.argmax() # index of maximum
1
Logical operations can be performed over the whole array. Like the built-in Python functions, all()
returns whether all the items in the array are True
and any()
returns whether any of the items are True
:
np.all([True, True, False])
False
np.any([True, True, False])
True
Finally, there are some simple statistics that can be gleaned:
x = np.array([1, 2, 3, 1])
x.mean()
1.75
np.median(x)
1.5
x.std() # full population standard dev.
0.82915619758885
What is the difference between sum()
and cumsum()
?
Basic operations on numpy arrays (addition, etc.) are elementwise, this means that if you are operating on two arrays, they must be the same size.
Nevertheless, It’s also possible to do operations on arrays of different sizes if NumPy can transform these arrays so that they all have the same size: this conversion is called broadcasting.
Here's an example to demonstrate:
a = np.tile(np.arange(0, 40, 10), (3, 1)).transpose()
a
array([[ 0, 0, 0], [10, 10, 10], [20, 20, 20], [30, 30, 30]])
b = np.array([0, 1, 2])
b
array([0, 1, 2])
a.shape
(4, 3)
b.shape
(3,)
You can see that the shapes of the two arrays are different, one is 4×3 and the other is one-dimensional of size 3.
NumPy can look at the two arrays and see that width of a
is 3
and the width of b
is 3
and so interprets that you want to match those together. The rule works by checking that the lengths of the trailing dimensions
In out case here, the trailing dimension of a
is 3
and the trailing dimension of b
is also 3
so broadcasting can occur:
a + b
array([[ 0, 1, 2], [10, 11, 12], [20, 21, 22], [30, 31, 32]])
We have already used broadcasting without knowing it!:
a = np.ones((4, 5))
a[0] = 2 # we assign an array of dimension 0 to an array of dimension 1
a
array([[2., 2., 2., 2., 2.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.], [1., 1., 1., 1., 1.]])
Try creating a number of different NumPy arrays of different sizes and dimensions and try broadcasting them amongst each other. Make sure you understand the rules concerning what can be broadcast and what cannot.
Know how to create arrays : array
, arange
, ones
, zeros
.
Know the shape of the array with array.shape
, then use slicing to obtain different views of the array: array[::2]
, etc. Adjust the shape of the array using reshape
or flatten it with ravel
and understand the difference between views and copies.
Obtain a subset of the elements of an array and/or modify their values with masks
a[a < 0] = 0
Know miscellaneous operations on arrays, such as finding the mean or max (array.max()
, array.mean()
). No need to retain everything, but have the reflex to search in the documentation (online docs, help()
, lookfor()
).
For advanced use: master the indexing with arrays of integers, as well as broadcasting. Know more NumPy functions to handle various array operations.
This is the end of the prepared material for this course but there is plenty more good material online. There are two main routes to take from here:
If you want to learn more about the numerical side of things, with a focus on SciPy and NumPy, look through the free notes at scipy-lectures.org, probably starting at chapter 1.5.
If you want to learn more about pandas then the best book for that is Python for Data Analysis, 2nd Edition by Wes McKinney, one of the authors of pandas. For free material, there are some excellent tutorials on the pandas website.