Version 1.0. Prepared by Makzan. Updated at 2021 March.
Numpy provides essential vector and matric computation. Numpy comes with its own array implementation similar to Python list. The difference is that numpy array has only one type for the entire array.
import numpy as np
arr1 = np.array([1,2,3,4,5])
print(arr1)
[1 2 3 4 5]
array from range
arr2 = np.array(range(10))
print(arr2)
[0 1 2 3 4 5 6 7 8 9]
array from range with arange
arr2b = np.arange(10)
print(arr2b)
[0 1 2 3 4 5 6 7 8 9]
arr2c = np.arange(10,20)
print(arr2c)
[10 11 12 13 14 15 16 17 18 19]
arr2d = np.arange(1,20,2)
print(arr2d)
[ 1 3 5 7 9 11 13 15 17 19]
We can specify the data type by using dtype
.
arr3 = np.array(range(10), dtype='float')
print(arr3)
[0. 1. 2. 3. 4. 5. 6. 7. 8. 9.]
Please generate an array of [10,20,30,40,50,60,70,80,90,100], in int type.
Expected result |
---|
array([ 10, 20, 30, 40, 50, 60, 70, 80, 90, 100]) |
Please generate an array of [10,20,30,40,50,60,70,80,90,100], in float type
Expected result |
---|
array([ 10., 20., 30., 40., 50., 60., 70., 80., 90., 100.]) |
arr6 = np.zeros(10)
print(arr6)
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]
arr6b = np.zeros(10, dtype='int')
print(arr6b)
[0 0 0 0 0 0 0 0 0 0]
arr7 = np.ones(10, dtype='float')
print(arr7)
[1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
arr8 = np.full(3, 3.14)
print(arr8)
[3.14 3.14 3.14]
arr9 = np.full( (3,5), 3.14)
print(arr9)
[[3.14 3.14 3.14 3.14 3.14] [3.14 3.14 3.14 3.14 3.14] [3.14 3.14 3.14 3.14 3.14]]
arr10 = np.random.rand(100)
print(arr10)
[0.79172504 0.52889492 0.56804456 0.92559664 0.07103606 0.0871293 0.0202184 0.83261985 0.77815675 0.87001215 0.97861834 0.79915856 0.46147936 0.78052918 0.11827443 0.63992102 0.14335329 0.94466892 0.52184832 0.41466194 0.26455561 0.77423369 0.45615033 0.56843395 0.0187898 0.6176355 0.61209572 0.616934 0.94374808 0.6818203 0.3595079 0.43703195 0.6976312 0.06022547 0.66676672 0.67063787 0.21038256 0.1289263 0.31542835 0.36371077 0.57019677 0.43860151 0.98837384 0.10204481 0.20887676 0.16130952 0.65310833 0.2532916 0.46631077 0.24442559 0.15896958 0.11037514 0.65632959 0.13818295 0.19658236 0.36872517 0.82099323 0.09710128 0.83794491 0.09609841 0.97645947 0.4686512 0.97676109 0.60484552 0.73926358 0.03918779 0.28280696 0.12019656 0.2961402 0.11872772 0.31798318 0.41426299 0.0641475 0.69247212 0.56660145 0.26538949 0.52324805 0.09394051 0.5759465 0.9292962 0.31856895 0.66741038 0.13179786 0.7163272 0.28940609 0.18319136 0.58651293 0.02010755 0.82894003 0.00469548 0.67781654 0.27000797 0.73519402 0.96218855 0.24875314 0.57615733 0.59204193 0.57225191 0.22308163 0.95274901]
arr10b = np.random.rand(3,3)
print(arr10b)
[[0.44712538 0.84640867 0.69947928] [0.29743695 0.81379782 0.39650574] [0.8811032 0.58127287 0.88173536]]
In programming language, random is not true random. We call it pseudorandom. Given the same seed, we can always generate the same sequence of numbers.
np.random.seed(0)
arr11 = np.random.rand(10,1)
print(arr11)
[[0.5488135 ] [0.71518937] [0.60276338] [0.54488318] [0.4236548 ] [0.64589411] [0.43758721] [0.891773 ] [0.96366276] [0.38344152]]
If we try to keep executing the following random function, we will keep getting new random numbers. But indeed, they are following the same sequence given the same seed.
Try re-running the previous seed and we will get the same sequence again.
np.random.rand(10,1)
array([[0.79172504], [0.52889492], [0.56804456], [0.92559664], [0.07103606], [0.0871293 ], [0.0202184 ], [0.83261985], [0.77815675], [0.87001215]])
Exercise: Pleaes try using the seed 540, and see if you can generate the following expected result.
np.random.seed(0)
arr = np.random.rand(10,1)
print(arr)
[[0.5488135 ] [0.71518937] [0.60276338] [0.54488318] [0.4236548 ] [0.64589411] [0.43758721] [0.891773 ] [0.96366276] [0.38344152]]
Expected Result for seed(540) |
---|
[[0.71688165] |
[0.50553693] [0.18142109] [0.70069925] [0.81784415] [0.28708016] [0.97490719] [0.09495503] [0.84069722] [0.06900928]]|
arr4 = np.linspace(0,10,3)
print(arr4)
[ 0. 5. 10.]
arr4b = np.linspace(0,100,5)
print(arr4b)
[ 0. 25. 50. 75. 100.]
arr4c = np.linspace(0,1,4)
print(arr4c)
[0. 0.33333333 0.66666667 1. ]
arr5 = np.arange(1,13).reshape([3,4])
print(arr5)
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]]
arr5.shape
(3, 4)
Try to create a random array with shape (5,4)
np.random.seed(0) # Reset the seed, in order to re-create the same result.
Expected result |
---|
array([[0.5488135 , 0.71518937, 0.60276338, 0.54488318], |
[0.4236548 , 0.64589411, 0.43758721, 0.891773 ],
[0.96366276, 0.38344152, 0.79172504, 0.52889492],
[0.56804456, 0.92559664, 0.07103606, 0.0871293 ],
[0.0202184 , 0.83261985, 0.77815675, 0.87001215]])|
Try to create a one's array with shape (5,4)
Expected result |
---|
array([[1., 1., 1., 1.], |
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.],
[1., 1., 1., 1.]])|
grid = np.arange(1,10).reshape([3,3])
print(grid)
[[1 2 3] [4 5 6] [7 8 9]]
grid2 = np.arange(1,4)
print(grid2)
[1 2 3]
grid2 = np.tile(grid2, (3,1))
print(grid2)
[[1 2 3] [1 2 3] [1 2 3]]
print(grid+grid2)
[[ 2 4 6] [ 5 7 9] [ 8 10 12]]
print(grid-grid2)
[[0 0 0] [3 3 3] [6 6 6]]
print(grid*grid2)
[[ 1 4 9] [ 4 10 18] [ 7 16 27]]
print(grid/grid2)
[[1. 1. 1. ] [4. 2.5 2. ] [7. 4. 3. ]]
print(grid//grid2)
[[1 1 1] [4 2 2] [7 4 3]]
print(grid ** grid2)
[[ 1 4 27] [ 4 25 216] [ 7 64 729]]
grid = np.arange(1,10).reshape([3,3])
print(grid)
[[1 2 3] [4 5 6] [7 8 9]]
grid.shape
(3, 3)
print(grid + 3)
[[ 4 5 6] [ 7 8 9] [10 11 12]]
print(grid*3)
[[ 3 6 9] [12 15 18] [21 24 27]]
print(grid/10)
[[0.1 0.2 0.3] [0.4 0.5 0.6] [0.7 0.8 0.9]]
print(grid/3)
[[0.33333333 0.66666667 1. ] [1.33333333 1.66666667 2. ] [2.33333333 2.66666667 3. ]]
print(grid//3)
[[0 0 1] [1 1 2] [2 2 3]]
print(grid+1)
[[ 2 3 4] [ 5 6 7] [ 8 9 10]]
grid2 = np.arange(1,4)
print(grid2)
[1 2 3]
grid2.shape
(3,)
grid = np.arange(1,10).reshape([3,3])
print(grid)
[[1 2 3] [4 5 6] [7 8 9]]
print(grid+grid2)
[[ 2 4 6] [ 5 7 9] [ 8 10 12]]
print(grid ** 2)
[[ 1 4 9] [16 25 36] [49 64 81]]
print(grid % 5)
[[1 2 3] [4 0 1] [2 3 4]]
arr = np.random.random(10000)
print(arr)
[0.5488135 0.71518937 0.60276338 ... 0.75842952 0.02378743 0.81357508]
print(np.sum(arr))
4964.588916200894
print(np.max(arr))
0.9999779517807228
print(np.min(arr))
7.2449638492178e-05
print(np.mean(arr))
0.49645889162008944
print(np.median(arr))
0.49350103035904186
print(len(arr[arr<0.2]))
2060
print(len(arr[(arr>0.2) & (arr<0.3)]))
995
Exercise: Given the following numpy array, please find all the records with negative value.
arr = np.array((-3, 10, 20, -5, -2, 50, 34, -12, 10))
arr
array([ -3, 10, 20, -5, -2, 50, 34, -12, 10])
Expected Result |
---|
array([ -3, -5, -2, -12]) |
Slicing in NumPy array is NOT COPY.
# [i, j]
# [i, :]
# [:, j]
# [i_start:i_end, j_start:j_end]
grid = np.arange(1,13).reshape([3,4])
print(grid)
[[ 1 2 3 4] [ 5 6 7 8] [ 9 10 11 12]]
print(grid[0,:])
[1 2 3 4]
print(grid[:,0])
[1 5 9]
print(grid[:,1:3])
[[ 2 3] [ 6 7] [10 11]]
grid2 = grid[:,:]
Let’s see if the slicing is a copy or not:
grid[0,0] = 100
print(grid)
print(grid2)
[[100 2 3 4] [ 5 6 7 8] [ 9 10 11 12]] [[100 2 3 4] [ 5 6 7 8] [ 9 10 11 12]]
grid[:,1:3] = 99
print(grid)
print(grid2)
[[100 99 99 4] [ 5 99 99 8] [ 9 99 99 12]] [[100 99 99 4] [ 5 99 99 8] [ 9 99 99 12]]
data = np.genfromtxt('visitors.csv',delimiter=',', dtype='datetime64[D],uint8', skip_header=0, names=('date','visitors'))
print(data)
[('2018-12-18', 22) ('2018-12-17', 0) ('2018-12-16', 4) ('2018-12-15', 218) ('2018-12-14', 11) ('2018-12-13', 11) ('2018-12-12', 14) ('2018-12-11', 4) ('2018-12-10', 5) ('2018-12-09', 15) ('2018-12-08', 104) ('2018-12-07', 19) ('2018-12-06', 8) ('2018-12-05', 3) ('2018-12-04', 24) ('2018-12-03', 66) ('2018-12-02', 40) ('2018-12-01', 69) ('2018-11-30', 8) ('2018-11-29', 13) ('2018-11-28', 10) ('2018-11-27', 18) ('2018-11-26', 72) ('2018-11-25', 31) ('2018-11-24', 146) ('2018-11-23', 42) ('2018-11-22', 56) ('2018-11-21', 19) ('2018-11-20', 76) ('2018-11-19', 11) ('2018-11-18', 0) ('2018-11-17', 0) ('2018-11-16', 6) ('2018-11-15', 7) ('2018-11-14', 32) ('2018-11-13', 102) ('2018-11-12', 198) ('2018-11-11', 22) ('2018-11-10', 82) ('2018-11-09', 213) ('2018-11-08', 52) ('2018-11-07', 13) ('2018-11-06', 0) ('2018-11-05', 6) ('2018-11-04', 0) ('2018-11-03', 7) ('2018-11-02', 25) ('2018-11-01', 29) ('2018-10-31', 9) ('2018-10-30', 14) ('2018-10-29', 4) ('2018-10-28', 4)]
data['date']
array(['2018-12-18', '2018-12-17', '2018-12-16', '2018-12-15', '2018-12-14', '2018-12-13', '2018-12-12', '2018-12-11', '2018-12-10', '2018-12-09', '2018-12-08', '2018-12-07', '2018-12-06', '2018-12-05', '2018-12-04', '2018-12-03', '2018-12-02', '2018-12-01', '2018-11-30', '2018-11-29', '2018-11-28', '2018-11-27', '2018-11-26', '2018-11-25', '2018-11-24', '2018-11-23', '2018-11-22', '2018-11-21', '2018-11-20', '2018-11-19', '2018-11-18', '2018-11-17', '2018-11-16', '2018-11-15', '2018-11-14', '2018-11-13', '2018-11-12', '2018-11-11', '2018-11-10', '2018-11-09', '2018-11-08', '2018-11-07', '2018-11-06', '2018-11-05', '2018-11-04', '2018-11-03', '2018-11-02', '2018-11-01', '2018-10-31', '2018-10-30', '2018-10-29', '2018-10-28'], dtype='datetime64[D]')
data['visitors']
array([ 22, 0, 4, 218, 11, 11, 14, 4, 5, 15, 104, 19, 8, 3, 24, 66, 40, 69, 8, 13, 10, 18, 72, 31, 146, 42, 56, 19, 76, 11, 0, 0, 6, 7, 32, 102, 198, 22, 82, 213, 52, 13, 0, 6, 0, 7, 25, 29, 9, 14, 4, 4], dtype=uint8)
What is the shape of the loaded CSV data
?
Expected result |
---|
(52,) |
What is the last 3 records in the data?
Expected result |
---|
array([('2018-10-30', 14), ('2018-10-29', 4), ('2018-10-28', 4)], |
dtype=[('date', '<M8[D]'), ('visitors', 'u1')])|
What is the maximum visitors count for a single day?
Expected result |
---|
218 |
What is the minimum visitors coutn for a single day?
np.min(data['visitors'])
0
Expected result |
---|
0 |
If we exclude the day with 0 visitors, what is the minimum visitors a day?
First, try to create an array of visitors that exlucdes all 0 data.
Expected result |
---|
array([ 22, 4, 218, 11, 11, 14, 4, 5, 15, 104, 19, 8, 3, |
24, 66, 40, 69, 8, 13, 10, 18, 72, 31, 146, 42, 56,
19, 76, 11, 6, 7, 32, 102, 198, 22, 82, 213, 52, 13,
6, 7, 25, 29, 9, 14, 4, 4], dtype=uint8)|
Next, we find the minimum value.
Expected result |
---|
3 |
v1 = [2,3]
v2 = [5,3]
np.dot(v1, v2)
19
A=[[2,-1],
[0,3],
[1,0]]
B=[[0,1,4,-1],
[-2,0,0,2]]
C = np.dot(A, B)
print(C)
[[ 2 2 8 -4] [-6 0 0 6] [ 0 1 4 -1]]
For instance, we can calculate the degree between two vector by using dot product and norm.
$ a.b=|a||b|\cos(\theta) $
def theta(v1, v2):
dot_product = np.dot(v1,v2)
norms = np.linalg.norm(v1)*np.linalg.norm(v2)
rad = np.arccos(dot_product/norms)
return np.rad2deg(rad)
v1 = [0,1]
v2 = [1,0]
theta(v1,v2)
90.0
v1 = [1,4,5]
v2 = [2,1,5]
v3 = [3,5,6]
Which two vectors are more "similar" to each other?
theta(v1,v2)
29.152519407030084
theta(v1,v3)
12.186074922100465
For instance, we can use Numpy to solve linear equations.
$ x+2y=7 $
$ 3x+4y=15 $
We can express the equations in matrix form.
$ Ar=s $
$ r = A^{-1}s $
A = [[1, 2],
[3, 4]]
s = [7, 15]
Ainv = np.linalg.inv(A)
Ainv
array([[-2. , 1. ], [ 1.5, -0.5]])
np.dot(Ainv, s)
array([1., 3.])
Let's try to solve the equations directly by using np.linalg.solve
.
np.linalg.solve(A, s)
array([1., 3.])
Another example of solving a 3-variable equations by using Numpy.
$ 2x+y+3z=20 $
$ x+2y+4z=21 $
$ x+y+2z=13 $
We can express the equations in matrix form.
A = [[2, 1, 3],
[1, 2, 4],
[1, 1, 2]]
s = [20, 21, 13]
r = np.linalg.solve(A, s)
r
array([5., 4., 2.])
Given the following equations, please calculate the value of x, y and z
$ x+y+z=14 $
$ 2x+y+2z=25 $
$ 3y+z=16 $
Expected result |
---|
array([4., 3., 7.]) |
In this lesson, we learned to express vector and matrix by using Numpy. We also learned essential operations and have a glimpse on how Numpy can help us on numerical computation.
In next lesson, we will use Pandas to process our data into tabular data with series
.