The core libraries in Python are NumPy, SciPy, Matplotlib and Pandas. NumPy provides homogeneous multidimensional arrays, SciPy provides functions and operators for mathematical computation Matplotlib can be used to plot functions and Pandas is used for non-homogeneous data manipulation and reading and writing files.
%matplotlib inline
Python offers the following built-in data structures: tutple, list, dictionary and set. We will discuss the array data structure in a following sextion about the NumPy numerical package.
A tutple is an immutable collection of immutable items that can be of different types: int, float, bool, string. A tuple provides few operators such as count() and index()
t = 1, 2.6, 'New York', 'New York'
c = t.count('New York')
i = t.index('New York')
print("Number of occurrences: {0:d}\nIndex of 1st occurrence: {1:d}\n".format(c, i))
Number of occurrences: 2 Index of 1st occurrence: 2
A Python list is a mutable collection of mutable items that can be of different types. In other words, a list can be expanded and reduced and its items are mutable. The functions used to expand a list are: append(), extend(), insert().
alist = [1,2,5]
print(alist) # prints the list
print(alist[0]) # prints the 1st element of the list
[1, 2, 5] 1
We can add one item at a time to the end of the list using append()
alist.append(7)
print(alist)
[1, 2, 5, 7]
We can use extend() to add more items
alist.extend([8, 3, 2])
print(alist)
[1, 2, 5, 7, 8, 3, 2]
Finally, we can add an element at a certain index in the list
alist.insert(1, 'New York')
print(alist)
[1, 'New York', 2, 5, 7, 8, 3, 2]
We can remove the 1st occurrence of an item that contains a particular value from the list
alist.remove('New York')
print(alist)
[1, 2, 5, 7, 8, 3, 2]
We can also return an item with a certain index value and remove it from the list
v = alist.pop(3)
v
7
print(alist)
[1, 2, 5, 8, 3, 2]
We can slice a tuple or a list to select subsets of those collections
alist[2:4]
[5, 8]
alist2d = [[0.1, 2], [3, 4]]
print(alist2d)
alist2d[0][0]
[[0.1, 2], [3, 4]]
0.1
List comprehension can be used to apply a function to a list of objects, as with a for loop, like a map() function or a lambda function.
families = ["Pippo/Pluto", "Qui/Quo/Qua"]
Let's define a function that splits the tokens in a string and extracts the last element
def leaf(family):
length = len(family.split("/"))
leaf = family.split("/")[length - 1]
return leaf
Now we apply the function to a list of strings using a list comprehension
[leaf(family) for family in families]
['Pluto', 'Qua']
We can achieve the same result with the following line, kind of lambda function
[family.split("/")[len(family.split("/")) -1] for family in families ]
['Pluto', 'Qua']
We can easily create nested list comprehension that work similarly to nested for loops
[[(i,j) for j in range(1,5)] for i in range(1,5)]
[[(1, 1), (1, 2), (1, 3), (1, 4)], [(2, 1), (2, 2), (2, 3), (2, 4)], [(3, 1), (3, 2), (3, 3), (3, 4)], [(4, 1), (4, 2), (4, 3), (4, 4)]]
We can get the same result by applying the leaf() function to each object in the families list using the map() function.
list(map(leaf, families))
['Pluto', 'Qua']
A dictionary is a symbol table: a collection of objects in which each object is associated to a key. The collection can be expanded and reduced.
cartoon_characters = {"Pippo": 5,
"Pluto": 4,
"Topolino": 2}
[print("Name: " + name + ", score: " + str(score)) for name, score in cartoon_characters.items()]
Name: Pippo, score: 5 Name: Pluto, score: 4 Name: Topolino, score: 2
[None, None, None]
We can print the objects' keys
list(cartoon_characters.keys())
['Pippo', 'Pluto', 'Topolino']
and the values
list(cartoon_characters.values())
[5, 4, 2]
A list can be used as an array: a collection of mutable objects of the same type: chars, integers or float or bool. Let's import the NumPy library
import numpy as np
A NumPy array can be created from a list
l = [1, 2, 4]
array = np.array(l)
print(array) # prints the full array
type(array)
[1 2 4]
numpy.ndarray
or from a range of number with start, end and step
arange = np.arange(0, 9, 1) # an array from 0 to 9 with increment 1
print(arange)
[0 1 2 3 4 5 6 7 8]
A Python array is an object, an instance of a class, that contains a list of data of the same type and offers many functions to transform the data. We can write the data into a file
path = 'physics/data/myarray.txt'
with open(path, 'wb') as f:
arange.tofile(f)
Then we can read the file and put the data into a new array
with open(path, 'rb') as f:
c = np.fromfile(f, dtype='int')
print(c)
[0 1 2 3 4 5 6 7 8]
We can compute the sum, the mean, and the standard deviation of the elements in the array
s = c.sum()
m = c.mean()
e = c.std()
print("Sum: {0:.1f}\nMean: {1:.1f}\nStandard deviation: {2:.3f}".format(s, m, e))
Sum: 36.0 Mean: 4.0 Standard deviation: 2.582
array2d = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
print(array2d) # prints the 2D array
[[1 2 3] [4 5 6] [7 8 9]]
Let's print the element in the 3rd row and 2nd column
print(array2d[2, 1])
8
An n-dimensional array is used to represent a transformation into an n-dimesional space.
d = array2d.ndim # dimension of the array
print("Array dimension: {0:d}".format(d))
Array dimension: 2
Each dimension consists of a tuple of numbers that represents its shape.
array2d.shape # shape of the multidimensional array
(3, 3)
We can compute the sum of the values of the array along one dimension or axix, for example along the columns
array2d.sum(axis=0)
array([12, 15, 18])
or along the rows
array2d.sum(axis=1)
array([ 6, 15, 24])
We may want to initialize an array with zeros
zeros2d = np.zeros((3,3))
print(zeros2d)
[[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
or with ones
ones2d = np.ones((3, 3))
print(ones2d)
[[1. 1. 1.] [1. 1. 1.] [1. 1. 1.]]
or we might need an identity matrix, a square matrix whose diagonal values are ones and all the rest zeros
np.eye(3)
array([[1., 0., 0.], [0., 1., 0.], [0., 0., 1.]])
Sometimes we need a sample of real numbers from an interval that are equally spaced. The linspace() function can be used to create such samples. It takes as input the two endpoints of the interval and the number of sample data points equally spaced within that interval. It is similar to range() but instead of setting the step we set the number of data points in the sample.
x = np.linspace(0., 1., 11)
x
array([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
The data points to be represented in a matrix, or in another multi-dimesional array, may come from a sequence so that once they are in a unidimensional array we have to change its shape adding the missing dimensions and putting the data points in the right place. After a reshaping operation The number of elements will stay the same but the shape of the array will be different.
l = 1, 2, 3, 4, 5, 6, 7, 8, 9
m1 = np.array(l)
m2 = m1.reshape(3,3) # this is the same array but represented as a 3x3 matrix
print(m2)
[[1 2 3] [4 5 6] [7 8 9]]
We can also flatten the data back to its original shape
m2.flatten()
array([1, 2, 3, 4, 5, 6, 7, 8, 9])
As said at the beginning of this section an array contains mutable object, that is, each object can change value (not type)
m2[1, 1] = 0
print(m2)
[[1 2 3] [4 0 6] [7 8 9]]
We may need to create a new array, for example the column vectors from a matrix, by resizing an array. After the risizing the shape will be different but the dimensions will stay the same so to create a column vector we have to create a nwe array and copy the values of the resized (or of the original matrix) into it.
v_shape = 3
v1 = np.resize(m2, (v_shape, 1))
v1
v2 = np.zeros(3)
for i in range(0, 3):
v2[i] = v1[i];
v2
array([1., 2., 3.])
We can apply logical operators to arrays as well as to numbers. We apply a logic operator to a random matrix, a matrix whose elements are a sample of pseudo-random numbers from a uniform distribution in the interval [0, 1)
R = np.random.rand(3, 3)
A = R > 0.5
A
array([[False, True, True], [ True, True, True], [False, False, False]])
Instead of representing the logical values with True or False we can represent them with the integers 1 and o respectively
A.astype(int)
array([[0, 1, 1], [1, 1, 1], [0, 0, 0]])
We can filter the elements of an array according to the logical rule defined before
R[A]
array([0.66377115, 0.70955479, 0.96854003, 0.76779624, 0.77286988])
We can apply different rules on the array elements depending on the result of the logical operator
np.where(R > 0.5, R - 0.5, R + 0.5) > 0.5
array([[ True, False, False], [False, False, False], [ True, True, True]])
I = 5000
%time mat = np.random.standard_normal((I, I))
CPU times: total: 1.11 s Wall time: 1.13 s
Let's say we have four 2x2 matrices A, B, C and D and we want to concatenate them in one 4x4 matrix M
$$ M = \begin{bmatrix} A & B \\ C & D \end{bmatrix} $$A = np.array([[1, 2], [3, 4]])
B = np.array([[5, 6], [7, 8]])
C = np.zeros([2, 2])
D = np.ones([2, 2])
A, B, C, D
(array([[1, 2], [3, 4]]), array([[5, 6], [7, 8]]), array([[0., 0.], [0., 0.]]), array([[1., 1.], [1., 1.]]))
M_up = np.concatenate([A, B], axis=1)
M_down = np.concatenate([C, D], axis=1)
M = np.concatenate([M_up, M_down], axis=0)
M
array([[1., 2., 5., 6.], [3., 4., 7., 8.], [0., 0., 1., 1.], [0., 0., 1., 1.]])
An important feature of Python arrays is that mathematical operators on arrays do not need to use loops, an operator is applied to all members of an operand array
x = np.arange(0., 9., 1)
y = 3 * x + 5
y
array([ 5., 8., 11., 14., 17., 20., 23., 26., 29.])
vectorization is even more useful for multi-dimensional arrays
A = np.random.randint(0, 10, (3, 3))
A
array([[6, 3, 3], [3, 1, 0], [8, 5, 0]])
B = np.random.randint(0, 10, (3, 3))
B
array([[9, 1, 9], [0, 5, 2], [8, 5, 1]])
We can compute the sum of two matrices with one single statement without loops. The operator is apüplied to an element of the first matrix and the corresponding element in the second matrix
A + B
array([[15, 4, 12], [ 3, 6, 2], [16, 10, 1]])
Broadcasting allows to use an operator with operands of different shapes, for example a matrix and a scalar
B + 3
array([[12, 4, 12], [ 3, 8, 5], [11, 8, 4]])
2 * B
array([[18, 2, 18], [ 0, 10, 4], [16, 10, 2]])
Two matrices A and B can be used as operand of an operator O if the matrix with the smaller shape can be broadcasted over the bigger one by moving it along one index or both.
A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
A
array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
B = np.array((1, 2, 3))
B
array([1, 2, 3])
Clearly we can move B over A one row at a time.
A * B
array([[ 1, 4, 9], [ 4, 10, 18], [ 7, 16, 27]])
Broadcasting doesn't change the associative rule of the operator.
B * A
array([[ 1, 4, 9], [ 4, 10, 18], [ 7, 16, 27]])
C = np.array([[1, -2], [5, -9]])
C
array([[ 1, -2], [ 5, -9]])
D = np.array([[-9, 2], [-5, 1]])
D
array([[-9, 2], [-5, 1]])
C @ D
array([[1, 0], [0, 1]])
Since D is the inverse of C, that is $D = C^{-1}$ then CD = DC = I, where I is the identity matrix.
D @ C
array([[1, 0], [0, 1]])
We can apply an operator C to a vector x in two ways: directly or through their transpose $\hat{C}\vec{x} = \vec{x}^T\hat{C}^T$
x = np.array([1, 1])
C @ x
array([-1, -4])
x.T @ C.T
array([-1, -4])
A case is the computation of item similarities. Given an interaction matrix A where the rows are users and the columns represent items, a 1 represent an interaction between a user and an item. A similarity measure between items can be computed from the interaction matrix A as shown below.
A = np.array([[1,1,0,1], [1,1,1,0], [1,0,0,1]]) # 3 users and 4 items
AT = A.transpose()
S = AT.dot(A) # S is symmetric
S
array([[3, 2, 1, 2], [2, 2, 1, 1], [1, 1, 1, 0], [2, 1, 0, 2]])
We can also consider the ratings of the users on the items, e.g. on a scale from 1 to 5 (0 means not rated)
A = np.array([[5,3,0,2], [4,0,3,1], [3,0,0,1]])
AT = A.transpose()
S = AT.dot(A)
S
array([[50, 15, 12, 17], [15, 9, 0, 6], [12, 0, 9, 3], [17, 6, 3, 6]])
We can now recommend items to a user by multiplying the user's ratings by the similarity matrix. The output is a vector of values that represent how much the user should rate each item. We will have to remove from the list the items that have been already rated by the user in order to recommend previously unseen items.
u1 = A[0] # user 1 interactions
r1 = S.dot(u1)
print(r1)
[329 114 66 115]
u1_nr = (u1 == 0) # find items not rated by user 1
item_i = np.where(u1_nr) # find the index of the 1st not rated item
print(r1.dot(u1_nr), item_i)
66 (array([2], dtype=int64),)
u2 = A[1]
r2 = S.dot(u2)
print(r2)
[253 66 78 83]
u2_nr = (u2 == 0)
item_i = np.where(u2_nr)
print(r2.dot(u2), item_i)
1329 (array([1], dtype=int64),)
u3 = A[2]
S.dot(u3)
array([167, 51, 39, 57])
A record array can contain objects of different types in different colums, e.g. all integers in a column and all strings in another column
recarray = np.zeros((2,), dtype=[('Integers','i4'),('Float','f4'),('Strings','a10')]) # a record array of two empty records (dtype is optional)
print(recarray)
[(0, 0., b'') (0, 0., b'')]
# prepare three arrays each of the same type: integer, float, string
column1 = np.array((1,2), dtype='int')
column2 = np.array((1.5,2.5), dtype='float')
column3 = np.array(('Rome','Paris'), dtype='str')
recarray = [column1, column2, column3] # puts together the three arrays in a record array
print(recarray[:])
print(recarray[2])
[array([1, 2]), array([1.5, 2.5]), array(['Rome', 'Paris'], dtype='<U5')] ['Rome' 'Paris']
# another way of creating a record array
recarray2 = np.rec.array([(1,1.5,'New York'),(2,3.7,'Moscow')],dtype=[('Integers','i4'),('Float','f4'),('Strings','a10')])
print(recarray2.Strings) # prints all the values in column 'Strings'
[b'New York' b'Moscow']
import pandas as pd
path = 'covid19/data/tui-dat.csv'
df = pd.read_csv(path)
df[0:2]
open | high | low | close | volume | |
---|---|---|---|---|---|
0 | 29.01 | 29.29 | 28.70 | 29.00 | 422606 |
1 | 28.81 | 29.15 | 28.71 | 28.93 | 265548 |
# Solve the linear system AX = B
A = np.array([[3, 6, -5],
[1, -3, 2],
[5, -1, 4]])
B = np.array([[12],
[-2],
[10]])
Ainv = np.linalg.inv(A) # inverse of A
X = Ainv.dot(B) # X = A^(-1)B
print(X)
print(A.dot(X)) # returns B
[[1.75] [1.75] [0.75]] [[12.] [-2.] [10.]]
# Determinant of a squared matrix
# 1 2 3
# S = 0 2 1
# 0 0 -1
S = np.array([[1,2,3],
[0,2,1],
[0,0,-1]])
det = np.linalg.det(S)
print(det)
l = np.linalg.eigvals(S) # eigenvalues
print(l)
-2.0 [ 1. 2. -1.]
import matplotlib.pyplot as plt
mu, sigma = 0.0, 1.0 # mean value and standard deviation
sample_size = 10000
v = np.random.normal(mu,sigma, sample_size) # Normal distribution
fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(5, 5))
n = ax.hist(v, bins=100) # n contains the number of samples in each bin, the one-dimensional grid values and the plot object
Let's import the SciPy library and the Matplotlib library for visualization
from scipy.optimize import curve_fit
# Creating a linear function to model and create data
def linearfunc(x, a, b):
return a * x + b
x = np.linspace(0, 10, 100) # start = 0, stop = 0, samples = 100
y = linearfunc(x, 1, 2) # linear function defined in [0,10]
plt.figure()
plt.plot(x, y)
# Adding noise to the data
yn = y + 0.9 * np.random.normal(size=len(x))
plt.plot(x,yn)
# Executing curve_fit on noisy data
popt, pcov = curve_fit(linearfunc, x, yn) # estimates the parameters of the linear function a, b
print(popt)
yfit = linearfunc(x,popt[0],popt[1]) # the fitted linear function overlaps with the original one
plt.plot(x,yfit)
[0.99569401 2.03516151]
[<matplotlib.lines.Line2D at 0x2bd49935130>]
# Creating a Gaussian function to model and create data
def gaussfunc(x, a, b, c):
return a*np.exp(-(x-b)**2/(2*c**2))
# Generating clean data
x = np.linspace(0, 10, 100)
y = gaussfunc(x, 1, 5, 2)
# Adding noise (gaussian) to the data (also gaussian)
yn = y + 0.2 * np.random.normal(size=len(x))
plt.figure()
plt.plot(x,yn) # plot the gaussian function with random noise - red color
# Executing curve_fit on noisy data
popt, pcov = curve_fit(gaussfunc, x, yn) # estimates the parameters of the gaussian function a, b, c
print(popt)
yfit = gaussfunc(x,popt[0],popt[1],popt[2]) # plot the fitted gaussian
plt.plot(x,yfit)
[ 0.97061079 5.08490729 -2.02934528]
[<matplotlib.lines.Line2D at 0x2bd49dbbf70>]
from scipy.optimize import fsolve
curve = lambda x: (x - 1)*(x - 2)
solution1 = fsolve(curve, 0)
print(solution1)
solution2 = fsolve(curve,3)
print(solution2)
[1.] [2.]
from scipy.interpolate import interp1d
x = np.linspace(0, 3*np.pi, 10)
y = np.sin(x)
# create a linear interpolation function
linearfunc = interp1d(x, y, kind='linear')
# create a quadratic interpolation function
quadraticfunc = interp1d(x, y, kind='quadratic')
# interpolate on a grid of 1,000 points
x_interp = np.linspace(0, 3*np.pi, 100)
linear_interp = linearfunc(x_interp)
quadratic_interp = quadraticfunc(x_interp)
# plot the results
plt.figure() # new figure
plt.plot(x, y,'o') # plot the data points
plt.plot(x_interp, linear_interp, x_interp, quadratic_interp); # plot the linear and quadratic interpolations
plt.legend(['data', 'linear', 'quadratic'], loc='best')
<matplotlib.legend.Legend at 0x2bd49efe0a0>
# Interpolation of noisy data
from scipy.interpolate import UnivariateSpline
sample = 30
x = np.linspace(0.5, 10*np.pi, sample)
y = np.cos(x) + np.log10(x) + np.random.randn(sample) / 10
linearfunc = interp1d(x, y, kind='linear')
splinefunc = UnivariateSpline(x, y, s=1)
x_interp = np.linspace(0.5, 10*np.pi, 1000)
linear_interp = linearfunc(x_interp)
spline_interp = splinefunc(x_interp)
plt.figure()
plt.plot(x,y,'o')
plt.plot(x_interp, linear_interp, x_interp, spline_interp)
plt.legend(['data', 'linear', 'spline'], loc='best')
<matplotlib.legend.Legend at 0x2bd4af4fbb0>
def func(x, y):
return np.sqrt(x**2 + y**2)+np.sin(x**2 + y**2)
# creates a 2D grid of 1000x1000 points with coordinates values from 0 to 5 for both x and y
grid_x, grid_y = np.mgrid[0:5:100j, 0:5:100j]
# sample data points
xy = np.random.rand(1000, 2)
z = func(xy[:,0]*5, xy[:,1]*5)
from scipy.interpolate import griddata
# interpolating data
grid_z0 = griddata(xy*5, z, (grid_x, grid_y), method='cubic')
plt.subplot(121)
plt.imshow(func(grid_x, grid_y).T, extent=(0,1,0,1), origin='lower') # shows the image generated on the grid points
plt.plot(xy[:,0], xy[:,1], 'k.', ms=1) # print the ramdom sample points
plt.title('Original')
plt.subplot(122)
plt.imshow(grid_z0.T, extent=(0,1,0,1), origin='lower') # shows the interpolated image
plt.title('Interpolated')
Text(0.5, 1.0, 'Interpolated')
def f(x, y):
return np.exp(-(x*x + y*y) / 1.0)
#return 1 / (1 + np.exp(-5*x - 4*y))
x = np.linspace(-1, 1, 100)
y = np.linspace(-1, 1, 100)
X, Y = np.meshgrid(x, y) # generates a grid from one-dimensional arrays
z = f(X, Y)
fig = plt.figure()
ax = plt.axes(projection='3d')
ax.contour3D(X, Y, z, 100, cmap='binary')
<matplotlib.contour.QuadContourSet at 0x215ea2bfc40>