This section is under construction. Please, refer to the official documentation.
A class is probably the most complex and powerful concept you may find in Python (it obviously exists in many other programming languages). It is a template used to create objects (also called instances of the class in question), which generally have many attributes and methods.
Let us remark that Python is a very flexible programming language, with consequently comes with rules we have to obey to cook in occordance with standard practice:
_
.They are normally only aimed at developpers, not at users;
They are aimed at defining the object and informing the user. In order to modify an object, you should use methods.
We start with a first example, aimed at creating an object from the class np.ndarray
.
Topics to be addressed:
'rr'.capitalize()
'Rr'
import numpy as np
#help(np.ndarray) # Uncomment to see
In order to create the object arr
(or instanciate the class np.ndarray
), the constructor of the class is called with a set of parameters.
This constructor returns the desired instance.
arr = np.ndarray(shape=(2, 2), dtype=int)
arr
array([[94094548068595, 0], [94082351606976, 94082351567200]])
As seen in the documentation, the object arr
has attributes:
print(arr.shape, arr.size)
(2, 2) 4
which charactarize the object, and methods (these are functions), often aimed at modifying the object or at returning an information given a particular knowledge:
arr.sort()
arr
array([[ 0, 94094548068595], [94082351567200, 94082351606976]])
arr.mean(axis=0)
array([4.70411758e+13, 9.40884498e+13])
All attributes and methods can be discovered in the documentation or with dir
:
dir(arr)[-10:]
['swapaxes', 'take', 'tobytes', 'tofile', 'tolist', 'tostring', 'trace', 'transpose', 'var', 'view']
At the first look, attributes are features (floats, lists, strings…) stored in the object and methods are functions aimed at acting on the object. However, there may be an overlap between the roles of attributes and methods, as exemplified by the transposition of arrays:
arr.T
array([[ 0, 94082351567200], [94094548068595, 94082351606976]])
arr.transpose()
array([[ 0, 94082351567200], [94094548068595, 94082351606976]])
arr
array([[ 0, 94094548068595], [94082351567200, 94082351606976]])
Both commands return exactly the same result, without modifying the object nor requiring an information, since transpose()
is used without arguments (.T
is actually defined by transpose()
).
This is so because the transposition of an array may be seen either as a feature of the object (user's point of view) or as an information requiring to browse the array (developper's point of view).
Let us start with a naive example.
class Student():
"""
name: student name
num: student number
"""
def __init__(self, name, num=0):
self.name = name
self.num = num
self.mark_ = None
def __str__(self):
return f'Student {self.name} (number {self.num})'
def set_mark(self, mark):
self.mark_ = mark if type(mark) is str else None
def get_mark(self):
return 'not filled' if self.mark_ is None else self.mark_
help(Student)
Help on class Student in module __main__: class Student(builtins.object) | Student(name, num=0) | | name: student name | num: student number | | Methods defined here: | | __init__(self, name, num=0) | Initialize self. See help(type(self)) for accurate signature. | | __str__(self) | Return str(self). | | get_mark(self) | | set_mark(self, mark) | | ---------------------------------------------------------------------- | Data descriptors defined here: | | __dict__ | dictionary for instance variables (if defined) | | __weakref__ | list of weak references to the object (if defined)
roger = Student('Roger', 42)
print(roger)
print('Mark:', roger.get_mark())
Student Roger (number 42) Mark: not filled
roger.set_mark('A')
print('Mark:', roger.get_mark())
Mark: A
Inheritance is a powerful mechanism that makes it possible to create a new class (called the child class) that has the same functionalities as another class (called the parent class), but with some modifications. The child class can have:
Very often, the child class has extra properties (to make it different from the parent class).
In the following example, StudentM
inherits from Student
but has two extra properties: spe
and year
.
By construction StudentM
also has the properties name
and num
, as well as the methods __str__
, get_mark
and set_mark
.
The new properties have to be initialized in the constructor __init__
.
For this reason, the constructor is redefined (we say that inheritance of the parent constructor is overriden).
However, instead of redefining the behavior of the constructor for inherited properties (name
and num
), we just call the parent's constructor with the super()
function.
The super()
function is a quite complex object, used mainly for multiple inheritance: when the child class depends on multiple parents, we can think of a depth-first, left-to-right search of the inherited attributes in parent classes.
super().__init__(*args)
is more complex way of searching, called call-next-method approach, especially useful and powerful for multiple inheritance.
For single inheritance (which will be generally the case for us), just remember that the super()
function represents the parent class.
class StudentM(Student):
"""
Master student
name: student name
spe: scientific specialization
num: student number
year: graduation year
"""
def __init__(self, name, spe, num=0, year=0):
self.spe = spe
self.year = year
super().__init__(name, num)
isabella = StudentM('Isabella', 'Statistics', 35, 2023)
print(isabella)
Student Isabella (number 35)
In the next cell, we complete the previous definition and:
__str__
such that it prints the specialization and the graduation year (again, this definition overrides inheritance);get_mark()
;has_graduated
.class StudentM(Student):
"""
Master student
name: student name
spe: scientific specialization
num: student number
year: graduation year
"""
def __init__(self, name, spe, num=0, year=0):
self.spe = spe
self.year = year
super().__init__(name, num)
def __str__(self):
return f'Student {self.name} (number {self.num}, specialization {self.spe}, graduation year {self.year})'
def get_mark(self):
pass
def has_graduated(self):
return type(self.mark_) is str and 'A' <= self.mark_ <= 'C'
isabella = StudentM('Isabella', 'Statistics', 35, 2023)
isabella.set_mark('B')
print(isabella.get_mark())
print(isabella.has_graduated())
None True
This section is under construction. Please, refer to the official documentation.
The main idea of multiprocessing is to split a large amount of computations among differents computers or threads inside a computer. For now, we will focus on the latter case, called multithreading. It has the advantage to share memory on its own and not to be worried with broadcasting the data among various machines. In addition, it can quickly be set up on a powerful personnal server or on a virtual machine in the cloud.
Parallel computing particularly shines when:
Below is a naive example of sequential versus parallel computation in a favorable situation.
n = 1000 # Matrix size
make_psd = lambda x: x.dot(x.T)
matrices = [make_psd(np.random.randn(n, n)) for _ in range(4)]
def naive_work(M):
R = np.ndarray((n, n))
for i in range(M.size):
R.flat[i] = np.exp(-M.flat[i])
return R
from time import time
t_in = time()
res = []
for M in matrices:
res.append(naive_work(M))
t_out = time()
print(f'{t_out - t_in:.0f} seconds')
3 seconds
from multiprocessing import Pool
t_in = time()
with Pool() as p:
res = p.map(naive_work, matrices)
t_out = time()
print(f'{t_out - t_in:.0f} seconds')
1 seconds
EDIT (thanks to M. Anakök):
On Windows, the execution of the previous cell may not end or may produce an error. If so, it may solve the issue to define workers in a seperate python file and to check the scope of the code:
%%writefile workers.py
def naive_work(M):
R = np.ndarray((n, n))
for i in range(M.size):
R.flat[i] = np.exp(-M.flat[i])
return R
Overwriting workers.py
import workers
if __name__ == '__main__':
t_in = time()
with Pool() as p:
res = p.map(naive_work, matrices)
t_out = time()
print(f'{t_out - t_in:.0f} seconds')
2 seconds
When tasks are multithreaded, parallel computing is useless.
def np_work(M):
return np.exp(-M)
t_in = time()
res = []
for M in matrices:
res.append(np_work(M))
t_out = time()
print(f'{t_out - t_in:.1f} seconds')
0.1 seconds
t_in = time()
with Pool() as p:
res = p.map(np_work, matrices)
t_out = time()
print(f'{t_out - t_in:.1f} seconds')
0.2 seconds
Machine learning is a vibrant scientific field at the connection between computer science, statistics, optimization and functional analysis:
The tasks tackled in machine learning are:
The rest of this document will focus on supervised learning.
Machine learning has deep roots into statistics since its main goal is to estimate a property of a distribution based on an observed sample. However, machine learning defines a new paradigm of inference, to propose an answer to the difficulty of modeling the distribution of the data.
Unlike inferential statistics, in which practitioners define a statistical model (assumption on the distribution from which is drawn the observed sample), machine learners define a model for the estimator (no assumption on the underlying distribution).
To illustrate our words, let $(X, Y)$ be a couple of random variables with unknown joint distribution. The main tasks of supervised learning are:
These tasks of estimating the regression or classification function are done within a hypothesis class of functions (model on the estimator), regardless of the distribution of $(X, Y)$.
Several packages are available for doing machine learning in Python, depending on the kind of methods the practitioner opts for:
Here, we focus on Scikit-learn, which offers a broad range of machine learning methods and has a simple and well documented API.
The general procedure for using a model is:
creation (for instance of a decision tree):
clf = tree.DecisionTreeClassifier()
learning (estimation with data corresponding to $X$ and labels to $Y$):
clf.fit(data, labels)
prediction (of unknown labels):
clf.predict(data)
We provide here a (counter)example of a machine learning method for classification. This one assumes that each class is normally distributed and estimates the Gaussian parameters using maximum likelihood.
import numpy as np
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
%matplotlib inline
# Generate data
n = 200 # Sample size
x = np.r_[np.random.randn(n//2, 2) + [0, 1],
np.random.randn(n//2, 2) + [3, -0.5]] # Data
y = np.r_[np.ones(n//2), -np.ones(n//2)] # Labels
# Fit and evaluate the model (classifier)
clf = GaussianNB() # Model creation
clf.fit(x, y) # Fit the model
y_pred = clf.predict(x) # Predict with the model
s = clf.score(x, y) # Compute the score
print("Classification score on training data is {0:0.0f}/100.".format(s*100))
# Plot data and prediction
fig, ax = plt.subplots()
mks = 10 # Marker size
alpha = 0.1
#ax.plot(x[y_pred>0, 0], x[y_pred>0, 1], 'bo', markersize=mks+5, alpha=alpha) # Pred
#ax.plot(x[y_pred<0, 0], x[y_pred<0, 1], 'ro', markersize=mks+5, alpha=alpha) # Pred
ax.plot(x[y>0, 0], x[y>0, 1], 'b.', markersize=mks) # Ground truth
ax.plot(x[y<0, 0], x[y<0, 1], 'r.', markersize=mks); # Ground truth
# Plot the frontier
X, Y = np.meshgrid(np.linspace(-5, 5, num=500), np.linspace(-5, 5, num=500))
X, Y = X.ravel(), Y.ravel()
Z = clf.predict_proba(np.c_[X, Y])[:, 0]
ind = np.where(np.fabs(Z-0.5) < 1e-2)
ax.plot(X[ind], Y[ind], 'g.', label="Frontier")
ax.legend(loc="best");
Classification score on training data is 98/100.
Inspired by scipy.stats
, define a class Normal
, that represents a random variable following a Gaussian distribution.
The object should have attributes:
loc
: location;scale
: scale;mean
: theoretical mean;std
: theoretical standard deviation;var
: theoretical variance;and methods:
pdf(x)
: returns the pdf value for x
;rvs(size=1)
: returns an iid random sample of the specified sise.# Answer
How to redefine the previous class using inheritance?
# Answer
Compare time to solve 4 linear systems sequentially and using multithreading.
# Answer
We focus here on a classification problem. Propose a procedure to evaluate the accuracy of a classifer. Apply this procedure to compare a k-nearest neighbors algorithm with a decision tree on the Iris dataset.
# Answer