This section is under construction. Please, refer to the official documentation.

A class is probably the most complex and powerful concept you may find in Python (it obviously exists in many other programming languages).
It is a template used to create objects (also called *instances* of the class in question), which generally have many attributes and methods.

Let us remark that Python is a very flexible programming language, with consequently comes with rules we have to obey to cook in occordance with standard practice:

- classe names are often capitalized, in order to differentiate them from functions;
- private attributes are suffixed with
`_`

. They are normally only aimed at developpers, not at users; - attributes should never be modified directly (or very carefully). They are aimed at defining the object and informing the user. In order to modify an object, you should use methods.

We start with a first example, aimed at creating an object from the class `np.ndarray`

.

Topics to be addressed:

- parameters of constructor
- methods

In [1]:

```
'rr'.capitalize()
```

Out[1]:

In [2]:

```
import numpy as np
help(np.ndarray)
```

In order to create the object `arr`

(or instanciate the class `np.ndarray`

), the constructor of the class is called with a set of parameters.
This constructor returns the desired instance.

In [3]:

```
arr = np.ndarray(shape=(2, 2), dtype=int)
arr
```

Out[3]:

As seen in the documentation, the object `arr`

has attributes:

In [4]:

```
print(arr.shape, arr.size)
```

which charactarize the object, and methods (these are functions), often aimed at modifying the object or at returning an information given a particular knowledge:

In [5]:

```
arr.sort()
arr
```

Out[5]:

In [6]:

```
arr.mean(axis=0)
```

Out[6]:

All attributes and methods can be discovered in the documentation or with `dir`

:

In [7]:

```
dir(arr)[-10:]
```

Out[7]:

At the first look, attributes are features (floats, lists, strings…) stored in the object and methods are functions aimed at acting on the object. However, there may be an overlap between the roles of attributes and methods, as exemplified by the transposition of arrays:

In [8]:

```
arr.T
```

Out[8]:

In [9]:

```
arr.transpose()
```

Out[9]:

In [10]:

```
arr
```

Out[10]:

Both commands return exactly the same result, without modifying the object nor requiring an information, since `transpose()`

is used without arguments (`.T`

is actually defined by `transpose()`

).
This is so because the transposition of an array may be seen either as a feature of the object (user's point of view) or as an information requiring to browse the array (developper's point of view).

Let us start with a naive example.

In [11]:

```
class Student():
"""
name: student name
num: student number
"""
def __init__(self, name, num=0):
self.name = name
self.num = num
self.mark_ = None
def __str__(self):
return 'Student {} (number {})'.format(self.name, self.num)
def set_mark(self, mark):
self.mark_ = mark if type(mark) is str else None
def get_mark(self):
return 'not filled' if self.mark_ is None else self.mark_
```

In [12]:

```
help(Student)
```

In [13]:

```
roger = Student('Roger', 42)
print(roger)
print('Mark:', roger.get_mark())
```

In [14]:

```
roger.set_mark('A')
print('Mark:', roger.get_mark())
```

This section is under construction. Please, refer to the official documentation.

The main idea of multiprocessing is to split a large amount of computations among differents computers or threads inside a computer. For now, we will focus on the latter case, called multithreading. It has the advantage to share memory on its own and not to be worried with broadcasting the data among various machines. In addition, it can quickly be set up on a powerful personnal server or on a virtual machine in the cloud.

Parallel computing particularly shines when:

- there are few expensive tasks rather many cheap ones;
- tasks are single threaded rather than multithreaded.

Below is a naive example of sequential versus parallel computation in a favorable situation.

In [15]:

```
n = 1000 # Matrix size
make_psd = lambda x: x.dot(x.T)
matrices = [make_psd(np.random.randn(n, n)) for _ in range(4)]
def naive_work(M):
R = np.ndarray((n, n))
for i in range(M.size):
R.flat[i] = np.exp(-M.flat[i])
return R
```

In [16]:

```
from time import time
t_in = time()
res = []
for M in matrices:
res.append(naive_work(M))
t_out = time()
print(f'{t_out - t_in:.0f} seconds')
```

In [17]:

```
from multiprocessing import Pool
t_in = time()
with Pool() as p:
res = p.map(naive_work, matrices)
t_out = time()
print(f'{t_out - t_in:.0f} seconds')
```

**EDIT (thanks to M. Anakök):**

On Windows, the execution of the previous cell may not end or may produce an error. If so, it may solve the issue to define workers in a seperate python file and to check the scope of the code:

In [19]:

```
%%writefile workers.py
def naive_work(M):
R = np.ndarray((n, n))
for i in range(M.size):
R.flat[i] = np.exp(-M.flat[i])
return R
```

In [20]:

```
import workers
if __name__ == '__main__':
t_in = time()
with Pool() as p:
res = p.map(naive_work, matrices)
t_out = time()
print(f'{t_out - t_in:.0f} seconds')
```

When tasks are multithreaded, parallel computing is useless.

In [18]:

```
def np_work(M):
return np.exp(-M)
t_in = time()
res = []
for M in matrices:
res.append(np_work(M))
t_out = time()
print(f'{t_out - t_in:.1f} seconds')
```

In [19]:

```
t_in = time()
with Pool() as p:
res = p.map(np_work, matrices)
t_out = time()
print(f'{t_out - t_in:.1f} seconds')
```

Machine learning is a vibrant scientific field at the connection between computer science, statistics, optimization and functional analysis:

- computer science: machine learning is mainly interested in
*algorithms*for solving*tangible problems*(hand-written text recognition, image classification, etc.); - statistics: it is all about estimating a property of the distribution from which the observed data is drawn;
- optimization: estimation often relies on minimizing an empirical risk;
- functional analysis: convergence guarantees for the algorithms as well as models for the estimator rely on this mathematical field.

The tasks tackled in machine learning are:

- (semi-)supervised learning: discovering a link between an input data (observed) and an outcome (supposed unknown);
- unsupervised learning: discovering hidden patterns in data;
- reinforcement learning: learning to achieve a goal based on interacting with the environment.

The rest of this document will focus on supervised learning.

Machine learning has deep roots into statistics since its main goal is to estimate a property of a distribution based on an observed sample. However, machine learning defines a new paradigm of inference, to propose an answer to the difficulty of modeling the distribution of the data.

Unlike inferential statistics, in which practitioners define a statistical model (assumption on the distribution from which is drawn the observed sample), machine learners define a model for the estimator (no assumption on the underlying distribution).

To illustrate our words, let $(X, Y)$ be a couple of random variables with unknown joint distribution. The main tasks of supervised learning are:

- regression: estimating $x \mapsto \mathbb E[Y ~|~ X=x]$, given that $Y$ has values in $\mathbb R$;
- classification: estimating $x \mapsto \operatorname{sign}\left(\frac{\mathbb P (Y=1 ~|~ X=x)}{\mathbb P (Y=-1 ~|~ X=x)} -1\right)$, given that $Y$ has values in $\{-1, 1\}$.

These tasks of estimating the regression or classification function are done within a hypothesis class of functions (model on the estimator), regardless of the distribution of $(X, Y)$.

Several packages are available for doing machine learning in Python, depending on the kind of methods the practitioner opts for:

- Scikit-learn and Shogun are for general machine learning;
- PyTorch, TensorFlow, Theano and Caffe are devised for neural networks and deep learning.

Here, we focus on Scikit-learn, which offers a broad range of machine learning methods and has a simple and well documented API.

The general procedure for using a model is:

- creation (for instance of a decision tree):

```
clf = tree.DecisionTreeClassifier()
```

- learning (estimation with data corresponding to $X$ and labels to $Y$):

```
clf.fit(data, labels)
```

- prediction (of unknown labels):

```
clf.predict(data)
```

We provide here a (counter)example of a machine learning method for classification. This one assumes that each class is normally distributed and estimates the Gaussian parameters using maximum likelihood.

In [20]:

```
import numpy as np
from sklearn.naive_bayes import GaussianNB
import matplotlib.pyplot as plt
%matplotlib inline
# Generate data
n = 200 # Sample size
x = np.r_[np.random.randn(n//2, 2) + [0, 1],
np.random.randn(n//2, 2) + [3, -0.5]] # Data
y = np.r_[np.ones(n//2), -np.ones(n//2)] # Labels
# Fit and evaluate the model (classifier)
clf = GaussianNB() # Model creation
clf.fit(x, y) # Fit the model
y_pred = clf.predict(x) # Predict with the model
s = clf.score(x, y) # Compute the score
print("Classification score on training data is {0:0.0f}/100.".format(s*100))
# Plot data and prediction
fig, ax = plt.subplots()
mks = 10 # Marker size
alpha = 0.1
#ax.plot(x[y_pred>0, 0], x[y_pred>0, 1], 'bo', markersize=mks+5, alpha=alpha) # Pred
#ax.plot(x[y_pred<0, 0], x[y_pred<0, 1], 'ro', markersize=mks+5, alpha=alpha) # Pred
ax.plot(x[y>0, 0], x[y>0, 1], 'b.', markersize=mks) # Ground truth
ax.plot(x[y<0, 0], x[y<0, 1], 'r.', markersize=mks); # Ground truth
# Plot the frontier
X, Y = np.meshgrid(np.linspace(-5, 5, num=500), np.linspace(-5, 5, num=500))
X, Y = X.ravel(), Y.ravel()
Z = clf.predict_proba(np.c_[X, Y])[:, 0]
ind = np.where(np.fabs(Z-0.5) < 1e-2)
ax.plot(X[ind], Y[ind], 'g.', label="Frontier")
ax.legend(loc="best");
```

Inspired by `scipy.stats`

, define a class `Normal`

, that represents a random variable following a Gaussian distribution.
The object should have attributes:

`loc`

: location;`scale`

: scale;`mean`

: theoretical mean;`std`

: theoretical standard deviation;`var`

: theoretical variance;

and methods:

`pdf(x)`

: returns the pdf value for`x`

;`rvs(size=1)`

: returns an iid random sample of the specified sise.

In [ ]:

```
# Answer
```

How to redefine the previous class using inheritance?

In [ ]:

```
# Answer
```

Compare time to solve 4 linear systems sequentially and using multithreading.

In [ ]:

```
# Answer
```

We focus here on a classification problem. Propose a procedure to evaluate the accuracy of a classifer. Apply this procedure to compare a k-nearest neighbors algorithm with a decision tree on the Iris dataset.

In [ ]:

```
# Answer
```