from mllab import *
Packages: nympy as np matplotlib.pyplot as plt seaborn as sns Functions: plotXY plot_frontiere map_regions covariance plot_cov sample_gmm scatter plot_level_set gaussian_sample
covariance?
Signature: covariance(sigma1=1.0, sigma2=1.0, theta=0.0) Docstring: Covariance matrix with eigenvalues sigma1 and sigma2, rotated by the angle theta. File: ~/bitbucket/class/2017/5MS102_Apprentissage_non-supervisé/nb/m2/mllab.py Type: function
Based on the Cholesky decomposition of a $2 \times 2$ covariance matrix $\Sigma$, write a function that generates a multivariate Gaussian $n$-sample of mean $\mu \in \mathbb R^2$ and covariance $\Sigma$.
The corresponding numpy array should be of size $(n, 2)$.
Compute the mean and the empirical covariance of the sample using Numpy routines.
# Answer
def gaussian_sample(mu=[0, 0], sigma1=1., sigma2=1., theta=0., n=50):
# Todo
# End todo
return X
# Todo
# End todo
Generate two multivariate Gaussian samples of size $n_1 = n_2 = 50$ with different means and equal covariance matrices.
Plot both samples with different markers by using the function plotXY
.
# Answer
$$ \begin{cases} \forall i \in \{\pm 1\}: X~|~Y=i \sim \mathcal N (\mu_i, \Sigma_i)\\ \mathbb P(Y=1) = \pi, \mathbb P(y=-1)=1-\pi. \end{cases} $$Write a function that creates and returns two Numpy arrays
X
andy
such thatX
is the data matrix (of size $n \times 2$) andy
is the vector of labels (of size $n$, with values $\pm 1$) corresponding to i.i.d. copies of $(X, Y)$ distributed such that:
Create a dataset of size $n=100$.
# Answer
def sample_classif(weight=0.5,
param1=dict(mu=[0, 0], sigma1=1., sigma2=1.),
param2=dict(mu=[0, 0], sigma1=1., sigma2=1.),
n=50):
Y = 2 * np.random.binomial(n=1, p=weight, size=n) - 1 # Labels 1 or -1
# Todo
# End todo
return X, Y
# Todo
# End todo
Based on the following code, implement a linear discriminant classifier, taking as parameters an $n \times 2$ Numpy array as data and a size-$n$ array of labels.
# Answer
from sklearn.base import BaseEstimator
from sklearn.discriminant_analysis import LinearClassifierMixin
class LDA(BaseEstimator, LinearClassifierMixin):
"""
LDA classifier for two classes.
"""
def __init__(self, bias=False):
"""
bias: default normalization (False) of the covariance matrix
estimator is by ``(N - 2)``, where ``N`` is the number of
observations given (unbiased estimate). If `bias` is True,
then normalization is by ``N``.
"""
self.bias = bias
self.yvalues_ = None
self.coef_ = None
self.intercept_ = None
def fit(self, X, y,):
self.yvalues_ = np.unique(y)
assert self.yvalues_.size==2
# Estimate covariance matrix and means
n_pos, n = np.sum(y == self.yvalues_[1]), X.shape[0] # Number of positive labels and size of the dataset
# Todo
# End todo
# Compute direction and intercept
# Todo
# End todo
return self
def decision_function(self, X):
# Compute decisions
# Todo
# End todo
return decisions
def predict(self, X):
# Compute predictions
predictions = np.ones(X.shape[0]) * self.yvalues_[0] # Negative label
# Todo
# End todo
return predictions
Fit a linear discriminant classifier on the data
X,y
.
Plot the data along with the classifier frontiere (use the function plot_frontiere
).
# Answer
Compare the result of scikit-learn LDA (decision function and frontiere).
# Answer
Analyze the behavior of LDA and QDA when it is faced to anisotropic Gaussian samples (in particular, check if the frontiere is the bisector of the line segment for which the extremities are both class centers), and then to Gaussian samples with different covariance matrices (you can use
plot_frontiere
with a list of classifiers).
# Answer
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis
qda = QuadraticDiscriminantAnalysis()
# Gassian parameters
mu1 = [0, 0]
mu2 = [5, 3]
plt.figure(figsize=(20, 20))
for ifig, (param1, param2) in enumerate([(dict(mu=mu1, sigma1=1, sigma2=1, theta=0), dict(mu=mu2, sigma1=1, sigma2=1, theta=0)),
(dict(mu=mu1, sigma1=1, sigma2=5, theta=0), dict(mu=mu2, sigma1=1, sigma2=5, theta=0)),
(dict(mu=mu1, sigma1=1, sigma2=5, theta=np.pi/6), dict(mu=mu2, sigma1=1, sigma2=5, theta=np.pi/6)),
(dict(mu=mu1, sigma1=1, sigma2=5, theta=0), dict(mu=mu2, sigma1=5, sigma2=1, theta=0)),
(dict(mu=mu1, sigma1=1, sigma2=5, theta=0), dict(mu=mu2, sigma1=5, sigma2=1, theta=np.pi/3))]):
# Dataset
# Todo
# End todo
# Discriminant analysis
# Todo
# End todo
# Class means
# Todo
# End todo
# Plot frontieres and class means
# Todo
# End todo
Implement the Fisher discriminant analysis based on the following code.
In practice, what is the difference between LDA and FisherDA?
# Answer
class FisherDA(BaseEstimator, LinearClassifierMixin):
"""
Fisher discriminant analysis for two classes.
"""
def __init__(self):
self.coef_ = None
self.intercept_ = None
def fit(self, X, y):
# Estimate prior, covariance matrix and means
# To do
# End todo
# Compute direction and intercept
# Todo
# End todo
self.intercept_ = 0
ypred = self.decision_function(X)
ind = np.argsort(ypred)
err = np.cumsum(y[ind]) + np.sum(y==y.min())
#plt.figure()
#plt.plot(ypred[ind], err) # Error
iintercept = np.argmin(err)
if iintercept < y.size-1:
self.intercept_ = -0.5*(ypred[ind[iintercept]] + ypred[ind[iintercept+1]])
else:
self.intercept_ = -ypred[ind[iintercept]]
return self
def decision_function(self, X):
# Compute decisions
# Todo
# End todo
return decisions
def predict(self, X):
# Compute predictions
# Todo
# End todo
return predictions
# Answer
$$ X|Y=1 \sim \mathcal N(0, I) \qquad \text{and} \qquad X|Y=-1 \sim 0.5 \mathcal N\left(\begin{pmatrix} 5 \\ 3 \end{pmatrix}, I\right) + 0.5 \mathcal N\left(\begin{pmatrix} 8 \\ 9 \end{pmatrix}, I\right) \quad \text{(non-Gaussian class)}. $$We consider that
Compare LDA and logistic regression.
# Answer
What about with this dataset (class $-1$ is Gaussian but with an outlier)?
# Dataset
X, Y = sample_classif(weight=.5,
param1=dict(mu=[0, 0], sigma1=1, sigma2=1, theta=0),
param2=dict(mu=[5, 3], sigma1=1, sigma2=1, theta=0),
n=100)
X[np.argmin(Y)] = np.random.randn(2) + 20
# Answer
We consider the dataset defined below.
# Dataset
X, Y = sample_classif(weight=.3,
param1=dict(mu=[0, 0], sigma1=10, sigma2=1, theta=np.pi/6),
param2=dict(mu=[0, 0], sigma1=1, sigma2=1, theta=0),
n=100)
X_ng, Y_ng = sample_classif(weight=0.5,
param1=dict(mu=[5, 3], sigma1=3, sigma2=10, theta=np.pi/6),
param2=dict(mu=[-5, -2], sigma1=3, sigma2=10, theta=np.pi/10),
n=X.shape[0])
X[Y==-1] = X_ng[:np.sum(Y==-1)]
plotXY(X, Y)
Fit an Adaboost classifier with $100$ weak learners and the algorithm SAMME.
Map the classifier regions on a figure.
# Answer
Plot on a new figure the estimator errors (attribute
estimator_errors_
).
What do you observe?
# Answer
Load the dataset digits.
How many observations, covariates and classes has it? Split the dataset into two equally sized subsets (one for training, the other for testin, i.e. estimating the true error).
# Answer
Plot the train and test errors of both algorithms SAMME and SAMME.R with respect to the number of iterations (from 1 to 200) for the dataset digits.
For this purpose, use DecisionTreeClassifier(max_depth=5)
as base learner.
# Answer