from mllab import *
from sklearn import datasets as data
Packages: nympy as np matplotlib.pyplot as plt seaborn as sns Functions: plotXY plot_frontiere map_regions covariance plot_cov sample_gmm scatter plot_level_set gaussian_sample
$$ \begin{cases} \pi_1 &= 0.33\\ \mu_1 &= (0, 0), \end{cases} $$$$ \begin{cases} \pi_2 &= 0.33\\ \mu_2 &= (5, 0), \end{cases} $$$$ \begin{cases} \pi_3 &= 0.34\\ \mu_3 &= (2, -5), \end{cases} $$Draw a sample of size 200 from a Gaussian mixture model with parameters
and with same identity covariance matrix. Plot the "contours" of the three clusters and their centers.
# Answer
Complete the following emplementation of soft k-means.
# Answer
class SoftKMeans(object):
def __init__(self, n_components=1, n_iter=100):
self.n_components = n_components
self.n_iter = n_iter
self.weights_ = None
self.means_ = None
self.covariances_ = None
self.log_likelihood_ = None
self.em_log_likelihood_ = None
def fit(self, X):
# Initialization
n_components = self.n_components
# List of initial weights, means and covariances
# (initial means can be taken at random among the trainin points)
# To do
# End to do
# Multivariate Gaussian pdf
def pdf(X, mean, cov):
invcov = np.linalg.inv(cov + 1e-6*np.eye(cov.shape[0]))
r = np.exp( -0.5*np.diag((X-mean) @ invcov @ (X-mean).T) )
r *= np.sqrt(np.linalg.det(invcov/(2*np.pi)))
return r
# Loop
log_likelihood = [] # Marginal log-likelihood at each iteration
em_log_likelihood = [] # Average joint log-likelihood at each iteration
# Compute the matrix of joint density values (size #components x #points)
# and update weights, means and covariances
for it in range(self.n_iter):
# Parameter update
# To do
# End to do
# Log-likelihoods computation
# To do
# End to do
self.weights_ = np.array(weights)
self.means_ = np.array(means)
self.covariances_ = np.array(covariances)
self.log_likelihood_ = log_likelihood
self.em_log_likelihood_ = em_log_likelihood
Fit a soft k-means with 3 components and 20 iterations on the data.
Print the prior probabilities. Plot the training dataset along with the means and the covariance matrices estimated.
Are the results consistent with the way the data has been generated?
# Answer
Plot the two log-likelihoods versus the number of iterations.
Is the marginal log-likelihood non-decreasing? Is it bounded from below by the average joint log-likelihood?
# Answer
With the help of the Gaussian mixture, estimate the parameters of a 3-componenents Gaussian mixture.
Print the prior probabilities and the maximal value of log-likelihood. Plot the training dataset along with the means and the covariance matrices estimated.
Are the results consistent with the your own implementation?
# Answer
Repeat the estimation several (let us say 9) times.
Are the results stable?
# Answer
What if initial parameters are set at random (look for the suitable parameter of Gaussian mixture)?
# Answer
Complete the following script in order to:
Analyze the results (there should be "unexpected" results).
# Answer
gmm = GaussianMixture(n_components=2)
for it in range(6):
plt.figure(figsize=(10, 3))
for it, (weights, means, covariances) in enumerate([
([0.5, 0.5], [[0, 0], [5, 0]], [(1, 1, 0), (1, 1, 0)]),
([0.05, 0.95], [[0, 0], [5, 0]], [(1, 1, 0), (1, 1, 0)]),
([0.5, 0.5], [[0, 0], [0, 0]], [(10, 1, 0), (1, 10, 0)]),
([0.5, 0.5], [[0, 0], [5, -5]], [(10, 1, 0), (1, 10, 0)])]):
X = sample_gm(weights, means, [covariance(*c) for c in covariances], size=100)
# To do
# End to do
Given the followin data, fit a Gaussian mixture.
Display the cluster centers along with the partitioning (use the function map_regions
).
(weights, means, covariances) = ([0.3, 0.2, 0.5], [[-5, -1], [5, 0], [2, -5]],
[(1, 5, np.pi/3), (1, 5, np.pi/3), (5, 1, np.pi/3)])
X = sample_gm(weights, means, [covariance(*c) for c in covariances], size=200)
# Answer
Do the same with k-means.
What is the difference?
# Answer
Given the following dataset, perform several k-means clustering with a random initialization (original version of k-means).
What do you observe?
(weights, means, covariances) = ([0.05, 0.2, 0.75], [[-5, -1], [5, 0], [2, -5]],
[(1, 5, np.pi/3), (1, 5, np.pi/3), (5, 1, np.pi/3)])
X = sample_gm(weights, means, [covariance(*c) for c in covariances], size=100)
# Answer
Here, we aim at analyzing Gaussian mixture and k-means for non-convex clusters.
For this purpose:
plotXY
;map_regions
) obtained with Gaussian mixture and k-means.What do you observe?
# Answer