In this session, we will review some concepts from probability, linear algebra and multivariate calculus (essentially gradient descent) and start deriving some of the simplest models in machine learning
1a. Write a function that takes a 2d symmetric matrix as input and return the largest eigenvalue.
# put your code here
1b Do the same with arbitrary rectangular matrices and return the largest singular value
# put your code here
1b Modify the function above and write a second function that now takes two arguments. The first argument is a matrix $X$ with singular value decomposition $(\Sigma, U, V)$. The second argument is an integer $K$. The function should then output the matrix $X_{\text{approx}}$ defined from the singular value decomposition of your input as
$$X_{\text{approx}} = \sum_{i=1}^K \sigma_i u_iv_i^T$$# put your code here
1c Consider the image shown below. Using the fact that every image is a 2D matrix. Apply your function to this image for various values of $K$. How large do you need to take $K$ to recover an clear image?
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import numpy as np
img=mpimg.imread('subject03.gif')
noisy = np.random.normal(img, 10)
imgplot = plt.imshow(noisy,cmap='gray')
plt.show()
# put your code here
1d Same question but now change the level of noise in the image.
# put your code here
3. MAP and MLE Consider a univariate Gaussian distribution
Pick up a value for the variance and a value for the mean. Then generate a few samples $\left\{x_i\right\}$ from the distribution.
Start by deriving the log-likelihood function. Plot this function for various values of $\mu$ at $\sigma$ fixed. Then vary $\sigma$ and generate a plot for each value of $\sigma$. How do the plot compare to each other?
As a second step, compute the estimators for $\sigma$ and $\mu$. Repeat the estimation for various draws of samples $\left\{x_i\right\}_{i=1}^n$ (keep the size of the sample fixed)
# put the code here
3b From MLE to MAP: Coin flipping
As a second exercise, we consider flipping a coin. The distribution of heads (resp. tails) can be modelled as a Bernoulli in this case. We thus have
$$p(x|\theta) = \theta^x(1 - \theta)^{1-x}$$for $x = 0$ or $1$.
# put your code here
Derive the expression for the MLE and compute its value from the samples you generated above. Justify your developments.
Still using your parameter $\theta$, imagine that you get the following sequence
What would be the MLE in this case? Does that seems reasonable ?
# put your code here
# put your code here
# put your code here
4a. Towards regression We now want to extend the previous result to a multivariate normal distribution.
for some $\boldsymbol b\in\mathbb{R}^n$ and biais $b_0$ that you can choose as you want.
What is the distribution of the $y_i$ ? Find the expression of the likelihood function in this case. How about the log-likelihood?
How can you compute the MLE?
# put your code here
4b From theory to practice Download and open (use pandas) the following salary dataset
Extract the sequences correspondind to the salaries and years of experience.
Start by plotting the data with the function scatter from pyplot.
We will use the linear gaussian model that we discussed above and assume that the salary can be expressed as directly proportionnal to the number of years of experience. I.e.
we further assume that the samples are independent. Using the MLE, derive an expression for the regression coefficients $\beta_1$ and $\beta_0$. What problem do you need to solve to get $\beta_1$ and $\beta_0$ ? (Hint, start by deriving the expression for $\beta_0$ and $\beta_1$ by computing the derivatives and setting them to $0$. Then substitute the expression for $\beta_0$ into the expression for $\beta_1$ and solve this last expression.)
# put your code here
4c. From univariate to multivariate
5. Gradient descent: Walking down the hill
As we have seen in class, except when there are very simple and can be solved explicitely, the parameters of most models in machine learning are set through optimization. As a first exercise, we will consider a simple model. Let us first consider the simple linear regression landscape
Exercise 5.a. Consider the following function whose plot is given below. You can uncomment the first line to activate interactive plot on jupyter and use the "plt.ioff() %matplotlib inline" cell below to turn the interactive mode off
$$f(x,y) = 3(1-x)^2 e^{-(x^2) - (y+1)^2}- 10(x/5 - x^3 - y^5)e^{-x^2-y^2}- \frac{1}{3} e^{-(x+1)^2 - y^2}$$#%matplotlib notebook # use to activate interactive plotting
import numpy as np
import matplotlib.pyplot as plt
xx, yy = np.meshgrid(x, y, indexing='xy')
z = 3*((1-xx)**2) * np.exp(-(xx**2) - (yy+1)**2) \
- 10*(xx/5 - xx**3 - yy**5) * np.exp(-xx**2 - yy**2)- (1/3)* np.exp(-(xx+1)**2 - yy**2)
from matplotlib import cm
from mpl_toolkits.mplot3d import Axes3D
from mpl_toolkits.mplot3d import axes3d
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.plot_surface(xx, yy, z, cmap=cm.viridis,alpha=.4,
linewidth=0, antialiased=False)
plt.show()
plt.ioff()
%matplotlib inline
Code a simple gradient algorithm (see corresponding note on the website). Set the parameters as follows
Then start your algorithm at
And plot the iterations on top of the function (you can use the 3D plotting tools from matplolib or the simpler 'contourf' function). What do you see ?