DS4420: Exercise on Dimensionality Reduction via PCA

Your name:

In general dimensionality reduction may be viewed as encoding data $x$ into latent (lower dimensional) representations $z$ that one can then use to recover, noisily, the original $x$'s.

One may view this as an encoding-decoding process, where we encode $x$ into $z$, and then decode $z$ into a reconstruction $\tilde{x}$. We saw that if our objective is to capture as much variance as possible in the compressed space, then we can use eigenvectors; namely the top $m$ eigenvectors (ranked by corresponding eigenvalues) may be used to create an $m$ dimensional representation of $x$. In this exercise, we see this in practice.

In [1]:
import numpy as np 
import matplotlib.pyplot as plt

import torch
# conda install -c pytorch torchvision
import torchvision

# note: if you cannot get torchvision installed 
# using the above sequence, you can resort to 
# the colab version here: 
# -- just be sure to download and then upload
# the notebook to blackboard when complete.
fMNIST = torchvision.datasets.FashionMNIST(
    root = './data/FashionMNIST',
    train = True,
    download = True)                                
In [2]:
fMNIST.data.shape
Out[2]:
torch.Size([60000, 28, 28])
In [3]:
from IPython.display import Image 
from matplotlib.pyplot import imshow
%matplotlib inline
imshow(np.asarray(fMNIST.data[6]), cmap='gray')
Out[3]:
<matplotlib.image.AxesImage at 0x7ff8fa4edc50>
In [4]:
X = fMNIST.data
X = np.array([x_i.flatten().numpy() for x_i in X])
X = X / 255 # normalize
X.shape
Out[4]:
(60000, 784)

TODO 1

Zero-mean $X$, and then construct the covariance matrix $S = X^t X$. What is the shape of $S$?

In [5]:
X_mu = X.mean(axis=0)
X = X - X_mu # zero-mean
S = (X.transpose() @ X)
S.shape
Out[5]:
(784, 784)

TODO 2

Use numpy.linalg.eigh to retrieve the top eigenvectors and eigenvalues of $S$. Then, take the top $M$ of these -- say $M=50$ -- and project $X$ into $Z$. How much variance is captured with this value of $M$?

In [6]:
eigvals, eigvecs = np.linalg.eigh(S)
In [188]:
M = 50
B = eigvecs[:, -M:]
B.shape
Out[188]:
(784, 50)
In [189]:
Z = X @ B
Z.shape
Out[189]:
(60000, 50)
In [190]:
sum(eigvals)
Out[190]:
4092975.6596677555
In [191]:
var_captured = np.sum(eigvals[-M:])/np.sum(eigvals)
print("{} components captures {}% of the variance".format(M, var_captured))
50 components captures 0.8626917002845209% of the variance

TODO 3

Plot the variance captured as a function of number of components. Remember; this is given by the sum of the first $k$ eigenvalues over the total sum of the eigenvalues. Based on this plot, what do you think is a 'good' number of components to pick?

In [192]:
total_var = np.sum(eigvals)
component_range = list(range(S.shape[0]))
vars_captured = np.array([np.sum(eigvals[-(k+1):])/total_var for k in component_range])
plt.plot(component_range, vars_captured)
Out[192]:
[<matplotlib.lines.Line2D at 0x7f81a2b4a390>]

TODO 4

Now "decode" an image back from $Z$ to $\tilde{X}$. Visualize it (you can follow the above code, using imshow).

In [193]:
X_tilde = Z @ B.transpose()
X_tilde.shape
Out[193]:
(60000, 784)
In [194]:
imshow(np.asarray(X_tilde[20]+X_mu).reshape((28,28)), cmap='gray')
Out[194]:
<matplotlib.image.AxesImage at 0x7f81c0469be0>
In [195]:
imshow(np.asarray(X[20] + X_mu).reshape(28,28), cmap='gray')
Out[195]:
<matplotlib.image.AxesImage at 0x7f81e404dc88>

Bonus 1: Scatter two-dimensional representation of two item types (classes).

Do they separate well in this reduced space?

In [197]:
y = fMNIST.train_labels.numpy()
#subset_indices = y[(y==1) | (y==7)]#np.argwhere((y==2 )|| (y==3))
ones = np.argwhere(y==1).flatten()
twos = np.argwhere(y==7).flatten()
In [198]:
Z1 = Z[ones]
Z2 = Z[twos]
Zc = np.concatenate((Z1, Z2))
colors = ["blue"] * Z1.shape[0] + ["red"] * Z2.shape[0]
In [199]:
Zc.shape
Out[199]:
(12000, 50)
In [200]:
plt.clf()
%matplotlib inline
plt.scatter(Zc[:,0], Zc[:,1], c=colors, alpha=.1)
Out[200]:
<matplotlib.collections.PathCollection at 0x7f81b1111470>

Bonus 2: Fun with Probabilistic PCA

In the MML reading, they derived that the posterior distribution over $z$ (the latent representation) given $x$ is:

$\mathcal{N}(m, C)$

Where:

$m = B^T (B B^T + \sigma^2I)^{-1} (x-u)$

$C = I - B^T(BB^T + \sigma^2I)^{-1}B$

In general we will assume $\sigma^2$ is known.

In [201]:
def sample_from_posterior_given_x(x, var=0.1, samples=10):
    B_term = B.transpose() @ np.linalg.inv((B @ B.transpose() + var * np.eye(B.shape[0])))
    m = B_term @ (x - X_mu)
    C = np.eye(B.shape[1]) - B_term @ B
    samples = np.array([np.random.multivariate_normal(m, C) for _ in range(samples)])
    return samples
In [209]:
samples = sample_from_posterior_given_x(X[1,:], var=1)
In [210]:
decoded_samples = samples @ B.transpose()
decoded_samples.shape
Out[210]:
(10, 784)
In [211]:
imshow(np.asarray(X[1] + X_mu).reshape(28,28), cmap='gray')
Out[211]:
<matplotlib.image.AxesImage at 0x7f818113b1d0>
In [214]:
imshow(np.asarray(decoded_samples[5] + X_mu).reshape(28,28), cmap='gray')
Out[214]:
<matplotlib.image.AxesImage at 0x7f8180335470>
In [ ]: