A7.1 Autoencoder for Classification

We have talked in lecture about how an Autoencoder nonlinearly reduces the dimensionality of data. In this assignment you will

  1. load an autoencoder network already trained in the MNIST data,
  2. apply it to the MNIST training set to obtain the outputs of the units in the bottleneck layer as a new representation of each training set image with a greatly reduced dimensionality,
  3. Train a fully-connected classification network on this new representation.
  4. Report on the percent of training and testing images correctly classified. Compare with the accuracy you get with the original images.

Download nn_torch.zip and extract the files.

In [35]:
import numpy as np
import matplotlib.pyplot as plt
import pandas
import pickle
import gzip
import torch
import neuralnetworks_torch as nntorch

First, let's load the MNIST data. You may download it here: mnist.pkl.gz.

In [36]:
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')

Xtrain = train_set[0]
Ttrain = train_set[1]

Xtest = test_set[0]
Ttest = test_set[1]

Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape
((50000, 784), (50000,), (10000, 784), (10000,))

To load the network saved in Lecture Notes 21, run the following code. This loads the saved torch neural network that was trained in a GPU. It loads the state of that net (its weights) into a new net of the same structure but allocated on the CPU.

First download mnist_autoencoder.pt.

In [43]:
n_in = Xtrain.shape[1]
n_hiddens_per_layer = [500, 100, 50, 50, 20, 50, 50, 100, 500]
nnet_autoencoder = nntorch.NeuralNetwork(n_in, n_hiddens_per_layer, n_in, device='cpu')
nnet_autoencoder.standardize = ''

nnet_autoencoder.load_state_dict(torch.load('mnist_autoencoder.pt', map_location=torch.device('cpu')))
<All keys matched successfully>

To get the output of the units in the middle hidden layer, run use_to_middle function implemented for you in neuralnetworks_torch.

In [44]:
Xtrain_reduced = nnet_autoencoder.use_to_middle(Xtrain)
(50000, 20)

And while we are here, let's get the reduced representation of Xtest also.

In [46]:
Xtest_reduced = nnet_autoencoder.use_to_middle(Xtest)
(10000, 20)


Your jobs are now to

  1. train one fully-connected classifier using Xtrain_reduced and Ttrain and test it with Xtest_reduced and Ttest, and
  2. train a second fully-connected classifier using Xtrain and Ttrain and test it with Xtest and Ttest.

Try to find parameters (hidden network structure, number of epochs, and learning rate) for which the classifier given the reduced representation does almost as well as the other classifier with the original data. Discuss your results.

Here is an example for part of Step 1. It shows a brief training session (small number of epochs and simple hidden layer structure) for using the reduced data.

In [54]:
n_in = Xtrain_reduced.shape[1]
reduced_classifier = nntorch.NeuralNetwork_Classifier(n_in, [50], 10, device='cpu')

n_epochs = 50
reduced_classifier.train(Xtrain_reduced, Ttrain, n_epochs, 0.01, method='adam', standardize='')

Classes, _ = reduced_classifier.use(Xtest_reduced)

def percent_correct(Predicted, Target):
    return 100 * np.mean(Predicted == Target)

print(f'% Correct  Ttest {percent_correct(Classes, Ttest):.2f}')
Epoch 5: RMSE 2.123
Epoch 10: RMSE 1.858
Epoch 15: RMSE 1.532
Epoch 20: RMSE 1.204
Epoch 25: RMSE 0.932
Epoch 30: RMSE 0.735
Epoch 35: RMSE 0.600
Epoch 40: RMSE 0.510
Epoch 45: RMSE 0.450
Epoch 50: RMSE 0.410
% Correct  Ttest 89.86

Extra Credit

For 1 point of extra credit repeat this assignment using a second data set, one that we have not used in class before. This will require you to to train a new autoencoder net to use for this part.