A4 Classifying Hand-Drawn Digits

Overview

For this assignment you will apply a neural network classifier to the problem of classifying handwritten digits. You will use the provided code of the classifier that is implemented using pytorch. You will experiment with various parameter values and describe your results.

If you are planning to use pytorch on the workstations installed in the Department of Computer Science, you must execute this command

export PYTHONPATH=$PYTHONPATH:/usr/local/anaconda/lib/python3.6/site-packages/

A better solution is to add it to your startup script, such as .bashrc.

Provided Code and Data

Download A4.zip and unzip it. You should have two files, neuralnetworks_pytorch.py and mnist.pkl.gz.

Load the data using the following code.

In [1]:
import numpy as np
import gzip
import pickle
In [2]:
with gzip.open('mnist.pkl.gz', 'rb') as f:
    train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
    
Xtrain = train_set[0].reshape((-1, 1, 28, 28))
Ttrain = train_set[1]
Xtest = test_set[0].reshape((-1, 1, 28, 28))
Ttest = test_set[1]

Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape
Out[2]:
((50000, 1, 28, 28), (50000,), (10000, 1, 28, 28), (10000,))

The second dimension of Xtrain and Xtest is 1, representing the number of values, or channels, in each pixel. These images are gray scale so have just one intensity.

Import the provided neural network code.

In [1]:
import neuralnetworks_pytorch as nn

This code defines the class NeuralNetworkClassifier_Pytorch. The constructor for this class accepts the arguments

  • n_inputs: (int) number of input components, which is the number of channels for a convolutional net, or the total number of pixels for a fully connected network
  • n_hiddens_by_layer: (list of ints) number of units in each hidden layer
  • n_outputs: (int) number of classes in the data
  • relu: (boolean, default False) if True, relu is used as the activation function. If False, the activation function is tanh
  • gpu: (boolean, default False) If True and this machine has a compatible GPU, run the network on the GPU
  • n_conv_layers: (int, default 0) 0 to create all layers as fully connected, else create this many convolutional layers as the initial layers in the network
  • windows: (list of ints, default [ ]) if all layers are fully connected, this should be empty. If network contains convolutional layers this must be a list of length equal to n_conv_layers, with an int for each layer specifying the height and width of the convolution window
  • strides: (list of ints, default [ ]) if all layers are fully connected, this should be empty. If network contains convolutional layers this must be a list of length equal to n_conv_layers, with an int for each layer specifying the horizontal and vertical stride ofthe convolution window
  • input_height_width: (int or None, default value) height and width of input image but only needed for convolutional network

Then train this neural network using the train function

  • Xtrain: (np.ndarray of floats) training samples along first dimension
  • Ttrain: (np.ndarray of longs) one-dimensional vector of integers indicating class of each training sample
  • Xtest: (np.ndarray of floats) testing samples along first dimension
  • Ttest: (np.ndarray of longs) one-dimensional vector of integers indicating class of each testing sample,
  • n_iterations: (int) number of optimization steps, sometimes called epochs
  • batch_size: (int) number of samples in each batch to calculate gradient for and update all weights
  • learning_rate: (float) factor multiplying gradient to determine step size

Once a neural net is created, with a line like

 nnet = nn.NeuralNetworkClassifier_Pytorch(1, [10, 20, 5], 10, 
           n_conv_layers=2, windows=[5, 7], strides=[1, 2], input_height_width=28)

it can be trained with a line like

 nnet.train(Xtrain, Ttrain, Xtest, Ttest, 200, 100, 0.001)

and predictions are made with

 classes, probs = nnet.use(Xtrain)

where classes are the predicted classes for each sample and prob is the probability of each class for each sample.

To determine the precent of predicted classes that are correct, use the following function.

In [4]:
def percent_correct(actual, predicted):
    return 100 * np.mean(actual == predicted)

Required Section 1: Fully-Connected Networks (40 points)

  1. (15 points) Using batch_size of 100, learning_rate of 0.001 and n_hiddens_by_layer of [20, 20, 20], try a variety of n_iterations values and plot the percent of testing data correctly classified versus n_iterations.

  2. (15 points) Using the best value of n_iterations, try at least five different values of n_hiddens_by_layer and plot the percent of testing data correctly classified.

  3. (10 points) Describe what you see in your plots with at least two sentences for each plot.

Required Section 2: Convolutional Networks (40 points)

  1. (15 points) Using batch_size of 100, learning_rate of 0.001, n_hiddens_by_layer of [20, 20, 20], n_iterations of 10, n_conv_layers of 2, and try several values of windows and of strides, and plot the percent of testing data correctly classified versus windows and strides.

  2. (15 points) Try several more variations of n_hiddens_by_layer, n_iterations, n_conv_layers, windows and strides.

  3. (10 points) Describe what you see in your plots in 1., and the variations you see in 2. with at least two sentences for each.

Required Section 3: Cost and Acc (20 points)

In the output from the train function we see two values, one called cost and one called acc. What is the meaning of each and why are their values so different? Study the code to help you answer this question.

ANSWER: (20 points) (type your answer here)

Extra Credit

Compare performance and training times for some of the parameter variations used in your main report when run on a GPU versus a CPU.