For this assignment you will apply a neural network classifier to the problem of classifying handwritten digits. You will use the provided code of the classifier that is implemented using
pytorch. You will experiment with various parameter values and describe your results.
If you are planning to use
pytorch on the workstations installed in the Department of Computer Science, you must execute this command
A better solution is to add it to your startup script, such as
Download A4.zip and unzip it. You should have two files,
Load the data using the following code.
import numpy as np import gzip import pickle
with gzip.open('mnist.pkl.gz', 'rb') as f: train_set, valid_set, test_set = pickle.load(f, encoding='latin1') Xtrain = train_set.reshape((-1, 1, 28, 28)) Ttrain = train_set Xtest = test_set.reshape((-1, 1, 28, 28)) Ttest = test_set Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape
((50000, 1, 28, 28), (50000,), (10000, 1, 28, 28), (10000,))
The second dimension of
Xtest is 1, representing the number of values, or channels, in each pixel. These images are gray scale so have just one intensity.
Import the provided neural network code.
import neuralnetworks_pytorch as nn
This code defines the class
NeuralNetworkClassifier_Pytorch. The constructor for this class accepts the arguments
n_inputs: (int) number of input components, which is the number of channels for a convolutional net, or the total number of pixels for a fully connected network
n_hiddens_by_layer: (list of ints) number of units in each hidden layer
n_outputs: (int) number of classes in the data
relu: (boolean, default False) if True, relu is used as the activation function. If False, the activation function is tanh
gpu: (boolean, default False) If True and this machine has a compatible GPU, run the network on the GPU
n_conv_layers: (int, default 0) 0 to create all layers as fully connected, else create this many convolutional layers as the initial layers in the network
windows: (list of ints, default [ ]) if all layers are fully connected, this should be empty. If network contains convolutional layers this must be a list of length equal to
n_conv_layers, with an int for each layer specifying the height and width of the convolution window
strides: (list of ints, default [ ]) if all layers are fully connected, this should be empty. If network contains convolutional layers this must be a list of length equal to
n_conv_layers, with an int for each layer specifying the horizontal and vertical stride ofthe convolution window
input_height_width: (int or None, default value) height and width of input image but only needed for convolutional network
Then train this neural network using the
Xtrain: (np.ndarray of floats) training samples along first dimension
Ttrain: (np.ndarray of longs) one-dimensional vector of integers indicating class of each training sample
Xtest: (np.ndarray of floats) testing samples along first dimension
Ttest: (np.ndarray of longs) one-dimensional vector of integers indicating class of each testing sample,
n_iterations: (int) number of optimization steps, sometimes called epochs
batch_size: (int) number of samples in each batch to calculate gradient for and update all weights
learning_rate: (float) factor multiplying gradient to determine step size
Once a neural net is created, with a line like
nnet = nn.NeuralNetworkClassifier_Pytorch(1, [10, 20, 5], 10, n_conv_layers=2, windows=[5, 7], strides=[1, 2], input_height_width=28)
it can be trained with a line like
nnet.train(Xtrain, Ttrain, Xtest, Ttest, 200, 100, 0.001)
and predictions are made with
classes, probs = nnet.use(Xtrain)
classes are the predicted classes for each sample and
prob is the probability of each class for each sample.
To determine the precent of predicted classes that are correct, use the following function.
def percent_correct(actual, predicted): return 100 * np.mean(actual == predicted)
(15 points) Using
batch_size of 100,
learning_rate of 0.001 and
[20, 20, 20], try a variety of
n_iterations values and plot the percent of testing data correctly classified versus
(15 points) Using the best value of
n_iterations, try at least five different values of
n_hiddens_by_layer and plot the percent of testing data correctly classified.
(10 points) Describe what you see in your plots with at least two sentences for each plot.
(15 points) Using
batch_size of 100,
learning_rate of 0.001,
[20, 20, 20],
n_iterations of 10,
n_conv_layers of 2, and try several values of
windows and of
strides, and plot the percent of testing data correctly classified versus
(15 points) Try several more variations of
(10 points) Describe what you see in your plots in 1., and the variations you see in 2. with at least two sentences for each.
In the output from the
train function we see two values, one called
cost and one called
acc. What is the meaning of each and why are their values so different? Study the code to help you answer this question.
ANSWER: (20 points) (type your answer here)
Compare performance and training times for some of the parameter variations used in your main report when run on a GPU versus a CPU.