For this assignment you will apply a neural network classifier to the problem of classifying handwritten digits. You will use the provided code of the classifier that is implemented using `pytorch`

. You will experiment with various parameter values and describe your results.

If you are planning to use `pytorch`

on the workstations installed in the Department of Computer Science, you must execute this command

`export PYTHONPATH=$PYTHONPATH:/usr/local/anaconda/lib/python3.6/site-packages/`

A better solution is to add it to your startup script, such as `.bashrc`

.

Download A4.zip and unzip it. You should have two files, `neuralnetworks_pytorch.py`

and `mnist.pkl.gz`

.

Load the data using the following code.

In [1]:

```
import numpy as np
import gzip
import pickle
```

In [2]:

```
with gzip.open('mnist.pkl.gz', 'rb') as f:
train_set, valid_set, test_set = pickle.load(f, encoding='latin1')
Xtrain = train_set[0].reshape((-1, 1, 28, 28))
Ttrain = train_set[1]
Xtest = test_set[0].reshape((-1, 1, 28, 28))
Ttest = test_set[1]
Xtrain.shape, Ttrain.shape, Xtest.shape, Ttest.shape
```

Out[2]:

The second dimension of `Xtrain`

and `Xtest`

is 1, representing the number of values, or channels, in each pixel. These images are gray scale so have just one intensity.

Import the provided neural network code.

In [1]:

```
import neuralnetworks_pytorch as nn
```

This code defines the class `NeuralNetworkClassifier_Pytorch`

. The constructor for this class accepts the arguments

`n_inputs`

: (int) number of input components, which is the number of channels for a convolutional net, or the total number of pixels for a fully connected network`n_hiddens_by_layer`

: (list of ints) number of units in each hidden layer`n_outputs`

: (int) number of classes in the data`relu`

: (boolean, default False) if True, relu is used as the activation function. If False, the activation function is tanh`gpu`

: (boolean, default False) If True and this machine has a compatible GPU, run the network on the GPU`n_conv_layers`

: (int, default 0) 0 to create all layers as fully connected, else create this many convolutional layers as the initial layers in the network`windows`

: (list of ints, default [ ]) if all layers are fully connected, this should be empty. If network contains convolutional layers this must be a list of length equal to`n_conv_layers`

, with an int for each layer specifying the height and width of the convolution window`strides`

: (list of ints, default [ ]) if all layers are fully connected, this should be empty. If network contains convolutional layers this must be a list of length equal to`n_conv_layers`

, with an int for each layer specifying the horizontal and vertical stride ofthe convolution window`input_height_width`

: (int or None, default value) height and width of input image but only needed for convolutional network

Then train this neural network using the `train`

function

`Xtrain`

: (np.ndarray of floats) training samples along first dimension`Ttrain`

: (np.ndarray of longs) one-dimensional vector of integers indicating class of each training sample`Xtest`

: (np.ndarray of floats) testing samples along first dimension`Ttest`

: (np.ndarray of longs) one-dimensional vector of integers indicating class of each testing sample,`n_iterations`

: (int) number of optimization steps, sometimes called epochs`batch_size`

: (int) number of samples in each batch to calculate gradient for and update all weights`learning_rate`

: (float) factor multiplying gradient to determine step size

Once a neural net is created, with a line like

```
nnet = nn.NeuralNetworkClassifier_Pytorch(1, [10, 20, 5], 10,
n_conv_layers=2, windows=[5, 7], strides=[1, 2], input_height_width=28)
```

it can be trained with a line like

```
nnet.train(Xtrain, Ttrain, Xtest, Ttest, 200, 100, 0.001)
```

and predictions are made with

```
classes, probs = nnet.use(Xtrain)
```

where `classes`

are the predicted classes for each sample and `prob`

is the probability of each class for each sample.

To determine the precent of predicted classes that are correct, use the following function.

In [4]:

```
def percent_correct(actual, predicted):
return 100 * np.mean(actual == predicted)
```

(15 points) Using

`batch_size`

of 100,`learning_rate`

of 0.001 and`n_hiddens_by_layer`

of`[20, 20, 20]`

, try a variety of`n_iterations`

values and plot the percent of testing data correctly classified versus`n_iterations`

.(15 points) Using the best value of

`n_iterations`

, try at least five different values of`n_hiddens_by_layer`

and plot the percent of testing data correctly classified.(10 points) Describe what you see in your plots with at least two sentences for each plot.

(15 points) Using

`batch_size`

of 100,`learning_rate`

of 0.001,`n_hiddens_by_layer`

of`[20, 20, 20]`

,`n_iterations`

of 10,`n_conv_layers`

of 2, and try several values of`windows`

and of`strides`

, and plot the percent of testing data correctly classified versus`windows`

and`strides`

.(15 points) Try several more variations of

`n_hiddens_by_layer`

,`n_iterations`

,`n_conv_layers`

,`windows`

and`strides`

.(10 points) Describe what you see in your plots in 1., and the variations you see in 2. with at least two sentences for each.

In the output from the `train`

function we see two values, one called `cost`

and one called `acc`

. What is the meaning of each and why are their values so different? Study the code to help you answer this question.

**ANSWER:** (20 points) (type your answer here)

Compare performance and training times for some of the parameter variations used in your main report when run on a GPU versus a CPU.