Notebook

Implementing a Neural Network¶

In this exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.

In [1]:

# A bit of setup

import numpy as np
import matplotlib.pyplot as plt

from cs231n.classifiers.neural_net import TwoLayerNet

%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2

def rel_error(x, y):
  """ returns relative error """
  return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))

We will use the class TwoLayerNet in the file cs231n/classifiers/neural_net.py to represent instances of our network. The network parameters are stored in the instance variable self.params where keys are string parameter names and values are numpy arrays. Below, we initialize toy data and a toy model that we will use to develop your implementation.

In [2]:

# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.

input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5

def init_toy_model():
  np.random.seed(0)
  return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)

def init_toy_data():
  np.random.seed(1)
  X = 10 * np.random.randn(num_inputs, input_size)
  y = np.array([0, 1, 2, 2, 1])
  return X, y

net = init_toy_model()
X, y = init_toy_data()

Forward pass: compute scores¶

Open the file cs231n/classifiers/neural_net.py and look at the method TwoLayerNet.loss. This function is very similar to the loss functions you have written for the SVM and Softmax exercises: It takes the data and weights and computes the class scores, the loss, and the gradients on the parameters.

Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.

In [3]:

scores = net.loss(X)
print 'Your scores:'
print scores
print
print 'correct scores:'
correct_scores = np.asarray([
  [-0.81233741, -1.27654624, -0.70335995],
  [-0.17129677, -1.18803311, -0.47310444],
  [-0.51590475, -1.01354314, -0.8504215 ],
  [-0.15419291, -0.48629638, -0.52901952],
  [-0.00618733, -0.12435261, -0.15226949]])
print correct_scores
print

# The difference should be very small. We get < 1e-7
print 'Difference between your scores and correct scores:'
print np.sum(np.abs(scores - correct_scores))

Your scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

correct scores:
[[-0.81233741 -1.27654624 -0.70335995]
 [-0.17129677 -1.18803311 -0.47310444]
 [-0.51590475 -1.01354314 -0.8504215 ]
 [-0.15419291 -0.48629638 -0.52901952]
 [-0.00618733 -0.12435261 -0.15226949]]

Difference between your scores and correct scores:
3.68027207103e-08

Forward pass: compute loss¶

In the same function, implement the second part that computes the data and regularizaion loss.

In [4]:

loss, _ = net.loss(X, y, reg=0.1)
correct_loss = 1.30378789133

# should be very small, we get < 1e-12
print 'Difference between your loss and correct loss:'
print np.sum(np.abs(loss - correct_loss))

Difference between your loss and correct loss:
1.79412040779e-13

Backward pass¶

Implement the rest of the function. This will compute the gradient of the loss with respect to the variables W1, b1, W2, and b2. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:

In [5]:

from cs231n.gradient_check import eval_numerical_gradient

# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.

loss, grads = net.loss(X, y, reg=0.1)

# these should all be less than 1e-8 or so
for param_name in grads:
  f = lambda W: net.loss(X, y, reg=0.1)[0]
  param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
  print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))

W1 max relative error: 3.669857e-09
W2 max relative error: 3.440708e-09
b2 max relative error: 4.447677e-11
b1 max relative error: 2.738421e-09

Train the network¶

To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function TwoLayerNet.train and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement TwoLayerNet.predict, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.

Once you have implemented the method, run the code below to train a two-layer network on toy data. You should achieve a training loss less than 0.2.

In [6]:

net = init_toy_model()
stats = net.train(X, y, X, y,
            learning_rate=1e-1, reg=1e-5,
            num_iters=100, verbose=False)

print 'Final training loss: ', stats['loss_history'][-1]

# plot the loss history
plt.plot(stats['loss_history'])
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()

Final training loss:  0.0171496079387

Load the data¶

Now that you have implemented a two-layer network that passes gradient checks and works on toy data, it's time to load up our favorite CIFAR-10 data so we can use it to train a classifier on a real dataset.

In [7]:

from cs231n.data_utils import load_CIFAR10

def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
    """
    Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
    it for the two-layer neural net classifier. These are the same steps as
    we used for the SVM, but condensed to a single function.  
    """
    # Load the raw CIFAR-10 data
    cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
    X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
        
    # Subsample the data
    mask = range(num_training, num_training + num_validation)
    X_val = X_train[mask]
    y_val = y_train[mask]
    mask = range(num_training)
    X_train = X_train[mask]
    y_train = y_train[mask]
    mask = range(num_test)
    X_test = X_test[mask]
    y_test = y_test[mask]

    # Normalize the data: subtract the mean image
    mean_image = np.mean(X_train, axis=0)
    X_train -= mean_image
    X_val -= mean_image
    X_test -= mean_image

    # Reshape data to rows
    X_train = X_train.reshape(num_training, -1)
    X_val = X_val.reshape(num_validation, -1)
    X_test = X_test.reshape(num_test, -1)

    return X_train, y_train, X_val, y_val, X_test, y_test


# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape

Train data shape:  (49000, 3072)
Train labels shape:  (49000,)
Validation data shape:  (1000, 3072)
Validation labels shape:  (1000,)
Test data shape:  (1000, 3072)
Test labels shape:  (1000,)

Train a network¶

To train our network we will use SGD with momentum. In addition, we will adjust the learning rate with an exponential learning rate schedule as optimization proceeds; after each epoch, we will reduce the learning rate by multiplying it by a decay rate.

In [8]:

input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)

# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
            num_iters=1000, batch_size=200,
            learning_rate=1e-4, learning_rate_decay=0.95,
            reg=0.5, verbose=True)

# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print 'Validation accuracy: ', val_acc

iteration 0 / 1000: loss 2.302954
iteration 100 / 1000: loss 2.302550
iteration 200 / 1000: loss 2.297648
iteration 300 / 1000: loss 2.259602
iteration 400 / 1000: loss 2.204170
iteration 500 / 1000: loss 2.118565
iteration 600 / 1000: loss 2.051535
iteration 700 / 1000: loss 1.988466
iteration 800 / 1000: loss 2.006591
iteration 900 / 1000: loss 1.951473
Validation accuracy:  0.287

Debug the training¶

With the default parameters we provided above, you should get a validation accuracy of about 0.29 on the validation set. This isn't very good.

One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.

Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.

In [9]:

# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')

plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()

In [10]:

from cs231n.vis_utils import visualize_grid

# Visualize the weights of the network

def show_net_weights(net):
  W1 = net.params['W1']
  W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
  plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
  plt.gca().axis('off')
  plt.show()

show_net_weights(net)

Tune your hyperparameters¶

What's wrong?. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.

Tuning. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.

Approximate results. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.

Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can, with a fully-connected Neural Network. For every 1% above 52% on the Test set we will award you with one extra bonus point. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).

In [ ]:

best_net = None # store the best model into this 

#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained  #
# model in best_net.                                                            #
#                                                                               #
# To help debug your network, it may help to use visualizations similar to the  #
# ones we used above; these visualizations will have significant qualitative    #
# differences from the ones we saw above for the poorly tuned network.          #
#                                                                               #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to  #
# write code to sweep through possible combinations of hyperparameters          #
# automatically like we did on the previous exercises.                          #
#################################################################################
best_val = -1
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
learning_rates = [1e-3, 1e-4, 5e-4, 1e-5, 5e-5]
regularization_strengths = [1e-5, 1e-4, 1e-3]
results = {}
for rate in learning_rates:
    for strength in regularization_strengths:
        net = TwoLayerNet(input_size, hidden_size, num_classes)
        stats = net.train(X_train, y_train, X_val, y_val,
                          num_iters=4000, batch_size=1000,
                          learning_rate=rate, learning_rate_decay=0.95,
                          reg=strength, verbose=True)
        learning_accuracy = np.mean(net.predict(X_train) == y_train)
        validation_accuracy = np.mean(net.predict(X_val) == y_val)
        print rate, strength, learning_accuracy, validation_accuracy
        if validation_accuracy > best_val:
            best_val = validation_accuracy
            best_net = net
        results[(rate, strength)] = (learning_accuracy, validation_accuracy)

# Print out results.
for lr, reg in sorted(results):
    train_accuracy, val_accuracy = results[(lr, reg)]
    print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
        lr, reg, train_accuracy, val_accuracy)

print 'best validation accuracy achieved during cross-validation: %f' % best_val
        
#################################################################################
#                               END OF YOUR CODE                                #
#################################################################################

iteration 0 / 4000: loss 2.302583
iteration 100 / 4000: loss 1.991260
iteration 200 / 4000: loss 1.775281
iteration 300 / 4000: loss 1.685591
iteration 400 / 4000: loss 1.636356
iteration 500 / 4000: loss 1.580622
iteration 600 / 4000: loss 1.524742
iteration 700 / 4000: loss 1.504215
iteration 800 / 4000: loss 1.518042
iteration 900 / 4000: loss 1.440999
iteration 1000 / 4000: loss 1.484518
iteration 1100 / 4000: loss 1.445671
iteration 1200 / 4000: loss 1.501369
iteration 1300 / 4000: loss 1.497817
iteration 1400 / 4000: loss 1.465981
iteration 1500 / 4000: loss 1.465990
iteration 1600 / 4000: loss 1.413547
iteration 1700 / 4000: loss 1.409602
iteration 1800 / 4000: loss 1.420250
iteration 1900 / 4000: loss 1.363839
iteration 2000 / 4000: loss 1.380997
iteration 2100 / 4000: loss 1.362503
iteration 2200 / 4000: loss 1.371016
iteration 2300 / 4000: loss 1.385116
iteration 2400 / 4000: loss 1.413836
iteration 2500 / 4000: loss 1.336518
iteration 2600 / 4000: loss 1.388470
iteration 2700 / 4000: loss 1.366604
iteration 2800 / 4000: loss 1.321478
iteration 2900 / 4000: loss 1.341668
iteration 3000 / 4000: loss 1.380588
iteration 3100 / 4000: loss 1.372859
iteration 3200 / 4000: loss 1.383042
iteration 3300 / 4000: loss 1.356748
iteration 3400 / 4000: loss 1.389328
iteration 3500 / 4000: loss 1.393579
iteration 3600 / 4000: loss 1.309941
iteration 3700 / 4000: loss 1.471550
iteration 3800 / 4000: loss 1.363161
iteration 3900 / 4000: loss 1.419592
iteration 0 / 4000: loss 2.302579
iteration 100 / 4000: loss 1.953719
iteration 200 / 4000: loss 1.767800
iteration 300 / 4000: loss 1.652734
iteration 400 / 4000: loss 1.622537
iteration 500 / 4000: loss 1.560656
iteration 600 / 4000: loss 1.554930
iteration 700 / 4000: loss 1.486025
iteration 800 / 4000: loss 1.487059
iteration 900 / 4000: loss 1.403388
iteration 1000 / 4000: loss 1.445652
iteration 1100 / 4000: loss 1.581631
iteration 1200 / 4000: loss 1.466240
iteration 1300 / 4000: loss 1.453394
iteration 1400 / 4000: loss 1.451255
iteration 1500 / 4000: loss 1.450092
iteration 1600 / 4000: loss 1.364694
iteration 1700 / 4000: loss 1.429886
iteration 1800 / 4000: loss 1.425794
iteration 1900 / 4000: loss 1.430862
iteration 2000 / 4000: loss 1.401274
iteration 2100 / 4000: loss 1.413930
iteration 2200 / 4000: loss 1.363495
iteration 2300 / 4000: loss 1.375438
iteration 2400 / 4000: loss 1.464351
iteration 2500 / 4000: loss 1.439878
iteration 2600 / 4000: loss 1.398308
iteration 2700 / 4000: loss 1.340024
iteration 2800 / 4000: loss 1.368054
iteration 2900 / 4000: loss 1.369013
iteration 3000 / 4000: loss 1.355864
iteration 3100 / 4000: loss 1.375931
iteration 3200 / 4000: loss 1.322514
iteration 3300 / 4000: loss 1.364359
iteration 3400 / 4000: loss 1.374361
iteration 3500 / 4000: loss 1.388159
iteration 3600 / 4000: loss 1.367475
iteration 3700 / 4000: loss 1.426817
iteration 3800 / 4000: loss 1.402098
iteration 3900 / 4000: loss 1.340263
iteration 0 / 4000: loss 2.302593
iteration 100 / 4000: loss 1.979821
iteration 200 / 4000: loss 1.793122
iteration 300 / 4000: loss 1.652701
iteration 400 / 4000: loss 1.668863
iteration 500 / 4000: loss 1.583747
iteration 600 / 4000: loss 1.591026
iteration 700 / 4000: loss 1.534686
iteration 800 / 4000: loss 1.513759
iteration 900 / 4000: loss 1.468964
iteration 1000 / 4000: loss 1.455680
iteration 1100 / 4000: loss 1.437290
iteration 1200 / 4000: loss 1.446390
iteration 1300 / 4000: loss 1.431270
iteration 1400 / 4000: loss 1.433307
iteration 1500 / 4000: loss 1.407405
iteration 1600 / 4000: loss 1.399802
iteration 1700 / 4000: loss 1.422070
iteration 1800 / 4000: loss 1.434525
iteration 1900 / 4000: loss 1.381993
iteration 2000 / 4000: loss 1.383511
iteration 2100 / 4000: loss 1.374440
iteration 2200 / 4000: loss 1.420267
iteration 2300 / 4000: loss 1.371797
iteration 2400 / 4000: loss 1.377633
iteration 2500 / 4000: loss 1.381026
iteration 2600 / 4000: loss 1.438926
iteration 2700 / 4000: loss 1.324090
iteration 2800 / 4000: loss 1.331116
iteration 2900 / 4000: loss 1.366819
iteration 3000 / 4000: loss 1.392021
iteration 3100 / 4000: loss 1.378007
iteration 3200 / 4000: loss 1.378634
iteration 3300 / 4000: loss 1.351343
iteration 3400 / 4000: loss 1.397372
iteration 3500 / 4000: loss 1.363175
iteration 3600 / 4000: loss 1.358633
iteration 3700 / 4000: loss 1.427065
iteration 3800 / 4000: loss 1.417329
iteration 3900 / 4000: loss 1.354736
iteration 0 / 4000: loss 2.302604
iteration 100 / 4000: loss 2.302159
iteration 200 / 4000: loss 2.299237
iteration 300 / 4000: loss 2.280363
iteration 400 / 4000: loss 2.222496
iteration 500 / 4000: loss 2.188501
iteration 600 / 4000: loss 2.154259
iteration 700 / 4000: loss 2.102640
iteration 800 / 4000: loss 2.092599
iteration 900 / 4000: loss 2.059757
iteration 1000 / 4000: loss 2.033296
iteration 1100 / 4000: loss 2.049668
iteration 1200 / 4000: loss 2.033248
iteration 1300 / 4000: loss 2.030999
iteration 1400 / 4000: loss 2.005833
iteration 1500 / 4000: loss 1.986753
iteration 1600 / 4000: loss 2.010519
iteration 1700 / 4000: loss 1.976191
iteration 1800 / 4000: loss 1.978606
iteration 1900 / 4000: loss 1.989667
iteration 2000 / 4000: loss 1.989466
iteration 2100 / 4000: loss 1.991465
iteration 2200 / 4000: loss 1.974892
iteration 2300 / 4000: loss 1.937419
iteration 2400 / 4000: loss 1.924777
iteration 2500 / 4000: loss 1.941007
iteration 2600 / 4000: loss 1.951206
iteration 2700 / 4000: loss 1.974104
iteration 2800 / 4000: loss 1.957977
iteration 2900 / 4000: loss 1.946540
iteration 3000 / 4000: loss 1.899332
iteration 3100 / 4000: loss 1.945131
iteration 3200 / 4000: loss 1.947768
iteration 3300 / 4000: loss 1.926949
iteration 3400 / 4000: loss 1.926096
iteration 3500 / 4000: loss 1.986117
iteration 3600 / 4000: loss 1.919890
iteration 3700 / 4000: loss 1.951008
iteration 3800 / 4000: loss 1.946074
iteration 3900 / 4000: loss 1.940454
iteration 0 / 4000: loss 2.302577
iteration 100 / 4000: loss 2.302041
iteration 200 / 4000: loss 2.298161
iteration 300 / 4000: loss 2.276714
iteration 400 / 4000: loss 2.227625
iteration 500 / 4000: loss 2.183338
iteration 600 / 4000: loss 2.152535
iteration 700 / 4000: loss 2.104467
iteration 800 / 4000: loss 2.102025
iteration 900 / 4000: loss 2.056442
iteration 1000 / 4000: loss 2.053660
iteration 1100 / 4000: loss 2.058710
iteration 1200 / 4000: loss 2.027170
iteration 1300 / 4000: loss 1.995004
iteration 1400 / 4000: loss 1.982414
iteration 1500 / 4000: loss 2.023218
iteration 1600 / 4000: loss 1.969810
iteration 1700 / 4000: loss 1.956546
iteration 1800 / 4000: loss 1.987838
iteration 1900 / 4000: loss 1.956324
iteration 2000 / 4000: loss 1.994102
iteration 2100 / 4000: loss 1.934595
iteration 2200 / 4000: loss 1.952064
iteration 2300 / 4000: loss 1.972918
iteration 2400 / 4000: loss 1.966378
iteration 2500 / 4000: loss 1.968966
iteration 2600 / 4000: loss 1.954609
iteration 2700 / 4000: loss 1.990847
iteration 2800 / 4000: loss 1.972864
iteration 2900 / 4000: loss 1.918136
iteration 3000 / 4000: loss 1.993241
iteration 3100 / 4000: loss 1.948581
iteration 3200 / 4000: loss 1.941726
iteration 3300 / 4000: loss 1.967197
iteration 3400 / 4000: loss 1.980261
iteration 3500 / 4000: loss 1.953736
iteration 3600 / 4000: loss 1.922499
iteration 3700 / 4000: loss 1.928117
iteration 3800 / 4000: loss 1.920787
iteration 3900 / 4000: loss 1.964791
iteration 0 / 4000: loss 2.302583
iteration 100 / 4000: loss 2.302186
iteration 200 / 4000: loss 2.299911
iteration 300 / 4000: loss 2.289584
iteration 400 / 4000: loss 2.240959
iteration 500 / 4000: loss 2.194659
iteration 600 / 4000: loss 2.147484
iteration 700 / 4000: loss 2.105903
iteration 800 / 4000: loss 2.077687
iteration 900 / 4000: loss 2.070988
iteration 1000 / 4000: loss 2.057552
iteration 1100 / 4000: loss 2.054473
iteration 1200 / 4000: loss 2.014233
iteration 1300 / 4000: loss 1.980386
iteration 1400 / 4000: loss 2.034329
iteration 1500 / 4000: loss 1.991258
iteration 1600 / 4000: loss 2.011089
iteration 1700 / 4000: loss 1.960384
iteration 1800 / 4000: loss 1.968818
iteration 1900 / 4000: loss 1.955082
iteration 2000 / 4000: loss 1.969226
iteration 2100 / 4000: loss 1.996775
iteration 2200 / 4000: loss 1.976854
iteration 2300 / 4000: loss 1.962374
iteration 2400 / 4000: loss 1.935568
iteration 2500 / 4000: loss 2.000669
iteration 2600 / 4000: loss 1.934205
iteration 2700 / 4000: loss 1.888278
iteration 2800 / 4000: loss 1.924791
iteration 2900 / 4000: loss 1.936377
iteration 3000 / 4000: loss 1.950651
iteration 3100 / 4000: loss 1.922994
iteration 3200 / 4000: loss 1.964252
iteration 3300 / 4000: loss 1.931348
iteration 3400 / 4000: loss 1.943417
iteration 3500 / 4000: loss 1.942194
iteration 3600 / 4000: loss 1.931666
iteration 3700 / 4000: loss 1.937381
iteration 3800 / 4000: loss 1.972889
iteration 3900 / 4000: loss 1.906907
iteration 0 / 4000: loss 2.302578
iteration 100 / 4000: loss 2.146089
iteration 200 / 4000: loss 1.961279
iteration 300 / 4000: loss 1.821556
iteration 400 / 4000: loss 1.785176
iteration 500 / 4000: loss 1.745873
iteration 600 / 4000: loss 1.723188
iteration 700 / 4000: loss 1.699136
iteration 800 / 4000: loss 1.656032
iteration 900 / 4000: loss 1.633901
iteration 1000 / 4000: loss 1.640450
iteration 1100 / 4000: loss 1.613893
iteration 1200 / 4000: loss 1.646916
iteration 1300 / 4000: loss 1.584392
iteration 1400 / 4000: loss 1.593374
iteration 1500 / 4000: loss 1.570534
iteration 1600 / 4000: loss 1.593367
iteration 1700 / 4000: loss 1.547839
iteration 1800 / 4000: loss 1.559422
iteration 1900 / 4000: loss 1.631229
iteration 2000 / 4000: loss 1.549596
iteration 2100 / 4000: loss 1.515912
iteration 2200 / 4000: loss 1.588543
iteration 2300 / 4000: loss 1.562227
iteration 2400 / 4000: loss 1.531671
iteration 2500 / 4000: loss 1.550642
iteration 2600 / 4000: loss 1.492808
iteration 2700 / 4000: loss 1.511688
iteration 2800 / 4000: loss 1.533244
iteration 2900 / 4000: loss 1.520584
iteration 3000 / 4000: loss 1.554804
iteration 3100 / 4000: loss 1.544910
iteration 3200 / 4000: loss 1.565163
iteration 3300 / 4000: loss 1.463295
iteration 3400 / 4000: loss 1.500455
iteration 3500 / 4000: loss 1.508728
iteration 3600 / 4000: loss 1.600252
iteration 3700 / 4000: loss 1.528342
iteration 3800 / 4000: loss 1.521581
iteration 3900 / 4000: loss 1.482106
iteration 0 / 4000: loss 2.302574
iteration 100 / 4000: loss 2.105837
iteration 200 / 4000: loss 1.936087
iteration 300 / 4000: loss 1.850773
iteration 400 / 4000: loss 1.809952
iteration 500 / 4000: loss 1.760602
iteration 600 / 4000: loss 1.700470
iteration 700 / 4000: loss 1.669701
iteration 800 / 4000: loss 1.715360
iteration 900 / 4000: loss 1.625905
iteration 1000 / 4000: loss 1.612379
iteration 1100 / 4000: loss 1.684409
iteration 1200 / 4000: loss 1.580097
iteration 1300 / 4000: loss 1.584557
iteration 1400 / 4000: loss 1.611433
iteration 1500 / 4000: loss 1.558695
iteration 1600 / 4000: loss 1.592766
iteration 1700 / 4000: loss 1.572762
iteration 1800 / 4000: loss 1.538258
iteration 1900 / 4000: loss 1.553126
iteration 2000 / 4000: loss 1.557430
iteration 2100 / 4000: loss 1.555179
iteration 2200 / 4000: loss 1.532023
iteration 2300 / 4000: loss 1.516409
iteration 2400 / 4000: loss 1.507545
iteration 2500 / 4000: loss 1.572807
iteration 2600 / 4000: loss 1.545169
iteration 2700 / 4000: loss 1.579024
iteration 2800 / 4000: loss 1.540756
iteration 2900 / 4000: loss 1.545628
iteration 3000 / 4000: loss 1.561428
iteration 3100 / 4000: loss 1.566211
iteration 3200 / 4000: loss 1.502305
iteration 3300 / 4000: loss 1.617657
iteration 3400 / 4000: loss 1.574238
iteration 3500 / 4000: loss 1.511990
iteration 3600 / 4000: loss 1.548783
iteration 3700 / 4000: loss 1.590014
iteration 3800 / 4000: loss 1.505331
iteration 3900 / 4000: loss 1.544370
iteration 0 / 4000: loss 2.302594
iteration 100 / 4000: loss 2.140417
iteration 200 / 4000: loss 1.934765
iteration 300 / 4000: loss 1.870320
iteration 400 / 4000: loss 1.824629

In [ ]:

# visualize the weights of the best network
show_net_weights(best_net)

Run on the test set¶

When you are done experimenting, you should evaluate your final trained network on the test set; you should get above 48%.

We will give you extra bonus point for every 1% of accuracy above 52%.

In [ ]:

test_acc = (best_net.predict(X_test) == y_test).mean()
print 'Test accuracy: ', test_acc