In this exercise we will develop a neural network with fully-connected layers to perform classification, and test it out on the CIFAR-10 dataset.
# A bit of setup
import numpy as np
import matplotlib.pyplot as plt
from cs231n.classifiers.neural_net import TwoLayerNet
%matplotlib inline
plt.rcParams['figure.figsize'] = (10.0, 8.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'
# for auto-reloading external modules
# see http://stackoverflow.com/questions/1907993/autoreload-of-modules-in-ipython
%load_ext autoreload
%autoreload 2
def rel_error(x, y):
""" returns relative error """
return np.max(np.abs(x - y) / (np.maximum(1e-8, np.abs(x) + np.abs(y))))
We will use the class TwoLayerNet
in the file cs231n/classifiers/neural_net.py
to represent instances of our network. The network parameters are stored in the instance variable self.params
where keys are string parameter names and values are numpy arrays. Below, we initialize toy data and a toy model that we will use to develop your implementation.
# Create a small net and some toy data to check your implementations.
# Note that we set the random seed for repeatable experiments.
input_size = 4
hidden_size = 10
num_classes = 3
num_inputs = 5
def init_toy_model():
np.random.seed(0)
return TwoLayerNet(input_size, hidden_size, num_classes, std=1e-1)
def init_toy_data():
np.random.seed(1)
X = 10 * np.random.randn(num_inputs, input_size)
y = np.array([0, 1, 2, 2, 1])
return X, y
net = init_toy_model()
X, y = init_toy_data()
Open the file cs231n/classifiers/neural_net.py
and look at the method TwoLayerNet.loss
. This function is very similar to the loss functions you have written for the SVM and Softmax exercises: It takes the data and weights and computes the class scores, the loss, and the gradients on the parameters.
Implement the first part of the forward pass which uses the weights and biases to compute the scores for all inputs.
scores = net.loss(X)
print 'Your scores:'
print scores
print
print 'correct scores:'
correct_scores = np.asarray([
[-0.81233741, -1.27654624, -0.70335995],
[-0.17129677, -1.18803311, -0.47310444],
[-0.51590475, -1.01354314, -0.8504215 ],
[-0.15419291, -0.48629638, -0.52901952],
[-0.00618733, -0.12435261, -0.15226949]])
print correct_scores
print
# The difference should be very small. We get < 1e-7
print 'Difference between your scores and correct scores:'
print np.sum(np.abs(scores - correct_scores))
Your scores: [[-0.81233741 -1.27654624 -0.70335995] [-0.17129677 -1.18803311 -0.47310444] [-0.51590475 -1.01354314 -0.8504215 ] [-0.15419291 -0.48629638 -0.52901952] [-0.00618733 -0.12435261 -0.15226949]] correct scores: [[-0.81233741 -1.27654624 -0.70335995] [-0.17129677 -1.18803311 -0.47310444] [-0.51590475 -1.01354314 -0.8504215 ] [-0.15419291 -0.48629638 -0.52901952] [-0.00618733 -0.12435261 -0.15226949]] Difference between your scores and correct scores: 3.68027207103e-08
In the same function, implement the second part that computes the data and regularizaion loss.
loss, _ = net.loss(X, y, reg=0.1)
correct_loss = 1.30378789133
# should be very small, we get < 1e-12
print 'Difference between your loss and correct loss:'
print np.sum(np.abs(loss - correct_loss))
Difference between your loss and correct loss: 1.79412040779e-13
Implement the rest of the function. This will compute the gradient of the loss with respect to the variables W1
, b1
, W2
, and b2
. Now that you (hopefully!) have a correctly implemented forward pass, you can debug your backward pass using a numeric gradient check:
from cs231n.gradient_check import eval_numerical_gradient
# Use numeric gradient checking to check your implementation of the backward pass.
# If your implementation is correct, the difference between the numeric and
# analytic gradients should be less than 1e-8 for each of W1, W2, b1, and b2.
loss, grads = net.loss(X, y, reg=0.1)
# these should all be less than 1e-8 or so
for param_name in grads:
f = lambda W: net.loss(X, y, reg=0.1)[0]
param_grad_num = eval_numerical_gradient(f, net.params[param_name], verbose=False)
print '%s max relative error: %e' % (param_name, rel_error(param_grad_num, grads[param_name]))
W1 max relative error: 3.669857e-09 W2 max relative error: 3.440708e-09 b2 max relative error: 4.447677e-11 b1 max relative error: 2.738421e-09
To train the network we will use stochastic gradient descent (SGD), similar to the SVM and Softmax classifiers. Look at the function TwoLayerNet.train
and fill in the missing sections to implement the training procedure. This should be very similar to the training procedure you used for the SVM and Softmax classifiers. You will also have to implement TwoLayerNet.predict
, as the training process periodically performs prediction to keep track of accuracy over time while the network trains.
Once you have implemented the method, run the code below to train a two-layer network on toy data. You should achieve a training loss less than 0.2.
net = init_toy_model()
stats = net.train(X, y, X, y,
learning_rate=1e-1, reg=1e-5,
num_iters=100, verbose=False)
print 'Final training loss: ', stats['loss_history'][-1]
# plot the loss history
plt.plot(stats['loss_history'])
plt.xlabel('iteration')
plt.ylabel('training loss')
plt.title('Training Loss history')
plt.show()
Final training loss: 0.0171496079387
Now that you have implemented a two-layer network that passes gradient checks and works on toy data, it's time to load up our favorite CIFAR-10 data so we can use it to train a classifier on a real dataset.
from cs231n.data_utils import load_CIFAR10
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=1000):
"""
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the two-layer neural net classifier. These are the same steps as
we used for the SVM, but condensed to a single function.
"""
# Load the raw CIFAR-10 data
cifar10_dir = 'cs231n/datasets/cifar-10-batches-py'
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# Subsample the data
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
# Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis=0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
# Reshape data to rows
X_train = X_train.reshape(num_training, -1)
X_val = X_val.reshape(num_validation, -1)
X_test = X_test.reshape(num_test, -1)
return X_train, y_train, X_val, y_val, X_test, y_test
# Invoke the above function to get our data.
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print 'Train data shape: ', X_train.shape
print 'Train labels shape: ', y_train.shape
print 'Validation data shape: ', X_val.shape
print 'Validation labels shape: ', y_val.shape
print 'Test data shape: ', X_test.shape
print 'Test labels shape: ', y_test.shape
Train data shape: (49000, 3072) Train labels shape: (49000,) Validation data shape: (1000, 3072) Validation labels shape: (1000,) Test data shape: (1000, 3072) Test labels shape: (1000,)
To train our network we will use SGD with momentum. In addition, we will adjust the learning rate with an exponential learning rate schedule as optimization proceeds; after each epoch, we will reduce the learning rate by multiplying it by a decay rate.
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
net = TwoLayerNet(input_size, hidden_size, num_classes)
# Train the network
stats = net.train(X_train, y_train, X_val, y_val,
num_iters=1000, batch_size=200,
learning_rate=1e-4, learning_rate_decay=0.95,
reg=0.5, verbose=True)
# Predict on the validation set
val_acc = (net.predict(X_val) == y_val).mean()
print 'Validation accuracy: ', val_acc
iteration 0 / 1000: loss 2.302954 iteration 100 / 1000: loss 2.302550 iteration 200 / 1000: loss 2.297648 iteration 300 / 1000: loss 2.259602 iteration 400 / 1000: loss 2.204170 iteration 500 / 1000: loss 2.118565 iteration 600 / 1000: loss 2.051535 iteration 700 / 1000: loss 1.988466 iteration 800 / 1000: loss 2.006591 iteration 900 / 1000: loss 1.951473 Validation accuracy: 0.287
With the default parameters we provided above, you should get a validation accuracy of about 0.29 on the validation set. This isn't very good.
One strategy for getting insight into what's wrong is to plot the loss function and the accuracies on the training and validation sets during optimization.
Another strategy is to visualize the weights that were learned in the first layer of the network. In most neural networks trained on visual data, the first layer weights typically show some visible structure when visualized.
# Plot the loss function and train / validation accuracies
plt.subplot(2, 1, 1)
plt.plot(stats['loss_history'])
plt.title('Loss history')
plt.xlabel('Iteration')
plt.ylabel('Loss')
plt.subplot(2, 1, 2)
plt.plot(stats['train_acc_history'], label='train')
plt.plot(stats['val_acc_history'], label='val')
plt.title('Classification accuracy history')
plt.xlabel('Epoch')
plt.ylabel('Clasification accuracy')
plt.show()
from cs231n.vis_utils import visualize_grid
# Visualize the weights of the network
def show_net_weights(net):
W1 = net.params['W1']
W1 = W1.reshape(32, 32, 3, -1).transpose(3, 0, 1, 2)
plt.imshow(visualize_grid(W1, padding=3).astype('uint8'))
plt.gca().axis('off')
plt.show()
show_net_weights(net)
What's wrong?. Looking at the visualizations above, we see that the loss is decreasing more or less linearly, which seems to suggest that the learning rate may be too low. Moreover, there is no gap between the training and validation accuracy, suggesting that the model we used has low capacity, and that we should increase its size. On the other hand, with a very large model we would expect to see more overfitting, which would manifest itself as a very large gap between the training and validation accuracy.
Tuning. Tuning the hyperparameters and developing intuition for how they affect the final performance is a large part of using Neural Networks, so we want you to get a lot of practice. Below, you should experiment with different values of the various hyperparameters, including hidden layer size, learning rate, numer of training epochs, and regularization strength. You might also consider tuning the learning rate decay, but you should be able to get good performance using the default value.
Approximate results. You should be aim to achieve a classification accuracy of greater than 48% on the validation set. Our best network gets over 52% on the validation set.
Experiment: You goal in this exercise is to get as good of a result on CIFAR-10 as you can, with a fully-connected Neural Network. For every 1% above 52% on the Test set we will award you with one extra bonus point. Feel free implement your own techniques (e.g. PCA to reduce dimensionality, or adding dropout, or adding features to the solver, etc.).
best_net = None # store the best model into this
#################################################################################
# TODO: Tune hyperparameters using the validation set. Store your best trained #
# model in best_net. #
# #
# To help debug your network, it may help to use visualizations similar to the #
# ones we used above; these visualizations will have significant qualitative #
# differences from the ones we saw above for the poorly tuned network. #
# #
# Tweaking hyperparameters by hand can be fun, but you might find it useful to #
# write code to sweep through possible combinations of hyperparameters #
# automatically like we did on the previous exercises. #
#################################################################################
best_val = -1
input_size = 32 * 32 * 3
hidden_size = 50
num_classes = 10
learning_rates = [1e-3, 1e-4, 5e-4, 1e-5, 5e-5]
regularization_strengths = [1e-5, 1e-4, 1e-3]
results = {}
for rate in learning_rates:
for strength in regularization_strengths:
net = TwoLayerNet(input_size, hidden_size, num_classes)
stats = net.train(X_train, y_train, X_val, y_val,
num_iters=4000, batch_size=1000,
learning_rate=rate, learning_rate_decay=0.95,
reg=strength, verbose=True)
learning_accuracy = np.mean(net.predict(X_train) == y_train)
validation_accuracy = np.mean(net.predict(X_val) == y_val)
print rate, strength, learning_accuracy, validation_accuracy
if validation_accuracy > best_val:
best_val = validation_accuracy
best_net = net
results[(rate, strength)] = (learning_accuracy, validation_accuracy)
# Print out results.
for lr, reg in sorted(results):
train_accuracy, val_accuracy = results[(lr, reg)]
print 'lr %e reg %e train accuracy: %f val accuracy: %f' % (
lr, reg, train_accuracy, val_accuracy)
print 'best validation accuracy achieved during cross-validation: %f' % best_val
#################################################################################
# END OF YOUR CODE #
#################################################################################
iteration 0 / 4000: loss 2.302583 iteration 100 / 4000: loss 1.991260 iteration 200 / 4000: loss 1.775281 iteration 300 / 4000: loss 1.685591 iteration 400 / 4000: loss 1.636356 iteration 500 / 4000: loss 1.580622 iteration 600 / 4000: loss 1.524742 iteration 700 / 4000: loss 1.504215 iteration 800 / 4000: loss 1.518042 iteration 900 / 4000: loss 1.440999 iteration 1000 / 4000: loss 1.484518 iteration 1100 / 4000: loss 1.445671 iteration 1200 / 4000: loss 1.501369 iteration 1300 / 4000: loss 1.497817 iteration 1400 / 4000: loss 1.465981 iteration 1500 / 4000: loss 1.465990 iteration 1600 / 4000: loss 1.413547 iteration 1700 / 4000: loss 1.409602 iteration 1800 / 4000: loss 1.420250 iteration 1900 / 4000: loss 1.363839 iteration 2000 / 4000: loss 1.380997 iteration 2100 / 4000: loss 1.362503 iteration 2200 / 4000: loss 1.371016 iteration 2300 / 4000: loss 1.385116 iteration 2400 / 4000: loss 1.413836 iteration 2500 / 4000: loss 1.336518 iteration 2600 / 4000: loss 1.388470 iteration 2700 / 4000: loss 1.366604 iteration 2800 / 4000: loss 1.321478 iteration 2900 / 4000: loss 1.341668 iteration 3000 / 4000: loss 1.380588 iteration 3100 / 4000: loss 1.372859 iteration 3200 / 4000: loss 1.383042 iteration 3300 / 4000: loss 1.356748 iteration 3400 / 4000: loss 1.389328 iteration 3500 / 4000: loss 1.393579 iteration 3600 / 4000: loss 1.309941 iteration 3700 / 4000: loss 1.471550 iteration 3800 / 4000: loss 1.363161 iteration 3900 / 4000: loss 1.419592 iteration 0 / 4000: loss 2.302579 iteration 100 / 4000: loss 1.953719 iteration 200 / 4000: loss 1.767800 iteration 300 / 4000: loss 1.652734 iteration 400 / 4000: loss 1.622537 iteration 500 / 4000: loss 1.560656 iteration 600 / 4000: loss 1.554930 iteration 700 / 4000: loss 1.486025 iteration 800 / 4000: loss 1.487059 iteration 900 / 4000: loss 1.403388 iteration 1000 / 4000: loss 1.445652 iteration 1100 / 4000: loss 1.581631 iteration 1200 / 4000: loss 1.466240 iteration 1300 / 4000: loss 1.453394 iteration 1400 / 4000: loss 1.451255 iteration 1500 / 4000: loss 1.450092 iteration 1600 / 4000: loss 1.364694 iteration 1700 / 4000: loss 1.429886 iteration 1800 / 4000: loss 1.425794 iteration 1900 / 4000: loss 1.430862 iteration 2000 / 4000: loss 1.401274 iteration 2100 / 4000: loss 1.413930 iteration 2200 / 4000: loss 1.363495 iteration 2300 / 4000: loss 1.375438 iteration 2400 / 4000: loss 1.464351 iteration 2500 / 4000: loss 1.439878 iteration 2600 / 4000: loss 1.398308 iteration 2700 / 4000: loss 1.340024 iteration 2800 / 4000: loss 1.368054 iteration 2900 / 4000: loss 1.369013 iteration 3000 / 4000: loss 1.355864 iteration 3100 / 4000: loss 1.375931 iteration 3200 / 4000: loss 1.322514 iteration 3300 / 4000: loss 1.364359 iteration 3400 / 4000: loss 1.374361 iteration 3500 / 4000: loss 1.388159 iteration 3600 / 4000: loss 1.367475 iteration 3700 / 4000: loss 1.426817 iteration 3800 / 4000: loss 1.402098 iteration 3900 / 4000: loss 1.340263 iteration 0 / 4000: loss 2.302593 iteration 100 / 4000: loss 1.979821 iteration 200 / 4000: loss 1.793122 iteration 300 / 4000: loss 1.652701 iteration 400 / 4000: loss 1.668863 iteration 500 / 4000: loss 1.583747 iteration 600 / 4000: loss 1.591026 iteration 700 / 4000: loss 1.534686 iteration 800 / 4000: loss 1.513759 iteration 900 / 4000: loss 1.468964 iteration 1000 / 4000: loss 1.455680 iteration 1100 / 4000: loss 1.437290 iteration 1200 / 4000: loss 1.446390 iteration 1300 / 4000: loss 1.431270 iteration 1400 / 4000: loss 1.433307 iteration 1500 / 4000: loss 1.407405 iteration 1600 / 4000: loss 1.399802 iteration 1700 / 4000: loss 1.422070 iteration 1800 / 4000: loss 1.434525 iteration 1900 / 4000: loss 1.381993 iteration 2000 / 4000: loss 1.383511 iteration 2100 / 4000: loss 1.374440 iteration 2200 / 4000: loss 1.420267 iteration 2300 / 4000: loss 1.371797 iteration 2400 / 4000: loss 1.377633 iteration 2500 / 4000: loss 1.381026 iteration 2600 / 4000: loss 1.438926 iteration 2700 / 4000: loss 1.324090 iteration 2800 / 4000: loss 1.331116 iteration 2900 / 4000: loss 1.366819 iteration 3000 / 4000: loss 1.392021 iteration 3100 / 4000: loss 1.378007 iteration 3200 / 4000: loss 1.378634 iteration 3300 / 4000: loss 1.351343 iteration 3400 / 4000: loss 1.397372 iteration 3500 / 4000: loss 1.363175 iteration 3600 / 4000: loss 1.358633 iteration 3700 / 4000: loss 1.427065 iteration 3800 / 4000: loss 1.417329 iteration 3900 / 4000: loss 1.354736 iteration 0 / 4000: loss 2.302604 iteration 100 / 4000: loss 2.302159 iteration 200 / 4000: loss 2.299237 iteration 300 / 4000: loss 2.280363 iteration 400 / 4000: loss 2.222496 iteration 500 / 4000: loss 2.188501 iteration 600 / 4000: loss 2.154259 iteration 700 / 4000: loss 2.102640 iteration 800 / 4000: loss 2.092599 iteration 900 / 4000: loss 2.059757 iteration 1000 / 4000: loss 2.033296 iteration 1100 / 4000: loss 2.049668 iteration 1200 / 4000: loss 2.033248 iteration 1300 / 4000: loss 2.030999 iteration 1400 / 4000: loss 2.005833 iteration 1500 / 4000: loss 1.986753 iteration 1600 / 4000: loss 2.010519 iteration 1700 / 4000: loss 1.976191 iteration 1800 / 4000: loss 1.978606 iteration 1900 / 4000: loss 1.989667 iteration 2000 / 4000: loss 1.989466 iteration 2100 / 4000: loss 1.991465 iteration 2200 / 4000: loss 1.974892 iteration 2300 / 4000: loss 1.937419 iteration 2400 / 4000: loss 1.924777 iteration 2500 / 4000: loss 1.941007 iteration 2600 / 4000: loss 1.951206 iteration 2700 / 4000: loss 1.974104 iteration 2800 / 4000: loss 1.957977 iteration 2900 / 4000: loss 1.946540 iteration 3000 / 4000: loss 1.899332 iteration 3100 / 4000: loss 1.945131 iteration 3200 / 4000: loss 1.947768 iteration 3300 / 4000: loss 1.926949 iteration 3400 / 4000: loss 1.926096 iteration 3500 / 4000: loss 1.986117 iteration 3600 / 4000: loss 1.919890 iteration 3700 / 4000: loss 1.951008 iteration 3800 / 4000: loss 1.946074 iteration 3900 / 4000: loss 1.940454 iteration 0 / 4000: loss 2.302577 iteration 100 / 4000: loss 2.302041 iteration 200 / 4000: loss 2.298161 iteration 300 / 4000: loss 2.276714 iteration 400 / 4000: loss 2.227625 iteration 500 / 4000: loss 2.183338 iteration 600 / 4000: loss 2.152535 iteration 700 / 4000: loss 2.104467 iteration 800 / 4000: loss 2.102025 iteration 900 / 4000: loss 2.056442 iteration 1000 / 4000: loss 2.053660 iteration 1100 / 4000: loss 2.058710 iteration 1200 / 4000: loss 2.027170 iteration 1300 / 4000: loss 1.995004 iteration 1400 / 4000: loss 1.982414 iteration 1500 / 4000: loss 2.023218 iteration 1600 / 4000: loss 1.969810 iteration 1700 / 4000: loss 1.956546 iteration 1800 / 4000: loss 1.987838 iteration 1900 / 4000: loss 1.956324 iteration 2000 / 4000: loss 1.994102 iteration 2100 / 4000: loss 1.934595 iteration 2200 / 4000: loss 1.952064 iteration 2300 / 4000: loss 1.972918 iteration 2400 / 4000: loss 1.966378 iteration 2500 / 4000: loss 1.968966 iteration 2600 / 4000: loss 1.954609 iteration 2700 / 4000: loss 1.990847 iteration 2800 / 4000: loss 1.972864 iteration 2900 / 4000: loss 1.918136 iteration 3000 / 4000: loss 1.993241 iteration 3100 / 4000: loss 1.948581 iteration 3200 / 4000: loss 1.941726 iteration 3300 / 4000: loss 1.967197 iteration 3400 / 4000: loss 1.980261 iteration 3500 / 4000: loss 1.953736 iteration 3600 / 4000: loss 1.922499 iteration 3700 / 4000: loss 1.928117 iteration 3800 / 4000: loss 1.920787 iteration 3900 / 4000: loss 1.964791 iteration 0 / 4000: loss 2.302583 iteration 100 / 4000: loss 2.302186 iteration 200 / 4000: loss 2.299911 iteration 300 / 4000: loss 2.289584 iteration 400 / 4000: loss 2.240959 iteration 500 / 4000: loss 2.194659 iteration 600 / 4000: loss 2.147484 iteration 700 / 4000: loss 2.105903 iteration 800 / 4000: loss 2.077687 iteration 900 / 4000: loss 2.070988 iteration 1000 / 4000: loss 2.057552 iteration 1100 / 4000: loss 2.054473 iteration 1200 / 4000: loss 2.014233 iteration 1300 / 4000: loss 1.980386 iteration 1400 / 4000: loss 2.034329 iteration 1500 / 4000: loss 1.991258 iteration 1600 / 4000: loss 2.011089 iteration 1700 / 4000: loss 1.960384 iteration 1800 / 4000: loss 1.968818 iteration 1900 / 4000: loss 1.955082 iteration 2000 / 4000: loss 1.969226 iteration 2100 / 4000: loss 1.996775 iteration 2200 / 4000: loss 1.976854 iteration 2300 / 4000: loss 1.962374 iteration 2400 / 4000: loss 1.935568 iteration 2500 / 4000: loss 2.000669 iteration 2600 / 4000: loss 1.934205 iteration 2700 / 4000: loss 1.888278 iteration 2800 / 4000: loss 1.924791 iteration 2900 / 4000: loss 1.936377 iteration 3000 / 4000: loss 1.950651 iteration 3100 / 4000: loss 1.922994 iteration 3200 / 4000: loss 1.964252 iteration 3300 / 4000: loss 1.931348 iteration 3400 / 4000: loss 1.943417 iteration 3500 / 4000: loss 1.942194 iteration 3600 / 4000: loss 1.931666 iteration 3700 / 4000: loss 1.937381 iteration 3800 / 4000: loss 1.972889 iteration 3900 / 4000: loss 1.906907 iteration 0 / 4000: loss 2.302578 iteration 100 / 4000: loss 2.146089 iteration 200 / 4000: loss 1.961279 iteration 300 / 4000: loss 1.821556 iteration 400 / 4000: loss 1.785176 iteration 500 / 4000: loss 1.745873 iteration 600 / 4000: loss 1.723188 iteration 700 / 4000: loss 1.699136 iteration 800 / 4000: loss 1.656032 iteration 900 / 4000: loss 1.633901 iteration 1000 / 4000: loss 1.640450 iteration 1100 / 4000: loss 1.613893 iteration 1200 / 4000: loss 1.646916 iteration 1300 / 4000: loss 1.584392 iteration 1400 / 4000: loss 1.593374 iteration 1500 / 4000: loss 1.570534 iteration 1600 / 4000: loss 1.593367 iteration 1700 / 4000: loss 1.547839 iteration 1800 / 4000: loss 1.559422 iteration 1900 / 4000: loss 1.631229 iteration 2000 / 4000: loss 1.549596 iteration 2100 / 4000: loss 1.515912 iteration 2200 / 4000: loss 1.588543 iteration 2300 / 4000: loss 1.562227 iteration 2400 / 4000: loss 1.531671 iteration 2500 / 4000: loss 1.550642 iteration 2600 / 4000: loss 1.492808 iteration 2700 / 4000: loss 1.511688 iteration 2800 / 4000: loss 1.533244 iteration 2900 / 4000: loss 1.520584 iteration 3000 / 4000: loss 1.554804 iteration 3100 / 4000: loss 1.544910 iteration 3200 / 4000: loss 1.565163 iteration 3300 / 4000: loss 1.463295 iteration 3400 / 4000: loss 1.500455 iteration 3500 / 4000: loss 1.508728 iteration 3600 / 4000: loss 1.600252 iteration 3700 / 4000: loss 1.528342 iteration 3800 / 4000: loss 1.521581 iteration 3900 / 4000: loss 1.482106 iteration 0 / 4000: loss 2.302574 iteration 100 / 4000: loss 2.105837 iteration 200 / 4000: loss 1.936087 iteration 300 / 4000: loss 1.850773 iteration 400 / 4000: loss 1.809952 iteration 500 / 4000: loss 1.760602 iteration 600 / 4000: loss 1.700470 iteration 700 / 4000: loss 1.669701 iteration 800 / 4000: loss 1.715360 iteration 900 / 4000: loss 1.625905 iteration 1000 / 4000: loss 1.612379 iteration 1100 / 4000: loss 1.684409 iteration 1200 / 4000: loss 1.580097 iteration 1300 / 4000: loss 1.584557 iteration 1400 / 4000: loss 1.611433 iteration 1500 / 4000: loss 1.558695 iteration 1600 / 4000: loss 1.592766 iteration 1700 / 4000: loss 1.572762 iteration 1800 / 4000: loss 1.538258 iteration 1900 / 4000: loss 1.553126 iteration 2000 / 4000: loss 1.557430 iteration 2100 / 4000: loss 1.555179 iteration 2200 / 4000: loss 1.532023 iteration 2300 / 4000: loss 1.516409 iteration 2400 / 4000: loss 1.507545 iteration 2500 / 4000: loss 1.572807 iteration 2600 / 4000: loss 1.545169 iteration 2700 / 4000: loss 1.579024 iteration 2800 / 4000: loss 1.540756 iteration 2900 / 4000: loss 1.545628 iteration 3000 / 4000: loss 1.561428 iteration 3100 / 4000: loss 1.566211 iteration 3200 / 4000: loss 1.502305 iteration 3300 / 4000: loss 1.617657 iteration 3400 / 4000: loss 1.574238 iteration 3500 / 4000: loss 1.511990 iteration 3600 / 4000: loss 1.548783 iteration 3700 / 4000: loss 1.590014 iteration 3800 / 4000: loss 1.505331 iteration 3900 / 4000: loss 1.544370 iteration 0 / 4000: loss 2.302594 iteration 100 / 4000: loss 2.140417 iteration 200 / 4000: loss 1.934765 iteration 300 / 4000: loss 1.870320 iteration 400 / 4000: loss 1.824629
# visualize the weights of the best network
show_net_weights(best_net)
When you are done experimenting, you should evaluate your final trained network on the test set; you should get above 48%.
We will give you extra bonus point for every 1% of accuracy above 52%.
test_acc = (best_net.predict(X_test) == y_test).mean()
print 'Test accuracy: ', test_acc