Deadline: May 22, 9pm
Late Penalty: There is a penalty-free grace period of one hour past the deadline. Any work that is submitted between 1 hour and 24 hours past the deadline will receive a 20% grade deduction. No other late work is accepted. Quercus submission time will be used, not your local computer time. You can submit your labs as many times as you want before the deadline, so please submit often and early.
TA: Huan Ling
This lab is partially based on an assignment developed by Prof. Jonathan Rose and Harris Chan.
In this lab, you will train a convolutional neural network to classify an image into one of two classes: "cat" or "dog". The code for the neural networks you train will be written for you, and you are not (yet!) expected to understand all provided code. However, by the end of the lab, you should be able to:
Submit a PDF file containing all your code, outputs, and write-up from parts 1-5. You can produce a PDF of your Google Colab file by going to File > Print and then save as PDF. The Colab instructions has more information.
Do not submit any other files produced by your code.
Include a link to your colab file in your submission.
Please use Google Colab to complete this assignment. If you want to use Jupyter Notebook, please complete the assignment and upload your Jupyter Notebook file to Google Colab for submission.
With Colab, you can export a PDF file using the menu option
File -> Print
and save as PDF file.
import numpy as np
import time
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
import torchvision
from torch.utils.data.sampler import SubsetRandomSampler
import torchvision.transforms as transforms
We will be making use of the following helper functions. You will be asked to look at and possibly modify some of these, but you are not expected to understand all of them.
You should look at the function names and read the docstrings. If you are curious, come back and explore the code after making some progress on the lab.
###############################################################################
# Data Loading
def get_relevant_indices(dataset, classes, target_classes):
""" Return the indices for datapoints in the dataset that belongs to the
desired target classes, a subset of all possible classes.
Args:
dataset: Dataset object
classes: A list of strings denoting the name of each class
target_classes: A list of strings denoting the name of desired classes
Should be a subset of the 'classes'
Returns:
indices: list of indices that have labels corresponding to one of the
target classes
"""
indices = []
for i in range(len(dataset)):
# Check if the label is in the target classes
label_index = dataset[i][1] # ex: 3
label_class = classes[label_index] # ex: 'cat'
if label_class in target_classes:
indices.append(i)
return indices
def get_data_loader(target_classes, batch_size):
""" Returns the indices for datapoints in the dataset that
belongs to the desired target classes, a subset of all possible classes.
Args:
dataset: Dataset object
classes: A list of strings denoting the name of each class
target_classes: A list of strings denoting the name of the desired
classes. Should be a subset of the argument 'classes'
Returns:
indices: list of indices that have labels corresponding to one of the
target classes
"""
classes = ('plane', 'car', 'bird', 'cat',
'deer', 'dog', 'frog', 'horse', 'ship', 'truck')
########################################################################
# The output of torchvision datasets are PILImage images of range [0, 1].
# We transform them to Tensors of normalized range [-1, 1].
transform = transforms.Compose(
[transforms.ToTensor(),
transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])
trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
download=True, transform=transform)
# Get the list of indices to sample from
relevant_train_indices = get_relevant_indices(
trainset,
classes,
target_classes)
# Split into train and validation
np.random.seed(1000) # Fixed numpy random seed for reproducible shuffling
np.random.shuffle(relevant_train_indices)
split = int(len(relevant_train_indices) * 0.8)
relevant_train_indices, relevant_val_indices = relevant_train_indices[:split], relevant_train_indices[split:]
train_sampler = SubsetRandomSampler(relevant_train_indices)
train_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
num_workers=1, sampler=train_sampler)
val_sampler = SubsetRandomSampler(relevant_val_indices)
val_loader = torch.utils.data.DataLoader(trainset, batch_size=batch_size,
num_workers=1, sampler=val_sampler)
testset = torchvision.datasets.CIFAR10(root='./data', train=False,
download=True, transform=transform)
relevant_test_indices = get_relevant_indices(testset, classes, target_classes)
test_sampler = SubsetRandomSampler(relevant_test_indices)
test_loader = torch.utils.data.DataLoader(testset, batch_size=batch_size,
num_workers=1, sampler=test_sampler)
return train_loader, val_loader, test_loader, classes
###############################################################################
# Training
def get_model_name(name, batch_size, learning_rate, epoch):
""" Generate a name for the model consisting of all the hyperparameter values
Args:
config: Configuration object containing the hyperparameters
Returns:
path: A string with the hyperparameter name and value concatenated
"""
path = "model_{0}_bs{1}_lr{2}_epoch{3}".format(name,
batch_size,
learning_rate,
epoch)
return path
def normalize_label(labels):
"""
Given a tensor containing 2 possible values, normalize this to 0/1
Args:
labels: a 1D tensor containing two possible scalar values
Returns:
A tensor normalize to 0/1 value
"""
max_val = torch.max(labels)
min_val = torch.min(labels)
norm_labels = (labels - min_val)/(max_val - min_val)
return norm_labels
def evaluate(net, loader, criterion):
""" Evaluate the network on the validation set.
Args:
net: PyTorch neural network object
loader: PyTorch data loader for the validation set
criterion: The loss function
Returns:
err: A scalar for the avg classification error over the validation set
loss: A scalar for the average loss function over the validation set
"""
total_loss = 0.0
total_err = 0.0
total_epoch = 0
for i, data in enumerate(loader, 0):
inputs, labels = data
labels = normalize_label(labels) # Convert labels to 0/1
outputs = net(inputs)
loss = criterion(outputs, labels.float())
corr = (outputs > 0.0).squeeze().long() != labels
total_err += int(corr.sum())
total_loss += loss.item()
total_epoch += len(labels)
err = float(total_err) / total_epoch
loss = float(total_loss) / (i + 1)
return err, loss
###############################################################################
# Training Curve
def plot_training_curve(path):
""" Plots the training curve for a model run, given the csv files
containing the train/validation error/loss.
Args:
path: The base path of the csv files produced during training
"""
import matplotlib.pyplot as plt
train_err = np.loadtxt("{}_train_err.csv".format(path))
val_err = np.loadtxt("{}_val_err.csv".format(path))
train_loss = np.loadtxt("{}_train_loss.csv".format(path))
val_loss = np.loadtxt("{}_val_loss.csv".format(path))
plt.title("Train vs Validation Error")
n = len(train_err) # number of epochs
plt.plot(range(1,n+1), train_err, label="Train")
plt.plot(range(1,n+1), val_err, label="Validation")
plt.xlabel("Epoch")
plt.ylabel("Error")
plt.legend(loc='best')
plt.show()
plt.title("Train vs Validation Loss")
plt.plot(range(1,n+1), train_loss, label="Train")
plt.plot(range(1,n+1), val_loss, label="Validation")
plt.xlabel("Epoch")
plt.ylabel("Loss")
plt.legend(loc='best')
plt.show()
We will make use of some of the CIFAR-10 data set, which consists of colour images of size 32x32 pixels belonging to 10 categories. You can find out more about the dataset at https://www.cs.toronto.edu/~kriz/cifar.html
For this assignment, we will only be using the cat and dog categories. We have included code that automatically downloads the dataset the first time that the main script is run.
# This will download the CIFAR-10 dataset to a folder called "data"
# the first time you run this code.
train_loader, val_loader, test_loader, classes = get_data_loader(
target_classes=["cat", "dog"],
batch_size=1) # One image per batch
Visualize some of the data by running the code below. Include the visualization in your writeup.
(You don't need to submit anything else.)
import matplotlib.pyplot as plt
k = 0
for images, labels in train_loader:
# since batch_size = 1, there is only 1 image in `images`
image = images[0]
# place the colour channel at the end, instead of at the beginning
img = np.transpose(image, [1,2,0])
# normalize pixel intensity values to [0, 1]
img = img / 2 + 0.5
plt.subplot(3, 5, k+1)
plt.axis('off')
plt.imshow(img)
k += 1
if k > 14:
break
How many training examples do we have for the combined cat
and dog
classes?
What about validation examples?
What about test examples?
Why do we need a validation set when training our model? What happens if we judge the performance of our models using the training set loss/error instead of the validation set loss/error?
We define two neural networks, a LargeNet
and SmallNet
.
We'll be training the networks in this section.
You won't understand fully what these networks are doing until the next few classes, and that's okay. For this assignment, please focus on learning how to train networks, and how hyperparameters affect training.
class LargeNet(nn.Module):
def __init__(self):
super(LargeNet, self).__init__()
self.name = "large"
self.conv1 = nn.Conv2d(3, 5, 5)
self.pool = nn.MaxPool2d(2, 2)
self.conv2 = nn.Conv2d(5, 10, 5)
self.fc1 = nn.Linear(10 * 5 * 5, 32)
self.fc2 = nn.Linear(32, 1)
def forward(self, x):
x = self.pool(F.relu(self.conv1(x)))
x = self.pool(F.relu(self.conv2(x)))
x = x.view(-1, 10 * 5 * 5)
x = F.relu(self.fc1(x))
x = self.fc2(x)
x = x.squeeze(1) # Flatten to [batch_size]
return x
class SmallNet(nn.Module):
def __init__(self):
super(SmallNet, self).__init__()
self.name = "small"
self.conv = nn.Conv2d(3, 5, 3)
self.pool = nn.MaxPool2d(2, 2)
self.fc = nn.Linear(5 * 7 * 7, 1)
def forward(self, x):
x = self.pool(F.relu(self.conv(x)))
x = self.pool(x)
x = x.view(-1, 5 * 7 * 7)
x = self.fc(x)
x = x.squeeze(1) # Flatten to [batch_size]
return x
small_net = SmallNet()
large_net = LargeNet()
The methods small_net.parameters()
and large_net.parameters()
produces an iterator of all the trainable parameters of the network.
These parameters are torch tensors containing many scalar values.
We haven't learned how how the parameters in these high-dimensional tensors will be used, but we should be able to count the number of parameters. Measuring the number of parameters in a network is one way of measuring the "size" of a network.
What is the total number of parameters in small_net
and in
large_net
? (Hint: how many numbers are in each tensor?)
for param in small_net.parameters():
print(param.shape)
The function train_net
below takes an untrained neural network (like small_net
and large_net
) and
several other parameters. You should be able to understand how this function works.
The figure below shows the high level training loop for a machine learning model:
def train_net(net, batch_size=64, learning_rate=0.01, num_epochs=30):
########################################################################
# Train a classifier on cats vs dogs
target_classes = ["cat", "dog"]
########################################################################
# Fixed PyTorch random seed for reproducible result
torch.manual_seed(1000)
########################################################################
# Obtain the PyTorch data loader objects to load batches of the datasets
train_loader, val_loader, test_loader, classes = get_data_loader(
target_classes, batch_size)
########################################################################
# Define the Loss function and optimizer
# The loss function will be Binary Cross Entropy (BCE). In this case we
# will use the BCEWithLogitsLoss which takes unnormalized output from
# the neural network and scalar label.
# Optimizer will be SGD with Momentum.
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.SGD(net.parameters(), lr=learning_rate, momentum=0.9)
########################################################################
# Set up some numpy arrays to store the training/test loss/erruracy
train_err = np.zeros(num_epochs)
train_loss = np.zeros(num_epochs)
val_err = np.zeros(num_epochs)
val_loss = np.zeros(num_epochs)
########################################################################
# Train the network
# Loop over the data iterator and sample a new batch of training data
# Get the output from the network, and optimize our loss function.
start_time = time.time()
for epoch in range(num_epochs): # loop over the dataset multiple times
total_train_loss = 0.0
total_train_err = 0.0
total_epoch = 0
for i, data in enumerate(train_loader, 0):
# Get the inputs
inputs, labels = data
labels = normalize_label(labels) # Convert labels to 0/1
# Zero the parameter gradients
optimizer.zero_grad()
# Forward pass, backward pass, and optimize
outputs = net(inputs)
loss = criterion(outputs, labels.float())
loss.backward()
optimizer.step()
# Calculate the statistics
corr = (outputs > 0.0).squeeze().long() != labels
total_train_err += int(corr.sum())
total_train_loss += loss.item()
total_epoch += len(labels)
train_err[epoch] = float(total_train_err) / total_epoch
train_loss[epoch] = float(total_train_loss) / (i+1)
val_err[epoch], val_loss[epoch] = evaluate(net, val_loader, criterion)
print(("Epoch {}: Train err: {}, Train loss: {} |"+
"Validation err: {}, Validation loss: {}").format(
epoch + 1,
train_err[epoch],
train_loss[epoch],
val_err[epoch],
val_loss[epoch]))
# Save the current model (checkpoint) to a file
model_path = get_model_name(net.name, batch_size, learning_rate, epoch)
torch.save(net.state_dict(), model_path)
print('Finished Training')
end_time = time.time()
elapsed_time = end_time - start_time
print("Total time elapsed: {:.2f} seconds".format(elapsed_time))
# Write the train/test loss/err into CSV file for plotting later
epochs = np.arange(1, num_epochs + 1)
np.savetxt("{}_train_err.csv".format(model_path), train_err)
np.savetxt("{}_train_loss.csv".format(model_path), train_loss)
np.savetxt("{}_val_err.csv".format(model_path), val_err)
np.savetxt("{}_val_loss.csv".format(model_path), val_loss)
The parameters to the function train_net
are hyperparameters of our neural network.
We made these hyperparameters easy to modify so that we can tune them later on.
What are the default values of the parameters batch_size
, learning_rate
,
and num_epochs
?
What files are written to disk when we call train_net
with small_net
, and train for 5 epochs? Provide a list
of all the files written to disk, and what information the files contain.
Train both small_net
and large_net
using the function train_net
and its default parameters.
The function will write many files to disk, including a model checkpoint (saved values of model weights)
at the end of each epoch.
If you are using Google Colab, you will need to mount Google Drive
so that the files generated by train_net
gets saved. We will be using
these files in part (d).
(See the Google Colab tutorial for more information about this.)
Report the total time elapsed when training each network. Which network took longer to train? Why?
# Since the function writes files to disk, you will need to mount
# your Google Drive. If you are working on the lab locally, you
# can comment out this code.
from google.colab import drive
drive.mount('/content/gdrive')
Use the function plot_training_curve
to display the trajectory of the
training/validation error and the training/validation loss.
You will need to use the function get_model_name
to generate the
argument to the plot_training_curve
function.
Do this for both the small network and the large network. Include both plots in your writeup.
#model_path = get_model_name("small", batch_size=??, learning_rate=??, epoch=29)
Describe what you notice about the training curve.
How do the curves differ for small_net
and large_net
?
Identify any occurences of underfitting and overfitting.
For this section, we will work with large_net
only.
Train large_net
with all default parameters, except set learning_rate=0.001
.
Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of lowering the learning rate.
# Note: When we re-construct the model, we start the training
# with *random weights*. If we omit this code, the values of
# the weights will still be the previously trained values.
large_net = LargeNet()
Train large_net
with all default parameters, except set learning_rate=0.1
.
Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of increasing the learning rate.
Train large_net
with all default parameters, including with learning_rate=0.01
.
Now, set batch_size=512
. Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of increasing the batch size.
Train large_net
with all default parameters, including with learning_rate=0.01
.
Now, set batch_size=16
. Does the model take longer/shorter to train?
Plot the training curve. Describe the effect of decreasing the batch size.
Train the model with the hyperparameters you chose in part(a), and include the training curve.
Based on your result from Part(a), suggest another set of hyperparameter values to try. Justify your choice.
Train the model with the hyperparameters you chose in part(c), and include the training curve.
Choose the best model that you have so far. This means choosing the best model checkpoint,
including the choice of small_net
vs large_net
, the batch_size
, learning_rate
,
and the epoch number.
Modify the code below to load your chosen set of weights to the model object net
.
net = SmallNet()
model_path = get_model_name(net.name, batch_size=64, learning_rate=0.01, epoch=10)
state = torch.load(model_path)
net.load_state_dict(state)
Justify your choice of model from part (a).
Using the code in Part 0, any code from lecture notes, or any code that you write, compute and report the test classification error for your chosen model.
# If you use the `evaluate` function provided in part 0, you will need to
# set batch_size > 1
train_loader, val_loader, test_loader, classes = get_data_loader(
target_classes=["cat", "dog"],
batch_size=64)
How does the test classification error compare with the validation error? Explain why you would expect the test error to be higher than the validation error.
Why did we only use the test data set at the very end? Why is it important that we use the test data as little as possible?