Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch
Sebastian Raschka CPython 3.6.8 IPython 7.2.0 torch 1.0.0
For some exotic research projects, you may want to share the weights in certain layers. For this example, suppose you want to share the weights across all output units but want to have unique bias units for each output unit.
The illustration below shows the last hidden layer and the output layer of a regular multilayer neural network:
What we are trying to achieve is to have the same weight for each output unit, i.e.,
One approach to achive this is to share the weight columns in the weight matrix of the hidden layer that connects to the output layer. A more efficient approach is to replace the matrix-matrix multiplication with shared weights by a matrix-vector multiplication that produces a single output unit, which we can then duplicate before adding the bias vector.
In other words, the first step is to modify the hidden layer such that it only contains a single vector:
# Replace this by the uncommented code below:
#self.linear_1 = torch.nn.Linear(7*7*8, num_classes)
# Use only a weight vector instead of weight matrix:
self.linear_1 = torch.nn.Linear(7*7*8, 1, bias=False)
# Define bias manually:
self.linear_1_bias = torch.nn.Parameter(torch.tensor(torch.zeros(num_classes),
dtype=self.linear_1.weight.dtype))
Next, in the forward
method, we compute the single output and duplicate it over the number of classes, then we add the weights:
# Duplicate outputs over all output units
logits = self.linear_1(out.view(-1, 7*7*8))
ones = torch.ones(num_classes, dtype=logits.dtype)
ones = logits
# then manually add bias
logits = logits + self.linear_1_bias
The following code in this notebook illustrates this using a convnet and the 10-class MNIST dataset.
The classification performance will obviously poor, because in this case weight sharing is not ideal, but this is more meant as a technical reference/demo, not a real-world use case for this dataset
import time
import numpy as np
import torch
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms
from torch.utils.data import DataLoader
if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True
##########################
### SETTINGS
##########################
# Device
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")
# Hyperparameters
random_seed = 1
learning_rate = 0.1
num_epochs = 10
batch_size = 128
# Architecture
num_classes = 10
##########################
### MNIST DATASET
##########################
# Note transforms.ToTensor() scales input images
# to 0-1 range
train_dataset = datasets.MNIST(root='data',
train=True,
transform=transforms.ToTensor(),
download=True)
test_dataset = datasets.MNIST(root='data',
train=False,
transform=transforms.ToTensor())
train_loader = DataLoader(dataset=train_dataset,
batch_size=batch_size,
shuffle=True)
test_loader = DataLoader(dataset=test_dataset,
batch_size=batch_size,
shuffle=False)
# Checking the dataset
for images, labels in train_loader:
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break
Image batch dimensions: torch.Size([128, 1, 28, 28]) Image label dimensions: torch.Size([128])
##########################
### MODEL
##########################
class ConvNet(torch.nn.Module):
def __init__(self, num_classes):
super(ConvNet, self).__init__()
# calculate same padding:
# (w - k + 2*p)/s + 1 = o
# => p = (s(o-1) - w + k)/2
# 28x28x1 => 28x28x4
self.conv_1 = torch.nn.Conv2d(in_channels=1,
out_channels=4,
kernel_size=(3, 3),
stride=(1, 1),
padding=1) # (1(28-1) - 28 + 3) / 2 = 1
# 28x28x4 => 14x14x4
self.pool_1 = torch.nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2),
padding=0) # (2(14-1) - 28 + 2) = 0
# 14x14x4 => 14x14x8
self.conv_2 = torch.nn.Conv2d(in_channels=4,
out_channels=8,
kernel_size=(3, 3),
stride=(1, 1),
padding=1) # (1(14-1) - 14 + 3) / 2 = 1
# 14x14x8 => 7x7x8
self.pool_2 = torch.nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2),
padding=0) # (2(7-1) - 14 + 2) = 0
##############################################################################
### WEIGHT SHARING IN LAST LAYER
#self.linear_1 = torch.nn.Linear(7*7*8, num_classes)
# Use only a weight vector instead of weight matrix:
self.linear_1 = torch.nn.Linear(7*7*8, 1, bias=False)
# Define bias manually:
self.linear_1_bias = torch.nn.Parameter(torch.tensor(torch.zeros(num_classes),
dtype=self.linear_1.weight.dtype))
##############################################################################
def forward(self, x):
out = self.conv_1(x)
out = F.relu(out)
out = self.pool_1(out)
out = self.conv_2(out)
out = F.relu(out)
out = self.pool_2(out)
##############################################################################
### WEIGHT SHARING IN LAST LAYER
# Duplicate outputs over all output units
logits = self.linear_1(out.view(-1, 7*7*8))
# then manually add bias
logits = logits + self.linear_1_bias
##############################################################################
probas = F.softmax(logits, dim=1)
return logits, probas
torch.manual_seed(random_seed)
model = ConvNet(num_classes=num_classes)
model = model.to(device)
optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)
/home/raschka/.local/lib/python3.6/site-packages/ipykernel_launcher.py:46: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).
def compute_accuracy(model, data_loader):
correct_pred, num_examples = 0, 0
for features, targets in data_loader:
features = features.to(device)
targets = targets.to(device)
logits, probas = model(features)
_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100
start_time = time.time()
for epoch in range(num_epochs):
model = model.train()
for batch_idx, (features, targets) in enumerate(train_loader):
features = features.to(device)
targets = targets.to(device)
### FORWARD AND BACK PROP
logits, probas = model(features)
cost = F.cross_entropy(logits, targets)
optimizer.zero_grad()
cost.backward()
### UPDATE MODEL PARAMETERS
optimizer.step()
### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f'
%(epoch+1, num_epochs, batch_idx,
len(train_loader), cost))
model = model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,
compute_accuracy(model, train_loader)))
print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))
print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))
Epoch: 001/010 | Batch 000/469 | Cost: 2.3026 Epoch: 001/010 | Batch 050/469 | Cost: 2.2990 Epoch: 001/010 | Batch 100/469 | Cost: 2.3003 Epoch: 001/010 | Batch 150/469 | Cost: 2.2959 Epoch: 001/010 | Batch 200/469 | Cost: 2.3057 Epoch: 001/010 | Batch 250/469 | Cost: 2.2986 Epoch: 001/010 | Batch 300/469 | Cost: 2.3015 Epoch: 001/010 | Batch 350/469 | Cost: 2.3060 Epoch: 001/010 | Batch 400/469 | Cost: 2.3028 Epoch: 001/010 | Batch 450/469 | Cost: 2.2964 Epoch: 001/010 training accuracy: 11.24% Time elapsed: 0.20 min Epoch: 002/010 | Batch 000/469 | Cost: 2.2972 Epoch: 002/010 | Batch 050/469 | Cost: 2.3077 Epoch: 002/010 | Batch 100/469 | Cost: 2.3085 Epoch: 002/010 | Batch 150/469 | Cost: 2.3044 Epoch: 002/010 | Batch 200/469 | Cost: 2.2997 Epoch: 002/010 | Batch 250/469 | Cost: 2.2986 Epoch: 002/010 | Batch 300/469 | Cost: 2.2935 Epoch: 002/010 | Batch 350/469 | Cost: 2.3029 Epoch: 002/010 | Batch 400/469 | Cost: 2.3011 Epoch: 002/010 | Batch 450/469 | Cost: 2.3057 Epoch: 002/010 training accuracy: 11.24% Time elapsed: 0.41 min Epoch: 003/010 | Batch 000/469 | Cost: 2.3035 Epoch: 003/010 | Batch 050/469 | Cost: 2.3138 Epoch: 003/010 | Batch 100/469 | Cost: 2.3072 Epoch: 003/010 | Batch 150/469 | Cost: 2.3000 Epoch: 003/010 | Batch 200/469 | Cost: 2.3023 Epoch: 003/010 | Batch 250/469 | Cost: 2.2958 Epoch: 003/010 | Batch 300/469 | Cost: 2.2970 Epoch: 003/010 | Batch 350/469 | Cost: 2.3000 Epoch: 003/010 | Batch 400/469 | Cost: 2.3005 Epoch: 003/010 | Batch 450/469 | Cost: 2.2929 Epoch: 003/010 training accuracy: 11.24% Time elapsed: 0.61 min Epoch: 004/010 | Batch 000/469 | Cost: 2.2920 Epoch: 004/010 | Batch 050/469 | Cost: 2.2986 Epoch: 004/010 | Batch 100/469 | Cost: 2.3005 Epoch: 004/010 | Batch 150/469 | Cost: 2.2923 Epoch: 004/010 | Batch 200/469 | Cost: 2.3051 Epoch: 004/010 | Batch 250/469 | Cost: 2.3060 Epoch: 004/010 | Batch 300/469 | Cost: 2.3073 Epoch: 004/010 | Batch 350/469 | Cost: 2.3064 Epoch: 004/010 | Batch 400/469 | Cost: 2.3052 Epoch: 004/010 | Batch 450/469 | Cost: 2.2975 Epoch: 004/010 training accuracy: 11.24% Time elapsed: 0.81 min Epoch: 005/010 | Batch 000/469 | Cost: 2.3027 Epoch: 005/010 | Batch 050/469 | Cost: 2.3005 Epoch: 005/010 | Batch 100/469 | Cost: 2.2946 Epoch: 005/010 | Batch 150/469 | Cost: 2.2892 Epoch: 005/010 | Batch 200/469 | Cost: 2.2958 Epoch: 005/010 | Batch 250/469 | Cost: 2.3036 Epoch: 005/010 | Batch 300/469 | Cost: 2.3003 Epoch: 005/010 | Batch 350/469 | Cost: 2.3015 Epoch: 005/010 | Batch 400/469 | Cost: 2.3057 Epoch: 005/010 | Batch 450/469 | Cost: 2.2972 Epoch: 005/010 training accuracy: 11.24% Time elapsed: 1.01 min Epoch: 006/010 | Batch 000/469 | Cost: 2.3029 Epoch: 006/010 | Batch 050/469 | Cost: 2.2979 Epoch: 006/010 | Batch 100/469 | Cost: 2.3027 Epoch: 006/010 | Batch 150/469 | Cost: 2.3028 Epoch: 006/010 | Batch 200/469 | Cost: 2.3010 Epoch: 006/010 | Batch 250/469 | Cost: 2.3025 Epoch: 006/010 | Batch 300/469 | Cost: 2.3054 Epoch: 006/010 | Batch 350/469 | Cost: 2.2972 Epoch: 006/010 | Batch 400/469 | Cost: 2.3037 Epoch: 006/010 | Batch 450/469 | Cost: 2.3064 Epoch: 006/010 training accuracy: 11.24% Time elapsed: 1.22 min Epoch: 007/010 | Batch 000/469 | Cost: 2.2983 Epoch: 007/010 | Batch 050/469 | Cost: 2.2979 Epoch: 007/010 | Batch 100/469 | Cost: 2.3077 Epoch: 007/010 | Batch 150/469 | Cost: 2.3047 Epoch: 007/010 | Batch 200/469 | Cost: 2.2998 Epoch: 007/010 | Batch 250/469 | Cost: 2.2993 Epoch: 007/010 | Batch 300/469 | Cost: 2.2966 Epoch: 007/010 | Batch 350/469 | Cost: 2.2967 Epoch: 007/010 | Batch 400/469 | Cost: 2.2916 Epoch: 007/010 | Batch 450/469 | Cost: 2.3016 Epoch: 007/010 training accuracy: 11.24% Time elapsed: 1.42 min Epoch: 008/010 | Batch 000/469 | Cost: 2.2992 Epoch: 008/010 | Batch 050/469 | Cost: 2.2953 Epoch: 008/010 | Batch 100/469 | Cost: 2.3018 Epoch: 008/010 | Batch 150/469 | Cost: 2.3053 Epoch: 008/010 | Batch 200/469 | Cost: 2.2983 Epoch: 008/010 | Batch 250/469 | Cost: 2.3089 Epoch: 008/010 | Batch 300/469 | Cost: 2.3048 Epoch: 008/010 | Batch 350/469 | Cost: 2.3065 Epoch: 008/010 | Batch 400/469 | Cost: 2.3037 Epoch: 008/010 | Batch 450/469 | Cost: 2.2966 Epoch: 008/010 training accuracy: 11.24% Time elapsed: 1.62 min Epoch: 009/010 | Batch 000/469 | Cost: 2.3060 Epoch: 009/010 | Batch 050/469 | Cost: 2.2945 Epoch: 009/010 | Batch 100/469 | Cost: 2.3037 Epoch: 009/010 | Batch 150/469 | Cost: 2.3064 Epoch: 009/010 | Batch 200/469 | Cost: 2.3043 Epoch: 009/010 | Batch 250/469 | Cost: 2.2991 Epoch: 009/010 | Batch 300/469 | Cost: 2.2945 Epoch: 009/010 | Batch 350/469 | Cost: 2.2999 Epoch: 009/010 | Batch 400/469 | Cost: 2.3131 Epoch: 009/010 | Batch 450/469 | Cost: 2.3079 Epoch: 009/010 training accuracy: 11.24% Time elapsed: 1.82 min Epoch: 010/010 | Batch 000/469 | Cost: 2.2977 Epoch: 010/010 | Batch 050/469 | Cost: 2.2996 Epoch: 010/010 | Batch 100/469 | Cost: 2.2979 Epoch: 010/010 | Batch 150/469 | Cost: 2.3038 Epoch: 010/010 | Batch 200/469 | Cost: 2.3062 Epoch: 010/010 | Batch 250/469 | Cost: 2.2935 Epoch: 010/010 | Batch 300/469 | Cost: 2.3027 Epoch: 010/010 | Batch 350/469 | Cost: 2.3006 Epoch: 010/010 | Batch 400/469 | Cost: 2.3051 Epoch: 010/010 | Batch 450/469 | Cost: 2.3077 Epoch: 010/010 training accuracy: 11.24% Time elapsed: 2.02 min Total Training Time: 2.02 min
Check that bias units updated correctly (should be all different):
model.linear_1_bias
Parameter containing: tensor([-0.0202, 0.1097, -0.0029, 0.0253, -0.0269, -0.1026, -0.0194, 0.0572, -0.0027, -0.0175], device='cuda:3', requires_grad=True)
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))
Test accuracy: 11.35%
The classification performance is obviously poor, because in this case weight sharing is not ideal, but this is more meant as a technical reference/demo, not a real-world use case for this dataset