Deep Learning Models -- A collection of various deep learning architectures, models, and tips for TensorFlow and PyTorch in Jupyter Notebooks.

In :
%load_ext watermark
%watermark -a 'Sebastian Raschka' -v -p torch

Sebastian Raschka

CPython 3.6.8
IPython 7.2.0

torch 1.0.0

• Runs on CPU or GPU (if available)

# Model Zoo -- Weight Sharing Within a Layer¶

For some exotic research projects, you may want to share the weights in certain layers. For this example, suppose you want to share the weights across all output units but want to have unique bias units for each output unit.

The illustration below shows the last hidden layer and the output layer of a regular multilayer neural network: What we are trying to achieve is to have the same weight for each output unit, i.e., One approach to achive this is to share the weight columns in the weight matrix of the hidden layer that connects to the output layer. A more efficient approach is to replace the matrix-matrix multiplication with shared weights by a matrix-vector multiplication that produces a single output unit, which we can then duplicate before adding the bias vector.

In other words, the first step is to modify the hidden layer such that it only contains a single vector:

# Replace this by the uncommented code below:
#self.linear_1 = torch.nn.Linear(7*7*8, num_classes)

# Use only a weight vector instead of weight matrix:
self.linear_1 = torch.nn.Linear(7*7*8, 1, bias=False)

# Define bias manually:
self.linear_1_bias = torch.nn.Parameter(torch.tensor(torch.zeros(num_classes),
dtype=self.linear_1.weight.dtype))


Next, in the forward method, we compute the single output and duplicate it over the number of classes, then we add the weights:

# Duplicate outputs over all output units
logits = self.linear_1(out.view(-1, 7*7*8))
ones = torch.ones(num_classes, dtype=logits.dtype)
ones = logits

logits = logits + self.linear_1_bias


The following code in this notebook illustrates this using a convnet and the 10-class MNIST dataset.

The classification performance will obviously poor, because in this case weight sharing is not ideal, but this is more meant as a technical reference/demo, not a real-world use case for this dataset

## Imports¶

In :
import time
import numpy as np
import torch
import torch.nn.functional as F
from torchvision import datasets
from torchvision import transforms

if torch.cuda.is_available():
torch.backends.cudnn.deterministic = True


## Settings and Dataset¶

In :
##########################
### SETTINGS
##########################

# Device
device = torch.device("cuda:3" if torch.cuda.is_available() else "cpu")

# Hyperparameters
random_seed = 1
learning_rate = 0.1
num_epochs = 10
batch_size = 128

# Architecture
num_classes = 10

##########################
### MNIST DATASET
##########################

# Note transforms.ToTensor() scales input images
# to 0-1 range
train_dataset = datasets.MNIST(root='data',
train=True,
transform=transforms.ToTensor(),

test_dataset = datasets.MNIST(root='data',
train=False,
transform=transforms.ToTensor())

batch_size=batch_size,
shuffle=True)

batch_size=batch_size,
shuffle=False)

# Checking the dataset
print('Image batch dimensions:', images.shape)
print('Image label dimensions:', labels.shape)
break

Image batch dimensions: torch.Size([128, 1, 28, 28])
Image label dimensions: torch.Size()


## Model¶

In :
##########################
### MODEL
##########################

class ConvNet(torch.nn.Module):

def __init__(self, num_classes):
super(ConvNet, self).__init__()

# (w - k + 2*p)/s + 1 = o
# => p = (s(o-1) - w + k)/2

# 28x28x1 => 28x28x4
self.conv_1 = torch.nn.Conv2d(in_channels=1,
out_channels=4,
kernel_size=(3, 3),
stride=(1, 1),
padding=1) # (1(28-1) - 28 + 3) / 2 = 1
# 28x28x4 => 14x14x4
self.pool_1 = torch.nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2),
padding=0) # (2(14-1) - 28 + 2) = 0
# 14x14x4 => 14x14x8
self.conv_2 = torch.nn.Conv2d(in_channels=4,
out_channels=8,
kernel_size=(3, 3),
stride=(1, 1),
padding=1) # (1(14-1) - 14 + 3) / 2 = 1
# 14x14x8 => 7x7x8
self.pool_2 = torch.nn.MaxPool2d(kernel_size=(2, 2),
stride=(2, 2),
padding=0) # (2(7-1) - 14 + 2) = 0

##############################################################################
### WEIGHT SHARING IN LAST LAYER

#self.linear_1 = torch.nn.Linear(7*7*8, num_classes)

# Use only a weight vector instead of weight matrix:
self.linear_1 = torch.nn.Linear(7*7*8, 1, bias=False)

# Define bias manually:
self.linear_1_bias = torch.nn.Parameter(torch.tensor(torch.zeros(num_classes),
dtype=self.linear_1.weight.dtype))
##############################################################################

def forward(self, x):
out = self.conv_1(x)
out = F.relu(out)
out = self.pool_1(out)

out = self.conv_2(out)
out = F.relu(out)
out = self.pool_2(out)

##############################################################################
### WEIGHT SHARING IN LAST LAYER

# Duplicate outputs over all output units
logits = self.linear_1(out.view(-1, 7*7*8))

logits = logits + self.linear_1_bias
##############################################################################

probas = F.softmax(logits, dim=1)
return logits, probas

torch.manual_seed(random_seed)
model = ConvNet(num_classes=num_classes)

model = model.to(device)

optimizer = torch.optim.SGD(model.parameters(), lr=learning_rate)

/home/raschka/.local/lib/python3.6/site-packages/ipykernel_launcher.py:46: UserWarning: To copy construct from a tensor, it is recommended to use sourceTensor.clone().detach() or sourceTensor.clone().detach().requires_grad_(True), rather than torch.tensor(sourceTensor).


## Training¶

In :
def compute_accuracy(model, data_loader):
correct_pred, num_examples = 0, 0
features = features.to(device)
targets = targets.to(device)
logits, probas = model(features)
_, predicted_labels = torch.max(probas, 1)
num_examples += targets.size(0)
correct_pred += (predicted_labels == targets).sum()
return correct_pred.float()/num_examples * 100

start_time = time.time()
for epoch in range(num_epochs):
model = model.train()
for batch_idx, (features, targets) in enumerate(train_loader):

features = features.to(device)
targets = targets.to(device)

### FORWARD AND BACK PROP
logits, probas = model(features)
cost = F.cross_entropy(logits, targets)

cost.backward()

### UPDATE MODEL PARAMETERS
optimizer.step()

### LOGGING
if not batch_idx % 50:
print ('Epoch: %03d/%03d | Batch %03d/%03d | Cost: %.4f'
%(epoch+1, num_epochs, batch_idx,

model = model.eval()
with torch.set_grad_enabled(False): # save memory during inference
print('Epoch: %03d/%03d training accuracy: %.2f%%' % (
epoch+1, num_epochs,

print('Time elapsed: %.2f min' % ((time.time() - start_time)/60))

print('Total Training Time: %.2f min' % ((time.time() - start_time)/60))

Epoch: 001/010 | Batch 000/469 | Cost: 2.3026
Epoch: 001/010 | Batch 050/469 | Cost: 2.2990
Epoch: 001/010 | Batch 100/469 | Cost: 2.3003
Epoch: 001/010 | Batch 150/469 | Cost: 2.2959
Epoch: 001/010 | Batch 200/469 | Cost: 2.3057
Epoch: 001/010 | Batch 250/469 | Cost: 2.2986
Epoch: 001/010 | Batch 300/469 | Cost: 2.3015
Epoch: 001/010 | Batch 350/469 | Cost: 2.3060
Epoch: 001/010 | Batch 400/469 | Cost: 2.3028
Epoch: 001/010 | Batch 450/469 | Cost: 2.2964
Epoch: 001/010 training accuracy: 11.24%
Time elapsed: 0.20 min
Epoch: 002/010 | Batch 000/469 | Cost: 2.2972
Epoch: 002/010 | Batch 050/469 | Cost: 2.3077
Epoch: 002/010 | Batch 100/469 | Cost: 2.3085
Epoch: 002/010 | Batch 150/469 | Cost: 2.3044
Epoch: 002/010 | Batch 200/469 | Cost: 2.2997
Epoch: 002/010 | Batch 250/469 | Cost: 2.2986
Epoch: 002/010 | Batch 300/469 | Cost: 2.2935
Epoch: 002/010 | Batch 350/469 | Cost: 2.3029
Epoch: 002/010 | Batch 400/469 | Cost: 2.3011
Epoch: 002/010 | Batch 450/469 | Cost: 2.3057
Epoch: 002/010 training accuracy: 11.24%
Time elapsed: 0.41 min
Epoch: 003/010 | Batch 000/469 | Cost: 2.3035
Epoch: 003/010 | Batch 050/469 | Cost: 2.3138
Epoch: 003/010 | Batch 100/469 | Cost: 2.3072
Epoch: 003/010 | Batch 150/469 | Cost: 2.3000
Epoch: 003/010 | Batch 200/469 | Cost: 2.3023
Epoch: 003/010 | Batch 250/469 | Cost: 2.2958
Epoch: 003/010 | Batch 300/469 | Cost: 2.2970
Epoch: 003/010 | Batch 350/469 | Cost: 2.3000
Epoch: 003/010 | Batch 400/469 | Cost: 2.3005
Epoch: 003/010 | Batch 450/469 | Cost: 2.2929
Epoch: 003/010 training accuracy: 11.24%
Time elapsed: 0.61 min
Epoch: 004/010 | Batch 000/469 | Cost: 2.2920
Epoch: 004/010 | Batch 050/469 | Cost: 2.2986
Epoch: 004/010 | Batch 100/469 | Cost: 2.3005
Epoch: 004/010 | Batch 150/469 | Cost: 2.2923
Epoch: 004/010 | Batch 200/469 | Cost: 2.3051
Epoch: 004/010 | Batch 250/469 | Cost: 2.3060
Epoch: 004/010 | Batch 300/469 | Cost: 2.3073
Epoch: 004/010 | Batch 350/469 | Cost: 2.3064
Epoch: 004/010 | Batch 400/469 | Cost: 2.3052
Epoch: 004/010 | Batch 450/469 | Cost: 2.2975
Epoch: 004/010 training accuracy: 11.24%
Time elapsed: 0.81 min
Epoch: 005/010 | Batch 000/469 | Cost: 2.3027
Epoch: 005/010 | Batch 050/469 | Cost: 2.3005
Epoch: 005/010 | Batch 100/469 | Cost: 2.2946
Epoch: 005/010 | Batch 150/469 | Cost: 2.2892
Epoch: 005/010 | Batch 200/469 | Cost: 2.2958
Epoch: 005/010 | Batch 250/469 | Cost: 2.3036
Epoch: 005/010 | Batch 300/469 | Cost: 2.3003
Epoch: 005/010 | Batch 350/469 | Cost: 2.3015
Epoch: 005/010 | Batch 400/469 | Cost: 2.3057
Epoch: 005/010 | Batch 450/469 | Cost: 2.2972
Epoch: 005/010 training accuracy: 11.24%
Time elapsed: 1.01 min
Epoch: 006/010 | Batch 000/469 | Cost: 2.3029
Epoch: 006/010 | Batch 050/469 | Cost: 2.2979
Epoch: 006/010 | Batch 100/469 | Cost: 2.3027
Epoch: 006/010 | Batch 150/469 | Cost: 2.3028
Epoch: 006/010 | Batch 200/469 | Cost: 2.3010
Epoch: 006/010 | Batch 250/469 | Cost: 2.3025
Epoch: 006/010 | Batch 300/469 | Cost: 2.3054
Epoch: 006/010 | Batch 350/469 | Cost: 2.2972
Epoch: 006/010 | Batch 400/469 | Cost: 2.3037
Epoch: 006/010 | Batch 450/469 | Cost: 2.3064
Epoch: 006/010 training accuracy: 11.24%
Time elapsed: 1.22 min
Epoch: 007/010 | Batch 000/469 | Cost: 2.2983
Epoch: 007/010 | Batch 050/469 | Cost: 2.2979
Epoch: 007/010 | Batch 100/469 | Cost: 2.3077
Epoch: 007/010 | Batch 150/469 | Cost: 2.3047
Epoch: 007/010 | Batch 200/469 | Cost: 2.2998
Epoch: 007/010 | Batch 250/469 | Cost: 2.2993
Epoch: 007/010 | Batch 300/469 | Cost: 2.2966
Epoch: 007/010 | Batch 350/469 | Cost: 2.2967
Epoch: 007/010 | Batch 400/469 | Cost: 2.2916
Epoch: 007/010 | Batch 450/469 | Cost: 2.3016
Epoch: 007/010 training accuracy: 11.24%
Time elapsed: 1.42 min
Epoch: 008/010 | Batch 000/469 | Cost: 2.2992
Epoch: 008/010 | Batch 050/469 | Cost: 2.2953
Epoch: 008/010 | Batch 100/469 | Cost: 2.3018
Epoch: 008/010 | Batch 150/469 | Cost: 2.3053
Epoch: 008/010 | Batch 200/469 | Cost: 2.2983
Epoch: 008/010 | Batch 250/469 | Cost: 2.3089
Epoch: 008/010 | Batch 300/469 | Cost: 2.3048
Epoch: 008/010 | Batch 350/469 | Cost: 2.3065
Epoch: 008/010 | Batch 400/469 | Cost: 2.3037
Epoch: 008/010 | Batch 450/469 | Cost: 2.2966
Epoch: 008/010 training accuracy: 11.24%
Time elapsed: 1.62 min
Epoch: 009/010 | Batch 000/469 | Cost: 2.3060
Epoch: 009/010 | Batch 050/469 | Cost: 2.2945
Epoch: 009/010 | Batch 100/469 | Cost: 2.3037
Epoch: 009/010 | Batch 150/469 | Cost: 2.3064
Epoch: 009/010 | Batch 200/469 | Cost: 2.3043
Epoch: 009/010 | Batch 250/469 | Cost: 2.2991
Epoch: 009/010 | Batch 300/469 | Cost: 2.2945
Epoch: 009/010 | Batch 350/469 | Cost: 2.2999
Epoch: 009/010 | Batch 400/469 | Cost: 2.3131
Epoch: 009/010 | Batch 450/469 | Cost: 2.3079
Epoch: 009/010 training accuracy: 11.24%
Time elapsed: 1.82 min
Epoch: 010/010 | Batch 000/469 | Cost: 2.2977
Epoch: 010/010 | Batch 050/469 | Cost: 2.2996
Epoch: 010/010 | Batch 100/469 | Cost: 2.2979
Epoch: 010/010 | Batch 150/469 | Cost: 2.3038
Epoch: 010/010 | Batch 200/469 | Cost: 2.3062
Epoch: 010/010 | Batch 250/469 | Cost: 2.2935
Epoch: 010/010 | Batch 300/469 | Cost: 2.3027
Epoch: 010/010 | Batch 350/469 | Cost: 2.3006
Epoch: 010/010 | Batch 400/469 | Cost: 2.3051
Epoch: 010/010 | Batch 450/469 | Cost: 2.3077
Epoch: 010/010 training accuracy: 11.24%
Time elapsed: 2.02 min
Total Training Time: 2.02 min


Check that bias units updated correctly (should be all different):

In :
model.linear_1_bias

Out:
Parameter containing:
tensor([-0.0202,  0.1097, -0.0029,  0.0253, -0.0269, -0.1026, -0.0194,  0.0572,
-0.0027, -0.0175], device='cuda:3', requires_grad=True)

## Evaluation¶

In :
with torch.set_grad_enabled(False): # save memory during inference
print('Test accuracy: %.2f%%' % (compute_accuracy(model, test_loader)))

Test accuracy: 11.35%


The classification performance is obviously poor, because in this case weight sharing is not ideal, but this is more meant as a technical reference/demo, not a real-world use case for this dataset