Notebook created by Felipe Oyarce, felipe.oyarce94@gmail.com
In this project we’ve implemented a strategy presented by Skolik et al., 2020 (check the implementation in Tensorflow Quantum) for effectively quantum neural networks. In layerwise learning the strategy is to gradually increase the number of parameters by adding a few layers and training them while freezing the parameters of previous layers already trained. An easy way for understanding this technique is to think that we’re dividing the problem into smaller circuits to successfully avoid to fall into Barren Plateaus. Here, we provide a proof-of-concept for the implementation of this technique in Pennylane’s Pytorch interface.
The task selected for this proof-of-concept is the same used in the original paper for the binary classification between the handwritten digits 3 and 6 in the MNIST dataset.
import random
import matplotlib.pyplot as plt
# Pennylane
import pennylane as qml
from pennylane import numpy as np
# Pytorch
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler
n_qubits = 9
n_layer_steps = 3
n_layers_to_add = 2
batch_size = 128
epochs = 5
We configure PyTorch to use CUDA only if available. Otherwise the CPU is used.
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
/home/felipeoyarce/Desktop/layerwise-learning/.venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at /pytorch/c10/cuda/CUDAFunctions.cpp:100.) return torch._C._cuda_getDeviceCount() > 0
We initialize a PennyLane device with a lightning.qubit
backend.
dev = qml.device("lightning.qubit", wires=n_qubits)
In data_transforms
, we compose several transformations to the images in order to reduce their sizes and construct a flatten vector while keeping meaningful information to being able to "learn" the difference between digits in a quantum neural network. Feel free to explore and try different representation of the data such as learned embeddings or dimensionality reduction approaches.
data_transforms = transforms.Compose([transforms.CenterCrop(18), #crop the image to a 18x18 image
transforms.Resize(3), #resize to a 3x3 image
transforms.ToTensor(), #convert to tensor
transforms.Lambda(lambda x: torch.flatten(x)) #obtain a vector by flatten the image
])
# Download the MNIST dataset and apply the composition of transformations.
train_set = datasets.MNIST(root='./data', train=True, download=True, transform=data_transforms)
test_set = datasets.MNIST(root='./data', train=False, download=True, transform=data_transforms)
# Change labels of digits '3' and '6' to be 0 and 1, respectively.
# Note that first we must change the labels of the digits '0' and '1'
train_set.targets[train_set.targets == 1] = 10
train_set.targets[train_set.targets == 0] = 10
train_set.targets[train_set.targets == 3] = 0
train_set.targets[train_set.targets == 6] = 1
test_set.targets[test_set.targets == 1] = 10
test_set.targets[test_set.targets == 0] = 10
test_set.targets[test_set.targets == 3] = 0
test_set.targets[test_set.targets == 6] = 1
# Filter to just images of '3's and '6's
subset_indices_train = ((train_set.targets == 0) + (train_set.targets == 1)).nonzero().view(-1)
subset_indices_test = ((test_set.targets == 0) + (test_set.targets == 1)).nonzero().view(-1)
print(len(subset_indices_test))
# Select just a subset of the training set.
# Increase the number of examples for more accurate results
NUM_EXAMPLES = 1000
subset_indices_train = subset_indices_train[:NUM_EXAMPLES]
# DataLoaders
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=False,
sampler=SubsetRandomSampler(subset_indices_train))
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=False,
sampler=SubsetRandomSampler(subset_indices_test))
1968
/home/felipeoyarce/Desktop/layerwise-learning/.venv/lib/python3.7/site-packages/ipykernel_launcher.py:18: UserWarning: This overload of nonzero is deprecated: nonzero() Consider using one of the following signatures instead: nonzero(*, bool as_tuple) (Triggered internally at /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)
k = 0
for x, y in train_loader:
for i in range(y.shape[0]):
if y[i].item() == 0:
k += 1
print(f"{k} images of digit '3'.")
print(f"{NUM_EXAMPLES - k} images of digit '6'.")
495 images of digit '3'. 505 images of digit '6'.
def set_random_gates(n_qubits):
"""Utility function for creating a list
of random gates chosen from gate_set.
The returned list has a length of n_qubits.
Arguments:
n_qubits (int): Integer number indicating
the number of qubits of the quantum
circuit.
Returns:
chosen_gates (list): List of length equal
to n_qubits containing RX, RY and RZ
rotations randomly chosen.
"""
gate_set = [qml.RX, qml.RY, qml.RZ]
chosen_gates = []
for i in range(n_qubits):
chosen_gate = random.choice(gate_set)
chosen_gates.append(chosen_gate)
return chosen_gates
def total_elements(array_list):
"""Utility function that returns the total number
of elements in a list of lists.
Arguments:
array_list (list[list]): List of lists.
Returns:
(int): Total number of elements in array_list.
"""
flattened = [val for sublist in array_list for val in sublist]
return len(flattened)
# Lists to update the new gates and trained weights.
layer_gates = []
layer_weights = []
def apply_layer(gates, weights):
"""Function to apply the layer composed of
of RX, RY and RZ to each qubit in the circuit
(just one gate per qubit, randomly chosen) with
their respective parameters. Then, apply CZ gates
in a ladder structure.
Arguments:
gates: List of single qubit gates to apply in
the circuit. Length equal to the number
of qubits of the circuit.
weights: List of parameters to apply in each
gate from gates. Length equal to the
number of qubits of the circuit.
Returns:
None
"""
# Apply single qubit gates with their weights.
for i in range(n_qubits):
gates[i](weights[i], wires = i)
# Apply CZ gates to each pair of qubits in ladder structure.
for i in range(n_qubits-1):
qml.CZ(wires=[i, i+1])
#Function for non-trainable part of the quantum circuit
def apply_frozen_layers(frozen_layer_gates, frozen_layer_weights):
"""Function that applies multiple layers to the quantum
circuit. The main purpose of this function is to use it
for applying the layers already trained during Phase I of
layerwise learning.
Arguments:
frozen_layer_gates: List of lists containing the qubit
rotations per layer to apply to the circuit.
List of "shape" (number layers, number qubits).
frozen_layer_weights: List of lists containing the
parameters (angles) to each rotation in
frozen_layer_gates. List of "shape" (number layers, number qubits).
Returns:
None
"""
for i in range(len(frozen_layer_gates)):
apply_layer(frozen_layer_gates[i], frozen_layer_weights[i])
@qml.qnode(dev, interface="torch")
def quantum_net(inputs, new_weights):
"""Quantum network to train during Phase I of
layerwise learning. The data inputs are encoded
using an Angle Embedding with X rotations. Then,
we apply the non-trainable layers or frozen layers
using the two lists called layer_gates and layer_weights
that store the randomly selected single qubit rotations
and their trained weights in previous steps of layerwise
learning. Finally, n_layers_to_add is an integer number that
indicates the number of trainable layers to add in
each step of Phase I.
Arguments:
inputs: Tensor data.
new_weights: New paramters to be train of shape
(n_layers_to_add, n_qubits).
Returns:
(float): Expectation value of an Z measurement in the
last qubit of the circuit.
"""
# Encode the data with Angle Embedding
qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
# Apply frozen layers
apply_frozen_layers(layer_gates, layer_weights)
# Apply layers with trainable parameters
for i in range(n_layers_to_add):
apply_layer(new_gates[i], new_weights[i])
# Expectation value of the last qubit
return qml.expval(qml.PauliZ(n_qubits-1))
# Sigmoid function and Binary Cross Entropy loss
sigmoid = nn.Sigmoid()
loss = nn.BCELoss()
for step in range(n_layer_steps):
print(f"Phase I step: {step+1}")
# Obtain random gates for each new layer.
new_gates = [set_random_gates(n_qubits) for i in range(n_layers_to_add)]
# Define shape of the weights
weight_shapes = {"new_weights": (n_layers_to_add, n_qubits)}
# Quantum net as a TorchLayer
qlayer = qml.qnn.TorchLayer(quantum_net, weight_shapes, init_method = nn.init.zeros_)
# Create Sequential Model
model = torch.nn.Sequential(qlayer, sigmoid)
# Optimizer
opt = optim.Adam(model.parameters(), lr=0.01)
batches = NUM_EXAMPLES // batch_size
for epoch in range(epochs):
running_loss = 0
for x, y in train_loader:
opt.zero_grad()
y = y.to(torch.float32)
loss_evaluated = loss(model(x), y)
loss_evaluated.backward()
running_loss += loss_evaluated
opt.step()
avg_loss = running_loss / batches
print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))
# Extract weights after optimization to be save in layer_weights
for param in model.parameters():
new_weights = param.data
new_weights = new_weights.tolist()
print(f"Trained parameters: {total_elements(new_weights)}")
layer_gates += new_gates
layer_weights += new_weights
print(f"Layer weights: {total_elements(layer_weights)}")
print(f"Number of layers: {len(layer_gates)}")
print("")
Phase I step: 1 Average loss over epoch 1: 0.9071 Average loss over epoch 2: 0.9058 Average loss over epoch 3: 0.9077 Average loss over epoch 4: 0.9057 Average loss over epoch 5: 0.9094 Trained parameters: 18 Layer weights: 18 Number of layers: 2 Phase I step: 2 Average loss over epoch 1: 0.9037 Average loss over epoch 2: 0.8973 Average loss over epoch 3: 0.8875 Average loss over epoch 4: 0.8786 Average loss over epoch 5: 0.8689 Trained parameters: 18 Layer weights: 36 Number of layers: 4 Phase I step: 3 Average loss over epoch 1: 0.8596 Average loss over epoch 2: 0.8481 Average loss over epoch 3: 0.8380 Average loss over epoch 4: 0.8293 Average loss over epoch 5: 0.8183 Trained parameters: 18 Layer weights: 54 Number of layers: 6
# Define partition of the circuit to train in each step.
# Here we train the circuit by halves.
partition_percentage = 0.5
partition_size = int(n_layer_steps*n_layers_to_add*partition_percentage)
n_partition_weights = partition_size*n_qubits
n_sweeps = 2
def edit_model_parameters(model, new_parameters):
"""Function for editing the initial parameters
of a Sequential model in Pytorch to be a given
tensor as the initial parameters of the model.
This function is useful for Phase II because the
initial parameters in this phase are the trained
weights from Phase I.
Arguments:
model (torch.nn.Sequential): In this case the Sequential
model in Pytorch with a TorchLayer from Pennylane.
Our quantum neural network.
new_parameters (torch.nn.Parameter): The new parameters
that we want in the model as initial weights.
Returns:
model (torch.nn.Sequential): The model with the new
model.parameters().
"""
old_params = {}
for name, params in model.named_parameters():
old_params[name] = params.clone()
old_params["0.partition_weights"] = new_parameters
for name, params in model.named_parameters():
params.data.copy_(old_params[name])
return model
def get_partition(layer_weights, partition, partition_size):
"""Function to get the first or second partition of an
array given a partition size. This function is useful
to avoid repeating our code in Phase II.
Arguments:
layer_weights: List of lists containing the
parameters (angles) to each rotation in
layer_gates. List of "shape" (number layers, number qubits).
partition (int): In this example it can be 1 or 2 to indicate
the partition.
partition_size (int): Integer that tells you the layer in which
the partition is made.
Returns:
Partition of layer_weights, first or second partition.
"""
if partition == 1:
return layer_weights[:partition_size]
if partition == 2:
return layer_weights[partition_size:]
def save_trained_partition(layer_weights, trained_weights, partition, partition_size):
"""Function to update layer weights after training a partition.
Arguments:
layer_weights: List of lists containing the
parameters (angles) to each rotation in
layer_gates. List of "shape" (number layers, number qubits).
trained_weights: Trained weights after training a partition
of the circuit, could be first or second partition.
partition (int): In this example it can be 1 or 2 to indicate
the partition.
partition_size (int): Integer that tells you the layer in which
the partition is made.
Returns:
None
"""
if partition == 1:
layer_weights[:partition_size] = trained_weights
if partition == 2:
layer_weights[partition_size:] = trained_weights
@qml.qnode(dev, interface="torch")
def train_partition(inputs, partition_weights):
"""Qnode defined to train just a partition of
the quantum circuit after Phase I. This function
supports just a partition in two pieces of the
circuit. If partition == 1 is going to treat as
trainable the first portion of the circuit and if
partition == 2, the second portion is going to be
trainable.
Arguments:
inputs: Tensor data.
partition_weights: Partition of the weights to be
trained. Shape (len(partition_weights, n_qubits).
Returns:
(float): Expectation value of an Z measurement in the
last qubit of the circuit.
"""
#Encode the data with Angle Embedding
qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
if partition == 1:
# Apply trainable partition first
for i in range(len(layer_gates[:partition_size])):
apply_layer(layer_gates[:partition_size][i], partition_weights[i])
#Apply non-trainable partition
for i in range(len(layer_gates[partition_size:])):
apply_layer(layer_gates[partition_size:][i], layer_weights[partition_size:][i])
elif partition == 2:
# Apply non-trainable partition first
for i in range(len(layer_gates[:partition_size])):
apply_layer(layer_gates[:partition_size][i], layer_weights[:partition_size][i])
# Apply trainable partition
for i in range(len(layer_gates[partition_size:])):
apply_layer(layer_gates[partition_size:][i], partition_weights[i])
# Expectation value of the last qubit
return qml.expval(qml.PauliZ(n_qubits-1))
for sweep in range(n_sweeps):
for partition in [1,2]:
print(f"Sweep: {sweep+1}, partition: {partition}")
# Get partition
trainable_weights = get_partition(layer_weights, partition, partition_size)
# Define shape of the weights
weight_shapes = {"partition_weights": (len(trainable_weights), n_qubits)}
# Quantum net as a TorchLayer
qlayer = qml.qnn.TorchLayer(train_partition, weight_shapes, init_method = nn.init.zeros_)
init_weights = nn.Parameter(torch.tensor(trainable_weights))
# Create Sequential Model
model = torch.nn.Sequential(qlayer, sigmoid)
# Edit model initial parameters to be init_weights
model = edit_model_parameters(model, init_weights)
# Optimizer
opt = optim.Adam(model.parameters(), lr=0.01)
batches = NUM_EXAMPLES // batch_size
for epoch in range(epochs):
running_loss = 0
for x, y in train_loader:
opt.zero_grad()
y = y.to(torch.float32)
loss_evaluated = loss(model(x), y)
loss_evaluated.backward()
running_loss += loss_evaluated
opt.step()
avg_loss = running_loss / batches
print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))
for param in model.parameters():
trained_weights = param.data
trained_weights = trained_weights.tolist()
print(f"Trained parameters: {total_elements(trained_weights)}")
save_trained_partition(layer_weights, trained_weights, partition, partition_size)
Sweep: 1, partition: 1 Average loss over epoch 1: 0.8106 Average loss over epoch 2: 0.8023 Average loss over epoch 3: 0.7974 Average loss over epoch 4: 0.7922 Average loss over epoch 5: 0.7887 Trained parameters: 27 Sweep: 1, partition: 2 Average loss over epoch 1: 0.7857 Average loss over epoch 2: 0.7806 Average loss over epoch 3: 0.7768 Average loss over epoch 4: 0.7734 Average loss over epoch 5: 0.7697 Trained parameters: 27 Sweep: 2, partition: 1 Average loss over epoch 1: 0.7668 Average loss over epoch 2: 0.7653 Average loss over epoch 3: 0.7639 Average loss over epoch 4: 0.7626 Average loss over epoch 5: 0.7618 Trained parameters: 27 Sweep: 2, partition: 2 Average loss over epoch 1: 0.7607 Average loss over epoch 2: 0.7572 Average loss over epoch 3: 0.7538 Average loss over epoch 4: 0.7509 Average loss over epoch 5: 0.7483 Trained parameters: 27
train_accuracy = 0
for x, y in train_loader:
probs = model(x)
preds = (probs>0.5).float()
train_accuracy += torch.sum(preds == y).item()/preds.shape[0]
print(f"Train accuracy: {train_accuracy/len(train_loader)}")
Train accuracy: 0.7672776442307693
test_accuracy = 0
for x, y in test_loader:
probs = model(x)
preds = (probs>0.5).float()
test_accuracy += torch.sum(preds == y).item()/preds.shape[0]
print(f"Test accuracy: {test_accuracy/len(test_loader)}")
Test accuracy: 0.7674153645833334
[1] McClean et al., 2018. Barren plateaus in quantum neural network training landscapes.
[2] Skolik et al., 2020. Layerwise learning for quantum neural networks.
[3] Notebook with the implementation in Tensorflow Quantum by the paper's author.
[4] Tensorflow quantum Blog post about layerwise learning.
[5] Tensorflow quantum YouTube's video about layerwise learning.