Notebook

Layerwise learning for quantum neural networks¶

Notebook created by Felipe Oyarce, felipe.oyarce94@gmail.com

In this project we’ve implemented a strategy presented by Skolik et al., 2020 (check the implementation in Tensorflow Quantum) for effectively quantum neural networks. In layerwise learning the strategy is to gradually increase the number of parameters by adding a few layers and training them while freezing the parameters of previous layers already trained. An easy way for understanding this technique is to think that we’re dividing the problem into smaller circuits to successfully avoid to fall into Barren Plateaus. Here, we provide a proof-of-concept for the implementation of this technique in Pennylane’s Pytorch interface.

The task selected for this proof-of-concept is the same used in the original paper for the binary classification between the handwritten digits 3 and 6 in the MNIST dataset.

Pennylane-Pytorch implementation in MNIST dataset¶

In [1]:

import random
import matplotlib.pyplot as plt

# Pennylane
import pennylane as qml
from pennylane import numpy as np

# Pytorch
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.datasets as datasets
import torchvision.transforms as transforms
from torch.utils.data.sampler import SubsetRandomSampler

Parameters¶

In [2]:

n_qubits = 9
n_layer_steps = 3
n_layers_to_add = 2
batch_size = 128
epochs = 5

We configure PyTorch to use CUDA only if available. Otherwise the CPU is used.

In [3]:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

/home/felipeoyarce/Desktop/layerwise-learning/.venv/lib/python3.7/site-packages/torch/cuda/__init__.py:52: UserWarning: CUDA initialization: Found no NVIDIA driver on your system. Please check that you have an NVIDIA GPU and installed a driver from http://www.nvidia.com/Download/index.aspx (Triggered internally at  /pytorch/c10/cuda/CUDAFunctions.cpp:100.)
  return torch._C._cuda_getDeviceCount() > 0

We initialize a PennyLane device with a lightning.qubit backend.

In [4]:

dev = qml.device("lightning.qubit", wires=n_qubits)

Data pre-processing¶

In data_transforms, we compose several transformations to the images in order to reduce their sizes and construct a flatten vector while keeping meaningful information to being able to "learn" the difference between digits in a quantum neural network. Feel free to explore and try different representation of the data such as learned embeddings or dimensionality reduction approaches.

Transformations¶

CenterCrop: Crops the given image at the center.
Resize: Resize the input image to the given size.
ToTensor: Converts the images with values in the range [0, 255] to tensors with values in the range [0,1].
Flatten: Lambda that applies a lambda function to flatten the image into a vector.

In [5]:

data_transforms = transforms.Compose([transforms.CenterCrop(18), #crop the image to a 18x18 image
                                      transforms.Resize(3), #resize to a 3x3 image
                                      transforms.ToTensor(), #convert to tensor
                                      transforms.Lambda(lambda x: torch.flatten(x)) #obtain a vector by flatten the image
                                     ])

In [6]:

# Download the MNIST dataset and apply the composition of transformations.
train_set = datasets.MNIST(root='./data', train=True, download=True, transform=data_transforms)
test_set = datasets.MNIST(root='./data', train=False, download=True, transform=data_transforms)

# Change labels of digits '3' and '6' to be 0 and 1, respectively.
# Note that first we must change the labels of the digits '0' and '1'
train_set.targets[train_set.targets == 1] = 10
train_set.targets[train_set.targets == 0] = 10
train_set.targets[train_set.targets == 3] = 0
train_set.targets[train_set.targets == 6] = 1

test_set.targets[test_set.targets == 1] = 10
test_set.targets[test_set.targets == 0] = 10
test_set.targets[test_set.targets == 3] = 0
test_set.targets[test_set.targets == 6] = 1

# Filter to just images of '3's and '6's
subset_indices_train = ((train_set.targets == 0) + (train_set.targets == 1)).nonzero().view(-1)
subset_indices_test = ((test_set.targets == 0) + (test_set.targets == 1)).nonzero().view(-1)

print(len(subset_indices_test))

# Select just a subset of the training set. 
# Increase the number of examples for more accurate results
NUM_EXAMPLES = 1000
subset_indices_train = subset_indices_train[:NUM_EXAMPLES]

# DataLoaders
train_loader = torch.utils.data.DataLoader(train_set, batch_size=batch_size, shuffle=False,
                                          sampler=SubsetRandomSampler(subset_indices_train))
test_loader = torch.utils.data.DataLoader(test_set, batch_size=batch_size, shuffle=False,
                                         sampler=SubsetRandomSampler(subset_indices_test))

/home/felipeoyarce/Desktop/layerwise-learning/.venv/lib/python3.7/site-packages/ipykernel_launcher.py:18: UserWarning: This overload of nonzero is deprecated:
	nonzero()
Consider using one of the following signatures instead:
	nonzero(*, bool as_tuple) (Triggered internally at  /pytorch/torch/csrc/utils/python_arg_parser.cpp:882.)

Data distribution¶

In [23]:

k = 0
for x, y in train_loader:
    for i in range(y.shape[0]):
        if y[i].item() == 0:
            k += 1
print(f"{k} images of digit '3'.")
print(f"{NUM_EXAMPLES - k} images of digit '6'.")

495 images of digit '3'.
505 images of digit '6'.

Utility functions¶

In [8]:

def set_random_gates(n_qubits):
    """Utility function for creating a list
    of random gates chosen from gate_set.
    
    The returned list has a length of n_qubits.
    
    Arguments:
        n_qubits (int): Integer number indicating
            the number of qubits of the quantum
            circuit.
            
    Returns:
        chosen_gates (list): List of length equal
            to n_qubits containing RX, RY and RZ
            rotations randomly chosen.
    """
    
    gate_set = [qml.RX, qml.RY, qml.RZ]
    chosen_gates = []
    for i in range(n_qubits):
        chosen_gate = random.choice(gate_set)
        chosen_gates.append(chosen_gate)
    return chosen_gates

def total_elements(array_list):
    """Utility function that returns the total number
    of elements in a list of lists.
    
    Arguments:
        array_list (list[list]): List of lists.
    
    Returns:
        (int): Total number of elements in array_list.
    """

    flattened = [val for sublist in array_list for val in sublist]
    return len(flattened)

Define lists to update new gates and trained weights¶

In [9]:

# Lists to update the new gates and trained weights.
layer_gates = []
layer_weights = []

Phase I: Increasing the circuit depth¶

In [10]:

def apply_layer(gates, weights):
    """Function to apply the layer composed of
    of RX, RY and RZ to each qubit in the circuit
    (just one gate per qubit, randomly chosen) with
    their respective parameters. Then, apply CZ gates
    in a ladder structure.
    
    Arguments:
        gates: List of single qubit gates to apply in
            the circuit. Length equal to the number
            of qubits of the circuit.
        
        weights: List of parameters to apply in each
            gate from gates. Length equal to the 
            number of qubits of the circuit.
        
    Returns:
        None
    """
    
    # Apply single qubit gates with their weights.
    for i in range(n_qubits): 
        gates[i](weights[i], wires = i)

    # Apply CZ gates to each pair of qubits in ladder structure.
    for i in range(n_qubits-1):
        qml.CZ(wires=[i, i+1])
        
#Function for non-trainable part of the quantum circuit
def apply_frozen_layers(frozen_layer_gates, frozen_layer_weights):
    """Function that applies multiple layers to the quantum
    circuit. The main purpose of this function is to use it
    for applying the layers already trained during Phase I of
    layerwise learning.
    
    Arguments:
        frozen_layer_gates: List of lists containing the qubit
            rotations per layer to apply to the circuit.
            List of "shape" (number layers, number qubits).
        
        frozen_layer_weights: List of lists containing the
            parameters (angles) to each rotation in
            frozen_layer_gates. List of "shape" (number layers, number qubits).
    
    Returns:
        None
    """

    for i in range(len(frozen_layer_gates)):
        apply_layer(frozen_layer_gates[i], frozen_layer_weights[i])

@qml.qnode(dev, interface="torch")
def quantum_net(inputs, new_weights):
    """Quantum network to train during Phase I of
    layerwise learning. The data inputs are encoded
    using an Angle Embedding with X rotations. Then, 
    we apply the non-trainable layers or frozen layers
    using the two lists called layer_gates and layer_weights
    that store the randomly selected single qubit rotations
    and their trained weights in previous steps of layerwise
    learning. Finally, n_layers_to_add is an integer number that
    indicates the number of trainable layers to add in
    each step of Phase I.
    
    Arguments:
        inputs: Tensor data.
        new_weights: New paramters to be train of shape
            (n_layers_to_add, n_qubits).
            
    Returns:
        (float): Expectation value of an Z measurement in the
            last qubit of the circuit.
    """

    # Encode the data with Angle Embedding
    qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
    
    # Apply frozen layers
    apply_frozen_layers(layer_gates, layer_weights)
    
    # Apply layers with trainable parameters
    for i in range(n_layers_to_add):
        apply_layer(new_gates[i], new_weights[i])
        
    # Expectation value of the last qubit
    return qml.expval(qml.PauliZ(n_qubits-1))

In [11]:

# Sigmoid function and Binary Cross Entropy loss
sigmoid = nn.Sigmoid()
loss = nn.BCELoss()

for step in range(n_layer_steps):
    
    print(f"Phase I step: {step+1}")
    
    # Obtain random gates for each new layer.
    new_gates = [set_random_gates(n_qubits) for i in range(n_layers_to_add)]
    
    # Define shape of the weights
    weight_shapes = {"new_weights": (n_layers_to_add, n_qubits)}
    
    # Quantum net as a TorchLayer
    qlayer = qml.qnn.TorchLayer(quantum_net, weight_shapes, init_method = nn.init.zeros_)
    
    # Create Sequential Model
    model = torch.nn.Sequential(qlayer, sigmoid)
    
    # Optimizer
    opt = optim.Adam(model.parameters(), lr=0.01)
    
    batches = NUM_EXAMPLES // batch_size
    for epoch in range(epochs):
        running_loss = 0
        for x, y in train_loader:
            opt.zero_grad()
            y = y.to(torch.float32)
            loss_evaluated = loss(model(x), y)
            loss_evaluated.backward()
            running_loss += loss_evaluated

            opt.step()
        avg_loss = running_loss / batches
        print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))
    
    # Extract weights after optimization to be save in layer_weights
    for param in model.parameters():
        new_weights = param.data
    new_weights = new_weights.tolist()
    print(f"Trained parameters: {total_elements(new_weights)}")

    layer_gates += new_gates
    layer_weights += new_weights
    print(f"Layer weights: {total_elements(layer_weights)}")
    print(f"Number of layers: {len(layer_gates)}")
    print("")

Phase I step: 1
Average loss over epoch 1: 0.9071
Average loss over epoch 2: 0.9058
Average loss over epoch 3: 0.9077
Average loss over epoch 4: 0.9057
Average loss over epoch 5: 0.9094
Trained parameters: 18
Layer weights: 18
Number of layers: 2

Phase I step: 2
Average loss over epoch 1: 0.9037
Average loss over epoch 2: 0.8973
Average loss over epoch 3: 0.8875
Average loss over epoch 4: 0.8786
Average loss over epoch 5: 0.8689
Trained parameters: 18
Layer weights: 36
Number of layers: 4

Phase I step: 3
Average loss over epoch 1: 0.8596
Average loss over epoch 2: 0.8481
Average loss over epoch 3: 0.8380
Average loss over epoch 4: 0.8293
Average loss over epoch 5: 0.8183
Trained parameters: 18
Layer weights: 54
Number of layers: 6

Phase II: Split to circuit into pieces¶

In [12]:

# Define partition of the circuit to train in each step.
# Here we train the circuit by halves.
partition_percentage = 0.5
partition_size = int(n_layer_steps*n_layers_to_add*partition_percentage)
n_partition_weights = partition_size*n_qubits
n_sweeps = 2

In [ ]:

def edit_model_parameters(model, new_parameters):
    """Function for editing the initial parameters
    of a Sequential model in Pytorch to be a given
    tensor as the initial parameters of the model.
    This function is useful for Phase II because the
    initial parameters in this phase are the trained
    weights from Phase I.
    
    Arguments:
        model (torch.nn.Sequential): In this case the Sequential
            model in Pytorch with a TorchLayer from Pennylane.
            Our quantum neural network.
        
        new_parameters (torch.nn.Parameter): The new parameters
            that we want in the model as initial weights.
        
    Returns:
        model (torch.nn.Sequential): The model with the new
            model.parameters().
    """
    
    old_params = {}
    for name, params in model.named_parameters():
        old_params[name] = params.clone()
    
    old_params["0.partition_weights"] = new_parameters
    
    for name, params in model.named_parameters():
        params.data.copy_(old_params[name])
        
    return model

def get_partition(layer_weights, partition, partition_size):
    """Function to get the first or second partition of an
    array given a partition size. This function is useful
    to avoid repeating our code in Phase II.
    
    Arguments:
        layer_weights: List of lists containing the
            parameters (angles) to each rotation in
            layer_gates. List of "shape" (number layers, number qubits).
            
        partition (int): In this example it can be 1 or 2 to indicate
            the partition.
            
        partition_size (int): Integer that tells you the layer in which
            the partition is made.
            
    Returns:
        Partition of layer_weights, first or second partition.
    """
    
    if partition == 1:
        return layer_weights[:partition_size]
    if partition == 2:
        return layer_weights[partition_size:]
    
def save_trained_partition(layer_weights, trained_weights, partition, partition_size):
    """Function to update layer weights after training a partition.
    
    Arguments:
        layer_weights: List of lists containing the
            parameters (angles) to each rotation in
            layer_gates. List of "shape" (number layers, number qubits).
        
        trained_weights: Trained weights after training a partition
            of the circuit, could be first or second partition.
        
        partition (int): In this example it can be 1 or 2 to indicate
            the partition.
            
        partition_size (int): Integer that tells you the layer in which
            the partition is made.
            
    Returns:
        None
    """
    
    if partition == 1:
        layer_weights[:partition_size] = trained_weights
    if partition == 2:
        layer_weights[partition_size:] = trained_weights

In [13]:

@qml.qnode(dev, interface="torch")
def train_partition(inputs, partition_weights):
    """Qnode defined to train just a partition of 
    the quantum circuit after Phase I. This function
    supports just a partition in two pieces of the
    circuit. If partition == 1 is going to treat as
    trainable the first portion of the circuit and if
    partition == 2, the second portion is going to be
    trainable.
    
    Arguments:
        inputs: Tensor data.
        partition_weights: Partition of the weights to be
            trained. Shape (len(partition_weights, n_qubits).
    
    Returns:
        (float): Expectation value of an Z measurement in the
            last qubit of the circuit.
    """

    #Encode the data with Angle Embedding
    qml.templates.AngleEmbedding(inputs, wires=range(n_qubits))
    
    if partition == 1:
        # Apply trainable partition first
        for i in range(len(layer_gates[:partition_size])):
            apply_layer(layer_gates[:partition_size][i], partition_weights[i])
        
        #Apply non-trainable partition
        for i in range(len(layer_gates[partition_size:])):
            apply_layer(layer_gates[partition_size:][i], layer_weights[partition_size:][i])
    
    elif partition == 2:
        # Apply non-trainable partition first
        for i in range(len(layer_gates[:partition_size])):
            apply_layer(layer_gates[:partition_size][i], layer_weights[:partition_size][i])
        
        # Apply trainable partition
        for i in range(len(layer_gates[partition_size:])):
            apply_layer(layer_gates[partition_size:][i], partition_weights[i])
    
    # Expectation value of the last qubit
    return qml.expval(qml.PauliZ(n_qubits-1))

In [14]:

for sweep in range(n_sweeps):
    
    for partition in [1,2]:
        print(f"Sweep: {sweep+1}, partition: {partition}")
        # Get partition
        trainable_weights = get_partition(layer_weights, partition, partition_size)

        # Define shape of the weights
        weight_shapes = {"partition_weights": (len(trainable_weights), n_qubits)}

        # Quantum net as a TorchLayer
        qlayer = qml.qnn.TorchLayer(train_partition, weight_shapes, init_method = nn.init.zeros_)

        init_weights = nn.Parameter(torch.tensor(trainable_weights))

        # Create Sequential Model
        model = torch.nn.Sequential(qlayer, sigmoid)

        # Edit model initial parameters to be init_weights
        model = edit_model_parameters(model, init_weights)

        # Optimizer
        opt = optim.Adam(model.parameters(), lr=0.01)

        batches = NUM_EXAMPLES // batch_size
        for epoch in range(epochs):
            running_loss = 0
            for x, y in train_loader:
                opt.zero_grad()
                y = y.to(torch.float32)
                loss_evaluated = loss(model(x), y)
                loss_evaluated.backward()
                running_loss += loss_evaluated

                opt.step()
            avg_loss = running_loss / batches
            print("Average loss over epoch {}: {:.4f}".format(epoch + 1, avg_loss))

        for param in model.parameters():
            trained_weights = param.data
        trained_weights = trained_weights.tolist()
        print(f"Trained parameters: {total_elements(trained_weights)}")

        save_trained_partition(layer_weights, trained_weights, partition, partition_size)

Sweep: 1, partition: 1
Average loss over epoch 1: 0.8106
Average loss over epoch 2: 0.8023
Average loss over epoch 3: 0.7974
Average loss over epoch 4: 0.7922
Average loss over epoch 5: 0.7887
Trained parameters: 27
Sweep: 1, partition: 2
Average loss over epoch 1: 0.7857
Average loss over epoch 2: 0.7806
Average loss over epoch 3: 0.7768
Average loss over epoch 4: 0.7734
Average loss over epoch 5: 0.7697
Trained parameters: 27
Sweep: 2, partition: 1
Average loss over epoch 1: 0.7668
Average loss over epoch 2: 0.7653
Average loss over epoch 3: 0.7639
Average loss over epoch 4: 0.7626
Average loss over epoch 5: 0.7618
Trained parameters: 27
Sweep: 2, partition: 2
Average loss over epoch 1: 0.7607
Average loss over epoch 2: 0.7572
Average loss over epoch 3: 0.7538
Average loss over epoch 4: 0.7509
Average loss over epoch 5: 0.7483
Trained parameters: 27

Results¶

In [19]:

train_accuracy = 0
for x, y in train_loader:
    probs = model(x)
    preds = (probs>0.5).float()
    train_accuracy += torch.sum(preds == y).item()/preds.shape[0]
print(f"Train accuracy: {train_accuracy/len(train_loader)}")

Train accuracy: 0.7672776442307693

In [20]:

test_accuracy = 0
for x, y in test_loader:
    probs = model(x)
    preds = (probs>0.5).float()
    test_accuracy += torch.sum(preds == y).item()/preds.shape[0]
print(f"Test accuracy: {test_accuracy/len(test_loader)}")

Test accuracy: 0.7674153645833334

References¶

[1] McClean et al., 2018. Barren plateaus in quantum neural network training landscapes.

[2] Skolik et al., 2020. Layerwise learning for quantum neural networks.

[3] Notebook with the implementation in Tensorflow Quantum by the paper's author.

[4] Tensorflow quantum Blog post about layerwise learning.

[5] Tensorflow quantum YouTube's video about layerwise learning.