Chapter 17 – Autoencoders and GANs

This notebook contains all the sample code in chapter 17.

Setup

First, let's import a few common modules, ensure MatplotLib plots figures inline and prepare a function to save the figures. We also check that Python 3.5 or later is installed (although Python 2.x may work, it is deprecated so we strongly recommend you use Python 3 instead), as well as Scikit-Learn ≥0.20 and TensorFlow ≥2.0.

In [1]:
# Python ≥3.5 is required
import sys
assert sys.version_info >= (3, 5)

# Scikit-Learn ≥0.20 is required
import sklearn
assert sklearn.__version__ >= "0.20"

try:
    # %tensorflow_version only exists in Colab.
    %tensorflow_version 2.x
    IS_COLAB = True
except Exception:
    IS_COLAB = False

# TensorFlow ≥2.0 is required
import tensorflow as tf
from tensorflow import keras
assert tf.__version__ >= "2.0"

if not tf.config.list_physical_devices('GPU'):
    print("No GPU was detected. LSTMs and CNNs can be very slow without a GPU.")
    if IS_COLAB:
        print("Go to Runtime > Change runtime and select a GPU hardware accelerator.")

# Common imports
import numpy as np
import os

# to make this notebook's output stable across runs
np.random.seed(42)
tf.random.set_seed(42)

# To plot pretty figures
%matplotlib inline
import matplotlib as mpl
import matplotlib.pyplot as plt
mpl.rc('axes', labelsize=14)
mpl.rc('xtick', labelsize=12)
mpl.rc('ytick', labelsize=12)

# Where to save the figures
PROJECT_ROOT_DIR = "."
CHAPTER_ID = "autoencoders"
IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, "images", CHAPTER_ID)
os.makedirs(IMAGES_PATH, exist_ok=True)

def save_fig(fig_id, tight_layout=True, fig_extension="png", resolution=300):
    path = os.path.join(IMAGES_PATH, fig_id + "." + fig_extension)
    print("Saving figure", fig_id)
    if tight_layout:
        plt.tight_layout()
    plt.savefig(path, format=fig_extension, dpi=resolution)

A couple utility functions to plot grayscale 28x28 image:

In [2]:
def plot_image(image):
    plt.imshow(image, cmap="binary")
    plt.axis("off")

PCA with a linear Autoencoder

Build 3D dataset:

In [3]:
np.random.seed(4)

def generate_3d_data(m, w1=0.1, w2=0.3, noise=0.1):
    angles = np.random.rand(m) * 3 * np.pi / 2 - 0.5
    data = np.empty((m, 3))
    data[:, 0] = np.cos(angles) + np.sin(angles)/2 + noise * np.random.randn(m) / 2
    data[:, 1] = np.sin(angles) * 0.7 + noise * np.random.randn(m) / 2
    data[:, 2] = data[:, 0] * w1 + data[:, 1] * w2 + noise * np.random.randn(m)
    return data

X_train = generate_3d_data(60)
X_train = X_train - X_train.mean(axis=0, keepdims=0)

Now let's build the Autoencoder...

In [4]:
np.random.seed(42)
tf.random.set_seed(42)

encoder = keras.models.Sequential([keras.layers.Dense(2, input_shape=[3])])
decoder = keras.models.Sequential([keras.layers.Dense(3, input_shape=[2])])
autoencoder = keras.models.Sequential([encoder, decoder])

autoencoder.compile(loss="mse", optimizer=keras.optimizers.SGD(lr=1.5))
In [5]:
history = autoencoder.fit(X_train, X_train, epochs=20)
Epoch 1/20
2/2 [==============================] - 0s 3ms/step - loss: 0.2547
Epoch 2/20
2/2 [==============================] - 0s 2ms/step - loss: 0.1032
Epoch 3/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0551
Epoch 4/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0503
Epoch 5/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0839
Epoch 6/20
2/2 [==============================] - 0s 2ms/step - loss: 0.2223
Epoch 7/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0913
Epoch 8/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0320
Epoch 9/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0242
Epoch 10/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0189
Epoch 11/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0142
Epoch 12/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0102
Epoch 13/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0066
Epoch 14/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0056
Epoch 15/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0057
Epoch 16/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0053
Epoch 17/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0050
Epoch 18/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0048
Epoch 19/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0049
Epoch 20/20
2/2 [==============================] - 0s 2ms/step - loss: 0.0047
In [6]:
codings = encoder.predict(X_train)
In [7]:
fig = plt.figure(figsize=(4,3))
plt.plot(codings[:,0], codings[:, 1], "b.")
plt.xlabel("$z_1$", fontsize=18)
plt.ylabel("$z_2$", fontsize=18, rotation=0)
plt.grid(True)
save_fig("linear_autoencoder_pca_plot")
plt.show()
Saving figure linear_autoencoder_pca_plot

Stacked Autoencoders

Let's use MNIST:

In [8]:
(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.fashion_mnist.load_data()
X_train_full = X_train_full.astype(np.float32) / 255
X_test = X_test.astype(np.float32) / 255
X_train, X_valid = X_train_full[:-5000], X_train_full[-5000:]
y_train, y_valid = y_train_full[:-5000], y_train_full[-5000:]

Train all layers at once

Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders).

In [9]:
def rounded_accuracy(y_true, y_pred):
    return keras.metrics.binary_accuracy(tf.round(y_true), tf.round(y_pred))
In [10]:
tf.random.set_seed(42)
np.random.seed(42)

stacked_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="selu"),
])
stacked_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
stacked_ae = keras.models.Sequential([stacked_encoder, stacked_decoder])
stacked_ae.compile(loss="binary_crossentropy",
                   optimizer=keras.optimizers.SGD(lr=1.5), metrics=[rounded_accuracy])
history = stacked_ae.fit(X_train, X_train, epochs=20,
                         validation_data=(X_valid, X_valid))
Epoch 1/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3381 - rounded_accuracy: 0.8870 - val_loss: 0.3164 - val_rounded_accuracy: 0.9007
Epoch 2/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3056 - rounded_accuracy: 0.9152 - val_loss: 0.3022 - val_rounded_accuracy: 0.9197
Epoch 3/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2986 - rounded_accuracy: 0.9215 - val_loss: 0.2984 - val_rounded_accuracy: 0.9201
Epoch 4/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2948 - rounded_accuracy: 0.9249 - val_loss: 0.2938 - val_rounded_accuracy: 0.9285
Epoch 5/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2923 - rounded_accuracy: 0.9272 - val_loss: 0.2919 - val_rounded_accuracy: 0.9285
Epoch 6/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2904 - rounded_accuracy: 0.9289 - val_loss: 0.2914 - val_rounded_accuracy: 0.9306
Epoch 7/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2890 - rounded_accuracy: 0.9301 - val_loss: 0.2907 - val_rounded_accuracy: 0.9281
Epoch 8/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2877 - rounded_accuracy: 0.9314 - val_loss: 0.2944 - val_rounded_accuracy: 0.9298
Epoch 9/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2869 - rounded_accuracy: 0.9321 - val_loss: 0.2905 - val_rounded_accuracy: 0.9259
Epoch 10/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2859 - rounded_accuracy: 0.9331 - val_loss: 0.2878 - val_rounded_accuracy: 0.9309
Epoch 11/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2851 - rounded_accuracy: 0.9337 - val_loss: 0.2866 - val_rounded_accuracy: 0.9348
Epoch 12/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2845 - rounded_accuracy: 0.9341 - val_loss: 0.2856 - val_rounded_accuracy: 0.9355
Epoch 13/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2839 - rounded_accuracy: 0.9346 - val_loss: 0.2847 - val_rounded_accuracy: 0.9355
Epoch 14/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2835 - rounded_accuracy: 0.9350 - val_loss: 0.2844 - val_rounded_accuracy: 0.9365
Epoch 15/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2830 - rounded_accuracy: 0.9355 - val_loss: 0.2841 - val_rounded_accuracy: 0.9353
Epoch 16/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2826 - rounded_accuracy: 0.9359 - val_loss: 0.2846 - val_rounded_accuracy: 0.9353
Epoch 17/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2823 - rounded_accuracy: 0.9360 - val_loss: 0.2836 - val_rounded_accuracy: 0.9363
Epoch 18/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2819 - rounded_accuracy: 0.9363 - val_loss: 0.2831 - val_rounded_accuracy: 0.9364
Epoch 19/20
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2816 - rounded_accuracy: 0.9365 - val_loss: 0.2827 - val_rounded_accuracy: 0.9374
Epoch 20/20
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2813 - rounded_accuracy: 0.9367 - val_loss: 0.2843 - val_rounded_accuracy: 0.9364

This function processes a few test images through the autoencoder and displays the original images and their reconstructions:

In [11]:
def show_reconstructions(model, images=X_valid, n_images=5):
    reconstructions = model.predict(images[:n_images])
    fig = plt.figure(figsize=(n_images * 1.5, 3))
    for image_index in range(n_images):
        plt.subplot(2, n_images, 1 + image_index)
        plot_image(images[image_index])
        plt.subplot(2, n_images, 1 + n_images + image_index)
        plot_image(reconstructions[image_index])
In [12]:
show_reconstructions(stacked_ae)
save_fig("reconstruction_plot")
Saving figure reconstruction_plot

Visualizing Fashion MNIST

In [13]:
np.random.seed(42)

from sklearn.manifold import TSNE

X_valid_compressed = stacked_encoder.predict(X_valid)
tsne = TSNE()
X_valid_2D = tsne.fit_transform(X_valid_compressed)
X_valid_2D = (X_valid_2D - X_valid_2D.min()) / (X_valid_2D.max() - X_valid_2D.min())
In [14]:
plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap="tab10")
plt.axis("off")
plt.show()

Let's make this diagram a bit prettier:

In [15]:
# adapted from https://scikit-learn.org/stable/auto_examples/manifold/plot_lle_digits.html
plt.figure(figsize=(10, 8))
cmap = plt.cm.tab10
plt.scatter(X_valid_2D[:, 0], X_valid_2D[:, 1], c=y_valid, s=10, cmap=cmap)
image_positions = np.array([[1., 1.]])
for index, position in enumerate(X_valid_2D):
    dist = np.sum((position - image_positions) ** 2, axis=1)
    if np.min(dist) > 0.02: # if far enough from other images
        image_positions = np.r_[image_positions, [position]]
        imagebox = mpl.offsetbox.AnnotationBbox(
            mpl.offsetbox.OffsetImage(X_valid[index], cmap="binary"),
            position, bboxprops={"edgecolor": cmap(y_valid[index]), "lw": 2})
        plt.gca().add_artist(imagebox)
plt.axis("off")
save_fig("fashion_mnist_visualization_plot")
plt.show()
Saving figure fashion_mnist_visualization_plot

Tying weights

It is common to tie the weights of the encoder and the decoder, by simply using the transpose of the encoder's weights as the decoder weights. For this, we need to use a custom layer.

In [16]:
class DenseTranspose(keras.layers.Layer):
    def __init__(self, dense, activation=None, **kwargs):
        self.dense = dense
        self.activation = keras.activations.get(activation)
        super().__init__(**kwargs)
    def build(self, batch_input_shape):
        self.biases = self.add_weight(name="bias",
                                      shape=[self.dense.input_shape[-1]],
                                      initializer="zeros")
        super().build(batch_input_shape)
    def call(self, inputs):
        z = tf.matmul(inputs, self.dense.weights[0], transpose_b=True)
        return self.activation(z + self.biases)
In [17]:
keras.backend.clear_session()
tf.random.set_seed(42)
np.random.seed(42)

dense_1 = keras.layers.Dense(100, activation="selu")
dense_2 = keras.layers.Dense(30, activation="selu")

tied_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    dense_1,
    dense_2
])

tied_decoder = keras.models.Sequential([
    DenseTranspose(dense_2, activation="selu"),
    DenseTranspose(dense_1, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])

tied_ae = keras.models.Sequential([tied_encoder, tied_decoder])

tied_ae.compile(loss="binary_crossentropy",
                optimizer=keras.optimizers.SGD(lr=1.5), metrics=[rounded_accuracy])
history = tied_ae.fit(X_train, X_train, epochs=10,
                      validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3269 - rounded_accuracy: 0.8960 - val_loss: 0.3083 - val_rounded_accuracy: 0.9075
Epoch 2/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2975 - rounded_accuracy: 0.9223 - val_loss: 0.2953 - val_rounded_accuracy: 0.9284
Epoch 3/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2920 - rounded_accuracy: 0.9274 - val_loss: 0.3022 - val_rounded_accuracy: 0.9079
Epoch 4/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2889 - rounded_accuracy: 0.9302 - val_loss: 0.2880 - val_rounded_accuracy: 0.9333
Epoch 5/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2865 - rounded_accuracy: 0.9325 - val_loss: 0.2874 - val_rounded_accuracy: 0.9313
Epoch 6/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2850 - rounded_accuracy: 0.9340 - val_loss: 0.2861 - val_rounded_accuracy: 0.9353
Epoch 7/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2838 - rounded_accuracy: 0.9350 - val_loss: 0.2856 - val_rounded_accuracy: 0.9365
Epoch 8/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2830 - rounded_accuracy: 0.9358 - val_loss: 0.2838 - val_rounded_accuracy: 0.9362
Epoch 9/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2823 - rounded_accuracy: 0.9364 - val_loss: 0.2863 - val_rounded_accuracy: 0.9294
Epoch 10/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2817 - rounded_accuracy: 0.9369 - val_loss: 0.2845 - val_rounded_accuracy: 0.9324
In [18]:
show_reconstructions(tied_ae)
plt.show()

Training one Autoencoder at a Time

In [19]:
def train_autoencoder(n_neurons, X_train, X_valid, loss, optimizer,
                      n_epochs=10, output_activation=None, metrics=None):
    n_inputs = X_train.shape[-1]
    encoder = keras.models.Sequential([
        keras.layers.Dense(n_neurons, activation="selu", input_shape=[n_inputs])
    ])
    decoder = keras.models.Sequential([
        keras.layers.Dense(n_inputs, activation=output_activation),
    ])
    autoencoder = keras.models.Sequential([encoder, decoder])
    autoencoder.compile(optimizer, loss, metrics=metrics)
    autoencoder.fit(X_train, X_train, epochs=n_epochs,
                    validation_data=(X_valid, X_valid))
    return encoder, decoder, encoder(X_train), encoder(X_valid)
In [20]:
tf.random.set_seed(42)
np.random.seed(42)

K = keras.backend
X_train_flat = K.batch_flatten(X_train) # equivalent to .reshape(-1, 28 * 28)
X_valid_flat = K.batch_flatten(X_valid)
enc1, dec1, X_train_enc1, X_valid_enc1 = train_autoencoder(
    100, X_train_flat, X_valid_flat, "binary_crossentropy",
    keras.optimizers.SGD(lr=1.5), output_activation="sigmoid",
    metrics=[rounded_accuracy])
enc2, dec2, _, _ = train_autoencoder(
    30, X_train_enc1, X_valid_enc1, "mse", keras.optimizers.SGD(lr=0.05),
    output_activation="selu")
Epoch 1/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3445 - rounded_accuracy: 0.8874 - val_loss: 0.3123 - val_rounded_accuracy: 0.9146
Epoch 2/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3039 - rounded_accuracy: 0.9203 - val_loss: 0.3006 - val_rounded_accuracy: 0.9246
Epoch 3/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2949 - rounded_accuracy: 0.9286 - val_loss: 0.2934 - val_rounded_accuracy: 0.9317
Epoch 4/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2891 - rounded_accuracy: 0.9342 - val_loss: 0.2888 - val_rounded_accuracy: 0.9363
Epoch 5/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2853 - rounded_accuracy: 0.9378 - val_loss: 0.2857 - val_rounded_accuracy: 0.9392
Epoch 6/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2827 - rounded_accuracy: 0.9403 - val_loss: 0.2834 - val_rounded_accuracy: 0.9409
Epoch 7/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2807 - rounded_accuracy: 0.9422 - val_loss: 0.2817 - val_rounded_accuracy: 0.9427
Epoch 8/10
1719/1719 [==============================] - 4s 3ms/step - loss: 0.2792 - rounded_accuracy: 0.9437 - val_loss: 0.2803 - val_rounded_accuracy: 0.9440
Epoch 9/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2779 - rounded_accuracy: 0.9449 - val_loss: 0.2792 - val_rounded_accuracy: 0.9450
Epoch 10/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2769 - rounded_accuracy: 0.9459 - val_loss: 0.2783 - val_rounded_accuracy: 0.9462
Epoch 1/10
1719/1719 [==============================] - 4s 2ms/step - loss: 0.5619 - val_loss: 0.3474
Epoch 2/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.2610 - val_loss: 0.2372
Epoch 3/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.2252 - val_loss: 0.2173
Epoch 4/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.2109 - val_loss: 0.2059
Epoch 5/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.2035 - val_loss: 0.1973
Epoch 6/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.1987 - val_loss: 0.1978
Epoch 7/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.1971 - val_loss: 0.2000
Epoch 8/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.1955 - val_loss: 0.2002
Epoch 9/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.1949 - val_loss: 0.1932
Epoch 10/10
1719/1719 [==============================] - 3s 2ms/step - loss: 0.1940 - val_loss: 0.1952
In [21]:
stacked_ae_1_by_1 = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    enc1, enc2, dec2, dec1,
    keras.layers.Reshape([28, 28])
])
In [22]:
show_reconstructions(stacked_ae_1_by_1)
plt.show()
In [23]:
stacked_ae_1_by_1.compile(loss="binary_crossentropy",
                          optimizer=keras.optimizers.SGD(lr=0.1), metrics=[rounded_accuracy])
history = stacked_ae_1_by_1.fit(X_train, X_train, epochs=10,
                                validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2867 - rounded_accuracy: 0.9343 - val_loss: 0.2883 - val_rounded_accuracy: 0.9341
Epoch 2/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2863 - rounded_accuracy: 0.9347 - val_loss: 0.2881 - val_rounded_accuracy: 0.9347
Epoch 3/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2861 - rounded_accuracy: 0.9349 - val_loss: 0.2879 - val_rounded_accuracy: 0.9347
Epoch 4/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2859 - rounded_accuracy: 0.9351 - val_loss: 0.2877 - val_rounded_accuracy: 0.9349
Epoch 5/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2858 - rounded_accuracy: 0.9353 - val_loss: 0.2876 - val_rounded_accuracy: 0.9351
Epoch 6/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2857 - rounded_accuracy: 0.9354 - val_loss: 0.2874 - val_rounded_accuracy: 0.9350
Epoch 7/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2856 - rounded_accuracy: 0.9355 - val_loss: 0.2874 - val_rounded_accuracy: 0.9352
Epoch 8/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2854 - rounded_accuracy: 0.9356 - val_loss: 0.2873 - val_rounded_accuracy: 0.9353
Epoch 9/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2854 - rounded_accuracy: 0.9357 - val_loss: 0.2871 - val_rounded_accuracy: 0.9353
Epoch 10/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2853 - rounded_accuracy: 0.9358 - val_loss: 0.2871 - val_rounded_accuracy: 0.9356
In [24]:
show_reconstructions(stacked_ae_1_by_1)
plt.show()

Using Convolutional Layers Instead of Dense Layers

Let's build a stacked Autoencoder with 3 hidden layers and 1 output layer (i.e., 2 stacked Autoencoders).

In [25]:
tf.random.set_seed(42)
np.random.seed(42)

conv_encoder = keras.models.Sequential([
    keras.layers.Reshape([28, 28, 1], input_shape=[28, 28]),
    keras.layers.Conv2D(16, kernel_size=3, padding="SAME", activation="selu"),
    keras.layers.MaxPool2D(pool_size=2),
    keras.layers.Conv2D(32, kernel_size=3, padding="SAME", activation="selu"),
    keras.layers.MaxPool2D(pool_size=2),
    keras.layers.Conv2D(64, kernel_size=3, padding="SAME", activation="selu"),
    keras.layers.MaxPool2D(pool_size=2)
])
conv_decoder = keras.models.Sequential([
    keras.layers.Conv2DTranspose(32, kernel_size=3, strides=2, padding="VALID", activation="selu",
                                 input_shape=[3, 3, 64]),
    keras.layers.Conv2DTranspose(16, kernel_size=3, strides=2, padding="SAME", activation="selu"),
    keras.layers.Conv2DTranspose(1, kernel_size=3, strides=2, padding="SAME", activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
conv_ae = keras.models.Sequential([conv_encoder, conv_decoder])

conv_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.0),
                metrics=[rounded_accuracy])
history = conv_ae.fit(X_train, X_train, epochs=5,
                      validation_data=(X_valid, X_valid))
Epoch 1/5
1719/1719 [==============================] - 11s 6ms/step - loss: 0.3018 - rounded_accuracy: 0.9187 - val_loss: 0.2848 - val_rounded_accuracy: 0.9287
Epoch 2/5
1719/1719 [==============================] - 9s 5ms/step - loss: 0.2756 - rounded_accuracy: 0.9413 - val_loss: 0.2729 - val_rounded_accuracy: 0.9455
Epoch 3/5
1719/1719 [==============================] - 9s 5ms/step - loss: 0.2708 - rounded_accuracy: 0.9461 - val_loss: 0.2696 - val_rounded_accuracy: 0.9497
Epoch 4/5
1719/1719 [==============================] - 9s 5ms/step - loss: 0.2682 - rounded_accuracy: 0.9490 - val_loss: 0.2686 - val_rounded_accuracy: 0.9491
Epoch 5/5
1719/1719 [==============================] - 9s 5ms/step - loss: 0.2664 - rounded_accuracy: 0.9509 - val_loss: 0.2671 - val_rounded_accuracy: 0.9509
In [26]:
conv_encoder.summary()
conv_decoder.summary()
Model: "sequential_10"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
reshape_2 (Reshape)          (None, 28, 28, 1)         0         
_________________________________________________________________
conv2d (Conv2D)              (None, 28, 28, 16)        160       
_________________________________________________________________
max_pooling2d (MaxPooling2D) (None, 14, 14, 16)        0         
_________________________________________________________________
conv2d_1 (Conv2D)            (None, 14, 14, 32)        4640      
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 7, 7, 32)          0         
_________________________________________________________________
conv2d_2 (Conv2D)            (None, 7, 7, 64)          18496     
_________________________________________________________________
max_pooling2d_2 (MaxPooling2 (None, 3, 3, 64)          0         
=================================================================
Total params: 23,296
Trainable params: 23,296
Non-trainable params: 0
_________________________________________________________________
Model: "sequential_11"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_transpose (Conv2DTran (None, 7, 7, 32)          18464     
_________________________________________________________________
conv2d_transpose_1 (Conv2DTr (None, 14, 14, 16)        4624      
_________________________________________________________________
conv2d_transpose_2 (Conv2DTr (None, 28, 28, 1)         145       
_________________________________________________________________
reshape_3 (Reshape)          (None, 28, 28)            0         
=================================================================
Total params: 23,233
Trainable params: 23,233
Non-trainable params: 0
_________________________________________________________________
In [27]:
show_reconstructions(conv_ae)
plt.show()
WARNING:tensorflow:5 out of the last 161 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd654dec2f0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Recurrent Autoencoders

In [28]:
recurrent_encoder = keras.models.Sequential([
    keras.layers.LSTM(100, return_sequences=True, input_shape=[28, 28]),
    keras.layers.LSTM(30)
])
recurrent_decoder = keras.models.Sequential([
    keras.layers.RepeatVector(28, input_shape=[30]),
    keras.layers.LSTM(100, return_sequences=True),
    keras.layers.TimeDistributed(keras.layers.Dense(28, activation="sigmoid"))
])
recurrent_ae = keras.models.Sequential([recurrent_encoder, recurrent_decoder])
recurrent_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(0.1),
                     metrics=[rounded_accuracy])
In [29]:
history = recurrent_ae.fit(X_train, X_train, epochs=10, validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 26s 15ms/step - loss: 0.5192 - rounded_accuracy: 0.7492 - val_loss: 0.4581 - val_rounded_accuracy: 0.8081
Epoch 2/10
1719/1719 [==============================] - 25s 14ms/step - loss: 0.4050 - rounded_accuracy: 0.8432 - val_loss: 0.3747 - val_rounded_accuracy: 0.8690
Epoch 3/10
1719/1719 [==============================] - 25s 15ms/step - loss: 0.3653 - rounded_accuracy: 0.8710 - val_loss: 0.3603 - val_rounded_accuracy: 0.8767
Epoch 4/10
1719/1719 [==============================] - 25s 15ms/step - loss: 0.3507 - rounded_accuracy: 0.8809 - val_loss: 0.3525 - val_rounded_accuracy: 0.8771
Epoch 5/10
1719/1719 [==============================] - 25s 14ms/step - loss: 0.3405 - rounded_accuracy: 0.8875 - val_loss: 0.3362 - val_rounded_accuracy: 0.8924
Epoch 6/10
1719/1719 [==============================] - 25s 14ms/step - loss: 0.3335 - rounded_accuracy: 0.8922 - val_loss: 0.3305 - val_rounded_accuracy: 0.8967
Epoch 7/10
1719/1719 [==============================] - 25s 15ms/step - loss: 0.3286 - rounded_accuracy: 0.8955 - val_loss: 0.3329 - val_rounded_accuracy: 0.8933
Epoch 8/10
1719/1719 [==============================] - 25s 15ms/step - loss: 0.3250 - rounded_accuracy: 0.8981 - val_loss: 0.3275 - val_rounded_accuracy: 0.8920
Epoch 9/10
1719/1719 [==============================] - 25s 15ms/step - loss: 0.3222 - rounded_accuracy: 0.9002 - val_loss: 0.3248 - val_rounded_accuracy: 0.9011
Epoch 10/10
1719/1719 [==============================] - 25s 15ms/step - loss: 0.3198 - rounded_accuracy: 0.9021 - val_loss: 0.3234 - val_rounded_accuracy: 0.8985
In [30]:
show_reconstructions(recurrent_ae)
plt.show()
WARNING:tensorflow:6 out of the last 162 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd6e8156d90> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Stacked denoising Autoencoder

Using Gaussian noise:

In [31]:
tf.random.set_seed(42)
np.random.seed(42)

denoising_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.GaussianNoise(0.2),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="selu")
])
denoising_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
denoising_ae = keras.models.Sequential([denoising_encoder, denoising_decoder])
denoising_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.0),
                     metrics=[rounded_accuracy])
history = denoising_ae.fit(X_train, X_train, epochs=10,
                           validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3499 - rounded_accuracy: 0.8774 - val_loss: 0.3174 - val_rounded_accuracy: 0.9052
Epoch 2/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3122 - rounded_accuracy: 0.9095 - val_loss: 0.3086 - val_rounded_accuracy: 0.9122
Epoch 3/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3063 - rounded_accuracy: 0.9146 - val_loss: 0.3043 - val_rounded_accuracy: 0.9182
Epoch 4/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3028 - rounded_accuracy: 0.9177 - val_loss: 0.3002 - val_rounded_accuracy: 0.9217
Epoch 5/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2998 - rounded_accuracy: 0.9203 - val_loss: 0.2976 - val_rounded_accuracy: 0.9237
Epoch 6/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2977 - rounded_accuracy: 0.9222 - val_loss: 0.2955 - val_rounded_accuracy: 0.9265
Epoch 7/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2961 - rounded_accuracy: 0.9234 - val_loss: 0.2945 - val_rounded_accuracy: 0.9270
Epoch 8/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.2949 - rounded_accuracy: 0.9247 - val_loss: 0.2941 - val_rounded_accuracy: 0.9283
Epoch 9/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2939 - rounded_accuracy: 0.9255 - val_loss: 0.2934 - val_rounded_accuracy: 0.9255
Epoch 10/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.2929 - rounded_accuracy: 0.9264 - val_loss: 0.2918 - val_rounded_accuracy: 0.9271
In [32]:
tf.random.set_seed(42)
np.random.seed(42)

noise = keras.layers.GaussianNoise(0.2)
show_reconstructions(denoising_ae, noise(X_valid, training=True))
plt.show()
WARNING:tensorflow:7 out of the last 163 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd35ec95378> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Using dropout:

In [33]:
tf.random.set_seed(42)
np.random.seed(42)

dropout_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dropout(0.5),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="selu")
])
dropout_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
dropout_ae = keras.models.Sequential([dropout_encoder, dropout_decoder])
dropout_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.0),
                   metrics=[rounded_accuracy])
history = dropout_ae.fit(X_train, X_train, epochs=10,
                         validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3568 - rounded_accuracy: 0.8710 - val_loss: 0.3200 - val_rounded_accuracy: 0.9041
Epoch 2/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3182 - rounded_accuracy: 0.9032 - val_loss: 0.3125 - val_rounded_accuracy: 0.9110
Epoch 3/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3128 - rounded_accuracy: 0.9075 - val_loss: 0.3075 - val_rounded_accuracy: 0.9153
Epoch 4/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3092 - rounded_accuracy: 0.9102 - val_loss: 0.3041 - val_rounded_accuracy: 0.9178
Epoch 5/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3067 - rounded_accuracy: 0.9123 - val_loss: 0.3015 - val_rounded_accuracy: 0.9193
Epoch 6/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3048 - rounded_accuracy: 0.9139 - val_loss: 0.3014 - val_rounded_accuracy: 0.9172
Epoch 7/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3033 - rounded_accuracy: 0.9151 - val_loss: 0.2995 - val_rounded_accuracy: 0.9210
Epoch 8/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3022 - rounded_accuracy: 0.9159 - val_loss: 0.2978 - val_rounded_accuracy: 0.9229
Epoch 9/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3012 - rounded_accuracy: 0.9167 - val_loss: 0.2971 - val_rounded_accuracy: 0.9221
Epoch 10/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3003 - rounded_accuracy: 0.9175 - val_loss: 0.2959 - val_rounded_accuracy: 0.9238
In [34]:
tf.random.set_seed(42)
np.random.seed(42)

dropout = keras.layers.Dropout(0.5)
show_reconstructions(dropout_ae, dropout(X_valid, training=True))
save_fig("dropout_denoising_plot", tight_layout=False)
WARNING:tensorflow:8 out of the last 164 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd743bded08> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
Saving figure dropout_denoising_plot

Sparse Autoencoder

Let's build a simple stacked autoencoder, so we can compare it to the sparse autoencoders we will build. This time we will use the sigmoid activation function for the coding layer, to ensure that the coding values range from 0 to 1:

In [35]:
tf.random.set_seed(42)
np.random.seed(42)

simple_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(30, activation="sigmoid"),
])
simple_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[30]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
simple_ae = keras.models.Sequential([simple_encoder, simple_decoder])
simple_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.),
                  metrics=[rounded_accuracy])
history = simple_ae.fit(X_train, X_train, epochs=10,
                        validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.4329 - rounded_accuracy: 0.7950 - val_loss: 0.3773 - val_rounded_accuracy: 0.8492
Epoch 2/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3612 - rounded_accuracy: 0.8668 - val_loss: 0.3514 - val_rounded_accuracy: 0.8797
Epoch 3/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3410 - rounded_accuracy: 0.8852 - val_loss: 0.3367 - val_rounded_accuracy: 0.8912
Epoch 4/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3288 - rounded_accuracy: 0.8954 - val_loss: 0.3263 - val_rounded_accuracy: 0.8992
Epoch 5/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3213 - rounded_accuracy: 0.9012 - val_loss: 0.3210 - val_rounded_accuracy: 0.9032
Epoch 6/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3176 - rounded_accuracy: 0.9038 - val_loss: 0.3179 - val_rounded_accuracy: 0.9050
Epoch 7/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3150 - rounded_accuracy: 0.9060 - val_loss: 0.3161 - val_rounded_accuracy: 0.9090
Epoch 8/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3129 - rounded_accuracy: 0.9079 - val_loss: 0.3154 - val_rounded_accuracy: 0.9108
Epoch 9/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3109 - rounded_accuracy: 0.9096 - val_loss: 0.3133 - val_rounded_accuracy: 0.9085
Epoch 10/10
1719/1719 [==============================] - 5s 3ms/step - loss: 0.3092 - rounded_accuracy: 0.9112 - val_loss: 0.3098 - val_rounded_accuracy: 0.9118
In [36]:
show_reconstructions(simple_ae)
plt.show()
WARNING:tensorflow:9 out of the last 165 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd35e4262f0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Let's create a couple functions to print nice activation histograms:

In [37]:
def plot_percent_hist(ax, data, bins):
    counts, _ = np.histogram(data, bins=bins)
    widths = bins[1:] - bins[:-1]
    x = bins[:-1] + widths / 2
    ax.bar(x, counts / len(data), width=widths*0.8)
    ax.xaxis.set_ticks(bins)
    ax.yaxis.set_major_formatter(mpl.ticker.FuncFormatter(
        lambda y, position: "{}%".format(int(np.round(100 * y)))))
    ax.grid(True)
In [38]:
def plot_activations_histogram(encoder, height=1, n_bins=10):
    X_valid_codings = encoder(X_valid).numpy()
    activation_means = X_valid_codings.mean(axis=0)
    mean = activation_means.mean()
    bins = np.linspace(0, 1, n_bins + 1)

    fig, [ax1, ax2] = plt.subplots(figsize=(10, 3), nrows=1, ncols=2, sharey=True)
    plot_percent_hist(ax1, X_valid_codings.ravel(), bins)
    ax1.plot([mean, mean], [0, height], "k--", label="Overall Mean = {:.2f}".format(mean))
    ax1.legend(loc="upper center", fontsize=14)
    ax1.set_xlabel("Activation")
    ax1.set_ylabel("% Activations")
    ax1.axis([0, 1, 0, height])
    plot_percent_hist(ax2, activation_means, bins)
    ax2.plot([mean, mean], [0, height], "k--")
    ax2.set_xlabel("Neuron Mean Activation")
    ax2.set_ylabel("% Neurons")
    ax2.axis([0, 1, 0, height])

Let's use these functions to plot histograms of the activations of the encoding layer. The histogram on the left shows the distribution of all the activations. You can see that values close to 0 or 1 are more frequent overall, which is consistent with the saturating nature of the sigmoid function. The histogram on the right shows the distribution of mean neuron activations: you can see that most neurons have a mean activation close to 0.5. Both histograms tell us that each neuron tends to either fire close to 0 or 1, with about 50% probability each. However, some neurons fire almost all the time (right side of the right histogram).

In [39]:
plot_activations_histogram(simple_encoder, height=0.35)
plt.show()

Now let's add $\ell_1$ regularization to the coding layer:

In [40]:
tf.random.set_seed(42)
np.random.seed(42)

sparse_l1_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(300, activation="sigmoid"),
    keras.layers.ActivityRegularization(l1=1e-3)  # Alternatively, you could add
                                                  # activity_regularizer=keras.regularizers.l1(1e-3)
                                                  # to the previous layer.
])
sparse_l1_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[300]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
sparse_l1_ae = keras.models.Sequential([sparse_l1_encoder, sparse_l1_decoder])
sparse_l1_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.0),
                     metrics=[rounded_accuracy])
history = sparse_l1_ae.fit(X_train, X_train, epochs=10,
                           validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.4310 - rounded_accuracy: 0.8129 - val_loss: 0.3808 - val_rounded_accuracy: 0.8555
Epoch 2/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3690 - rounded_accuracy: 0.8689 - val_loss: 0.3638 - val_rounded_accuracy: 0.8741
Epoch 3/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3545 - rounded_accuracy: 0.8799 - val_loss: 0.3502 - val_rounded_accuracy: 0.8857
Epoch 4/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3437 - rounded_accuracy: 0.8876 - val_loss: 0.3418 - val_rounded_accuracy: 0.8898
Epoch 5/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3372 - rounded_accuracy: 0.8920 - val_loss: 0.3368 - val_rounded_accuracy: 0.8949
Epoch 6/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3320 - rounded_accuracy: 0.8968 - val_loss: 0.3316 - val_rounded_accuracy: 0.8992
Epoch 7/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3280 - rounded_accuracy: 0.8999 - val_loss: 0.3278 - val_rounded_accuracy: 0.9030
Epoch 8/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3246 - rounded_accuracy: 0.9027 - val_loss: 0.3268 - val_rounded_accuracy: 0.9052
Epoch 9/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3221 - rounded_accuracy: 0.9049 - val_loss: 0.3232 - val_rounded_accuracy: 0.9028
Epoch 10/10
1719/1719 [==============================] - 6s 3ms/step - loss: 0.3202 - rounded_accuracy: 0.9064 - val_loss: 0.3210 - val_rounded_accuracy: 0.9083
In [41]:
show_reconstructions(sparse_l1_ae)
WARNING:tensorflow:10 out of the last 166 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd6548b3ea0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
In [42]:
plot_activations_histogram(sparse_l1_encoder, height=1.)
plt.show()

Let's use the KL Divergence loss instead to ensure sparsity, and target 10% sparsity rather than 0%:

In [43]:
p = 0.1
q = np.linspace(0.001, 0.999, 500)
kl_div = p * np.log(p / q) + (1 - p) * np.log((1 - p) / (1 - q))
mse = (p - q)**2
mae = np.abs(p - q)
plt.plot([p, p], [0, 0.3], "k:")
plt.text(0.05, 0.32, "Target\nsparsity", fontsize=14)
plt.plot(q, kl_div, "b-", label="KL divergence")
plt.plot(q, mae, "g--", label=r"MAE ($\ell_1$)")
plt.plot(q, mse, "r--", linewidth=1, label=r"MSE ($\ell_2$)")
plt.legend(loc="upper left", fontsize=14)
plt.xlabel("Actual sparsity")
plt.ylabel("Cost", rotation=0)
plt.axis([0, 1, 0, 0.95])
save_fig("sparsity_loss_plot")
Saving figure sparsity_loss_plot
In [44]:
K = keras.backend
kl_divergence = keras.losses.kullback_leibler_divergence

class KLDivergenceRegularizer(keras.regularizers.Regularizer):
    def __init__(self, weight, target=0.1):
        self.weight = weight
        self.target = target
    def __call__(self, inputs):
        mean_activities = K.mean(inputs, axis=0)
        return self.weight * (
            kl_divergence(self.target, mean_activities) +
            kl_divergence(1. - self.target, 1. - mean_activities))
In [45]:
tf.random.set_seed(42)
np.random.seed(42)

kld_reg = KLDivergenceRegularizer(weight=0.05, target=0.1)
sparse_kl_encoder = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(300, activation="sigmoid", activity_regularizer=kld_reg)
])
sparse_kl_decoder = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[300]),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
sparse_kl_ae = keras.models.Sequential([sparse_kl_encoder, sparse_kl_decoder])
sparse_kl_ae.compile(loss="binary_crossentropy", optimizer=keras.optimizers.SGD(lr=1.0),
              metrics=[rounded_accuracy])
history = sparse_kl_ae.fit(X_train, X_train, epochs=10,
                           validation_data=(X_valid, X_valid))
Epoch 1/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.4150 - rounded_accuracy: 0.8121 - val_loss: 0.3716 - val_rounded_accuracy: 0.8564
Epoch 2/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3531 - rounded_accuracy: 0.8763 - val_loss: 0.3442 - val_rounded_accuracy: 0.8847
Epoch 3/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3340 - rounded_accuracy: 0.8918 - val_loss: 0.3293 - val_rounded_accuracy: 0.8975
Epoch 4/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3224 - rounded_accuracy: 0.9018 - val_loss: 0.3213 - val_rounded_accuracy: 0.9043
Epoch 5/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3169 - rounded_accuracy: 0.9063 - val_loss: 0.3171 - val_rounded_accuracy: 0.9078
Epoch 6/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3135 - rounded_accuracy: 0.9093 - val_loss: 0.3140 - val_rounded_accuracy: 0.9105
Epoch 7/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3107 - rounded_accuracy: 0.9117 - val_loss: 0.3115 - val_rounded_accuracy: 0.9130
Epoch 8/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3084 - rounded_accuracy: 0.9137 - val_loss: 0.3092 - val_rounded_accuracy: 0.9149
Epoch 9/10
1719/1719 [==============================] - 6s 4ms/step - loss: 0.3063 - rounded_accuracy: 0.9155 - val_loss: 0.3074 - val_rounded_accuracy: 0.9144
Epoch 10/10
1719/1719 [==============================] - 7s 4ms/step - loss: 0.3043 - rounded_accuracy: 0.9171 - val_loss: 0.3054 - val_rounded_accuracy: 0.9169
In [46]:
show_reconstructions(sparse_kl_ae)
WARNING:tensorflow:11 out of the last 167 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd35e07a7b8> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.
In [47]:
plot_activations_histogram(sparse_kl_encoder)
save_fig("sparse_autoencoder_plot")
plt.show()
Saving figure sparse_autoencoder_plot

Variational Autoencoder

In [48]:
class Sampling(keras.layers.Layer):
    def call(self, inputs):
        mean, log_var = inputs
        return K.random_normal(tf.shape(log_var)) * K.exp(log_var / 2) + mean 
In [49]:
tf.random.set_seed(42)
np.random.seed(42)

codings_size = 10

inputs = keras.layers.Input(shape=[28, 28])
z = keras.layers.Flatten()(inputs)
z = keras.layers.Dense(150, activation="selu")(z)
z = keras.layers.Dense(100, activation="selu")(z)
codings_mean = keras.layers.Dense(codings_size)(z)
codings_log_var = keras.layers.Dense(codings_size)(z)
codings = Sampling()([codings_mean, codings_log_var])
variational_encoder = keras.models.Model(
    inputs=[inputs], outputs=[codings_mean, codings_log_var, codings])

decoder_inputs = keras.layers.Input(shape=[codings_size])
x = keras.layers.Dense(100, activation="selu")(decoder_inputs)
x = keras.layers.Dense(150, activation="selu")(x)
x = keras.layers.Dense(28 * 28, activation="sigmoid")(x)
outputs = keras.layers.Reshape([28, 28])(x)
variational_decoder = keras.models.Model(inputs=[decoder_inputs], outputs=[outputs])

_, _, codings = variational_encoder(inputs)
reconstructions = variational_decoder(codings)
variational_ae = keras.models.Model(inputs=[inputs], outputs=[reconstructions])

latent_loss = -0.5 * K.sum(
    1 + codings_log_var - K.exp(codings_log_var) - K.square(codings_mean),
    axis=-1)
variational_ae.add_loss(K.mean(latent_loss) / 784.)
variational_ae.compile(loss="binary_crossentropy", optimizer="rmsprop", metrics=[rounded_accuracy])
history = variational_ae.fit(X_train, X_train, epochs=25, batch_size=128,
                             validation_data=(X_valid, X_valid))
Epoch 1/25
430/430 [==============================] - 3s 7ms/step - loss: 0.3893 - rounded_accuracy: 0.8611 - val_loss: 0.3486 - val_rounded_accuracy: 0.8952
Epoch 2/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3422 - rounded_accuracy: 0.8982 - val_loss: 0.3372 - val_rounded_accuracy: 0.9051
Epoch 3/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3326 - rounded_accuracy: 0.9054 - val_loss: 0.3338 - val_rounded_accuracy: 0.9082
Epoch 4/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3279 - rounded_accuracy: 0.9091 - val_loss: 0.3305 - val_rounded_accuracy: 0.9069
Epoch 5/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3249 - rounded_accuracy: 0.9119 - val_loss: 0.3291 - val_rounded_accuracy: 0.9117
Epoch 6/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3227 - rounded_accuracy: 0.9136 - val_loss: 0.3267 - val_rounded_accuracy: 0.9085
Epoch 7/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3210 - rounded_accuracy: 0.9149 - val_loss: 0.3215 - val_rounded_accuracy: 0.9139
Epoch 8/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3196 - rounded_accuracy: 0.9162 - val_loss: 0.3204 - val_rounded_accuracy: 0.9181
Epoch 9/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3186 - rounded_accuracy: 0.9169 - val_loss: 0.3242 - val_rounded_accuracy: 0.9063
Epoch 10/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3177 - rounded_accuracy: 0.9176 - val_loss: 0.3208 - val_rounded_accuracy: 0.9182
Epoch 11/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3170 - rounded_accuracy: 0.9183 - val_loss: 0.3183 - val_rounded_accuracy: 0.9202
Epoch 12/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3164 - rounded_accuracy: 0.9189 - val_loss: 0.3199 - val_rounded_accuracy: 0.9199
Epoch 13/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3159 - rounded_accuracy: 0.9192 - val_loss: 0.3176 - val_rounded_accuracy: 0.9205
Epoch 14/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3153 - rounded_accuracy: 0.9195 - val_loss: 0.3168 - val_rounded_accuracy: 0.9185
Epoch 15/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3148 - rounded_accuracy: 0.9198 - val_loss: 0.3166 - val_rounded_accuracy: 0.9180
Epoch 16/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3145 - rounded_accuracy: 0.9201 - val_loss: 0.3175 - val_rounded_accuracy: 0.9205
Epoch 17/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3141 - rounded_accuracy: 0.9205 - val_loss: 0.3162 - val_rounded_accuracy: 0.9216
Epoch 18/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3138 - rounded_accuracy: 0.9206 - val_loss: 0.3170 - val_rounded_accuracy: 0.9206
Epoch 19/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3135 - rounded_accuracy: 0.9209 - val_loss: 0.3181 - val_rounded_accuracy: 0.9204
Epoch 20/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3132 - rounded_accuracy: 0.9210 - val_loss: 0.3145 - val_rounded_accuracy: 0.9222
Epoch 21/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3130 - rounded_accuracy: 0.9212 - val_loss: 0.3176 - val_rounded_accuracy: 0.9174
Epoch 22/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3127 - rounded_accuracy: 0.9214 - val_loss: 0.3150 - val_rounded_accuracy: 0.9207
Epoch 23/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3125 - rounded_accuracy: 0.9214 - val_loss: 0.3162 - val_rounded_accuracy: 0.9218
Epoch 24/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3123 - rounded_accuracy: 0.9218 - val_loss: 0.3144 - val_rounded_accuracy: 0.9228
Epoch 25/25
430/430 [==============================] - 3s 6ms/step - loss: 0.3120 - rounded_accuracy: 0.9219 - val_loss: 0.3150 - val_rounded_accuracy: 0.9218
In [50]:
show_reconstructions(variational_ae)
plt.show()
WARNING:tensorflow:11 out of the last 11 calls to <function Model.make_predict_function.<locals>.predict_function at 0x7fd32f7831e0> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/tutorials/customization/performance#python_or_tensor_args and https://www.tensorflow.org/api_docs/python/tf/function for  more details.

Generate Fashion Images

In [51]:
def plot_multiple_images(images, n_cols=None):
    n_cols = n_cols or len(images)
    n_rows = (len(images) - 1) // n_cols + 1
    if images.shape[-1] == 1:
        images = np.squeeze(images, axis=-1)
    plt.figure(figsize=(n_cols, n_rows))
    for index, image in enumerate(images):
        plt.subplot(n_rows, n_cols, index + 1)
        plt.imshow(image, cmap="binary")
        plt.axis("off")

Let's generate a few random codings, decode them and plot the resulting images:

In [52]:
tf.random.set_seed(42)

codings = tf.random.normal(shape=[12, codings_size])
images = variational_decoder(codings).numpy()
plot_multiple_images(images, 4)
save_fig("vae_generated_images_plot", tight_layout=False)
Saving figure vae_generated_images_plot

Now let's perform semantic interpolation between these images:

In [53]:
tf.random.set_seed(42)
np.random.seed(42)

codings_grid = tf.reshape(codings, [1, 3, 4, codings_size])
larger_grid = tf.image.resize(codings_grid, size=[5, 7])
interpolated_codings = tf.reshape(larger_grid, [-1, codings_size])
images = variational_decoder(interpolated_codings).numpy()

plt.figure(figsize=(7, 5))
for index, image in enumerate(images):
    plt.subplot(5, 7, index + 1)
    if index%7%2==0 and index//7%2==0:
        plt.gca().get_xaxis().set_visible(False)
        plt.gca().get_yaxis().set_visible(False)
    else:
        plt.axis("off")
    plt.imshow(image, cmap="binary")
save_fig("semantic_interpolation_plot", tight_layout=False)
Saving figure semantic_interpolation_plot

Generative Adversarial Networks

In [54]:
np.random.seed(42)
tf.random.set_seed(42)

codings_size = 30

generator = keras.models.Sequential([
    keras.layers.Dense(100, activation="selu", input_shape=[codings_size]),
    keras.layers.Dense(150, activation="selu"),
    keras.layers.Dense(28 * 28, activation="sigmoid"),
    keras.layers.Reshape([28, 28])
])
discriminator = keras.models.Sequential([
    keras.layers.Flatten(input_shape=[28, 28]),
    keras.layers.Dense(150, activation="selu"),
    keras.layers.Dense(100, activation="selu"),
    keras.layers.Dense(1, activation="sigmoid")
])
gan = keras.models.Sequential([generator, discriminator])
In [55]:
discriminator.compile(loss="binary_crossentropy", optimizer="rmsprop")
discriminator.trainable = False
gan.compile(loss="binary_crossentropy", optimizer="rmsprop")
In [56]:
batch_size = 32
dataset = tf.data.Dataset.from_tensor_slices(X_train).shuffle(1000)
dataset = dataset.batch(batch_size, drop_remainder=True).prefetch(1)
In [57]:
def train_gan(gan, dataset, batch_size, codings_size, n_epochs=50):
    generator, discriminator = gan.layers
    for epoch in range(n_epochs):
        print("Epoch {}/{}".format(epoch + 1, n_epochs))              # not shown in the book
        for X_batch in dataset:
            # phase 1 - training the discriminator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            generated_images = generator(noise)
            X_fake_and_real = tf.concat([generated_images, X_batch], axis=0)
            y1 = tf.constant([[0.]] * batch_size + [[1.]] * batch_size)
            discriminator.trainable = True
            discriminator.train_on_batch(X_fake_and_real, y1)
            # phase 2 - training the generator
            noise = tf.random.normal(shape=[batch_size, codings_size])
            y2 = tf.constant([[1.]] * batch_size)
            discriminator.trainable = False
            gan.train_on_batch(noise, y2)
        plot_multiple_images(generated_images, 8)                     # not shown
        plt.show()                                                    # not shown
In [58]:
train_gan(gan, dataset, batch_size, codings_size, n_epochs=1)
Epoch 1/1
In [59]:
tf.random.set_seed(42)
np.random.seed(42)

noise = tf.random.normal(shape=[batch_size, codings_size])
generated_images = generator(noise)
plot_multiple_images(generated_images, 8)
save_fig("gan_generated_images_plot", tight_layout=False)
Saving figure gan_generated_images_plot
In [60]:
train_gan(gan, dataset, batch_size, codings_size)
Epoch 1/50