Notebook

Loading Data¶

Fashion-MNIST is a dataset of Zalando's article images—consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes. The dataset serves as a direct drop-in replacement for the original MNIST dataset for benchmarking machine learning algorithms. It shares the same image size and structure of training and testing splits.

After downloading the dataset, I checked to make sure they are in the appropriate folder.

In [1]:

import numpy as np
import pandas as pd

# Input data files are available in the "data/" directory.
# For example, running this will list the files in the input directory
from subprocess import check_output
print(check_output(["ls", "data"]).decode("utf8"))

fashion-mnist_test.csv
fashion-mnist_train.csv
t10k-images-idx3-ubyte
t10k-labels-idx1-ubyte
train-images-idx3-ubyte
train-labels-idx1-ubyte

Approach¶

In this work, I will train a very simple Convolutional Neural Network classifier with 1 convolution layer using the Keras deep learning library. The model is first trained for 10 epochs with batch size of 256, compiled with categorical_crossentropy loss function and Adam optimizer. Then, I added data augmentation, which generates new training samples by rotating, shifting and zooming on the training samples, and trained for another 50 epochs.

I will first split the original training data (60,000 images) into 80% training (48,000 images) and 20% validation (12000 images) optimize the classifier, while keeping the test data (10,000 images) to finally evaluate the accuracy of the model on the data it has never seen. This helps to see whether I'm over-fitting on the training data and whether I should lower the learning rate and train for more epochs if validation accuracy is higher than training accuracy or stop over-training if training accuracy shift higher than the validation.

In [2]:

# Import libraries
from keras.utils import to_categorical
from sklearn.model_selection import train_test_split

# Load training and test data into dataframes
data_train = pd.read_csv('data/fashion-mnist_train.csv')
data_test = pd.read_csv('data/fashion-mnist_test.csv')

# X forms the training images, and y forms the training labels
X = np.array(data_train.iloc[:, 1:])
y = to_categorical(np.array(data_train.iloc[:, 0]))

# Here I split original training data to sub-training (80%) and validation data (20%)
X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=0.2, random_state=13)

# X_test forms the test images, and y_test forms the test labels
X_test = np.array(data_test.iloc[:, 1:])
y_test = to_categorical(np.array(data_test.iloc[:, 0]))

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

Processing Data¶

After loading and splitting the data, I preprocess them by reshaping them into the shape the network expects and scaling them so that all values are in the [0, 1] interval. Previously, for instance, the training data were stored in an array of shape (60000, 28, 28) of type uint8 with values in the [0, 255] interval. I transform it into a float32 array of shape (60000, 28 * 28) with values between 0 and 1.

In [3]:

# Each image's dimension is 28 x 28
img_rows, img_cols = 28, 28
input_shape = (img_rows, img_cols, 1)

# Prepare the training images
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_train = X_train.astype('float32')
X_train /= 255

# Prepare the test images
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
X_test = X_test.astype('float32')
X_test /= 255

# Prepare the validation images
X_val = X_val.reshape(X_val.shape[0], img_rows, img_cols, 1)
X_val = X_val.astype('float32')
X_val /= 255

CNN with 1 Convolutional Layer¶

This CNN takes as input tensors of shape (image_height, image_width, image_channels). In this case, I configure the CNN to process inputs of size (28, 28, 1), which is the format of the FashionMNIST images. I do this by passing the argument input_shape=(28, 28, 1) to the first layer.

The 1st layer is a Conv2D layer for the convolution operation that extracts features from the input images by sliding a convolution filter over the input to produce a feature map. Here I choose feature map with size 3 x 3.
The 2nd layer is a MaxPooling2D layer for the max-pooling operation that reduces the dimensionality of each feature, which helps shorten training time and reduce number of parameters. Here I choose the pooling window with size 2 x 2.
To combat overfititng, I add a Dropout layer as the 3rd layer, a powerful regularization technique. Dropout is the method used to reduce overfitting. It forces the model to learn multiple independent representations of the same data by randomly disabling neurons in the learning phase. In this model, dropout will randomnly disable 20% of the neurons.
The next step is to feed the last output tensor into a stack of Dense layers, otherwise known as fully-connected layers. These densely connected classifiers process vectors, which are 1D, whereas the current output is a 3D tensor. Thus, I need to flatten the 3D outputs to 1D, and then add 2 Dense layers on top.
I do a 10-way classification (as there are 10 classes of fashion images), using a final layer with 10 outputs and a softmax activation. Softmax activation enables me to calculate the output based on the probabilities. Each class is assigned a probability and the class with the maximum probability is the model’s output for the input.

In [6]:

# Import Keras libraries
import keras
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D

cnn1 = Sequential()
cnn1.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=input_shape))
cnn1.add(MaxPooling2D(pool_size=(2, 2)))
cnn1.add(Dropout(0.2))

cnn1.add(Flatten())

cnn1.add(Dense(128, activation='relu'))
cnn1.add(Dense(10, activation='softmax'))

When compiling the model, I choose categorical_crossentropy as the loss function (which is relevent for multiclass, single-label classification problem) and Adam optimizer.

The cross-entropy loss calculates the error rate between the predicted value and the original value. The formula for calculating cross-entropy loss is given here. Categorical is used because there are 10 classes to predict from. If there were 2 classes, I would have used binary_crossentropy.
The Adam optimizer is an improvement over SGD(Stochastic Gradient Descent). The optimizer is responsible for updating the weights of the neurons via backpropagation. It calculates the derivative of the loss function with respect to each weight and subtracts it from the weight. That is how a neural network learns.

In [7]:

cnn1.compile(loss=keras.losses.categorical_crossentropy,
              optimizer=keras.optimizers.Adam(),
              metrics=['accuracy'])

Let's display the architecture of this simple CNN model:

In [8]:

cnn1.summary()

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_1 (Conv2D)            (None, 26, 26, 32)        320       
_________________________________________________________________
max_pooling2d_1 (MaxPooling2 (None, 13, 13, 32)        0         
_________________________________________________________________
dropout_1 (Dropout)          (None, 13, 13, 32)        0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 5408)              0         
_________________________________________________________________
dense_1 (Dense)              (None, 128)               692352    
_________________________________________________________________
dense_2 (Dense)              (None, 10)                1290      
=================================================================
Total params: 693,962
Trainable params: 693,962
Non-trainable params: 0
_________________________________________________________________

693,962 parameters are available to be trained.
The output of the Conv2D and MaxPooling2D layers are 3D tensors of shape (height, width, channels).
The number of channels is controlled by the 1st argument passed to the Conv2D layer (32).
The (13, 13, 32) outputs from Dropout layer are flattened into vectors of shape (5408,) before going through 2 Dense layers.

Training the Model¶

As previously mentioned, I train the model with batch size of 256 and 10 epochs on both training and validation data.

In [9]:

history1 = cnn1.fit(X_train, y_train,
          batch_size=256,
          epochs=10,
          verbose=1,
          validation_data=(X_val, y_val))

Train on 48000 samples, validate on 12000 samples
Epoch 1/10
48000/48000 [==============================] - 30s 621us/step - loss: 0.5460 - acc: 0.8085 - val_loss: 0.3817 - val_acc: 0.8690
Epoch 2/10
48000/48000 [==============================] - 31s 649us/step - loss: 0.3627 - acc: 0.8712 - val_loss: 0.3390 - val_acc: 0.8828
Epoch 3/10
48000/48000 [==============================] - 30s 635us/step - loss: 0.3191 - acc: 0.8868 - val_loss: 0.3118 - val_acc: 0.8906
Epoch 4/10
48000/48000 [==============================] - 31s 645us/step - loss: 0.2957 - acc: 0.8940 - val_loss: 0.3012 - val_acc: 0.8938
Epoch 5/10
48000/48000 [==============================] - 42s 865us/step - loss: 0.2718 - acc: 0.9034 - val_loss: 0.2881 - val_acc: 0.8968
Epoch 6/10
48000/48000 [==============================] - 38s 800us/step - loss: 0.2587 - acc: 0.9087 - val_loss: 0.2890 - val_acc: 0.8938
Epoch 7/10
48000/48000 [==============================] - 30s 623us/step - loss: 0.2461 - acc: 0.9108 - val_loss: 0.2727 - val_acc: 0.9018
Epoch 8/10
48000/48000 [==============================] - 32s 670us/step - loss: 0.2318 - acc: 0.9159 - val_loss: 0.2592 - val_acc: 0.9048
Epoch 9/10
48000/48000 [==============================] - 32s 671us/step - loss: 0.2200 - acc: 0.9210 - val_loss: 0.2691 - val_acc: 0.9046
Epoch 10/10
48000/48000 [==============================] - 33s 688us/step - loss: 0.2109 - acc: 0.9241 - val_loss: 0.2559 - val_acc: 0.9081

In [10]:

score1 = cnn1.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score1[0])
print('Test accuracy:', score1[1])

Test loss: 0.2484699842095375
Test accuracy: 0.9104

My accuracy is 91.04%, pretty powerful!

Data Augmentation¶

Overfitting can be caused by having too few samples to learn from, making me unable to train a model that can generalize to new data. Given infinite data, my model would be exposed to every possible aspect of the data distribution at hand: I would never overfit.

Data augmentation takes the approach of generating more training data from existing training samples, by augmenting the samples via a number of random transformations that yield believable-looking images. The goal is that at training time, my model will never see the exact same picture twice. This helps expose the model to more aspects of the data and generalize better.

In Keras, this can be done by configuring a number of random transformations to be performed on the images read by the ImageDataGenerator instance.

rotation_range is a value in degrees (0–180), a range within which to randomly rotate pictures.
width_shift and height_shift are ranges (as a fraction of total width or height) within which to randomly translate pictures vertically or horizontally.
shear_range is for randomly applying shearing transformations.
zoom_range is for randomly zooming inside pictures.

In [11]:

from keras.preprocessing.image import ImageDataGenerator
gen = ImageDataGenerator(rotation_range=8, width_shift_range=0.08, shear_range=0.3,
                               height_shift_range=0.08, zoom_range=0.08)
batches = gen.flow(X_train, y_train, batch_size=256)
val_batches = gen.flow(X_val, y_val, batch_size=256)

Let's train the network using data augmentation.

In [13]:

history1 = cnn1.fit_generator(batches, steps_per_epoch=48000//256, epochs=50,
                    validation_data=val_batches, validation_steps=12000//256, use_multiprocessing=True)

Epoch 1/50
187/187 [==============================] - 35s 188ms/step - loss: 0.3706 - acc: 0.8633 - val_loss: 0.3606 - val_acc: 0.8665
Epoch 2/50
187/187 [==============================] - 40s 214ms/step - loss: 0.3636 - acc: 0.8654 - val_loss: 0.3414 - val_acc: 0.8747
Epoch 3/50
187/187 [==============================] - 45s 243ms/step - loss: 0.3503 - acc: 0.8685 - val_loss: 0.3401 - val_acc: 0.8750
Epoch 4/50
187/187 [==============================] - 38s 205ms/step - loss: 0.3487 - acc: 0.8709 - val_loss: 0.3428 - val_acc: 0.8755
Epoch 5/50
187/187 [==============================] - 36s 195ms/step - loss: 0.3351 - acc: 0.8756 - val_loss: 0.3373 - val_acc: 0.8718
Epoch 6/50
187/187 [==============================] - 38s 202ms/step - loss: 0.3314 - acc: 0.8778 - val_loss: 0.3354 - val_acc: 0.8736
Epoch 7/50
187/187 [==============================] - 38s 201ms/step - loss: 0.3313 - acc: 0.8758 - val_loss: 0.3363 - val_acc: 0.8735
Epoch 8/50
187/187 [==============================] - 37s 198ms/step - loss: 0.3204 - acc: 0.8811 - val_loss: 0.3083 - val_acc: 0.8890
Epoch 9/50
187/187 [==============================] - 36s 193ms/step - loss: 0.3211 - acc: 0.8803 - val_loss: 0.3400 - val_acc: 0.8722
Epoch 10/50
187/187 [==============================] - 37s 200ms/step - loss: 0.3129 - acc: 0.8832 - val_loss: 0.3087 - val_acc: 0.8896
Epoch 11/50
187/187 [==============================] - 36s 194ms/step - loss: 0.3109 - acc: 0.8839 - val_loss: 0.3187 - val_acc: 0.8836
Epoch 12/50
187/187 [==============================] - 36s 192ms/step - loss: 0.3070 - acc: 0.8857 - val_loss: 0.3119 - val_acc: 0.8828
Epoch 13/50
187/187 [==============================] - 36s 192ms/step - loss: 0.3056 - acc: 0.8869 - val_loss: 0.3072 - val_acc: 0.8882
Epoch 14/50
187/187 [==============================] - 35s 185ms/step - loss: 0.3050 - acc: 0.8873 - val_loss: 0.2996 - val_acc: 0.8890
Epoch 15/50
187/187 [==============================] - 35s 190ms/step - loss: 0.2994 - acc: 0.8879 - val_loss: 0.2954 - val_acc: 0.8923
Epoch 16/50
187/187 [==============================] - 37s 198ms/step - loss: 0.2980 - acc: 0.8895 - val_loss: 0.2917 - val_acc: 0.8928
Epoch 17/50
187/187 [==============================] - 41s 217ms/step - loss: 0.2929 - acc: 0.8915 - val_loss: 0.2911 - val_acc: 0.8916
Epoch 18/50
187/187 [==============================] - 40s 215ms/step - loss: 0.2922 - acc: 0.8909 - val_loss: 0.3008 - val_acc: 0.8914
Epoch 19/50
187/187 [==============================] - 42s 227ms/step - loss: 0.2893 - acc: 0.8928 - val_loss: 0.3009 - val_acc: 0.8868
Epoch 20/50
187/187 [==============================] - 34s 181ms/step - loss: 0.2879 - acc: 0.8938 - val_loss: 0.3089 - val_acc: 0.8833
Epoch 21/50
187/187 [==============================] - 33s 178ms/step - loss: 0.2868 - acc: 0.8933 - val_loss: 0.2967 - val_acc: 0.8924
Epoch 22/50
187/187 [==============================] - 36s 192ms/step - loss: 0.2838 - acc: 0.8941 - val_loss: 0.2970 - val_acc: 0.8894
Epoch 23/50
187/187 [==============================] - 33s 178ms/step - loss: 0.2801 - acc: 0.8957 - val_loss: 0.2927 - val_acc: 0.8893
Epoch 24/50
187/187 [==============================] - 36s 195ms/step - loss: 0.2798 - acc: 0.8958 - val_loss: 0.2862 - val_acc: 0.8967
Epoch 25/50
187/187 [==============================] - 39s 209ms/step - loss: 0.2756 - acc: 0.8982 - val_loss: 0.2792 - val_acc: 0.8962
Epoch 26/50
187/187 [==============================] - 34s 179ms/step - loss: 0.2759 - acc: 0.8973 - val_loss: 0.2884 - val_acc: 0.8941
Epoch 27/50
187/187 [==============================] - 32s 173ms/step - loss: 0.2734 - acc: 0.8989 - val_loss: 0.2854 - val_acc: 0.8968
Epoch 28/50
187/187 [==============================] - 31s 167ms/step - loss: 0.2719 - acc: 0.8981 - val_loss: 0.2764 - val_acc: 0.8990
Epoch 29/50
187/187 [==============================] - 31s 166ms/step - loss: 0.2689 - acc: 0.8999 - val_loss: 0.2792 - val_acc: 0.8974
Epoch 30/50
187/187 [==============================] - 32s 169ms/step - loss: 0.2687 - acc: 0.9005 - val_loss: 0.2861 - val_acc: 0.8942
Epoch 31/50
187/187 [==============================] - 31s 166ms/step - loss: 0.2666 - acc: 0.9008 - val_loss: 0.2900 - val_acc: 0.8925
Epoch 32/50
187/187 [==============================] - 32s 169ms/step - loss: 0.2685 - acc: 0.9015 - val_loss: 0.2995 - val_acc: 0.8928
Epoch 33/50
187/187 [==============================] - 31s 166ms/step - loss: 0.2672 - acc: 0.8990 - val_loss: 0.2657 - val_acc: 0.8990
Epoch 34/50
187/187 [==============================] - 32s 170ms/step - loss: 0.2633 - acc: 0.9029 - val_loss: 0.2790 - val_acc: 0.8967
Epoch 35/50
187/187 [==============================] - 31s 167ms/step - loss: 0.2597 - acc: 0.9027 - val_loss: 0.2928 - val_acc: 0.8921
Epoch 36/50
187/187 [==============================] - 38s 204ms/step - loss: 0.2600 - acc: 0.9036 - val_loss: 0.2731 - val_acc: 0.8997
Epoch 37/50
187/187 [==============================] - 34s 182ms/step - loss: 0.2613 - acc: 0.9036 - val_loss: 0.2764 - val_acc: 0.8984
Epoch 38/50
187/187 [==============================] - 31s 167ms/step - loss: 0.2540 - acc: 0.9055 - val_loss: 0.2819 - val_acc: 0.8982
Epoch 39/50
187/187 [==============================] - 31s 164ms/step - loss: 0.2568 - acc: 0.9043 - val_loss: 0.2724 - val_acc: 0.9000
Epoch 40/50
187/187 [==============================] - 31s 167ms/step - loss: 0.2586 - acc: 0.9037 - val_loss: 0.2722 - val_acc: 0.9008
Epoch 41/50
187/187 [==============================] - 31s 164ms/step - loss: 0.2486 - acc: 0.9077 - val_loss: 0.2671 - val_acc: 0.9004
Epoch 42/50
187/187 [==============================] - 31s 167ms/step - loss: 0.2527 - acc: 0.9047 - val_loss: 0.2738 - val_acc: 0.9002
Epoch 43/50
187/187 [==============================] - 31s 163ms/step - loss: 0.2525 - acc: 0.9061 - val_loss: 0.2790 - val_acc: 0.8988
Epoch 44/50
187/187 [==============================] - 31s 168ms/step - loss: 0.2529 - acc: 0.9067 - val_loss: 0.2718 - val_acc: 0.9017
Epoch 45/50
187/187 [==============================] - 31s 166ms/step - loss: 0.2517 - acc: 0.9052 - val_loss: 0.2798 - val_acc: 0.8977
Epoch 46/50
187/187 [==============================] - 31s 164ms/step - loss: 0.2511 - acc: 0.9066 - val_loss: 0.2810 - val_acc: 0.8921
Epoch 47/50
187/187 [==============================] - 31s 166ms/step - loss: 0.2515 - acc: 0.9070 - val_loss: 0.2657 - val_acc: 0.9015
Epoch 48/50
187/187 [==============================] - 31s 164ms/step - loss: 0.2479 - acc: 0.9079 - val_loss: 0.2715 - val_acc: 0.9014
Epoch 49/50
187/187 [==============================] - 32s 170ms/step - loss: 0.2486 - acc: 0.9061 - val_loss: 0.2755 - val_acc: 0.8988
Epoch 50/50
187/187 [==============================] - 31s 166ms/step - loss: 0.2439 - acc: 0.9103 - val_loss: 0.2682 - val_acc: 0.9004

In [14]:

score1 = cnn1.evaluate(X_test, y_test, verbose=0)
print('Test loss:', score1[0])
print('Test accuracy:', score1[1])

Test loss: 0.22102486880123615
Test accuracy: 0.9229

Okay, I improved the accuracy to 92.29%!

Results¶

Let's plot training and validation accuracy as well as training and validation loss.

In [16]:

import matplotlib.pyplot as plt
%matplotlib inline

accuracy = history1.history['acc']
val_accuracy = history1.history['val_acc']
loss = history1.history['loss']
val_loss = history1.history['val_loss']
epochs = range(len(accuracy))

plt.plot(epochs, accuracy, 'bo', label='Training accuracy')
plt.plot(epochs, val_accuracy, 'b', label='Validation accuracy')
plt.title('Training and validation accuracy')
plt.legend()
plt.figure()

plt.plot(epochs, loss, 'bo', label='Training loss')
plt.plot(epochs, val_loss, 'b', label='Validation loss')
plt.title('Training and validation loss')
plt.legend()
plt.show()

These plots look great: The training curves are closely tracking the validation curves.

Classification Report¶

I can summarize the performance of my classifier as follows:

In [17]:

# get the predictions for the test data
predicted_classes = cnn1.predict_classes(X_test)

# get the indices to be plotted
y_true = data_test.iloc[:, 0]
correct = np.nonzero(predicted_classes==y_true)[0]
incorrect = np.nonzero(predicted_classes!=y_true)[0]

In [18]:

from sklearn.metrics import classification_report
target_names = ["Class {}".format(i) for i in range(10)]
print(classification_report(y_true, predicted_classes, target_names=target_names))

             precision    recall  f1-score   support

    Class 0       0.88      0.88      0.88      1000
    Class 1       0.99      0.99      0.99      1000
    Class 2       0.93      0.83      0.87      1000
    Class 3       0.92      0.94      0.93      1000
    Class 4       0.86      0.90      0.88      1000
    Class 5       0.98      0.98      0.98      1000
    Class 6       0.77      0.79      0.78      1000
    Class 7       0.96      0.97      0.96      1000
    Class 8       0.98      0.99      0.99      1000
    Class 9       0.98      0.96      0.97      1000

avg / total       0.92      0.92      0.92     10000

It's apparent that the classifier is underperforming for class 6 in terms of both precision and recall. For class 4, the classifier is slightly lacking precision; whereas for class 2, it is slightly lacking recall.

Perhaps I would gain more insight after visualizing the correct and incorrect predictions.

Here is a subset of correctly predicted classes.

In [19]:

for i, correct in enumerate(correct[:9]):
    plt.subplot(3,3,i+1)
    plt.imshow(X_test[correct].reshape(28,28), cmap='gray', interpolation='none')
    plt.title("Predicted {}, Class {}".format(predicted_classes[correct], y_true[correct]))
    plt.tight_layout()

And here is a subset of incorrectly predicted classes.

In [20]:

for i, incorrect in enumerate(incorrect[0:9]):
    plt.subplot(3,3,i+1)
    plt.imshow(X_test[incorrect].reshape(28,28), cmap='gray', interpolation='none')
    plt.title("Predicted {}, Class {}".format(predicted_classes[incorrect], y_true[incorrect]))
    plt.tight_layout()

Visualizing What My Model Learns¶

It’s often said that deep-learning models are “black boxes”: learning representations that are difficult to extract and present in a human-readable form. Although this is partially true for certain types of deep-learning models, it’s definitely not true for convnets. The representations learned by convnets are highly amenable to visualization, in large part because they’re representations of visual concepts.

Here I attempt to visualize the intermediate CNN outputs (intermediate activations). Visualizing intermediate activations consists of displaying the feature maps that are output by various convolution and pooling layers in a network, given a certain input (the output of a layer is often called its activation, the output of the activation function). This gives a view into how an input is decomposed into the different filters learned by the network.

I want to visualize feature maps with three dimensions: width, height, and depth (channels). Each channel encodes relatively independent features, so the proper way to visualize these feature maps is by independently plotting the contents of every channel as a 2D image.

I first get an input test image (#100).

In [24]:

test_im1 = X_train[100]
plt.imshow(test_im1.reshape(28,28), cmap='viridis', interpolation='none')
plt.show()

In order to extract the feature maps I want to look at, I create a Keras model that takes batches of images as input, and outputs the activations of all convolution and pooling layers. To do this, I use the Keras class Model. A model is instantiated using two arguments: an input tensor (or list of input tensors) and an output tensor (or list of output tensors). The resulting class is a Keras model, mapping the specified inputs to the specified outputs. When fed an image input, this model returns the values of the layer activations in the original model.

In [25]:

from keras import models
# extracts the outputs of the top 8 layers
layer_outputs = [layer.output for layer in cnn1.layers[:8]]

# creates a model that will return these outputs, given the model input
activation_model = models.Model(input=cnn1.input, output=layer_outputs)

# returns a list of Numpy arrays: one array per layer activation
activations = activation_model.predict(test_im1.reshape(1,28,28,1))

# activation of the 1st convolution layer
first_layer_activation = activations[0]

# display the 4th channel of the activation of the 1st layer of the original model
plt.matshow(first_layer_activation[0, :, :, 4], cmap='viridis')

/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/ipykernel_launcher.py:3: UserWarning: Update your `Model` call to the Keras 2 API: `Model(inputs=Tensor("co..., outputs=[<tf.Tenso...)`
  This is separate from the ipykernel package so we can avoid doing imports until

Out[25]:

<matplotlib.image.AxesImage at 0x120794898>

Great, let's do the same things for 2 other test images: no.500 and no.1000.

In [26]:

test_im2 = X_train[500]
plt.imshow(test_im2.reshape(28,28), cmap='viridis', interpolation='none')
plt.show()

In [27]:

activations = activation_model.predict(test_im2.reshape(1,28,28,1))
first_layer_activation = activations[0]
plt.matshow(first_layer_activation[0, :, :, 4], cmap='viridis')

Out[27]:

<matplotlib.image.AxesImage at 0x1202ada20>

In [28]:

test_im3 = X_train[1000]
plt.imshow(test_im3.reshape(28,28), cmap='viridis', interpolation='none')
plt.show()

In [29]:

activations = activation_model.predict(test_im3.reshape(1,28,28,1))
first_layer_activation = activations[0]
plt.matshow(first_layer_activation[0, :, :, 4], cmap='viridis')

Out[29]:

<matplotlib.image.AxesImage at 0x12048d898>

Let's plot a complete visualization of all the activations in the network. I extract and plot every channel in each of the eight activation maps, and then stack the results in one big image tensor, with channels stacked side by side.

In [30]:

layer_names = []
for layer in cnn1.layers[:-1]:
    layer_names.append(layer.name) 
images_per_row = 16
for layer_name, layer_activation in zip(layer_names, activations):
    if layer_name.startswith('conv'):
        n_features = layer_activation.shape[-1]
        size = layer_activation.shape[1]
        n_cols = n_features // images_per_row
        display_grid = np.zeros((size * n_cols, images_per_row * size))
        for col in range(n_cols):
            for row in range(images_per_row):
                channel_image = layer_activation[0,:, :, col * images_per_row + row]
                channel_image -= channel_image.mean()
                channel_image /= channel_image.std()
                channel_image *= 64
                channel_image += 128
                channel_image = np.clip(channel_image, 0, 255).astype('uint8')
                display_grid[col * size : (col + 1) * size,
                             row * size : (row + 1) * size] = channel_image
        scale = 1. / size
        plt.figure(figsize=(scale * display_grid.shape[1],
                            scale * display_grid.shape[0]))
        plt.title(layer_name)
        plt.grid(False)
        plt.imshow(display_grid, aspect='auto', cmap='viridis')