Notebook

Visualizing Convolutional Neural Networks¶

Code samples adapted from Francois Chollet: https://github.com/fchollet/deep-learning-with-python-notebooks/

SMU Contributors: Ian Johnson, Eric C. Larson

Deep learning models are frequently treated with a "black-box" mentality: users simply care about output and don't care to take a look inside to see what's going on. Since convolutional neural networks are, abstractly speaking, "representations of visual concepts" (Chollet), they are viable for visualization.

There are numerous ways to visualize the underlying structure and state of a CNN. Chollet explores 3 particularly useful and intuitive techniques for visualizing convolutional networks:

Visualizing intermediate outputs / activations
Visualizing convolutional filters
Visualizing heatmaps of class activations in input images (for classification CNNs only)

Loading a Model¶

The following visualization code is adapted from Chollet's Jupyter Notebooks for his book: https://github.com/fchollet/deep-learning-with-python-notebooks/

In [2]:

import keras
keras.__version__

from keras.models import load_model

model = load_model('models/cats_and_dogs_small_2.h5')
model.summary()  # As a reminder.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_21 (Conv2D)           (None, 148, 148, 32)      896       
_________________________________________________________________
max_pooling2d_21 (MaxPooling (None, 74, 74, 32)        0         
_________________________________________________________________
conv2d_22 (Conv2D)           (None, 72, 72, 64)        18496     
_________________________________________________________________
max_pooling2d_22 (MaxPooling (None, 36, 36, 64)        0         
_________________________________________________________________
conv2d_23 (Conv2D)           (None, 34, 34, 128)       73856     
_________________________________________________________________
max_pooling2d_23 (MaxPooling (None, 17, 17, 128)       0         
_________________________________________________________________
conv2d_24 (Conv2D)           (None, 15, 15, 128)       147584    
_________________________________________________________________
max_pooling2d_24 (MaxPooling (None, 7, 7, 128)         0         
_________________________________________________________________
flatten_6 (Flatten)          (None, 6272)              0         
_________________________________________________________________
dropout_3 (Dropout)          (None, 6272)              0         
_________________________________________________________________
dense_11 (Dense)             (None, 512)               3211776   
_________________________________________________________________
dense_12 (Dense)             (None, 1)                 513       
=================================================================
Total params: 3,453,121
Trainable params: 3,453,121
Non-trainable params: 0
_________________________________________________________________

Loading and Formatting an Image¶

Here we load an image to use for our visualization. Feel free to try an image of your own choosing.

In [8]:

img_url = 'https://raw.githubusercontent.com/8000net/LectureNotes/master/images/dog.jpg'

# We preprocess the image into a 4D tensor
from keras.preprocessing import image
import numpy as np
import requests
from io import BytesIO
from PIL import Image

def load_image_as_array(url, size=(150, 150)):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    img = img.resize(size)
    return np.array(img).astype(float)

img_tensor = load_image_as_array(img_url)
img_tensor = np.expand_dims(img_tensor, axis=0)
# Remember that the model was trained on inputs
# that were preprocessed in the following way:
img_tensor /= 255.

# Its shape is (1, 150, 150, 3)
print(img_tensor.shape)

import matplotlib.pyplot as plt

plt.imshow(img_tensor[0])
plt.show()

(1, 150, 150, 3)

Visualizing Intermediate Activations¶

Given an arbitrary input to a CNN, we can visualize the activation of each layer of the network by displaying the feature maps that are output by each layer of the network. For each layer, there is a 3-dimensional feature map which can be visualized as a set of 2D images, one for each channel. The resulting images are representations of the filter activations of individual channels in a convolutional or pooling layer. Chollet uses the following example of a cat image to visualize the contents of a CNN used to descriminate between cats and dogs.

Visualizing the Activation of a Channel¶

Now that we've loaded an image to visualize the network with, we can create a Model that accepts inputs of image batches and returns outputs of all the layers of the original network.

"To do this, we will use the Keras class Model. A Model is instantiated using two arguments: an input tensor (or list of input tensors), and an output tensor (or list of output tensors). The resulting class is a Keras model, just like the Sequential models that you are familiar with, mapping the specified inputs to the specified outputs. What sets the Model class apart is that it allows for models with multiple outputs, unlike Sequential. For more information about the Model class, see Chapter 7, Section 1." (Chollet)

In [9]:

from keras import models

# Extracts the outputs of the top 8 layers:
layer_outputs = [layer.output for layer in model.layers[:8]]
# Creates a model that will return these outputs, given the model input:
activation_model = models.Model(inputs=model.input, outputs=layer_outputs)

Notice that this is not a traditional Keras model in that there are multiple outputs. Generally speaking, Keras supports an arbitrary number of inputs and outputs to a model, but we have used one-input, one-output models up to this point.

In [10]:

# This will return a list of 5 Numpy arrays:
# one array per layer activation
activations = activation_model.predict(img_tensor)
first_layer_activation = activations[0]
print(first_layer_activation.shape)

(1, 148, 148, 32)

Visualizing a Single Channel¶

Using this multi-output model, we can visualize the activation of any arbitrary channel of any layer of the network. For example, here is a channel of the first layer that appears to function as a diagonal edge detector.

In [11]:

import matplotlib.pyplot as plt

plt.matshow(first_layer_activation[0, :, :, 29], cmap='viridis')
plt.show()

This can be extended to visualize every channel of every layer in the network, which gives us eyes into the black box of the convolutional cats-and-dogs network. The following code (Chollet) plots every single channel side-by-side for each layer of the network.

Visualizing the Entire Network's Activations¶

In [12]:

import keras

# These are the names of the layers, so can have them as part of our plot
layer_names = []
for layer in model.layers[:8]:
    layer_names.append(layer.name)

images_per_row = 16

# Now let's display our feature maps
for layer_name, layer_activation in zip(layer_names, activations):
    # This is the number of features in the feature map
    n_features = layer_activation.shape[-1]

    # The feature map has shape (1, size, size, n_features)
    size = layer_activation.shape[1]

    # We will tile the activation channels in this matrix
    n_cols = n_features // images_per_row
    display_grid = np.zeros((size * n_cols, images_per_row * size))

    # We'll tile each filter into this big horizontal grid
    for col in range(n_cols):
        for row in range(images_per_row):
            channel_image = layer_activation[0,
                                             :, :,
                                             col * images_per_row + row]
            # Post-process the feature to make it visually palatable
            channel_image -= channel_image.mean()
            channel_image /= channel_image.std()
            channel_image *= 64
            channel_image += 128
            channel_image = np.clip(channel_image, 0, 255).astype('uint8')
            display_grid[col * size : (col + 1) * size,
                         row * size : (row + 1) * size] = channel_image

    # Display the grid
    scale = 1. / size
    plt.figure(figsize=(scale * display_grid.shape[1],
                        scale * display_grid.shape[0]))
    plt.title(layer_name)
    plt.grid(False)
    plt.imshow(display_grid, aspect='auto', cmap='viridis')
    
plt.show()

/hpc/applications/anaconda/3/lib/python3.6/site-packages/ipykernel_launcher.py:30: RuntimeWarning: invalid value encountered in true_divide

Observations¶

Notice that as we move downward through this figure, into deeper layers of the network, the activations retain less and less of the original input content. The channels serve primarily as edge detectors and the like, while later layers retain very little of the original form of the image. Moreover, in some later layers, there is no activation. For example, in max_pooling2d_23, there are two channels with no activation. This indicates that the pattern represented by that channel is not present in the source image.

It is important to identify that, as you move deeper into the network, the features that activate a given channel become more and more abstract. Chollet describes this behaviour by comparing a convolutional neural network to an information distillation pipeline, which iteratively transforms raw data such that irrelevant information is "filtered out and useful information is magnified."

I encourage you to try your own image of a dog or cat and examine the activations with that image. Spend some time looking at what all the activations look like. Can you identify any high-level abstract concepts that are identified by later-level layers? Perhaps dog ears or cat eyes?

Live Demo Activations.py¶

Now let's look at analyzing this through the WebCam for VGG. You can run the following scripts in python assuming you have OpenCV installed.

cd activation-demo

python Activations.py

[Back to Slides]¶

Visualizing Convolutional Filters in VGG16¶

We can also perform the inverse operation, in some sense, by synthesizing images to maximize the response from a specific filter. This allows us to visualize the pattern that a given channel responds to. This can be performed with gradient ascent, a process wherein you perform gradient descent on the input image to a network, starting with a blank image. The final result image, in an optimal gradient ascent, is maximally activating of the chosen filter.

Chollet performs this with the following code, which utliizes stochastic gradient descent to synthesize an image which maximally activates an arbitrary filter

In [13]:

from keras.applications import VGG16
from keras import backend as K

# Load the pre-trained VGG16 model
model = VGG16(weights='imagenet', include_top=False)

# Selecting a layer and channel to visualize
layer_name = 'block3_conv1'
filter_index = 0
 
# Isolate the output and loss for the given chanel
layer_output = model.get_layer(layer_name).output
loss = K.mean(layer_output[:, :, :, filter_index])

# We take the gradient of this loss using keras backend.gradients
grads = K.gradients(loss, model.input)[0]

# Before performing gradient descent, we divide the gradient tensor by its L2 norm (square root
# of the mean of the square of values in the tensor). We add a small epsilon term to the L2 norm
# to avoid division by zero.
grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

# We use a keras backend function to accept a numpy tensor and return a loss and gradient for that tensor.
iterate = K.function([model.input], [loss, grads])

# To quickly test the interface:
import numpy as np
loss_value, grads_value = iterate([np.zeros((1, 150, 150, 3))])

Downloading data from https://github.com/fchollet/deep-learning-models/releases/download/v0.1/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58892288/58889256 [==============================] - 4s 0us/step

When we perform gradient descent and iteratively update the blank image, the result is not guaranteed to be a valid, displayable image. We can fix that using a simple utility function below:

In [14]:

def deprocess_image(x):
    # normalize tensor: center on 0., ensure std is 0.1
    x -= x.mean()
    x /= (x.std() + 1e-5)
    x *= 0.1

    # clip to [0, 1]
    x += 0.5
    x = np.clip(x, 0, 1)

    # convert to RGB array
    x *= 255
    x = np.clip(x, 0, 255).astype('uint8')
    return x

A Generic Function for Filter Activation Optimization¶

Using this deprocess_image utility function, and the code above, we can write a generalized function to generate an optimal image for activating any arbitrary filter in the network. Notice the additonal code at the bottom of the function which, for 40 iterations, performs gradient ascent on the input image to adjust it to maximize filter activation.

In [15]:

def generate_pattern(layer_name, filter_index, size=150):
    # Build a loss function that maximizes the activation
    # of the nth filter of the layer considered.
    layer_output = model.get_layer(layer_name).output
    loss = K.mean(layer_output[:, :, :, filter_index])

    # Compute the gradient of the input picture wrt this loss
    grads = K.gradients(loss, model.input)[0]

    # Normalization trick: we normalize the gradient
    grads /= (K.sqrt(K.mean(K.square(grads))) + 1e-5)

    # This function returns the loss and grads given the input picture
    iterate = K.function([model.input], [loss, grads])
    
    # We start from a gray image with some noise
    input_img_data = np.random.random((1, size, size, 3)) * 20 + 128.

    # Run gradient ascent for 40 steps
    step = 1.
    for i in range(40):
        loss_value, grads_value = iterate([input_img_data])
        input_img_data += grads_value * step
        # todo: see about deprocessing the image (make sure it works)
        
    img = input_img_data[0]
    return deprocess_image(img)

Visualizing a Single Filter¶

In [16]:

plt.imshow(generate_pattern('block3_conv1', 7))
plt.show()

Visualizing the Entire Network's Filters¶

This can be repeated for any filter in the network. The following code from Chollet plots 64 of the filters from each convolutional layer, with black margins between them in 4 8x8 grids of filters.

In [18]:

for layer_name in ['block1_conv1', 'block2_conv1', 'block3_conv1', 'block4_conv1']:
    size = 64
    margin = 5

    # This a empty (black) image where we will store our results.
    results = np.zeros((8 * size + 7 * margin, 8 * size + 7 * margin, 3))

    for i in range(8):  # iterate over the rows of our results grid
        for j in range(8):  # iterate over the columns of our results grid
            # Generate the pattern for filter `i + (j * 8)` in `layer_name`
            filter_img = generate_pattern(layer_name, i + (j * 8), size=size)
            # Put the result in the square `(i, j)` of the results grid
            horizontal_start = i * size + i * margin
            horizontal_end = horizontal_start + size
            vertical_start = j * size + j * margin
            vertical_end = vertical_start + size
            results[horizontal_start: horizontal_end, vertical_start: vertical_end, :] = filter_img

    # Display the results grid
    plt.figure(figsize=(20, 20))
    plt.imshow(results)
    plt.show()

Observations¶

Notice that the trend observed in the first set of visualizations is very evident here: as we move into deeper layers of the network, the convolutional filters resemble increasingly abstract ideas. The first layer consists of primarily simple textures, and the filters get increasingly complex until we reach the final layer, where filters have abstract meanings recognizable to the human eye (feathers, eyes, bricks, etc.)

Going Beyond the Demo:¶

Now try to manipulate the above code to generate images like DeepDream. To achieve this:

Implement a "random shift" process for adding noise into the image
Implement L1 normalization of the gradient
Implement clipping during the iteration process

[Back To Slides]¶

Classification Activation Heatmaps¶

For classification CNNs, it can be useful to identify which parts of an input image have the most influence in the final output classification. This general technique of visualization is referred to as Class Activation Maps (CAMs), and we will use Chollet's implementation of Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization to show some examples of CAMs for VGG16.

In [1]:

from keras.applications.vgg16 import VGG16
from keras import backend as K

K.clear_session()

# Note that we are including the densely-connected classifier on top;
# all previous times, we were discarding it.
model = VGG16(weights='imagenet')

model.summary()

/Users/eclarson/anaconda3/envs/mlenv/lib/python3.6/site-packages/h5py/__init__.py:36: FutureWarning: Conversion of the second argument of issubdtype from `float` to `np.floating` is deprecated. In future, it will be treated as `np.float64 == np.dtype(float).type`.
  from ._conv import register_converters as _register_converters
Using TensorFlow backend.

_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         (None, 224, 224, 3)       0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0
_________________________________________________________________

Formatting an Image for VGG¶

Now we will use some pre-processing code to convert an arbitrary image into the correct format for VGG. The sample input image here is of Dallas Hall and lawn on a sunny day.

In [10]:

from keras.preprocessing import image
from keras.applications.vgg16 import preprocess_input, decode_predictions
import numpy as np
from matplotlib import pyplot as plt
%matplotlib inline

import requests
from io import BytesIO
from PIL import Image

def load_image_as_array(url, size=(150, 150)):
    response = requests.get(url)
    img = Image.open(BytesIO(response.content))
    img = img.resize(size)
    return np.array(img).astype(float)

# The local path to our target image
img_url = 'https://raw.githubusercontent.com/8000net/LectureNotes/master/images/dallas_hall.jpg'

img = load_image_as_array(img_url, size=(224, 224))

# We add a dimension to transform our array into a "batch"
# of size (1, 224, 224, 3)
x = np.expand_dims(img, axis=0)

# Finally we preprocess the batch
# (this does channel-wise color normalization)
x = preprocess_input(x)

plt.imshow(np.squeeze(x)/256.0+0.5)

Clipping input data to the valid range for imshow with RGB data ([0..1] for floats or [0..255] for integers).

Out[10]:

<matplotlib.image.AxesImage at 0xb2eb297f0>

Getting a Prediction from VGG¶

In [11]:

preds = model.predict(x)
print('Predicted:', decode_predictions(preds, top=3)[0])

predicted_class = np.argmax(preds)

Predicted: [('n03877845', 'palace', 0.4713494), ('n03733281', 'maze', 0.41759244), ('n04355338', 'sundial', 0.043737043)]

The model interprets Dallas Hall (and lawn) to be a palace. (A reasonable prediction, if you ask me)

Creating an Activation Heatmap¶

To visualize why the maximal class was chosen, we use the following Grad-CAM implementation from Chollet. Grad-CAM is a visualization technique originally invented by Selvaraju, et. al which uses the gradient of the predicted class at the final convolutional layer to generate a map of pixel locations on the input image which cause the relevant predicted class to be chosen.

In [60]:

predicted_class_output = model.output[:, predicted_class] # defines class of interest

# The is the output feature map of the `block5_conv3` layer,
# the last convolutional layer in VGG16
last_conv_layer = model.get_layer('block5_conv3')

# This is the gradient of the predicted class with regard to
# the output feature map of `block5_conv3`
grads = K.gradients(predicted_class_output, last_conv_layer.output)[0]

# This is a vector of shape (512,), where each entry
# is the mean intensity of the gradient over a specific feature map channel
pooled_grads = K.mean(grads, axis=(0, 1, 2))

# This function allows us to access the values of the quantities we just defined:
# `pooled_grads` and the output feature map of `block5_conv3`,
# given a sample image
iterate = K.function([model.input], [pooled_grads, last_conv_layer.output[0]])

# These are the values of these two quantities, as Numpy arrays,
# given our sample image
pooled_grads_value, conv_layer_output_value = iterate([x])

# We multiply each channel in the feature map array
# by "how important this channel is" with regard to the predicted class
for i in range(512):
    conv_layer_output_value[:, :, i] *= pooled_grads_value[i]

# The channel-wise mean of the resulting feature map
# is our heatmap of class activation
heatmap = np.mean(conv_layer_output_value, axis=-1)

# We then normalize the heatmap 0-1 for visualization:
heatmap = np.maximum(heatmap, 0)
heatmap /= np.max(heatmap)

Overlaying the Heatmap with OpenCV¶

In [61]:

import cv2

# We use cv2 to load the original image
img = cv2.imread(img_path)

# We resize the heatmap to have the same size as the original image
heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))

# We convert the heatmap to RGB
heatmap = np.uint8(255 * heatmap)

# We apply the heatmap to the original image
heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)

# 0.4 here is a heatmap intensity factor
superimposed_img = heatmap * 0.4 + img

cv2.imwrite('images/dallas_hall_palace_heatmap.jpg', superimposed_img)

Out[61]:

True

Dallas Hall Lawn Palace Activation¶

The following heatmap shows that the center of the building's facade is what most identifies it as a palace.

alt text

Dallas Hall Lawn Maze Activation¶

VGG also identifies that Dallas Hall looks like a maze. The heatmap of the activation for the maze class shows that the crossing sidewalks are the culprit. This is quite intuitive, as a series of crossing paths is likely to be indicative of a maze.

alt text

Dallas Hall Lawn Sundial Activation¶

Finally, VGG identifies that the image may be a sundial. This is a slightly less intuitive reaction to the image (and, accordingly, the sundial class activates to a much lesser degree than the previous classes). However, the class is activated by the fountain and to a lesser extent by the top of the dome on top of Dallas hall. Upon further inspection, it is reasonable to see how those two parts of the image could be confused for a sundial.

alt text

Explore VGG Activation Heatmaps for Yourself¶

I encourage you to try this with an image of your own choosing and explore the code in detail.

Live Demo Heatmap.py¶

Now let's look at analyzing this through the WebCam for VGG. You can run the following scripts in python assuming you have OpenCV installed.

cd activation-demo

python Heatmap.py

Package Watermark¶

The following software versions were used to run this Jupyter notebook:

Python 3.5.2
Keras==2.1.3
matplotlib==1.5.1
numpy==1.14.0
opencv-python==3.4.0.12
tensorflow==1.5.0

In [ ]: