Objectives:
%matplotlib inline
import warnings
import numpy as np
import matplotlib.pyplot as plt
np.random.seed(1)
# Load a pre-trained ResNet50
# We use include_top = False for now,
# as we'll import output Dense Layer later
import tensorflow as tf
from tensorflow.keras.applications.resnet50 import ResNet50
base_model = ResNet50(include_top=False)
print(base_model.output_shape)
(None, None, None, 2048)
#print(base_model.summary())
res5c = base_model.layers[-1]
type(res5c)
tensorflow.python.keras.layers.core.Activation
res5c.output_shape
(None, None, None, 2048)
res5c
residual block, the resnet outputs a tensor of shape $W \times H \times 2048$.The regular ResNet head after the base model is as follows:
x = base_model.output
x = GlobalAveragePooling2D()(x)
x = Dense(1000)(x)
x = Softmax()(x)
Here is the full definition of the model: https://github.com/keras-team/keras-applications/blob/master/keras_applications/resnet50.py
We build the following Custom Layer to apply a softmax only to the last dimension of a tensor:
from tensorflow.keras import layers
# A custom layer in Keras must implement the four following methods:
class SoftmaxMap(layers.Layer):
# Init function
def __init__(self, axis=-1, **kwargs):
self.axis = axis
super(SoftmaxMap, self).__init__(**kwargs)
# There's no parameter, so we don't need this one
def build(self, input_shape):
pass
# This is the layer we're interested in:
# very similar to the regular softmax but note the additional
# that we accept x.shape == (batch_size, w, h, n_classes)
# which is not the case in Keras by default.
# Note also that we substract the logits by their maximum to
# make the softmax numerically stable.
def call(self, x, mask=None):
e = tf.exp(x - tf.math.reduce_max(x, axis=self.axis, keepdims=True))
s = tf.math.reduce_sum(e, axis=self.axis, keepdims=True)
return e / s
# The output shape is the same as the input shape
def get_output_shape_for(self, input_shape):
return input_shape
Let's check that we can use this layer to normalize the classes probabilities of some random spatial predictions:
n_samples, w, h, n_classes = 10, 3, 4, 5
random_data = np.random.randn(n_samples, w, h, n_classes).astype("float32")
random_data.shape
(10, 3, 4, 5)
Because those predictions are random, if we some accross the classes dimensions we get random values instead of class probabilities that would need to some to 1:
random_data[0].sum(axis=-1)
array([[ 0.27685624, -1.248265 , -0.17073488, -1.5251491 ], [ 2.3490455 , -1.4799201 , -3.292039 , 1.5062871 ], [-0.08336051, 2.3915372 , -1.7528347 , 3.5274372 ]], dtype=float32)
Let's create a SoftmaxMap
function from the layer and process our test data:
softmaxMap = SoftmaxMap()
softmax_mapped_data = softmaxMap(random_data).numpy()
softmax_mapped_data.shape
(10, 3, 4, 5)
All the values are now in the [0, 1] range:
softmax_mapped_data[0]
array([[[0.56862915, 0.06077185, 0.06606976, 0.0383178 , 0.2662114 ], [0.01185074, 0.6777243 , 0.05529718, 0.16287254, 0.09225518], [0.48184162, 0.01423021, 0.08089 , 0.07605473, 0.3469834 ], [0.07524367, 0.19022258, 0.09395018, 0.23576528, 0.4048183 ]], [[0.03309341, 0.31252265, 0.24507003, 0.16442387, 0.24489002], [0.11887496, 0.20828551, 0.09239109, 0.180172 , 0.40027645], [0.19135876, 0.25699514, 0.19221951, 0.16412118, 0.19530542], [0.09936516, 0.03292248, 0.12721542, 0.5291481 , 0.21134889]], [[0.1007165 , 0.05022516, 0.05779989, 0.66288376, 0.12837464], [0.04103367, 0.09390692, 0.6337432 , 0.08749201, 0.1438242 ], [0.34717003, 0.18080053, 0.08203337, 0.18132693, 0.20866917], [0.17282423, 0.22243509, 0.24389896, 0.1278988 , 0.23294285]]], dtype=float32)
The last dimension now approximately sum to one, we can therefore be used as class probabilities (or parameters for a multinouli distribution):
softmax_mapped_data[0].sum(axis=-1)
array([[0.99999994, 0.99999994, 1. , 1. ], [1. , 1. , 0.99999994, 1. ], [0.99999994, 1. , 1. , 0.9999999 ]], dtype=float32)
Note that the highest activated channel for each spatial location is still the same before and after the softmax map. The ranking of the activations is preserved as softmax is a monotonic function (when considered element-wise):
random_data[0].argmax(axis=-1)
array([[0, 1, 0, 4], [1, 4, 1, 3], [3, 2, 0, 2]])
softmax_mapped_data[0].argmax(axis=-1)
array([[0, 1, 0, 4], [1, 4, 1, 3], [3, 2, 0, 2]])
base_model.layers
to find which layer to removefrom tensorflow.keras.layers import Convolution2D
from tensorflow.keras.models import Model
input = base_model.layers[0].input
# TODO: compute per-area class probabilites
output = input
fully_conv_ResNet = Model(inputs=input, outputs=output)
# %load solutions/fully_conv.py
from tensorflow.keras.layers import Convolution2D
from tensorflow.keras.models import Model
input = base_model.layers[0].input
# Take the output of the last layer of the convnet
# layer:
x = base_model.layers[-1].output
# A 1x1 convolution, with 1000 output channels, one per class
x = Convolution2D(1000, (1, 1), name='conv1000')(x)
# Softmax on last axis of tensor to normalize the class
# predictions in each spatial area
output = SoftmaxMap(axis=-1)(x)
fully_conv_ResNet = Model(inputs=input, outputs=output)
# A 1x1 convolution applies a Dense to each spatial grid location
You can use the following random data to check that it's possible to run a forward pass on a random RGB image:
prediction_maps = fully_conv_ResNet(np.random.randn(1, 200, 300, 3)).numpy()
prediction_maps.shape
WARNING:tensorflow:Layer conv1_pad is casting an input tensor from dtype float64 to the layer's dtype of float32, which is new behavior in TensorFlow 2. The layer has dtype float32 because it's dtype defaults to floatx. If you intended to run this layer in float32, you can safely ignore this warning. If in doubt, this warning is likely only an issue if you are porting a TensorFlow 1.X model to TensorFlow 2. To change all layers to have dtype float64 by default, call `tf.keras.backend.set_floatx('float64')`. To change just this layer, pass dtype='float64' to the layer constructor. If you are the author of this layer, you can disable autocasting by passing autocast=False to the base Layer constructor.
(1, 7, 10, 1000)
How do you explain the resulting output shape?
The class probabilities should sum to one in each area of the output map:
prediction_maps.sum(axis=-1)
array([[[1. , 1.0000001 , 1. , 1. , 1. , 1. , 1. , 1. , 0.9999999 , 1. ], [1. , 0.99999994, 1. , 0.9999999 , 0.9999999 , 0.9999999 , 1.0000001 , 1. , 1. , 1. ], [0.99999994, 1. , 1. , 1. , 0.9999999 , 0.99999994, 1.0000001 , 0.9999999 , 1. , 1. ], [1. , 0.99999994, 0.99999994, 1. , 1.0000002 , 1. , 0.99999994, 0.9999999 , 1.0000002 , 1.0000001 ], [1. , 0.99999994, 1.0000001 , 0.9999999 , 1. , 1.0000001 , 1. , 1.0000001 , 0.99999994, 1. ], [1. , 1. , 1. , 1. , 1. , 1. , 0.99999994, 1. , 1. , 1. ], [1. , 1. , 0.9999999 , 1. , 1. , 1. , 1. , 0.9999999 , 0.99999994, 0.99999994]]], dtype=float32)
weights_dense.h5
import h5py
with h5py.File('weights_dense.h5', 'r') as h5f:
w = h5f['w'][:]
b = h5f['b'][:]
last_layer = fully_conv_ResNet.layers[-2]
print("Loaded weight shape:", w.shape)
print("Last conv layer weights shape:", last_layer.get_weights()[0].shape)
Loaded weight shape: (2048, 1000) Last conv layer weights shape: (1, 1, 2048, 1000)
# reshape the weights
w_reshaped = w.reshape((1, 1, 2048, 1000))
# set the conv layer weights
last_layer.set_weights([w_reshaped, b])
model.predict
to compute the outputfrom tensorflow.keras.applications.imagenet_utils import preprocess_input
from skimage.io import imread
from skimage.transform import resize
def forward_pass_resize(img_path, img_size):
img_raw = imread(img_path)
print("Image shape before resizing: %s" % (img_raw.shape,))
img = resize(img_raw, img_size, mode='reflect', preserve_range=True)
img = preprocess_input(img[np.newaxis])
print("Image batch size shape before forward pass:", img.shape)
prediction_map = fully_conv_ResNet(img).numpy()
return prediction_map
output = forward_pass_resize("dog.jpg", (800, 600))
print("prediction map shape", output.shape)
Image shape before resizing: (1600, 2560, 3) Image batch size shape before forward pass: (1, 800, 600, 3) prediction map shape (1, 25, 19, 1000)
# Helper file for importing synsets from imagenet
import imagenet_tool
synset = "n02084071" # synset corresponding to dogs
ids = imagenet_tool.synset_to_dfs_ids(synset)
print("All dog classes ids (%d):" % len(ids))
print(ids)
All dog classes ids (118): [251, 268, 256, 253, 255, 254, 257, 159, 211, 210, 212, 214, 213, 216, 215, 219, 220, 221, 217, 218, 207, 209, 206, 205, 208, 193, 202, 194, 191, 204, 187, 203, 185, 192, 183, 199, 195, 181, 184, 201, 186, 200, 182, 188, 189, 190, 197, 196, 198, 179, 180, 177, 178, 175, 163, 174, 176, 160, 162, 161, 164, 168, 173, 170, 169, 165, 166, 167, 172, 171, 264, 263, 266, 265, 267, 262, 246, 242, 243, 248, 247, 229, 233, 234, 228, 231, 232, 230, 227, 226, 235, 225, 224, 223, 222, 236, 252, 237, 250, 249, 241, 239, 238, 240, 244, 245, 259, 261, 260, 258, 154, 153, 158, 152, 155, 151, 157, 156]
for dog_id in ids[:10]:
print(imagenet_tool.id_to_words(dog_id))
print('...')
dalmatian, coach dog, carriage dog Mexican hairless Newfoundland, Newfoundland dog basenji Leonberg pug, pug-dog Great Pyrenees Rhodesian ridgeback vizsla, Hungarian pointer German short-haired pointer ...
The following function builds a heatmap from a forward pass. It sums the representation for all ids corresponding to a synset
def build_heatmap(prediction_map, synset):
class_ids = imagenet_tool.synset_to_dfs_ids(synset)
class_ids = np.array([id_ for id_ in class_ids if id_ is not None])
each_dog_proba_map = prediction_map[0, :, :, class_ids]
# this style of indexing a tensor by an other array has the following shape effect:
# (H, W, 1000) indexed by (118) ==> (118, H, W)
any_dog_proba_map = each_dog_proba_map.sum(axis=0)
print("size of heatmap: " + str(any_dog_proba_map.shape))
return any_dog_proba_map
def display_img_and_heatmap(img_path, heatmap):
dog = imread(img_path)
plt.figure(figsize=(12, 8))
plt.subplot(1, 2, 1)
plt.imshow(dog)
plt.axis('off')
plt.subplot(1, 2, 2)
plt.imshow(heatmap, interpolation='nearest', cmap="viridis")
plt.axis('off')
Exercise
"dog.jpg"
, with the following sizes:(200, 320)
(400, 640)
(800, 1280)
(1600, 2560)
(optional, requires a lot of memory)You may plot a heatmap using the above function display_img_and_heatmap
. You might also want to reuse forward_pass_resize
to compute the class maps them-selves
# dog synset
s = "n02084071"
# TODO
# %load solutions/build_heatmaps.py
s = "n02084071"
probas_1 = forward_pass_resize("dog.jpg", (200, 320))
heatmap_1 = build_heatmap(probas_1, synset=s)
display_img_and_heatmap("dog.jpg", heatmap_1)
probas_2 = forward_pass_resize("dog.jpg", (400, 640))
heatmap_2 = build_heatmap(probas_2, synset=s)
display_img_and_heatmap("dog.jpg", heatmap_2)
probas_3 = forward_pass_resize("dog.jpg", (800, 1280))
heatmap_3 = build_heatmap(probas_3, synset=s)
display_img_and_heatmap("dog.jpg", heatmap_3)
# We observe that heatmap_1 and heatmap_2 gave coarser
# segmentations than heatmap_3. However, heatmap_3
# has small artifacts outside of the dog area
# heatmap_3 encodes more local, texture level information
# about the dog, while lower resolutions will encode more
# semantic information about the full object
# combining them is probably a good idea!
Image shape before resizing: (1600, 2560, 3) Image batch size shape before forward pass: (1, 200, 320, 3) size of heatmap: (7, 10) Image shape before resizing: (1600, 2560, 3) Image batch size shape before forward pass: (1, 400, 640, 3) size of heatmap: (13, 20) Image shape before resizing: (1600, 2560, 3) Image batch size shape before forward pass: (1, 800, 1280, 3) size of heatmap: (25, 40)
By combining the heatmaps at different scales, we obtain a much better information about the location of the dog.
Bonus
from skimage.transform import resize
# TODO
# %load solutions/geom_avg.py
from skimage.transform import resize
heatmap_1_r = resize(heatmap_1, (50,80), mode='reflect',
preserve_range=True, anti_aliasing=True)
heatmap_2_r = resize(heatmap_2, (50,80), mode='reflect',
preserve_range=True, anti_aliasing=True)
heatmap_3_r = resize(heatmap_3, (50,80), mode='reflect',
preserve_range=True, anti_aliasing=True)
heatmap_geom_avg = np.power(heatmap_1_r * heatmap_2_r * heatmap_3_r, 0.333)
display_img_and_heatmap("dog.jpg", heatmap_geom_avg)
Bonus
Experiment with Semantic segmentation. You may train on COCO dataset http://mscoco.org/dataset/#overview
To go further, consider open source implementation of models rather than building your own from scratch. For instance, FAIR's detection lib (in Caffe2) provides a lot of state of the art models. https://github.com/facebookresearch/Detectron