| Arash Tavassoli | May-June 2019 |
This is the third notebook in a series of four:
Part 1 - Exploratory Data Analysis
Part 2 - Data Preprocessing
Part 3 - Model Training and Analysis
Part 4 - Real-Time Facial Expression Recognition
Convolutional Neural Networks (CNN) are currently considered as one of the top choices for any image classification problem, primarily because they are able to pick up on patterns in small parts of an image. However, this deep learning approach shows its true power only when it is presented with large enough data at the training stage. It is also considered as a computationally expensive Machine Learning system, given the large number of trainable parameters, specially in larger (deeper) network architectures.
In this project we will use Python's high-level neural networks API, Keras (using TensorFlow backend) to build and tune a Convolutional Neural Network on the images that we processed in the previous notebooks.
Considering the large number of hyperparameters that can be tuned for a Convolutional Neural Network, the approach for this project is to build a set of helper functions that, given a model number, can read the hyperparameters for each convolutional (CONV) and fully connected (FC) layers, as well as the optimization parameters, from an Excel sheet that is saved as Model Params.xlsx
. The functions will then compile the sequential model and save the trained model, the training history and some accuracy metrics on a new folder (with the same model number), and also update the Excel sheet with the training time, training accuracy and validation accuracy.
Training will focus on a model with 3 and anoher model with 5 classes of facial expressions.
The Model Params.xlsx
is saved in the same folder as the current notebook and hosts the values for the following hyperparameters that will be used during model training:
For each CONV layer:
For each FC layer:
Other parameters:
Let's start by importing the required libraries:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
import seaborn as sns
from sklearn.metrics import confusion_matrix
from sklearn.metrics import classification_report
import keras
from keras.models import Sequential
from keras.layers import Dense, Conv2D, Flatten, MaxPooling2D, Dropout, SpatialDropout2D, AveragePooling2D, BatchNormalization, Activation
from keras import regularizers
from keras.utils import plot_model
from keras import models
from google.colab import drive
from IPython.display import clear_output
import json
import time
import os
Using TensorFlow backend.
The entire model training process has been done on Google Colab (to use GPUs), therefore the following root directory is specific to my personal Google Drive where the notebook was saved:
# This root_dir is used after mounting the Google Drive on Google Colab environment
root_dir = '/gdrive/My Drive/Colab Notebooks/BrainStation Capstone - Colab'
# Mount Google Drive
drive.mount('/gdrive')
Drive already mounted at /gdrive; to attempt to forcibly remount, call drive.mount("/gdrive", force_remount=True).
Let's load the Excel sheet, with hyperparameters already saved on it:
# Loading the Excel sheet with pre-defined hyperparameters:
params_sheet = pd.read_excel(root_dir + '/Model Params.xlsx').drop(['Unnamed: 0'],axis=1)
# Optional preview of the parameters for a selected model:
model_number = 9
display(params_sheet[params_sheet['model'] == model_number])
model | layer | type | size | filter_size | filter_s | filter_p | activ_fun | kernel_init | L2_reg | pool_type | pool_size | pool_s | pool_p | dropout_rate | num_classes | num_epochs | batch_size | optimizer | learning_rate | input_pixel | train_time | train_acc | val_acc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
95 | 9 | 1 | CONV | 64 | 3 | 1 | same | relu | Xavier | 0.0001 | None | - | - | - | 0.2 | 3 | 30 | 50 | Adam | 5e-05 | 100 | 208 | 86.12 | 90.23 |
96 | 9 | 2 | CONV | 64 | 3 | 2 | same | relu | Xavier | 0.0001 | Max | 3 | 1 | valid | 0.2 | - | - | - | - | - | - | - | - | - |
97 | 9 | 3 | CONV | 128 | 3 | 1 | same | relu | Xavier | 0.0001 | None | - | - | - | 0.3 | - | - | - | - | - | - | - | - | - |
98 | 9 | 4 | CONV | 128 | 3 | 2 | same | relu | Xavier | 0.0001 | Max | 3 | 1 | valid | 0.3 | - | - | - | - | - | - | - | - | - |
99 | 9 | 5 | CONV | 256 | 3 | 1 | same | relu | Xavier | 0.0001 | None | - | - | - | 0.3 | - | - | - | - | - | - | - | - | - |
100 | 9 | 6 | CONV | 256 | 3 | 2 | same | relu | Xavier | 0.0001 | Max | 3 | 1 | valid | 0.4 | - | - | - | - | - | - | - | - | - |
101 | 9 | 7 | FC | 128 | - | - | - | relu | - | 0.0001 | - | - | - | - | 0.2 | - | - | - | - | - | - | - | - | - |
102 | 9 | 8 | FC | 64 | - | - | - | relu | - | 0.0001 | - | - | - | - | 0.2 | - | - | - | - | - | - | - | - | - |
We can now start building the helper functions that will load the data, compile the model and visualize the performance:
Given a model number (from available models on the Excel sheet), this function reads the hyperparameters from the Excel sheet and compile a sequantial model in Keras:
# A function to construct the network based on parameters saved on the excel sheet:
def model_compiler(params_sheet, model_number):
# Filtering the params_sheet to parameters specific to one given model number
param_df = params_sheet[params_sheet['model'] == model_number].reset_index()
# Loding hyperparams
num_epochs = int(param_df.loc[0, 'num_epochs'])
mini_batch_size = int(param_df.loc[0, 'batch_size'])
num_classes = int(param_df.loc[0, 'num_classes'])
optimizer_type = param_df.loc[0, 'optimizer']
learning_rate = param_df.loc[0, 'learning_rate']
input_pixel = param_df.loc[0, 'input_pixel']
# Starting to construct the sequential model
model = Sequential()
# The CONV layers
for i in param_df[param_df['type'] == 'CONV'].index:
if param_df.loc[i, 'kernel_init'] == 'Xavier':
initializer = keras.initializers.glorot_normal(seed = 1)
else:
print('Initializer Undefined')
break
# For layer 1 we need to define the input_shape
if i == 0:
model.add(Conv2D(filters = param_df.loc[i, 'size'],
kernel_size = param_df.loc[i, 'filter_size'],
strides = (param_df.loc[i, 'filter_s'], param_df.loc[i, 'filter_s']),
padding = param_df.loc[i, 'filter_p'],
kernel_initializer = initializer,
kernel_regularizer = regularizers.l2(param_df.loc[i, 'L2_reg']),
input_shape = (input_pixel, input_pixel, 1)))
else:
model.add(Conv2D(filters = param_df.loc[i, 'size'],
kernel_size = param_df.loc[i, 'filter_size'],
strides = (param_df.loc[i, 'filter_s'], param_df.loc[i, 'filter_s']),
padding = param_df.loc[i, 'filter_p'],
kernel_initializer = initializer,
kernel_regularizer = regularizers.l2(param_df.loc[i, 'L2_reg'])))
# Using batch normalization
model.add(BatchNormalization())
model.add(Activation(param_df.loc[i, 'activ_fun']))
# Add pooling
if param_df.loc[i, 'pool_type'] == 'Max':
model.add(MaxPooling2D(pool_size = (param_df.loc[i, 'pool_size'], param_df.loc[i, 'pool_size']),
strides = (param_df.loc[i, 'pool_s'], param_df.loc[i, 'pool_s']),
padding = param_df.loc[i, 'pool_p']))
if param_df.loc[i, 'pool_type'] == 'Average':
model.add(AveragePooling2D(pool_size = (param_df.loc[i, 'pool_size'], param_df.loc[i, 'pool_size']),
strides = (param_df.loc[i, 'pool_s'], param_df.loc[i, 'pool_s']),
padding = param_df.loc[i, 'pool_p']))
model.add(SpatialDropout2D(rate = param_df.loc[i, 'dropout_rate']))
# Flattening
model.add(Flatten())
# The fully connected layers
for i in param_df[param_df['type'] == 'FC'].index:
model.add(Dense(units = param_df.loc[i, 'size'],
activation = param_df.loc[i, 'activ_fun']))
model.add(Dropout(rate = param_df.loc[i, 'dropout_rate']))
# SoftMax layer as the output
model.add(Dense(units = num_classes, activation='softmax'))
# Compiling the model
if optimizer_type == 'Adam':
model.compile(optimizer = keras.optimizers.Adam(lr = learning_rate, epsilon = 1e-8),
loss = 'categorical_crossentropy',
metrics = ['accuracy'])
return model, num_epochs, mini_batch_size, input_pixel
Given the history
output from the trained model, this function prints out the training, validation and test accuracies, as well as the accuracy plots and confusion matrix for the trained model:
# A function to visualize the accuracies, confusion matrix and classification report:
def summary_visualizer(history, path, X_test, y_test, params_sheet, model_number):
param_df = params_sheet[params_sheet['model'] == model_number].reset_index()
num_classes = int(param_df.loc[0, 'num_classes'])
# Selecting the right expression names based on num of classes
if num_classes == 3:
expression_names = ['Happy', 'Sad', 'Surprised']
if num_classes == 5:
expression_names = ['Neutral','Happy', 'Sad', 'Surprised','Angry']
clear_output(wait = True)
# Defining true and predicted labels
y_true = np.argmax(y_test, axis = 1)
y_pred = np.argmax(model.predict(X_test), axis = 1)
print(f"Training accuracy:\t({(history.history['acc'][-1]*100):.2f}%, {(max(history.history['acc'])*100):.2f}%) (last epoch, highest)")
print(f"Validation accuracy:\t({(history.history['val_acc'][-1]*100):.2f}%, {(max(history.history['val_acc'])*100):.2f}%) (last epoch, highest)")
print(f"Test accuracy:\t\t({(sum(y_true == y_pred) / len(y_true) * 100):.2f}%)\n")
# Constructing the confusion matrix and calculating percentage of correct and incorrect predictions
conf_matrix = confusion_matrix(y_true = y_true, y_pred = y_pred)
conf_matrix_df = pd.DataFrame(data = conf_matrix, columns = expression_names, index = expression_names)
conf_matrix_percent = round(conf_matrix_df/conf_matrix_df.sum(axis = 1)*100, 2)
print('Model accuracy and loss function:\n')
plt.figure(figsize = (14,4))
gridspec.GridSpec(1,2)
# Summarize history for accuracy
plt.subplot2grid((1,2), (0,0))
plt.plot(history.history['acc'])
plt.plot(history.history['val_acc'])
plt.title('Model Accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='lower right')
# Summarize history for loss
plt.subplot2grid((1,2), (0,1))
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('Model Loss')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend(['Train', 'Test'], loc='upper right');
plt.savefig(path + os.sep + "Accuracy_Plot.png")
plt.show();
print('The confusion matrix:\n')
plt.figure(figsize = (12,4))
gridspec.GridSpec(1,2)
# Plotting the confusion matrix
plt.subplot2grid((1,2), (0,0))
sns.heatmap(conf_matrix_df, annot=True, cmap = 'Blues', fmt='g', linewidths = .8)
plt.ylabel('True label', labelpad = 5)
plt.xlabel('Predicted label', labelpad = 20)
plt.title('Confusion matrix')
plt.yticks(rotation=0);
# Plotting the Percentage of correct and incorrect predictions for each class
plt.subplot2grid((1,2), (0,1))
sns.heatmap(conf_matrix_percent, annot=True, cmap = 'Blues', fmt='g', linewidths = .8)
plt.ylabel('True label', labelpad = 5)
plt.xlabel('Predicted label', labelpad = 20)
plt.title('Percentage of correct/incorrect predictions for each class\n(Predicted / Total in True Class)', fontsize = 10)
plt.yticks(rotation=0);
plt.tight_layout(w_pad=8)
plt.savefig(path + os.sep + "Confusion_matrix.png")
plt.show()
# Printing the classification report
print('\nThe classification report:\n')
print(classification_report(y_true = y_true,
y_pred = y_pred,
target_names = expression_names))
This function simply loads the .npy
files that we exported in the previous notebook, based on the number of classes that is defined on the Excel sheet (for a given model number):
# A function to load the appropriate data files:
def data_loader(params_sheet, model_number):
param_df = params_sheet[params_sheet['model'] == model_number].reset_index()
num_classes = int(param_df.loc[0, 'num_classes'])
if num_classes == 3:
X_train = np.load(root_dir + '/Data/3 Expressions/X_train_3classes.npy')
X_test = np.load(root_dir + '/Data/3 Expressions/X_test_3classes.npy')
X_val = np.load(root_dir + '/Data/3 Expressions/X_val_3classes.npy')
y_train = np.load(root_dir + '/Data/3 Expressions/y_train_3classes.npy')
y_test = np.load(root_dir + '/Data/3 Expressions/y_test_3classes.npy')
y_val = np.load(root_dir + '/Data/3 Expressions/y_val_3classes.npy')
elif num_classes == 5:
X_train = np.load(root_dir + '/Data/5 Expressions/X_train_5classes.npy')
X_test = np.load(root_dir + '/Data/5 Expressions/X_test_5classes.npy')
X_val = np.load(root_dir + '/Data/5 Expressions/X_val_5classes.npy')
y_train = np.load(root_dir + '/Data/5 Expressions/y_train_5classes.npy')
y_test = np.load(root_dir + '/Data/5 Expressions/y_test_5classes.npy')
y_val = np.load(root_dir + '/Data/5 Expressions/y_val_5classes.npy')
print(f'X_train shape:\t{X_train.shape}')
print(f'y_train shape:\t{y_train.shape}')
print(f'X_val shape:\t{X_val.shape}')
print(f'y_val shape:\t{y_val.shape}')
print(f'X_test shape:\t{X_test.shape}')
print(f'y_test shape:\t{y_test.shape}')
return X_train, X_test, X_val, y_train, y_test, y_val
With the helper functions defined, we can now train the CNNs, first with 3 expressions (Happy, Sad, Surprised) and then with 5 expressions (Happy, Sad, Surprised, Angry and Neutral):
For 3 classe, training is done on different combinations of hyperparameters and the best performing model is found to be Model # 9 on the Excel sheet with the following key parameters and performance metrics:
# Loading the data (3 classes):
model_number = 9 # See the excel sheet for list of other models
X_train, X_test, X_val, y_train, y_test, y_val = data_loader(params_sheet, model_number)
# Compiling the model (3 classes):
model, num_epochs, mini_batch_size, input_pixel = model_compiler(params_sheet, model_number)
# Training the model (3 classes):
start_time = time.time()
history = model.fit(X_train, y_train, validation_data = (X_val, y_val),
batch_size = mini_batch_size,
epochs = num_epochs, verbose = 2)
end_time = time.time()
# Creating a new directory to save model and history:
path = root_dir + '/Models/' + str(model_number)
if not os.path.exists(path):
os.mkdir(path)
model.save(path + os.sep + 'model.h5') # Saving the model
json.dump(history.history, open(path + os.sep + 'Model_history.json', 'w')) # Saving the history
plot_model(model, to_file=path + os.sep + 'Model.png', show_shapes = True) # Saving the model visualization
# Updating the Excel sheet:
row_index = params_sheet[params_sheet['model'] == model_number].index[0]
params_sheet.loc[row_index, 'train_acc'] = round(max(history.history['acc'])*100, 2)
params_sheet.loc[row_index, 'val_acc'] = round(max(history.history['val_acc'])*100, 2)
params_sheet.loc[row_index, 'train_time'] = round((end_time - start_time)/60)
params_sheet.to_excel(root_dir + '/Model Params.xlsx')
# Plotting training history:
summary_visualizer(history, path, X_test, y_test, params_sheet, model_number)
Training accuracy: (86.12%, 86.12%) (last epoch, highest) Validation accuracy: (90.15%, 90.23%) (last epoch, highest) Test accuracy: (90.19%) Model accuracy and loss function:
The confusion matrix:
The classification report: precision recall f1-score support Neutral 0.89 0.94 0.92 3500 Happy 0.92 0.88 0.90 3500 Sad 0.89 0.89 0.89 3205 accuracy 0.90 10205 macro avg 0.90 0.90 0.90 10205 weighted avg 0.90 0.90 0.90 10205
Similarly, for 5 classes, training is done on different combinations of hyperparameters and the best performing model is found to be Model # 11 on the Excel sheet with the following key parameters and performance metrics:
# Loading the data (5 Classes):
model_number = 11 # See the excel sheet for list of other models
X_train, X_test, X_val, y_train, y_test, y_val = data_loader(params_sheet, model_number)
# Compiling the model (5 Classes):
model, num_epochs, mini_batch_size, input_pixel = model_compiler(params_sheet, model_number)
# Training the model (5 Classes):
start_time = time.time()
history = model.fit(X_train, y_train, validation_data = (X_val, y_val),
batch_size = mini_batch_size,
epochs = num_epochs, verbose = 2)
end_time = time.time()
# Creating a new directory to save model and history files:
path = root_dir + '/Models/' + str(model_number)
if not os.path.exists(path):
os.mkdir(path)
model.save(path + os.sep + 'model.h5') # Saving the model
json.dump(history.history, open(path + os.sep + 'Model_history.json', 'w')) # Saving the history
plot_model(model, to_file=path + os.sep + 'Model.png', show_shapes = True) # Saving the model visualization
# Updating the Excel sheet:
row_index = params_sheet[params_sheet['model'] == model_number].index[0]
params_sheet.loc[row_index, 'train_acc'] = round(max(history.history['acc'])*100, 2)
params_sheet.loc[row_index, 'val_acc'] = round(max(history.history['val_acc'])*100, 2)
params_sheet.loc[row_index, 'train_time'] = round((end_time - start_time)/60)
params_sheet.to_excel(root_dir + '/Model Params.xlsx')
# Plotting training history:
summary_visualizer(history, path, X_test, y_test, params_sheet, model_number)
Training accuracy: (71.24%, 71.24%) (last epoch, highest) Validation accuracy: (75.69%, 75.96%) (last epoch, highest) Test accuracy: (75.66%) Model accuracy and loss function:
The confusion matrix:
The classification report: precision recall f1-score support Neutral 0.63 0.66 0.64 3500 Happy 0.90 0.83 0.86 3500 Sad 0.67 0.75 0.71 3500 Surprised 0.82 0.83 0.83 3205 Angry 0.80 0.72 0.76 3500 accuracy 0.76 17205 macro avg 0.76 0.76 0.76 17205 weighted avg 0.76 0.76 0.76 17205
One interesting observation above is that for both 3-class and 5-class models, the validation accuracy (test on the plot) is always higher than the training accuracy, although a reversed trend is expected in most model training processes (similar observation with the loss values). This is an expected trend from the model training process on Keras for the following two reasons (source: Keras FAQ):
Regularization mechanisms, such as Dropout and L1/L2 weight regularization, are turned off at testing/validation time.
The training loss is the average of the losses over each batch of training data. Because the model is changing over time, the loss over the first batches of an epoch is generally higher than over the last batches. On the other hand, the testing/validation loss for an epoch is computed using the model as it is at the end of the epoch, resulting in a lower loss.
As the last section of this notebook, we look back at the human accuracies that we defined in the fist notebook (EDA) and compare our model accuracies with the accuracies that we got from hand-labelling a set of 1000 sample images for 5 classes and 600 sample images for 3 classes:
# Loading the data for 3 Classes:
model_number = 9
with open(root_dir + '/Models/9/Model_history.json') as json_file:
history = json.load(json_file)
plt.plot(history['val_acc'])
plt.title('Model Accuracy (3-Class)')
plt.ylabel('Validation Accuracy')
plt.xlabel('Epoch')
plt.axhline(0.8483, color = 'red', ls = ':', linewidth=2)
plt.text(11.5, 0.83, 'Human Accuracy = 84.8%', color = 'red', size = 14)
plt.text(12.3, 0.915, 'Model Accuracy = 90.2%', color = '#186aa3', size = 14)
plt.ylim((0.7, 0.95));
# Loading the data for 5 Classes:
model_number = 11
with open(root_dir + '/Models/11/Model_history.json') as json_file:
history = json.load(json_file)
plt.plot(history['val_acc'])
plt.title('Model Accuracy (5-Class)')
plt.ylabel('Validation Accuracy')
plt.xlabel('Epoch')
plt.axhline(0.6930, color = 'red', ls = ':', linewidth=2)
plt.text(16.68, 0.775, 'Model Accuracy = 75.9%', color = '#186aa3', size = 14)
plt.text(15.5, 0.67, 'Human Accuracy = 69.3%', color = 'red', size = 14)
plt.ylim((0.54, 0.82));
Although the validation accurcies of 90.2% and 75.9% for the 3-class and 5-class models may not sound super impressive, it is shown that the models are still performing superior to what a human can do on the same set of data. This is once again explained by the fact that many images in this dataset are difficult to classify, even for a human mind.
We can now use the saved models in the next notebook to build a real-time facial expression recognizer!