Part 4 of 4 - Real-Time Facial Expression Recognition

| Final Capstone Project for the Diploma Program in Data Science | BrainStation Vancouver |

| Arash Tavassoli | May-June 2019 |


This is the fourth notebook in a series of four:

  • Part 1 - Exploratory Data Analysis

  • Part 2 - Data Preprocessing

  • Part 3 - Model Training and Analysis

  • Part 4 - Real-Time Facial Expression Recognition

What to expect in this notebook:

A Real-time Facial Expression Recognizer that will use the trained models from Part 3 to do real-time expression classification from the video captured by the computer's webcam


Building a Real-Time Facial Expression Recognizer with OpenCV

In this part we use the trained model from Part 3 and pair its prediction capabilities with OpenCV to build a simple, but fun, tool to take the live video from computer's webcam and do real-time prediction on the detected face:

Let's start with building a function that:

  1. Inputs the trained model and list of expected classes
  2. Starts capturing the video from computer's webcam
  3. Detects the largest face on each frame (with min size limit of 1/20 of the frame's smaller dimension)
  4. Predicts the facial expression using the trained model
  5. Annotate the frame with a box around the face and the predicted expression (and probabilities for each class)
  6. Releases the frame to the screen
In [1]:
import cv2
import numpy as np
from keras import models
Using TensorFlow backend.
In [2]:
root_dir = '/Users/Arash/Google Drive/Colab Notebooks/BrainStation Capstone - Colab'
In [3]:
# A function to get the trained model and start a real-time 
# facial expression recognizer using computer's webcam (OpenCV):

def RealTimeExpressionRecognizer(model, classes):
    # Load the CascadeClassifier:
    face_cascade = cv2.CascadeClassifier('Data/haarcascade_frontalface_alt.xml')

    video = cv2.VideoCapture(0)
    
    # Setting color and font params:
    cv_color = (33, 83, 244)
    cv_font = cv2.FONT_HERSHEY_SIMPLEX

    while True:
        _, frame = video.read()

        # Converting the frame to grayscale:
        im = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

        # Detecting faces in frame (with min size limit of 1/20 of the frame's smaller dimension):
        min_acceptable = int(min(frame.shape)/20)
        detected_faces = face_cascade.detectMultiScale(im, minSize = (min_acceptable, min_acceptable))

        # Checking if there is no face:        
        if detected_faces is ():
                
            # Text box to instruct the user of how to quit the live stream:
            cv2.putText(img = frame, 
                        text = 'Press Q to quit', 
                        org = (10, 20), 
                        fontFace = cv_font, fontScale = 0.7, thickness = 1,
                        color = (255,255,255))
                
            # Text box for when there is no face detected in the frame:
            no_face_text = "I'm waiting! Show me your smile! :)"
            textsize = cv2.getTextSize(no_face_text, cv_font, 1, 2)[0]
            textX = int((frame.shape[1] - textsize[0]) / 2)
            textY = int((frame.shape[0] + textsize[1]) / 2)
            cv2.putText(img = frame, 
                        text = no_face_text, 
                        org = (textX, textY), 
                        fontFace = cv_font, fontScale = 1, thickness = 2,
                        color = (255,255,255))
        else:
                
            # Picking the largest face, cropping, resizing and scaling it:
            x, y, width, height = detected_faces[np.argmax(detected_faces[:,2])]
            im = im[y:y + height, x:x + width]
            im = cv2.resize(im, (100, 100))
            im = im/255

            # preparing for model prediction, feeding to model:
            img_array = np.array(im)
            img_array = img_array.reshape(1,100,100,1)
            prediction = model.predict(img_array)[0]

            # Drawing the rectangle and inserting texts:
            cv2.rectangle(frame, (x, y), (x + width, y + height), cv_color, 3)
                
            # Text box to instruct the user of how to quit the live stream:
            cv2.putText(img = frame, 
                        text = 'Press Q to quit', 
                        org = (10, 20), 
                        fontFace = cv_font, fontScale = 0.5, thickness = 1,
                        color = (255,255,255))
                
            # Adding prediction probabilities to the frame:
            cv2.putText(img = frame, 
                        text = f'{classes[0]}: {int(prediction[0]*100)}%', 
                        org = (x + width + 10, y + 10), 
                        fontFace = cv_font, fontScale = 0.5, thickness = 1, 
                        color = cv_color)
            cv2.putText(img = frame, 
                        text = f'{classes[1]}: {int(prediction[1]*100)}%', 
                        org = (x + width + 10, y + 30), 
                        fontFace = cv_font, fontScale = 0.5, thickness = 1, 
                        color = cv_color)
            cv2.putText(img = frame, 
                        text = f'{classes[2]}: {int(prediction[2]*100)}%', 
                        org = (x + width + 10, y + 50), 
                        fontFace = cv_font, fontScale = 0.5, thickness = 1, 
                        color = cv_color)
            if len(classes) == 5:
                cv2.putText(img = frame, 
                            text = f'{classes[3]}: {int(prediction[3]*100)}%', 
                            org = (x + width + 10, y + 70), 
                            fontFace = cv_font, fontScale = 0.5, thickness = 1, 
                            color = cv_color)
                cv2.putText(img = frame, 
                            text = f'{classes[4]}: {int(prediction[4]*100)}%', 
                            org = (x + width + 10, y + 90), 
                            fontFace = cv_font, fontScale = 0.5, thickness = 1, 
                            color = cv_color)
            
            # Adding the predicted expression:
            cv2.putText(img = frame, 
                        text = f'You are {classes[np.argmax(prediction)]}', 
                        org = (x, y - 20), 
                        fontFace = cv_font, fontScale = 1, thickness = 2,
                        color = cv_color)

        # Showing the frame with annotations:
        cv2.imshow("Real-Time Facial Expression Recognizer", frame)
            
        # Allowing for quit using Q key:
        key=cv2.waitKey(1)
        if key == ord('q'):
             break
    video.release()
    cv2.destroyAllWindows()

All we need to do now is to load the trained models and feed them into the RealTimeExpressionRecognizer function:

1. Model with 3 Classes (Happy, Sad and Surprised):

In [ ]:
# Loading the trained model:
model = models.load_model(root_dir + '/Models/9/model.h5')

# List of expected classes:
classes = ['Happy', 'Sad', 'Surprised']

RealTimeExpressionRecognizer(model, classes)

2. Model with 5 Classes (Neutral, Happy, Sad, Surprised and Angry):

In [ ]:
# Loading the trained model:
model = models.load_model(root_dir + '/Models/11/model.h5')

# List of expected classes:
classes = ['Neutral', 'Happy', 'Sad', 'Surprised', 'Angry']

RealTimeExpressionRecognizer(model, classes)
WARNING:tensorflow:From /Users/Arash/anaconda3/lib/python3.7/site-packages/tensorflow/python/framework/op_def_library.py:263: colocate_with (from tensorflow.python.framework.ops) is deprecated and will be removed in a future version.
Instructions for updating:
Colocations handled automatically by placer.
WARNING:tensorflow:From /Users/Arash/anaconda3/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3445: calling dropout (from tensorflow.python.ops.nn_ops) with keep_prob is deprecated and will be removed in a future version.
Instructions for updating:
Please use `rate` instead of `keep_prob`. Rate should be set to `rate = 1 - keep_prob`.
WARNING:tensorflow:From /Users/Arash/anaconda3/lib/python3.7/site-packages/tensorflow/python/ops/math_ops.py:3066: to_int32 (from tensorflow.python.ops.math_ops) is deprecated and will be removed in a future version.
Instructions for updating:
Use tf.cast instead.

Heads-up: Due to some compatibility issues between Jupyter Notebook and OpenCV, it is recommended to run this code as a stand-alone .py file on the terminal or an IDE such as PyCharm. Running on Jupyter Notebook is expected to work but you may not be able to close the capturing screen using the Q key (in such case you will need to force quit and restart the Python kernel).