Welcome to the final Computer Vision project in the Artificial Intelligence Nanodegree program!
In this project, you’ll combine your knowledge of computer vision techniques and deep learning to build and end-to-end facial keypoint recognition system! Facial keypoints include points around the eyes, nose, and mouth on any face and are used in many applications, from facial tracking to emotion recognition.
There are three main parts to this project:
Part 1 : Investigating OpenCV, pre-processing, and face detection
Part 2 : Training a Convolutional Neural Network (CNN) to detect facial keypoints
Part 3 : Putting parts 1 and 2 together to identify facial keypoints on any image!
*Here's what you need to know to complete the project:
In this notebook, some template code has already been provided for you, and you will need to implement additional functionality to successfully complete this project. You will not need to modify the included code beyond what is requested.
a. Sections that begin with '(IMPLEMENTATION)' in the header indicate that the following block of code will require additional functionality which you must provide. Instructions will be provided for each section, and the specifics of the implementation are marked in the code block with a 'TODO' statement. Please be sure to read the instructions carefully!
In addition to implementing code, there will be questions that you must answer which relate to the project and your implementation.
a. Each section where you will answer a question is preceded by a 'Question X' header.
b. Carefully read each question and provide thorough answers in the following text boxes that begin with 'Answer:'.
Note: Code and Markdown cells can be executed using the Shift + Enter keyboard shortcut. Markdown cells can be edited by double-clicking the cell to enter edit mode.
The rubric contains optional suggestions for enhancing the project beyond the minimum requirements. If you decide to pursue the "(Optional)" sections, you should include the code in this IPython notebook.
Your project submission will be evaluated based on your answers to each of the questions and the code implementations you provide.
Each part of the notebook is further broken down into separate steps. Feel free to use the links below to navigate the notebook.
In this project you will get to explore a few of the many computer vision algorithms built into the OpenCV library. This expansive computer vision library is now almost 20 years old and still growing!
The project itself is broken down into three large parts, then even further into separate steps. Make sure to read through each step, and complete any sections that begin with '(IMPLEMENTATION)' in the header; these implementation sections may contain multiple TODOs that will be marked in code. For convenience, we provide links to each of these steps below.
Part 1 : Investigating OpenCV, pre-processing, and face detection
Part 2 : Training a Convolutional Neural Network (CNN) to detect facial keypoints
Part 3 : Putting parts 1 and 2 together to identify facial keypoints on any image!
Have you ever wondered how Facebook automatically tags images with your friends' faces? Or how high-end cameras automatically find and focus on a certain person's face? Applications like these depend heavily on the machine learning task known as face detection - which is the task of automatically finding faces in images containing people.
At its root face detection is a classification problem - that is a problem of distinguishing between distinct classes of things. With face detection these distinct classes are 1) images of human faces and 2) everything else.
We use OpenCV's implementation of Haar feature-based cascade classifiers to detect human faces in images. OpenCV provides many pre-trained face detectors, stored as XML files on github. We have downloaded one of these detectors and stored it in the detector_architectures
directory.
In the next python cell, we load in the required libraries for this section of the project.
# Import required libraries for this section
%matplotlib inline
import numpy as np
import matplotlib.pyplot as plt
import math
import cv2 # OpenCV library for computer vision
from PIL import Image
import time
Next, we load in and display a test image for performing face detection.
Note: by default OpenCV assumes the ordering of our image's color channels are Blue, then Green, then Red. This is slightly out of order with most image types we'll use in these experiments, whose color channels are ordered Red, then Green, then Blue. In order to switch the Blue and Red channels of our test image around we will use OpenCV's cvtColor
function, which you can read more about by checking out some of its documentation located here. This is a general utility function that can do other transformations too like converting a color image to grayscale, and transforming a standard color image to HSV color space.
# Load in color image for face detection
image = cv2.imread('images/test_image_1.jpg')
# Convert the image to RGB colorspace
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Plot our image using subplots to specify a size and title
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Original Image')
ax1.imshow(image)
<matplotlib.image.AxesImage at 0x7f28799f5400>
There are a lot of people - and faces - in this picture. 13 faces to be exact! In the next code cell, we demonstrate how to use a Haar Cascade classifier to detect all the faces in this test image.
This face detector uses information about patterns of intensity in an image to reliably detect faces under varying light conditions. So, to use this face detector, we'll first convert the image from color to grayscale.
Then, we load in the fully trained architecture of the face detector -- found in the file haarcascade_frontalface_default.xml - and use it on our image to find faces!
To learn more about the parameters of the detector see this post.
# Convert the RGB image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
# Extract the pre-trained face detector from an xml file
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
# Detect the faces in image
faces = face_cascade.detectMultiScale(gray, 4, 6)
# Print the number of faces detected in the image
print('Number of faces detected:', len(faces))
# Make a copy of the orginal image to draw face detections on
image_with_detections = np.copy(image)
# Get the bounding box for each detected face
for (x,y,w,h) in faces:
# Add a red bounding box to the detections image
cv2.rectangle(image_with_detections, (x,y), (x+w,y+h), (255,0,0), 3)
# Display the image with the detections
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Image with Face Detections')
ax1.imshow(image_with_detections)
Number of faces detected: 13
<matplotlib.image.AxesImage at 0x7f287992b5c0>
In the above code, faces
is a numpy array of detected faces, where each row corresponds to a detected face. Each detected face is a 1D array with four entries that specifies the bounding box of the detected face. The first two entries in the array (extracted in the above code as x
and y
) specify the horizontal and vertical positions of the top left corner of the bounding box. The last two entries in the array (extracted here as w
and h
) specify the width and height of the box.
There are other pre-trained detectors available that use a Haar Cascade Classifier - including full human body detectors, license plate detectors, and more. A full list of the pre-trained architectures can be found here.
To test your eye detector, we'll first read in a new test image with just a single face.
# Load in color image for face detection
image = cv2.imread('images/james.jpg')
# Convert the image to RGB colorspace
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Plot the RGB image
fig = plt.figure(figsize = (6,6))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Original Image')
ax1.imshow(image)
<matplotlib.image.AxesImage at 0x7f2824bffb00>
Notice that even though the image is a black and white image, we have read it in as a color image and so it will still need to be converted to grayscale in order to perform the most accurate face detection.
So, the next steps will be to convert this image to grayscale, then load OpenCV's face detector and run it with parameters that detect this face accurately.
# Convert the RGB image to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
# Extract the pre-trained face detector from an xml file
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
# Detect the faces in image
faces = face_cascade.detectMultiScale(gray, 1.25, 6)
# Print the number of faces detected in the image
print('Number of faces detected:', len(faces))
# Make a copy of the orginal image to draw face detections on
image_with_detections = np.copy(image)
# Get the bounding box for each detected face
for (x,y,w,h) in faces:
# Add a red bounding box to the detections image
cv2.rectangle(image_with_detections, (x,y), (x+w,y+h), (255,0,0), 3)
# Display the image with the detections
fig = plt.figure(figsize = (6,6))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Image with Face Detection')
ax1.imshow(image_with_detections)
Number of faces detected: 1
<matplotlib.image.AxesImage at 0x7f2824b6d5c0>
A Haar-cascade eye detector can be included in the same way that the face detector was and, in this first task, it will be your job to do just this.
To set up an eye detector, use the stored parameters of the eye cascade detector, called haarcascade_eye.xml
, located in the detector_architectures
subdirectory. In the next code cell, create your eye detector and store its detections.
A few notes before you get started:
First, make sure to give your loaded eye detector the variable name
eye_cascade
and give the list of eye regions you detect the variable name
eyes
Second, since we've already run the face detector over this image, you should only search for eyes within the rectangular face regions detected in faces
. This will minimize false detections.
Lastly, once you've run your eye detector over the facial detection region, you should display the RGB image with both the face detection boxes (in red) and your eye detections (in green) to verify that everything works as expected.
# Make a copy of the original image to plot rectangle detections
image_with_detections = np.copy(image)
# Loop over the detections and draw their corresponding face detection boxes
for (x,y,w,h) in faces:
cv2.rectangle(image_with_detections, (x,y), (x+w,y+h),(255,0,0), 3)
# Do not change the code above this comment!
## TODO: Add eye detection, using haarcascade_eye.xml, to the current face detector algorithm
## TODO: Loop over the eye detections and draw their corresponding boxes in green on image_with_detections
eye_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_eye.xml')
eyes = eye_cascade.detectMultiScale(gray, 1.01, 6)
for (x,y,w,h) in eyes:
cv2.rectangle(image_with_detections, (x,y), (x+w, y+h), (0,255,0), 3)
# Plot the image with both faces and eyes detected
fig = plt.figure(figsize = (6,6))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Image with Face and Eye Detection')
ax1.imshow(image_with_detections)
<matplotlib.image.AxesImage at 0x7f28201a3f98>
It's time to kick it up a notch, and add face and eye detection to your laptop's camera! Afterwards, you'll be able to show off your creation like in the gif shown below - made with a completed version of the code!
Notice that not all of the detections here are perfect - and your result need not be perfect either. You should spend a small amount of time tuning the parameters of your detectors to get reasonable results, but don't hold out for perfection. If we wanted perfection we'd need to spend a ton of time tuning the parameters of each detector, cleaning up the input image frames, etc. You can think of this as more of a rapid prototype.
The next cell contains code for a wrapper function called laptop_camera_face_eye_detector
that, when called, will activate your laptop's camera. You will place the relevant face and eye detection code in this wrapper function to implement face/eye detection and mark those detections on each image frame that your camera captures.
Before adding anything to the function, you can run it to get an idea of how it works - a small window should pop up showing you the live feed from your camera; you can press any key to close this window.
Note: Mac users may find that activating this function kills the kernel of their notebook every once in a while. If this happens to you, just restart your notebook's kernel, activate cell(s) containing any crucial import statements, and you'll be good to go!
### Add face and eye detection to this laptop camera function
# Make sure to draw out all faces/eyes found in each frame on the shown video feed
import cv2
import time
# wrapper function for face/eye detection with your laptop camera
def laptop_camera_go():
# Create instance of video capturer
cv2.namedWindow("face detection activated")
vc = cv2.VideoCapture(0)
# Try to get the first frame
if vc.isOpened():
rval, frame = vc.read()
else:
rval = False
# Keep the video stream open
while rval:
# Plot the image from camera with all the face and eye detections marked
# Detect the faces and eyes in image
faces = face_cascade.detectMultiScale(frame, 1.25, 6)
eyes = eye_cascade.detectMultiScale(frame, 1.1, 12)
for (x,y,w,h) in faces:
cv2.rectangle(frame, (x,y), (x+w,y+h),(255,0,0), 3)
for (x,y,w,h) in eyes:
cv2.rectangle(frame, (x,y), (x+w, y+h), (0,255,0), 3)
cv2.imshow("face detection activated", frame)
# Exit functionality - press any key to exit laptop video
key = cv2.waitKey(20)
if key > 0: # Exit by pressing any key
# Destroy windows
cv2.destroyAllWindows()
# Make sure window closes on OSx
for i in range (1,5):
cv2.waitKey(1)
return
# Read next frame
time.sleep(0.05) # control framerate for computation - default 20 frames per sec
rval, frame = vc.read()
# Call the laptop camera face/eye detector function above
laptop_camera_go()
Image quality is an important aspect of any computer vision task. Typically, when creating a set of images to train a deep learning network, significant care is taken to ensure that training images are free of visual noise or artifacts that hinder object detection. While computer vision algorithms - like a face detector - are typically trained on 'nice' data such as this, new test data doesn't always look so nice!
When applying a trained computer vision algorithm to a new piece of test data one often cleans it up first before feeding it in. This sort of cleaning - referred to as pre-processing - can include a number of cleaning phases like blurring, de-noising, color transformations, etc., and many of these tasks can be accomplished using OpenCV.
In this short subsection we explore OpenCV's noise-removal functionality to see how we can clean up a noisy image, which we then feed into our trained face detector.
In the next cell, we create an artificial noisy version of the previous multi-face image. This is a little exaggerated - we don't typically get images that are this noisy - but image noise, or 'grainy-ness' in a digitial image - is a fairly common phenomenon.
# Load in the multi-face test image again
image = cv2.imread('images/test_image_1.jpg')
# Convert the image copy to RGB colorspace
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Make an array copy of this image
image_with_noise = np.asarray(image)
# Create noise - here we add noise sampled randomly from a Gaussian distribution: a common model for noise
noise_level = 40
noise = np.random.randn(image.shape[0],image.shape[1],image.shape[2])*noise_level
# Add this noise to the array image copy
image_with_noise = image_with_noise + noise
# Convert back to uint8 format
image_with_noise = np.asarray([np.uint8(np.clip(i,0,255)) for i in image_with_noise])
# Plot our noisy image!
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Noisy Image')
ax1.imshow(image_with_noise)
<matplotlib.image.AxesImage at 0x7f28201276d8>
In the context of face detection, the problem with an image like this is that - due to noise - we may miss some faces or get false detections.
In the next cell we apply the same trained OpenCV detector with the same settings as before, to see what sort of detections we get.
# Convert the RGB image to grayscale
gray_noise = cv2.cvtColor(image_with_noise, cv2.COLOR_RGB2GRAY)
# Extract the pre-trained face detector from an xml file
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
# Detect the faces in image
faces = face_cascade.detectMultiScale(gray_noise, 4, 6)
# Print the number of faces detected in the image
print('Number of faces detected:', len(faces))
# Make a copy of the orginal image to draw face detections on
image_with_detections = np.copy(image_with_noise)
# Get the bounding box for each detected face
for (x,y,w,h) in faces:
# Add a red bounding box to the detections image
cv2.rectangle(image_with_detections, (x,y), (x+w,y+h), (255,0,0), 3)
# Display the image with the detections
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Noisy Image with Face Detections')
ax1.imshow(image_with_detections)
Number of faces detected: 12
<matplotlib.image.AxesImage at 0x7f282009bf28>
With this added noise we now miss one of the faces!
Time to get your hands dirty: using OpenCV's built in color image de-noising functionality called fastNlMeansDenoisingColored
- de-noise this image enough so that all the faces in the image are properly detected. Once you have cleaned the image in the next cell, use the cell that follows to run our trained face detector over the cleaned image to check out its detections.
You can find its [official documentation here](documentation for denoising and a useful example here.
Note: you can keep all parameters except photo_render
fixed as shown in the second link above. Play around with the value of this parameter - see how it affects the resulting cleaned image.
## TODO: Use OpenCV's built in color image de-noising function to clean up our noisy image!
denoised = cv2.fastNlMeansDenoisingColored(image_with_noise, None, 17, 25,7, 21)
# Plot our noisy image!
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('De-Noised Image')
ax1.imshow(denoised)
<matplotlib.image.AxesImage at 0x7f281924c198>
## TODO: Run the face detector on the de-noised image to improve your detections and display the result
# Convert the RGB image to grayscale
gray_noise = cv2.cvtColor(denoised, cv2.COLOR_RGB2GRAY)
# Extract the pre-trained face detector from an xml file
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
# Detect the faces in image
faces = face_cascade.detectMultiScale(gray_noise, 4, 6)
# Print the number of faces detected in the image
print('Number of faces detected:', len(faces))
# Make a copy of the orginal image to draw face detections on
image_with_detections = np.copy(denoised)
# Get the bounding box for each detected face
for (x,y,w,h) in faces:
# Add a red bounding box to the detections image
cv2.rectangle(denoised, (x,y), (x+w,y+h), (255,0,0), 3)
# Display the image with the detections
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('De-Noised Image with Face Detections')
ax1.imshow(denoised)
Number of faces detected: 13
<matplotlib.image.AxesImage at 0x7f2819235d30>
Now that we have developed a simple pipeline for detecting faces using OpenCV - let's start playing around with a few fun things we can do with all those detected faces!
Edge detection is a concept that pops up almost everywhere in computer vision applications, as edge-based features (as well as features built on top of edges) are often some of the best features for e.g., object detection and recognition problems.
Edge detection is a dimension reduction technique - by keeping only the edges of an image we get to throw away a lot of non-discriminating information. And typically the most useful kind of edge-detection is one that preserves only the important, global structures (ignoring local structures that aren't very discriminative). So removing local structures / retaining global structures is a crucial pre-processing step to performing edge detection in an image, and blurring can do just that.
Below is an animated gif showing the result of an edge-detected cat taken from Wikipedia, where the image is gradually blurred more and more prior to edge detection. When the animation begins you can't quite make out what it's a picture of, but as the animation evolves and local structures are removed via blurring the cat becomes visible in the edge-detected image.
Edge detection is a convolution performed on the image itself, and you can read about Canny edge detection on this OpenCV documentation page.
In the cell below we load in a test image, then apply Canny edge detection on it. The original image is shown on the left panel of the figure, while the edge-detected version of the image is shown on the right. Notice how the result looks very busy - there are too many little details preserved in the image before it is sent to the edge detector. When applied in computer vision applications, edge detection should preserve global structure; doing away with local structures that don't help describe what objects are in the image.
# Load in the image
image = cv2.imread('images/fawzia.jpg')
# Convert to RGB colorspace
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Convert to grayscale
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
# Perform Canny edge detection
edges = cv2.Canny(gray,100,200)
# Dilate the image to amplify edges
edges = cv2.dilate(edges, None)
# Plot the RGB and edge-detected image
fig = plt.figure(figsize = (15,15))
ax1 = fig.add_subplot(121)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Original Image')
ax1.imshow(image)
ax2 = fig.add_subplot(122)
ax2.set_xticks([])
ax2.set_yticks([])
ax2.set_title('Canny Edges')
ax2.imshow(edges, cmap='gray')
<matplotlib.image.AxesImage at 0x7f281912fe10>
Without first blurring the image, and removing small, local structures, a lot of irrelevant edge content gets picked up and amplified by the detector (as shown in the right panel above).
In the next cell, you will repeat this experiment - blurring the image first to remove these local structures, so that only the important boudnary details remain in the edge-detected image.
Blur the image by using OpenCV's filter2d
functionality - which is discussed in this documentation page - and use an averaging kernel of width equal to 4.
### TODO: Blur the test imageusing OpenCV's filter2d functionality,
# Use an averaging kernel, and a kernel width equal to 4
image = cv2.imread('images/fawzia.jpg')
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
blur = np.copy(gray)
kernal = np.ones((4,4), dtype=np.float32)/16
blur = cv2.filter2D(blur, -1, kernal)
## TODO: Then perform Canny edge detection and display the output
edges = cv2.Canny(blur, 100, 200)
edges = cv2.dilate(edges, None)
fig = plt.figure(figsize=(15,15))
ax1 = fig.add_subplot(121)
ax2 = fig.add_subplot(122)
ax1.set_title('Blurred Image')
ax1.imshow(blur, cmap='gray')
ax2.set_title('Canny Edges')
ax2.imshow(edges, cmap='gray')
<matplotlib.image.AxesImage at 0x7f2819093c88>
If you film something like a documentary or reality TV, you must get permission from every individual shown on film before you can show their face, otherwise you need to blur it out - by blurring the face a lot (so much so that even the global structures are obscured)! This is also true for projects like Google's StreetView maps - an enormous collection of mapping images taken from a fleet of Google vehicles. Because it would be impossible for Google to get the permission of every single person accidentally captured in one of these images they blur out everyone's faces, the detected images must automatically blur the identity of detected people. Here's a few examples of folks caught in the camera of a Google street view vehicle.
Let's try this out for ourselves. Use the face detection pipeline built above and what you know about using the filter2D
to blur and image, and use these in tandem to hide the identity of the person in the following image - loaded in and printed in the next cell.
# Load in the image
image = cv2.imread('images/gus.jpg')
# Convert the image to RGB colorspace
image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Display the image
fig = plt.figure(figsize = (6,6))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Original Image')
ax1.imshow(image)
<matplotlib.image.AxesImage at 0x7f2818f57128>
The idea here is to 1) automatically detect the face in this image, and then 2) blur it out! Make sure to adjust the parameters of the averaging blur filter to completely obscure this person's identity.
## TODO: Implement face detection
gray = cv2.cvtColor(image, cv2.COLOR_RGB2GRAY)
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
faces = face_cascade.detectMultiScale(gray,4,4)
print('Number of faces detected:', len(faces))
image_blurred = np.copy(image)
kernal = np.ones((100,100), dtype=np.float32)/10000
## TODO: Blur the bounding box around each detected face using an averaging filter and display the result
for (x,y,w,h) in faces:
image_blurred[y:y+w,x:x+h] = cv2.filter2D(image_blurred[y:y+w,x:x+h], -1, kernal)
cv2.rectangle(image_blurred, (x,y), (x+w,y+h),(150,120,255), 10)
# Display Image
fig = plt.figure(figsize=(10,10))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Face Blurred Image')
plt.imshow(image_blurred)
Number of faces detected: 1
<matplotlib.image.AxesImage at 0x7f2818e91f60>
In this optional task you can add identity protection to your laptop camera, using the previously completed code where you added face detection to your laptop camera - and the task above. You should be able to get reasonable results with little parameter tuning - like the one shown in the gif below.
As with the previous video task, to make this perfect would require significant effort - so don't strive for perfection here, strive for reasonable quality.
The next cell contains code a wrapper function called laptop_camera_identity_hider
that - when called - will activate your laptop's camera. You need to place the relevant face detection and blurring code developed above in this function in order to blur faces entering your laptop camera's field of view.
Before adding anything to the function you can call it to get a hang of how it works - a small window will pop up showing you the live feed from your camera, you can press any key to close this window.
Note: Mac users may find that activating this function kills the kernel of their notebook every once in a while. If this happens to you, just restart your notebook's kernel, activate cell(s) containing any crucial import statements, and you'll be good to go!
### Insert face detection and blurring code into the wrapper below to create an identity protector on your laptop!
import cv2
import time
def laptop_camera_go():
# Create instance of video capturer
cv2.namedWindow("face detection activated")
vc = cv2.VideoCapture(0)
# Try to get the first frame
if vc.isOpened():
rval, frame = vc.read()
else:
rval = False
# Keep video stream open
while rval:
# Plot image from camera with detections marked
faces = face_cascade.detectMultiScale(frame, 1.25, 6)
for (x,y,w,h) in faces:
frame[y:y+w,x:x+h] = cv2.filter2D(frame[y:y+w,x:x+h], -1, kernal)
cv2.rectangle(frame, (x,y), (x+w,y+h),(255,120,150), 5)
cv2.imshow("face detection activated", frame)
# Exit functionality - press any key to exit laptop video
key = cv2.waitKey(20)
if key > 0: # Exit by pressing any key
# Destroy windows
cv2.destroyAllWindows()
for i in range (1,5):
cv2.waitKey(1)
return
# Read next frame
time.sleep(0.05) # control framerate for computation - default 20 frames per sec
rval, frame = vc.read()
# Run laptop identity hider
laptop_camera_go()
OpenCV is often used in practice with other machine learning and deep learning libraries to produce interesting results. In this stage of the project you will create your own end-to-end pipeline - employing convolutional networks in keras along with OpenCV - to apply a "selfie" filter to streaming video and images.
You will start by creating and then training a convolutional network that can detect facial keypoints in a small dataset of cropped images of human faces. We then guide you towards OpenCV to expanding your detection algorithm to more general images. What are facial keypoints? Let's take a look at some examples.
Facial keypoints (also called facial landmarks) are the small blue-green dots shown on each of the faces in the image above - there are 15 keypoints marked in each image. They mark important areas of the face - the eyes, corners of the mouth, the nose, etc. Facial keypoints can be used in a variety of machine learning applications from face and emotion recognition to commercial applications like the image filters popularized by Snapchat.
Below we illustrate a filter that, using the results of this section, automatically places sunglasses on people in images (using the facial keypoints to place the glasses correctly on each face). Here, the facial keypoints have been colored lime green for visualization purposes.
But first things first: how can we make a facial keypoint detector? Well, at a high level, notice that facial keypoint detection is a regression problem. A single face corresponds to a set of 15 facial keypoints (a set of 15 corresponding $(x, y)$ coordinates, i.e., an output point). Because our input data are images, we can employ a convolutional neural network to recognize patterns in our images and learn how to identify these keypoint given sets of labeled data.
In order to train a regressor, we need a training set - a set of facial image / facial keypoint pairs to train on. For this we will be using this dataset from Kaggle. We've already downloaded this data and placed it in the data
directory. Make sure that you have both the training and test data files. The training dataset contains several thousand $96 \times 96$ grayscale images of cropped human faces, along with each face's 15 corresponding facial keypoints (also called landmarks) that have been placed by hand, and recorded in $(x, y)$ coordinates. This wonderful resource also has a substantial testing set, which we will use in tinkering with our convolutional network.
To load in this data, run the Python cell below - notice we will load in both the training and testing sets.
The load_data
function is in the included utils.py
file.
from utils import *
# Load training set
X_train, y_train = load_data()
print("X_train.shape == {}".format(X_train.shape))
print("y_train.shape == {}; y_train.min == {:.3f}; y_train.max == {:.3f}".format(
y_train.shape, y_train.min(), y_train.max()))
# Load testing set
X_test, _ = load_data(test=True)
print("X_test.shape == {}".format(X_test.shape))
X_train.shape == (2140, 96, 96, 1) y_train.shape == (2140, 30); y_train.min == -0.920; y_train.max == 0.996 X_test.shape == (1783, 96, 96, 1)
The load_data
function in utils.py
originates from this excellent blog post, which you are strongly encouraged to read. Please take the time now to review this function. Note how the output values - that is, the coordinates of each set of facial landmarks - have been normalized to take on values in the range $[-1, 1]$, while the pixel values of each input point (a facial image) have been normalized to the range $[0,1]$.
Note: the original Kaggle dataset contains some images with several missing keypoints. For simplicity, the load_data
function removes those images with missing labels from the dataset. As an optional extension, you are welcome to amend the load_data
function to include the incomplete data points.
Execute the code cell below to visualize a subset of the training data.
# Make a true test set from the original data
import sklearn
X_train, y_train = sklearn.utils.shuffle(X_train, y_train)
X_test, y_test = X_train[:500], y_train[:500]
X_train, y_train = X_train[500:], y_train[500:]
import matplotlib.pyplot as plt
%matplotlib inline
fig = plt.figure(figsize=(20,20))
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
for i in range(9):
ax = fig.add_subplot(3, 3, i + 1, xticks=[], yticks=[])
plot_data(X_train[i], y_train[i], ax)
print(X_train.shape, X_test.shape)
(1640, 96, 96, 1) (500, 96, 96, 1)
image_number = 7
scale_factor = .9
rotation = 5 #In degrees
# Unnormalize the keypoints
keypoints = y_train[image_number] * 48 + 48
# Use openCV to get a rotation matrix
M = cv2.getRotationMatrix2D((48,48),15, .9)
dst = cv2.warpAffine(np.squeeze(X_train[image_number]),M,(96,96))
new_keypoints = np.zeros(30)
for i in range(15):
coord_idx = 2*i
old_coord = keypoints[coord_idx:coord_idx+2]
new_coord = np.matmul(M,np.append(old_coord,1))
new_keypoints[coord_idx] += new_coord[0]
new_keypoints[coord_idx+1] += new_coord[1]
# Plot the image and the augmented image
fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(121)
ax.imshow(np.squeeze(X_train[image_number]), cmap='gray')
ax.scatter(keypoints[0::2],
keypoints[1::2],
marker='o',
c='c',
s=20)
ax2 = fig.add_subplot(122)
ax2.imshow(dst, cmap='gray')
ax2.scatter(new_keypoints[0::2],
new_keypoints[1::2],
marker='o',
c='c',
s=20)
<matplotlib.collections.PathCollection at 0x7f24ae5722e8>
The process for flipping the data horizontally is from this blog post.
image_number = 12
y_train[image_number]
array([ 0.3862355 , -0.11629344, -0.28210425, -0.20511582, 0.27967182, -0.13173746, 0.49897683, -0.09969112, -0.1257336 , -0.182722 , -0.4278713 , -0.21102703, 0.36343077, -0.28352025, 0.61673236, -0.27950174, -0.0539131 , -0.3459766 , -0.50139093, -0.40791002, 0.16773917, 0.37398112, 0.19199608, 0.7714715 , -0.36855176, 0.6645354 , -0.0447662 , 0.7309279 , -0.05993136, 0.77642334], dtype=float32)
keypoints = y_train[image_number]
new_keypoints = np.zeros(30)
new_keypoints += keypoints
new_keypoints[0::2] *= -1
# Flip the indices of the left right keypoints
flip_indices = [
(0, 2), (1, 3),
(4, 8), (5, 9), (6, 10), (7, 11),
(12, 16), (13, 17), (14, 18), (15, 19),
(22, 24), (23, 25),
]
for a,b in flip_indices:
new_keypoints[a], new_keypoints[b] = new_keypoints[b], new_keypoints[a]
keypoints = keypoints * 48 + 48
new_keypoints = new_keypoints * 48 + 48
# Flip the image horizontally
flipped_img = X_train[image_number][:,::-1,:]
y_train[image_number]
array([ 0.3862355 , -0.11629344, -0.28210425, -0.20511582, 0.27967182, -0.13173746, 0.49897683, -0.09969112, -0.1257336 , -0.182722 , -0.4278713 , -0.21102703, 0.36343077, -0.28352025, 0.61673236, -0.27950174, -0.0539131 , -0.3459766 , -0.50139093, -0.40791002, 0.16773917, 0.37398112, 0.19199608, 0.7714715 , -0.36855176, 0.6645354 , -0.0447662 , 0.7309279 , -0.05993136, 0.77642334], dtype=float32)
# Unnormalize keypoints for plotting
fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(121)
ax.imshow(np.squeeze(X_train[image_number]), cmap='gray')
ax.set_title('Original Image')
ax.scatter(keypoints[0::2],
keypoints[1::2],
marker='o',
c='c',
s=20)
ax2 = fig.add_subplot(122)
ax2.imshow(np.squeeze(flipped_img), cmap='gray')
ax2.set_title('Flipped Image')
ax2.scatter(new_keypoints[0::2],
new_keypoints[1::2],
marker='o',
c='c',
s=20)
<matplotlib.collections.PathCollection at 0x7f24ae411048>
Below it is shown that the coordinate for the right eye (the person's left) accuratly identifies that eye in both normal and reflected images.
Here the augmentation process outlined above is organized into several functions to augment the data.
def scale_rotate_transform(data, labels, rotation_range=10, scale_range=.1):
'''
Scales and rotates an image and the keypoints.
'''
aug_data = np.copy(data)
aug_labels = np.copy(labels)
# Apply rotation and scale transform
for i in range(len(data)):
# Unnormalize the keypoints
aug_labels[i] = aug_labels[i]*48 + 48
scale_factor = 1.0 + (np.random.uniform(-1,1)) * scale_range
rotation_factor = (np.random.uniform(-1,1)) * rotation_range
# Use openCV to get a rotation matrix
M = cv2.getRotationMatrix2D((48,48), rotation_factor, scale_factor)
aug_data[i] = np.expand_dims(cv2.warpAffine(np.squeeze(aug_data[i]),M,(96,96)), axis=2)
for j in range(15):
coord_idx = 2*j
old_coord = aug_labels[i][coord_idx:coord_idx+2]
new_coord = np.matmul(M,np.append(old_coord,1))
aug_labels[i][coord_idx] = new_coord[0]
aug_labels[i][coord_idx+1] = new_coord[1]
#normalize aug_labels
aug_labels[i] = (aug_labels[i] - 48)/48
return aug_data, aug_labels
def horizontal_flip(data, labels):
'''
Takes a image set and keypoint labels and flips them horizontally.
'''
# Flip the images horizontally
flipped_data = np.copy(data)[:,:,::-1,:]
flipped_labels = np.zeros(labels.shape)
for i in range(data.shape[0]):
# Flip the x coordinates of the key points
flipped_labels[i] += labels[i]
flipped_labels[i, 0::2] *= -1
# Flip the indices of the left right keypoints
flip_indices = [
(0, 2), (1, 3),
(4, 8), (5, 9), (6, 10), (7, 11),
(12, 16), (13, 17), (14, 18), (15, 19),
(22, 24), (23, 25),
]
for a,b in flip_indices:
flipped_labels[i,a], flipped_labels[i,b] = flipped_labels[i,b], flipped_labels[i,a]
return flipped_data, flipped_labels
def data_augmentation(data, labels, rotation_range=10, scale_range=.1, h_flip=True):
'''
Takes in a the images and keypoints, applys a random rotation and scaling. Then flips the image
and keypoints horizontally if specified.
'''
aug_data, aug_labels = scale_rotate_transform(data, labels, rotation_range, scale_range)
if h_flip:
aug_data, aug_labels = horizontal_flip(aug_data, aug_labels)
return aug_data, aug_labels
To augment the dataset, the original data and the horizontal reflection of the dataset is combined.
data_hflip, labels_hflip = data_augmentation(X_train, y_train, 0.0, 0.0, True)
keypoints = y_train[image_number]*48 +48
new_keypoints = labels_hflip[image_number]*48+48
fig = plt.figure(figsize=(12,12))
ax = fig.add_subplot(121)
ax.imshow(np.squeeze(X_train[image_number]), cmap='gray')
ax.scatter(keypoints[0::2],
keypoints[1::2],
marker='o',
c='c',
s=20)
ax2 = fig.add_subplot(122)
ax2.imshow(np.squeeze(data_hflip[image_number]), cmap='gray')
ax2.scatter(new_keypoints[0::2],
new_keypoints[1::2],
marker='o',
c='c',
s=20)
<matplotlib.collections.PathCollection at 0x7f24ae3383c8>
X_aug = np.concatenate((X_train, data_hflip), axis=0)
y_aug = np.concatenate((y_train, labels_hflip), axis=0)
Now that the orignial dataset is doubled, a version that is randomly scaled and rotated is created and added on. At this point the original dataset is 4 times the original size.
X_train_transformed, y_train_transformed = data_augmentation(X_aug, y_aug, 15.0, .1, False)
X_train_transformed2, y_train_transformed2 = data_augmentation(X_aug, y_aug, 15.0, .1, False)
Below is a sample of the transformed images and their keypoints.
fig = plt.figure(figsize=(20,20))
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
for i in range(9):
ax = fig.add_subplot(3, 3, i + 1, xticks=[], yticks=[])
plot_data(X_train_transformed[i], y_train_transformed[i], ax)
X_aug = np.concatenate((X_aug, X_train_transformed, X_train_transformed2), axis=0)
y_aug = np.concatenate((y_aug, y_train_transformed, y_train_transformed2), axis=0)
X_aug.shape
(9840, 96, 96, 1)
Let's repeat the above to really increase the amount of data our model is training on.
X_train_transformed, y_train_transformed = data_augmentation(X_aug, y_aug, 15.0, .1, False)
X_train_transformed2, y_train_transformed2 = data_augmentation(X_aug, y_aug, 15.0, .1, False)
X_aug = np.concatenate((X_aug, X_train_transformed, X_train_transformed2), axis=0)
y_aug = np.concatenate((y_aug, y_train_transformed, y_train_transformed2), axis=0)
X_aug.shape
(29520, 96, 96, 1)
For each training image, there are two landmarks per eyebrow (four total), three per eye (six total), four for the mouth, and one for the tip of the nose.
Review the plot_data
function in utils.py
to understand how the 30-dimensional training labels in y_train
are mapped to facial locations, as this function will prove useful for your pipeline.
In this section, you will specify a neural network for predicting the locations of facial keypoints. Use the code cell below to specify the architecture of your neural network. We have imported some layers that you may find useful for this task, but if you need to use more Keras layers, feel free to import them in the cell.
Your network should accept a $96 \times 96$ grayscale image as input, and it should output a vector with 30 entries, corresponding to the predicted (horizontal and vertical) locations of 15 facial keypoints. If you are not sure where to start, you can find some useful starting architectures in this blog, but you are not permitted to copy any of the architectures that you find online.
# Import deep learning resources from Keras
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dropout, GlobalAveragePooling2D, BatchNormalization
from keras.layers import Flatten, Dense
## TODO: Specify a CNN architecture
# Your model should accept 96x96 pixel graysale images in
# It should have a fully-connected output layer with 30 values (2 for each facial keypoint)
model = Sequential()
# Conv layer1
model.add(Conv2D(32, 3, strides=(1,1), padding='same', activation='elu', input_shape=(96,96,1)))
model.add(BatchNormalization())
model.add(Dropout(.2))
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer2
model.add(Conv2D(64, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(Dropout(.2))
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer3
model.add(Conv2D(128, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(Dropout(.2))
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer4
model.add(Conv2D(256, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer5
model.add(Conv2D(256, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
#Flatten Layer
model.add(GlobalAveragePooling2D())
model.add(BatchNormalization())
#Fully Connected Layer 2
model.add(Dense(30, activation='elu'))
# Summarize the model
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_61 (Conv2D) (None, 96, 96, 32) 320 _________________________________________________________________ batch_normalization_73 (Batc (None, 96, 96, 32) 128 _________________________________________________________________ dropout_37 (Dropout) (None, 96, 96, 32) 0 _________________________________________________________________ max_pooling2d_61 (MaxPooling (None, 48, 48, 32) 0 _________________________________________________________________ conv2d_62 (Conv2D) (None, 48, 48, 64) 18496 _________________________________________________________________ batch_normalization_74 (Batc (None, 48, 48, 64) 256 _________________________________________________________________ dropout_38 (Dropout) (None, 48, 48, 64) 0 _________________________________________________________________ max_pooling2d_62 (MaxPooling (None, 24, 24, 64) 0 _________________________________________________________________ conv2d_63 (Conv2D) (None, 24, 24, 128) 73856 _________________________________________________________________ batch_normalization_75 (Batc (None, 24, 24, 128) 512 _________________________________________________________________ dropout_39 (Dropout) (None, 24, 24, 128) 0 _________________________________________________________________ max_pooling2d_63 (MaxPooling (None, 12, 12, 128) 0 _________________________________________________________________ conv2d_64 (Conv2D) (None, 12, 12, 256) 295168 _________________________________________________________________ batch_normalization_76 (Batc (None, 12, 12, 256) 1024 _________________________________________________________________ max_pooling2d_64 (MaxPooling (None, 6, 6, 256) 0 _________________________________________________________________ conv2d_65 (Conv2D) (None, 6, 6, 256) 590080 _________________________________________________________________ batch_normalization_77 (Batc (None, 6, 6, 256) 1024 _________________________________________________________________ max_pooling2d_65 (MaxPooling (None, 3, 3, 256) 0 _________________________________________________________________ global_average_pooling2d_13 (None, 256) 0 _________________________________________________________________ batch_normalization_78 (Batc (None, 256) 1024 _________________________________________________________________ dense_13 (Dense) (None, 30) 7710 ================================================================= Total params: 989,598 Trainable params: 987,614 Non-trainable params: 1,984 _________________________________________________________________
After specifying your architecture, you'll need to compile and train the model to detect facial keypoints'
Use the compile
method to configure the learning process. Experiment with your choice of optimizer; you may have some ideas about which will work best (SGD
vs. RMSprop
, etc), but take the time to empirically verify your theories.
Use the fit
method to train the model. Break off a validation set by setting validation_split=0.2
. Save the returned History
object in the history
variable.
Experiment with your model to minimize the validation loss (measured as mean squared error). A very good model will achieve about 0.0015 loss (though it's possible to do even better). When you have finished training, save your model as an HDF5 file with file path my_model.h5
.
from keras.optimizers import SGD, RMSprop, Adagrad, Adadelta, Adam, Adamax, Nadam
learning_rates = [.001, .0001, .00001]
batch_sizes = [32, 64, 128]
full_loss_hist = []
full_val_loss =[]
for lr in learning_rates:
## TODO: Compile the model
model.compile(optimizer=Adam(lr=lr), loss='mean_squared_error')
for batch_size in batch_sizes:
print('learning rate: {}, batch size: {}'.format(lr,batch_size))
## TODO: Train the model
hist = model.fit(X_aug, y_aug, batch_size=batch_size, epochs=15, validation_data=(X_test, y_test), verbose=2)
full_loss_hist.append(hist.history['loss'])
full_val_loss.append(hist.history['val_loss'])
## TODO: Save the model as model.h5
model.save('my_model.h5')
learning rate: 0.001, batch size: 32 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 55s - loss: 0.0012 - val_loss: 9.4695e-04 Epoch 2/15 - 39s - loss: 9.9520e-04 - val_loss: 8.4110e-04 Epoch 3/15 - 39s - loss: 9.2155e-04 - val_loss: 0.0013 Epoch 4/15 - 39s - loss: 8.7887e-04 - val_loss: 9.6937e-04 Epoch 5/15 - 39s - loss: 8.5313e-04 - val_loss: 7.6165e-04 Epoch 6/15 - 39s - loss: 8.3940e-04 - val_loss: 6.6805e-04 Epoch 7/15 - 39s - loss: 8.0954e-04 - val_loss: 8.4218e-04 Epoch 8/15 - 39s - loss: 7.8644e-04 - val_loss: 8.4273e-04 Epoch 9/15 - 41s - loss: 7.6781e-04 - val_loss: 8.2917e-04 Epoch 10/15 - 41s - loss: 7.5424e-04 - val_loss: 8.9748e-04 Epoch 11/15 - 41s - loss: 7.3416e-04 - val_loss: 7.3889e-04 Epoch 12/15 - 39s - loss: 7.2585e-04 - val_loss: 6.1351e-04 Epoch 13/15 - 41s - loss: 7.0580e-04 - val_loss: 7.3076e-04 Epoch 14/15 - 40s - loss: 7.0909e-04 - val_loss: 6.8884e-04 Epoch 15/15 - 39s - loss: 6.8641e-04 - val_loss: 6.7384e-04 learning rate: 0.001, batch size: 64 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 34s - loss: 4.0772e-04 - val_loss: 6.6049e-04 Epoch 2/15 - 35s - loss: 3.9936e-04 - val_loss: 6.7735e-04 Epoch 3/15 - 34s - loss: 4.1017e-04 - val_loss: 7.7198e-04 Epoch 4/15 - 34s - loss: 4.1900e-04 - val_loss: 6.7762e-04 Epoch 5/15 - 34s - loss: 4.2576e-04 - val_loss: 6.7930e-04 Epoch 6/15 - 34s - loss: 4.2833e-04 - val_loss: 7.2658e-04 Epoch 7/15 - 34s - loss: 4.0199e-04 - val_loss: 7.0186e-04 Epoch 8/15 - 34s - loss: 4.1565e-04 - val_loss: 7.9000e-04 Epoch 9/15 - 34s - loss: 3.9828e-04 - val_loss: 8.5054e-04 Epoch 10/15 - 34s - loss: 4.0350e-04 - val_loss: 7.3613e-04 Epoch 11/15 - 36s - loss: 3.9632e-04 - val_loss: 7.6800e-04 Epoch 12/15 - 38s - loss: 3.9048e-04 - val_loss: 7.4103e-04 Epoch 13/15 - 34s - loss: 3.9591e-04 - val_loss: 7.2472e-04 Epoch 14/15 - 40s - loss: 3.8969e-04 - val_loss: 7.7194e-04 Epoch 15/15 - 43s - loss: 4.0250e-04 - val_loss: 7.0122e-04 learning rate: 0.001, batch size: 128 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 38s - loss: 2.4788e-04 - val_loss: 6.1129e-04 Epoch 2/15 - 33s - loss: 2.5065e-04 - val_loss: 6.1066e-04 Epoch 3/15 - 32s - loss: 2.5093e-04 - val_loss: 6.3708e-04 Epoch 4/15 - 32s - loss: 2.5084e-04 - val_loss: 6.2916e-04 Epoch 5/15 - 32s - loss: 2.4979e-04 - val_loss: 7.0359e-04 Epoch 6/15 - 32s - loss: 2.5199e-04 - val_loss: 6.9004e-04 Epoch 7/15 - 32s - loss: 2.6056e-04 - val_loss: 7.0357e-04 Epoch 8/15 - 33s - loss: 2.5010e-04 - val_loss: 7.2863e-04 Epoch 9/15 - 32s - loss: 2.5850e-04 - val_loss: 6.8227e-04 Epoch 10/15 - 32s - loss: 2.5112e-04 - val_loss: 6.9764e-04 Epoch 11/15 - 32s - loss: 2.5146e-04 - val_loss: 6.7172e-04 Epoch 12/15 - 32s - loss: 2.4075e-04 - val_loss: 7.2732e-04 Epoch 13/15 - 33s - loss: 2.5095e-04 - val_loss: 7.0697e-04 Epoch 14/15 - 33s - loss: 2.4563e-04 - val_loss: 6.8766e-04 Epoch 15/15 - 33s - loss: 2.4843e-04 - val_loss: 6.8599e-04 learning rate: 0.0001, batch size: 32 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 56s - loss: 4.5347e-04 - val_loss: 6.5164e-04 Epoch 2/15 - 41s - loss: 4.5462e-04 - val_loss: 6.4698e-04 Epoch 3/15 - 40s - loss: 4.3913e-04 - val_loss: 6.4066e-04 Epoch 4/15 - 44s - loss: 4.5682e-04 - val_loss: 6.3655e-04 Epoch 5/15 - 46s - loss: 4.4823e-04 - val_loss: 6.5284e-04 Epoch 6/15 - 45s - loss: 4.3650e-04 - val_loss: 6.3997e-04 Epoch 7/15 - 45s - loss: 4.5203e-04 - val_loss: 6.6522e-04 Epoch 8/15 - 41s - loss: 4.5077e-04 - val_loss: 6.5708e-04 Epoch 9/15 - 40s - loss: 4.3236e-04 - val_loss: 6.5508e-04 Epoch 10/15 - 41s - loss: 4.6691e-04 - val_loss: 6.4720e-04 Epoch 11/15 - 40s - loss: 4.5470e-04 - val_loss: 6.4135e-04 Epoch 12/15 - 40s - loss: 4.4127e-04 - val_loss: 6.5544e-04 Epoch 13/15 - 42s - loss: 4.6218e-04 - val_loss: 6.6317e-04 Epoch 14/15 - 41s - loss: 4.3803e-04 - val_loss: 6.6840e-04 Epoch 15/15 - 40s - loss: 4.4640e-04 - val_loss: 6.7179e-04 learning rate: 0.0001, batch size: 64 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 35s - loss: 2.7684e-04 - val_loss: 6.5697e-04 Epoch 2/15 - 35s - loss: 2.5288e-04 - val_loss: 6.6082e-04 Epoch 3/15 - 34s - loss: 2.7080e-04 - val_loss: 6.5084e-04 Epoch 4/15 - 34s - loss: 2.6905e-04 - val_loss: 6.5714e-04 Epoch 5/15 - 35s - loss: 2.6566e-04 - val_loss: 6.7243e-04 Epoch 6/15 - 35s - loss: 2.6842e-04 - val_loss: 6.5235e-04 Epoch 7/15 - 37s - loss: 2.6699e-04 - val_loss: 6.5785e-04 Epoch 8/15 - 39s - loss: 2.7429e-04 - val_loss: 6.3766e-04 Epoch 9/15 - 35s - loss: 2.6334e-04 - val_loss: 6.4898e-04 Epoch 10/15 - 34s - loss: 2.7130e-04 - val_loss: 6.5982e-04 Epoch 11/15 - 34s - loss: 2.6777e-04 - val_loss: 6.5592e-04 Epoch 12/15 - 34s - loss: 2.6872e-04 - val_loss: 6.5888e-04 Epoch 13/15 - 35s - loss: 2.7233e-04 - val_loss: 6.5422e-04 Epoch 14/15 - 34s - loss: 2.6082e-04 - val_loss: 6.5957e-04 Epoch 15/15 - 35s - loss: 2.6113e-04 - val_loss: 6.5979e-04 learning rate: 0.0001, batch size: 128 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 32s - loss: 1.8325e-04 - val_loss: 6.4724e-04 Epoch 2/15 - 32s - loss: 1.7849e-04 - val_loss: 6.4283e-04 Epoch 3/15 - 32s - loss: 1.7759e-04 - val_loss: 6.4645e-04 Epoch 4/15 - 32s - loss: 1.7928e-04 - val_loss: 6.5056e-04 Epoch 5/15 - 32s - loss: 1.8523e-04 - val_loss: 6.5334e-04 Epoch 6/15 - 33s - loss: 1.7490e-04 - val_loss: 6.4710e-04 Epoch 7/15 - 32s - loss: 1.7739e-04 - val_loss: 6.5795e-04 Epoch 8/15 - 32s - loss: 1.7457e-04 - val_loss: 6.5310e-04 Epoch 9/15 - 32s - loss: 1.7633e-04 - val_loss: 6.4512e-04 Epoch 10/15 - 32s - loss: 1.7846e-04 - val_loss: 6.5983e-04 Epoch 11/15 - 33s - loss: 1.7810e-04 - val_loss: 6.6653e-04 Epoch 12/15 - 32s - loss: 1.7981e-04 - val_loss: 6.6232e-04 Epoch 13/15 - 32s - loss: 1.8392e-04 - val_loss: 6.5875e-04 Epoch 14/15 - 32s - loss: 1.6956e-04 - val_loss: 6.6867e-04 Epoch 15/15 - 32s - loss: 1.7621e-04 - val_loss: 6.8489e-04 learning rate: 1e-05, batch size: 32 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 54s - loss: 4.2498e-04 - val_loss: 6.4977e-04 Epoch 2/15 - 39s - loss: 4.0754e-04 - val_loss: 6.5056e-04 Epoch 3/15 - 40s - loss: 4.2315e-04 - val_loss: 6.4867e-04 Epoch 4/15 - 42s - loss: 4.3701e-04 - val_loss: 6.5582e-04 Epoch 5/15 - 40s - loss: 4.2303e-04 - val_loss: 6.5325e-04 Epoch 6/15 - 39s - loss: 4.3168e-04 - val_loss: 6.4946e-04 Epoch 7/15 - 39s - loss: 4.1893e-04 - val_loss: 6.5555e-04 Epoch 8/15 - 39s - loss: 4.3544e-04 - val_loss: 6.5463e-04 Epoch 9/15 - 41s - loss: 4.1544e-04 - val_loss: 6.4974e-04 Epoch 10/15 - 39s - loss: 4.3110e-04 - val_loss: 6.5465e-04 Epoch 11/15 - 39s - loss: 4.1290e-04 - val_loss: 6.6211e-04 Epoch 12/15 - 39s - loss: 4.1692e-04 - val_loss: 6.5537e-04 Epoch 13/15 - 39s - loss: 4.1378e-04 - val_loss: 6.5949e-04 Epoch 14/15 - 39s - loss: 4.1379e-04 - val_loss: 6.5438e-04 Epoch 15/15 - 40s - loss: 4.2136e-04 - val_loss: 6.5825e-04 learning rate: 1e-05, batch size: 64 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 35s - loss: 2.5332e-04 - val_loss: 6.5131e-04 Epoch 2/15 - 35s - loss: 2.5137e-04 - val_loss: 6.4962e-04 Epoch 3/15 - 35s - loss: 2.6588e-04 - val_loss: 6.5092e-04 Epoch 4/15 - 36s - loss: 2.4672e-04 - val_loss: 6.5612e-04 Epoch 5/15 - 35s - loss: 2.5727e-04 - val_loss: 6.5980e-04 Epoch 6/15 - 36s - loss: 2.5653e-04 - val_loss: 6.5360e-04 Epoch 7/15 - 36s - loss: 2.4632e-04 - val_loss: 6.4966e-04 Epoch 8/15 - 36s - loss: 2.4950e-04 - val_loss: 6.5665e-04 Epoch 9/15 - 34s - loss: 2.5892e-04 - val_loss: 6.5254e-04 Epoch 10/15 - 34s - loss: 2.5133e-04 - val_loss: 6.4890e-04 Epoch 11/15 - 35s - loss: 2.4580e-04 - val_loss: 6.5177e-04 Epoch 12/15 - 34s - loss: 2.5020e-04 - val_loss: 6.4863e-04 Epoch 13/15 - 36s - loss: 2.6089e-04 - val_loss: 6.5200e-04 Epoch 14/15 - 35s - loss: 2.5500e-04 - val_loss: 6.5295e-04 Epoch 15/15 - 34s - loss: 2.4635e-04 - val_loss: 6.5473e-04 learning rate: 1e-05, batch size: 128 Train on 29520 samples, validate on 500 samples Epoch 1/15 - 32s - loss: 1.6868e-04 - val_loss: 6.5674e-04 Epoch 2/15 - 33s - loss: 1.6813e-04 - val_loss: 6.5658e-04 Epoch 3/15 - 33s - loss: 1.6929e-04 - val_loss: 6.5185e-04 Epoch 4/15 - 33s - loss: 1.7108e-04 - val_loss: 6.5379e-04 Epoch 5/15 - 33s - loss: 1.7535e-04 - val_loss: 6.5546e-04 Epoch 6/15 - 32s - loss: 1.7131e-04 - val_loss: 6.5046e-04 Epoch 7/15 - 33s - loss: 1.6765e-04 - val_loss: 6.5145e-04 Epoch 8/15 - 33s - loss: 1.7264e-04 - val_loss: 6.5460e-04 Epoch 9/15 - 36s - loss: 1.7173e-04 - val_loss: 6.5425e-04 Epoch 10/15 - 33s - loss: 1.6673e-04 - val_loss: 6.5089e-04 Epoch 11/15 - 35s - loss: 1.6829e-04 - val_loss: 6.5357e-04 Epoch 12/15 - 33s - loss: 1.7022e-04 - val_loss: 6.5628e-04 Epoch 13/15 - 32s - loss: 1.7005e-04 - val_loss: 6.5107e-04 Epoch 14/15 - 32s - loss: 1.6780e-04 - val_loss: 6.5174e-04 Epoch 15/15 - 32s - loss: 1.7027e-04 - val_loss: 6.4991e-04
With the model trained on data with complete keypoints, it can now be used to predict the keypoints that are missing in the original data set. This should allow the model to learn from the keypoints that are available allowing it the generalize better since it is seeing new examples.
This process can be repeated to get better predictions after it has learned from the new data. So the process will be to take the model and generate predictions on the full dataset. Using the labels from the full data set, the keypoints in the predictions will be replaced by the keypoints that do exist in the labels.
Next, this data set will be augmented to increase the number of examples. The model will be trained from scratch using this new more general dataset. The model can then be trained on the original dataset which has only complete keypoints. Then this, hopefully more accurate model, can be used to generate another more accurate set of predictions at which point the cycle can be repeated until there is no longer an improvement.
# Get full dataset including incomplet labels
X_full, y_full = load_data(complete_points=False)
# Find incomplete label to see what it looks like
y_full[0]
array([ 0.4286238 , -0.1771081 , -0.38599524, -0.1921938 , nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -0.00885714, 0.52437097, nan, nan, nan, nan, nan, nan, -0.04657143, 0.8411681 ], dtype=float32)
# Use previously trained model to predict the label; replace nan values with predictions. Keep real labels.
predictions = model.predict(X_full)
print(predictions.shape, y_full.shape)
(7049, 30) (7049, 30)
# Difference between prediction and labels for incomplete keypoint data
y_full[0] - predictions[0]
array([ 0.05512971, 0.04558255, 0.02660319, 0.00583398, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 0.01904963, 0.23627204, nan, nan, nan, nan, nan, nan, -0.07105352, 0.05360168], dtype=float32)
# Loop through all labels
for i in range(len(y_full)):
# Loop through all coordinates in label
for j in range(len(y_full[i])):
if not np.isnan(y_full[i][j]):
# If there is a coordinate in the actual label, replace value in the prediction
predictions[i][j] = y_full[i][j]
# Difference after replacing predictions with actual values
y_full[0] - predictions[0]
array([ 0., 0., 0., 0., nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, 0., 0., nan, nan, nan, nan, nan, nan, 0., 0.], dtype=float32)
y_full[0]
array([ 0.4286238 , -0.1771081 , -0.38599524, -0.1921938 , nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, nan, -0.00885714, 0.52437097, nan, nan, nan, nan, nan, nan, -0.04657143, 0.8411681 ], dtype=float32)
# Add horizontal flip of the new data
data_full_hflip, labels_full_hflip = data_augmentation(X_full, predictions, 0.0, 0.0, True)
X_full = np.concatenate((X_full, data_full_hflip), axis=0)
predictions = np.concatenate((predictions, labels_full_hflip), axis=0)
X_full_transformed, predictions_transformed = data_augmentation(X_full, predictions, 15.0, .1, False)
X_full = np.concatenate((X_full, X_full_transformed), axis=0)
predictions = np.concatenate((predictions, predictions_transformed), axis=0)
print(X_full.shape, predictions.shape)
(28196, 96, 96, 1) (28196, 30)
# Start with a new model; same architechture
from keras.models import Sequential
from keras.layers import Conv2D, MaxPooling2D, Dropout, GlobalAveragePooling2D, BatchNormalization
from keras.layers import Flatten, Dense
## TODO: Specify a CNN architecture
# Your model should accept 96x96 pixel graysale images in
# It should have a fully-connected output layer with 30 values (2 for each facial keypoint)
model = Sequential()
# Conv layer1
model.add(Conv2D(32, 3, strides=(1,1), padding='same', activation='elu', input_shape=(96,96,1)))
model.add(BatchNormalization())
model.add(Dropout(.2))
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer2
model.add(Conv2D(64, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(Dropout(.2))
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer3
model.add(Conv2D(128, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(Dropout(.2))
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer4
model.add(Conv2D(256, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
# Conv layer5
model.add(Conv2D(256, 3, strides=(1,1), padding='same', activation='elu'))
model.add(BatchNormalization())
model.add(MaxPooling2D((2,2), strides= 2, padding='same'))
#Flatten Layer
model.add(GlobalAveragePooling2D())
model.add(BatchNormalization())
#Fully Connected Layer 2
model.add(Dense(30, activation='elu'))
# Summarize the model
model.summary()
_________________________________________________________________ Layer (type) Output Shape Param # ================================================================= conv2d_81 (Conv2D) (None, 96, 96, 32) 320 _________________________________________________________________ batch_normalization_97 (Batc (None, 96, 96, 32) 128 _________________________________________________________________ dropout_49 (Dropout) (None, 96, 96, 32) 0 _________________________________________________________________ max_pooling2d_81 (MaxPooling (None, 48, 48, 32) 0 _________________________________________________________________ conv2d_82 (Conv2D) (None, 48, 48, 64) 18496 _________________________________________________________________ batch_normalization_98 (Batc (None, 48, 48, 64) 256 _________________________________________________________________ dropout_50 (Dropout) (None, 48, 48, 64) 0 _________________________________________________________________ max_pooling2d_82 (MaxPooling (None, 24, 24, 64) 0 _________________________________________________________________ conv2d_83 (Conv2D) (None, 24, 24, 128) 73856 _________________________________________________________________ batch_normalization_99 (Batc (None, 24, 24, 128) 512 _________________________________________________________________ dropout_51 (Dropout) (None, 24, 24, 128) 0 _________________________________________________________________ max_pooling2d_83 (MaxPooling (None, 12, 12, 128) 0 _________________________________________________________________ conv2d_84 (Conv2D) (None, 12, 12, 256) 295168 _________________________________________________________________ batch_normalization_100 (Bat (None, 12, 12, 256) 1024 _________________________________________________________________ max_pooling2d_84 (MaxPooling (None, 6, 6, 256) 0 _________________________________________________________________ conv2d_85 (Conv2D) (None, 6, 6, 256) 590080 _________________________________________________________________ batch_normalization_101 (Bat (None, 6, 6, 256) 1024 _________________________________________________________________ max_pooling2d_85 (MaxPooling (None, 3, 3, 256) 0 _________________________________________________________________ global_average_pooling2d_17 (None, 256) 0 _________________________________________________________________ batch_normalization_102 (Bat (None, 256) 1024 _________________________________________________________________ dense_17 (Dense) (None, 30) 7710 ================================================================= Total params: 989,598 Trainable params: 987,614 Non-trainable params: 1,984 _________________________________________________________________
# Train Model on new dataset
learning_rates = [.001, .0001, .00001]
batch_sizes = [32, 64, 128]
full_loss_hist = []
full_val_loss =[]
for lr in learning_rates:
model.compile(optimizer=Adam(lr=lr), loss='mean_squared_error')
for batch_size in batch_sizes:
print('learning rate: {}, batch size: {}'.format(lr,batch_size))
#Check that it works good on base dataset
hist = model.fit(X_full, predictions, batch_size=batch_size, epochs=15, validation_data=(X_test, y_test), verbose=2)
full_loss_hist.append(hist.history['loss'])
full_val_loss.append(hist.history['val_loss'])
model.save('my_model_full.h5')
learning rate: 0.001, batch size: 32 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 54s - loss: 0.0195 - val_loss: 0.0104 Epoch 2/15 - 40s - loss: 0.0045 - val_loss: 0.0113 Epoch 3/15 - 41s - loss: 0.0039 - val_loss: 0.0082 Epoch 4/15 - 40s - loss: 0.0036 - val_loss: 0.0098 Epoch 5/15 - 40s - loss: 0.0033 - val_loss: 0.0062 Epoch 6/15 - 37s - loss: 0.0031 - val_loss: 0.0067 Epoch 7/15 - 37s - loss: 0.0028 - val_loss: 0.0041 Epoch 8/15 - 37s - loss: 0.0027 - val_loss: 0.0040 Epoch 9/15 - 37s - loss: 0.0026 - val_loss: 0.0224 Epoch 10/15 - 37s - loss: 0.0029 - val_loss: 0.0074 Epoch 11/15 - 37s - loss: 0.0026 - val_loss: 0.0097 Epoch 12/15 - 37s - loss: 0.0022 - val_loss: 0.0041 Epoch 13/15 - 37s - loss: 0.0020 - val_loss: 0.0047 Epoch 14/15 - 37s - loss: 0.0019 - val_loss: 0.0070 Epoch 15/15 - 37s - loss: 0.0018 - val_loss: 0.0058 learning rate: 0.001, batch size: 64 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 32s - loss: 0.0012 - val_loss: 0.0019 Epoch 2/15 - 32s - loss: 0.0012 - val_loss: 0.0032 Epoch 3/15 - 33s - loss: 0.0012 - val_loss: 0.0014 Epoch 4/15 - 32s - loss: 0.0011 - val_loss: 0.0019 Epoch 5/15 - 32s - loss: 0.0011 - val_loss: 0.0026 Epoch 6/15 - 32s - loss: 0.0011 - val_loss: 0.0018 Epoch 7/15 - 32s - loss: 0.0011 - val_loss: 0.0020 Epoch 8/15 - 33s - loss: 0.0011 - val_loss: 0.0019 Epoch 9/15 - 32s - loss: 9.7674e-04 - val_loss: 0.0016 Epoch 10/15 - 32s - loss: 9.6472e-04 - val_loss: 0.0018 Epoch 11/15 - 32s - loss: 9.7155e-04 - val_loss: 0.0013 Epoch 12/15 - 32s - loss: 9.2632e-04 - val_loss: 0.0014 Epoch 13/15 - 32s - loss: 9.1127e-04 - val_loss: 0.0020 Epoch 14/15 - 33s - loss: 9.1991e-04 - val_loss: 0.0020 Epoch 15/15 - 32s - loss: 8.5716e-04 - val_loss: 9.9964e-04 learning rate: 0.001, batch size: 128 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 31s - loss: 6.0849e-04 - val_loss: 7.9574e-04 Epoch 2/15 - 31s - loss: 5.6832e-04 - val_loss: 7.5494e-04 Epoch 3/15 - 31s - loss: 5.9605e-04 - val_loss: 8.6793e-04 Epoch 4/15 - 31s - loss: 5.7918e-04 - val_loss: 0.0010 Epoch 5/15 - 31s - loss: 5.9109e-04 - val_loss: 8.6564e-04 Epoch 6/15 - 31s - loss: 5.9349e-04 - val_loss: 8.7029e-04 Epoch 7/15 - 31s - loss: 5.8171e-04 - val_loss: 9.1624e-04 Epoch 8/15 - 31s - loss: 5.7751e-04 - val_loss: 7.0348e-04 Epoch 9/15 - 31s - loss: 5.8156e-04 - val_loss: 8.9425e-04 Epoch 10/15 - 31s - loss: 5.8115e-04 - val_loss: 7.7898e-04 Epoch 11/15 - 31s - loss: 5.6498e-04 - val_loss: 0.0015 Epoch 12/15 - 31s - loss: 5.7757e-04 - val_loss: 9.0425e-04 Epoch 13/15 - 31s - loss: 5.7994e-04 - val_loss: 8.8603e-04 Epoch 14/15 - 31s - loss: 5.5362e-04 - val_loss: 8.6407e-04 Epoch 15/15 - 31s - loss: 5.4140e-04 - val_loss: 9.0152e-04 learning rate: 0.0001, batch size: 32 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 52s - loss: 6.3853e-04 - val_loss: 6.9813e-04 Epoch 2/15 - 37s - loss: 6.3048e-04 - val_loss: 6.7028e-04 Epoch 3/15 - 38s - loss: 6.1410e-04 - val_loss: 5.3642e-04 Epoch 4/15 - 38s - loss: 6.2978e-04 - val_loss: 7.6363e-04 Epoch 5/15 - 38s - loss: 6.6969e-04 - val_loss: 6.1286e-04 Epoch 6/15 - 37s - loss: 6.2520e-04 - val_loss: 6.3329e-04 Epoch 7/15 - 37s - loss: 6.0509e-04 - val_loss: 6.2585e-04 Epoch 8/15 - 37s - loss: 6.1380e-04 - val_loss: 5.8595e-04 Epoch 9/15 - 39s - loss: 5.9890e-04 - val_loss: 5.7783e-04 Epoch 10/15 - 40s - loss: 6.1248e-04 - val_loss: 5.4884e-04 Epoch 11/15 - 39s - loss: 6.0165e-04 - val_loss: 6.0731e-04 Epoch 12/15 - 38s - loss: 6.0259e-04 - val_loss: 5.8730e-04 Epoch 13/15 - 37s - loss: 6.0262e-04 - val_loss: 6.0494e-04 Epoch 14/15 - 37s - loss: 5.9823e-04 - val_loss: 8.4754e-04 Epoch 15/15 - 37s - loss: 5.9210e-04 - val_loss: 5.5085e-04 learning rate: 0.0001, batch size: 64 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 33s - loss: 4.2770e-04 - val_loss: 4.9212e-04 Epoch 2/15 - 33s - loss: 4.1640e-04 - val_loss: 5.5342e-04 Epoch 3/15 - 33s - loss: 4.2051e-04 - val_loss: 5.8223e-04 Epoch 4/15 - 33s - loss: 4.1736e-04 - val_loss: 5.3324e-04 Epoch 5/15 - 33s - loss: 4.1975e-04 - val_loss: 4.8764e-04 Epoch 6/15 - 32s - loss: 4.2039e-04 - val_loss: 5.6192e-04 Epoch 7/15 - 33s - loss: 4.2695e-04 - val_loss: 5.1749e-04 Epoch 8/15 - 33s - loss: 4.1850e-04 - val_loss: 4.9385e-04 Epoch 9/15 - 33s - loss: 4.1721e-04 - val_loss: 5.5393e-04 Epoch 10/15 - 33s - loss: 4.1301e-04 - val_loss: 5.8216e-04 Epoch 11/15 - 33s - loss: 4.2301e-04 - val_loss: 5.2006e-04 Epoch 12/15 - 33s - loss: 4.2036e-04 - val_loss: 5.4130e-04 Epoch 13/15 - 33s - loss: 4.1821e-04 - val_loss: 5.4795e-04 Epoch 14/15 - 33s - loss: 4.1876e-04 - val_loss: 5.3383e-04 Epoch 15/15 - 34s - loss: 4.1531e-04 - val_loss: 5.1962e-04 learning rate: 0.0001, batch size: 128 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 31s - loss: 3.2330e-04 - val_loss: 5.1198e-04 Epoch 2/15 - 31s - loss: 3.1889e-04 - val_loss: 4.9657e-04 Epoch 3/15 - 31s - loss: 3.2018e-04 - val_loss: 4.8976e-04 Epoch 4/15 - 31s - loss: 3.2034e-04 - val_loss: 5.0701e-04 Epoch 5/15 - 31s - loss: 3.2066e-04 - val_loss: 5.0004e-04 Epoch 6/15 - 31s - loss: 3.2966e-04 - val_loss: 5.0102e-04 Epoch 7/15 - 31s - loss: 3.1973e-04 - val_loss: 5.0350e-04 Epoch 8/15 - 31s - loss: 3.1381e-04 - val_loss: 5.4086e-04 Epoch 9/15 - 31s - loss: 3.2387e-04 - val_loss: 5.2903e-04 Epoch 10/15 - 31s - loss: 3.1850e-04 - val_loss: 5.5065e-04 Epoch 11/15 - 31s - loss: 3.1808e-04 - val_loss: 4.8096e-04 Epoch 12/15 - 31s - loss: 3.1863e-04 - val_loss: 5.2388e-04 Epoch 13/15 - 31s - loss: 3.1807e-04 - val_loss: 5.3931e-04 Epoch 14/15 - 31s - loss: 3.1616e-04 - val_loss: 4.9281e-04 Epoch 15/15 - 31s - loss: 3.1603e-04 - val_loss: 5.0976e-04 learning rate: 1e-05, batch size: 32 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 53s - loss: 5.2582e-04 - val_loss: 4.8883e-04 Epoch 2/15 - 37s - loss: 5.3775e-04 - val_loss: 4.8798e-04 Epoch 3/15 - 37s - loss: 5.2523e-04 - val_loss: 4.8734e-04 Epoch 4/15 - 38s - loss: 5.2449e-04 - val_loss: 4.9500e-04 Epoch 5/15 - 38s - loss: 5.2029e-04 - val_loss: 5.0274e-04 Epoch 6/15 - 37s - loss: 5.2132e-04 - val_loss: 4.9915e-04 Epoch 7/15 - 38s - loss: 5.2519e-04 - val_loss: 4.9727e-04 Epoch 8/15 - 37s - loss: 5.1938e-04 - val_loss: 4.9617e-04 Epoch 9/15 - 37s - loss: 5.2184e-04 - val_loss: 4.9404e-04 Epoch 10/15 - 39s - loss: 5.2563e-04 - val_loss: 4.9367e-04 Epoch 11/15 - 39s - loss: 5.1756e-04 - val_loss: 5.0557e-04 Epoch 12/15 - 37s - loss: 5.3013e-04 - val_loss: 5.1196e-04 Epoch 13/15 - 37s - loss: 5.1983e-04 - val_loss: 4.9463e-04 Epoch 14/15 - 38s - loss: 5.2287e-04 - val_loss: 5.0045e-04 Epoch 15/15 - 39s - loss: 5.1602e-04 - val_loss: 4.9872e-04 learning rate: 1e-05, batch size: 64 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 33s - loss: 3.7353e-04 - val_loss: 4.9179e-04 Epoch 2/15 - 32s - loss: 3.6804e-04 - val_loss: 4.9222e-04 Epoch 3/15 - 33s - loss: 3.6788e-04 - val_loss: 4.9755e-04 Epoch 4/15 - 33s - loss: 3.6904e-04 - val_loss: 4.9759e-04 Epoch 5/15 - 33s - loss: 3.7299e-04 - val_loss: 4.9504e-04 Epoch 6/15 - 33s - loss: 3.6418e-04 - val_loss: 4.9044e-04 Epoch 7/15 - 33s - loss: 3.7106e-04 - val_loss: 4.7895e-04 Epoch 8/15 - 33s - loss: 3.7275e-04 - val_loss: 4.9623e-04 Epoch 9/15 - 32s - loss: 3.7050e-04 - val_loss: 4.8082e-04 Epoch 10/15 - 32s - loss: 3.6585e-04 - val_loss: 4.8842e-04 Epoch 11/15 - 33s - loss: 3.6892e-04 - val_loss: 4.9562e-04 Epoch 12/15 - 32s - loss: 3.6735e-04 - val_loss: 4.8865e-04 Epoch 13/15 - 33s - loss: 3.6610e-04 - val_loss: 4.9684e-04 Epoch 14/15 - 32s - loss: 3.7002e-04 - val_loss: 4.8802e-04 Epoch 15/15 - 33s - loss: 3.6745e-04 - val_loss: 4.8800e-04 learning rate: 1e-05, batch size: 128 Train on 28196 samples, validate on 500 samples Epoch 1/15 - 31s - loss: 2.9173e-04 - val_loss: 4.8797e-04 Epoch 2/15 - 31s - loss: 2.9230e-04 - val_loss: 4.9321e-04 Epoch 3/15 - 32s - loss: 2.9233e-04 - val_loss: 4.8913e-04 Epoch 4/15 - 31s - loss: 2.9298e-04 - val_loss: 4.9072e-04 Epoch 5/15 - 31s - loss: 2.9080e-04 - val_loss: 4.8714e-04 Epoch 6/15 - 31s - loss: 2.8702e-04 - val_loss: 4.9128e-04 Epoch 7/15 - 34s - loss: 2.9089e-04 - val_loss: 4.8643e-04 Epoch 8/15 - 37s - loss: 2.9104e-04 - val_loss: 4.8413e-04 Epoch 9/15 - 37s - loss: 2.9592e-04 - val_loss: 4.7780e-04 Epoch 10/15 - 37s - loss: 2.9318e-04 - val_loss: 4.9228e-04 Epoch 11/15 - 37s - loss: 2.9143e-04 - val_loss: 4.8395e-04 Epoch 12/15 - 37s - loss: 2.8865e-04 - val_loss: 4.8587e-04 Epoch 13/15 - 37s - loss: 2.9087e-04 - val_loss: 4.8365e-04 Epoch 14/15 - 37s - loss: 2.9062e-04 - val_loss: 4.7758e-04 Epoch 15/15 - 37s - loss: 2.9208e-04 - val_loss: 4.9122e-04
Question 1: Outline the steps you took to get to your final neural network architecture and your reasoning at each step.
Answer:
I also used 'elu' activation functions rather than 'relu'. I chose 'elu' for the output function as well. Since the keypoint values needed to be between -1 and 1 the tanh function was an option but, I decided to stay with the 'elu' function. This was because the 'elu' has a higher gradient for larger values(the gradient approaches zero for large values of tanh), at least in the positive direction. This helps the backpropogation of the error. I tested the tanh function on the output and it was the same if not worse, so I stayed with the 'elu'.
After confirming that this architecture worked reasonably well, I added a small amount of dropout to the first 3 blocks. This gave a slight increase in performance. I then added a batch normalization after each convolutional layer as another form of regularization. This also improved the accuracy of the model.
For the training process I progressivly increased the batch size and decreased the learning rate. I found that a lower error was achieved when starting with a batch size of 32 and a learning rate of 0.001 while gradually increasing the batch size, then decreasing the learning rate and repeating the batch size increase as compared to just starting at a lower learning rate with a larger batch size. The Adam optimizer was used.
With the final model architecture figured out I decided to augment the data. For this data set I flipped the data horizontally using the process outlined in the blog post linked above. I also added scaling and rotation effects to the augmentation. This was somewhat challenging because not only did the augmentation require transforming the images, but also augmenting the keypoints and having the keypoints match the new transformed image.
With a now quadrupled data set, I was able to get the models mean squared error down to around 0.0007. My next step was to make use of the incomplete data of the original data set. This data had keypoint labels, but not for all 15 facial landmarks. To use this data, I took my already quite accurate model and generated predictions for the keypoints of the full data set of faces. Then using the labels for this data, I replaced the coordinates in the prediction with the keypoints the did have labels. I then augmented this data set and trained a model, having the same architecture, from scratch. Basically this model was trained using the predictions of the full data set rather than the incomplete labels, however the predictions associated with the coordinates labels that did exist were replaced with the true labels. It should also be noted that the original complete data set was included in this full dataset. This method improved the mean squared error rate down to around 0.00045
Question 2: Defend your choice of optimizer. Which optimizers did you test, and how did you determine which worked best?
Answer: I chose the Adam optimizer. It works well and none of the other models out performed it. I tried all the optimizers on the first versions of the model and Adam was as good if not slightly better than the others. I also tried RMSprop on the full training process and it performed slightly worse than Adam so I stayed with Adam.
Use the code cell below to plot the training and validation loss of your neural network. You may find this resource useful.
training_hist = validation
validation_hist = full_val_loss[0]
for item in full_val_loss[1:]:
validation_hist = validation_hist + item
## TODO: Visualize the training and validation loss of your neural network
plt.plot(training_hist)
plt.plot(validation_hist)
plt.title('Model Error')
plt.ylabel('Mean Squared Error')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.show()
plt.plot(training_hist[37:])
plt.plot(validation_hist[37:])
plt.title('Error over final 100 epochs')
plt.ylabel('Mean Squared Error')
plt.xlabel('Epoch')
plt.legend(['Training', 'Validation'], loc='upper right')
plt.show()
Question 3: Do you notice any evidence of overfitting or underfitting in the above plot? If so, what steps have you taken to improve your model? Note that slight overfitting or underfitting will not hurt your chances of a successful submission, as long as you have attempted some solutions towards improving your model (such as regularization, dropout, increased/decreased number of layers, etc).
Answer:
Training on the original model there was overfitting. Adding Batch Normalization and dropout helped to negate this, but it did not completely eliminate it.
Augmenting the data had a significant effect on reducing any over fitting as well as making use of teh full dataset. All combined the model was quite resistant to overfitting.
Execute the code cell below to visualize your model's predicted keypoints on a subset of the testing images.
y_predict = model.predict(X_test)
fig = plt.figure(figsize=(20,20))
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
for i in range(9):
ax = fig.add_subplot(3, 3, i + 1, xticks=[], yticks=[])
plot_data(X_test[i+10], y_predict[i+10], ax)
With the work you did in Sections 1 and 2 of this notebook, along with your freshly trained facial keypoint detector, you can now complete the full pipeline. That is given a color image containing a person or persons you can now
In this Subsection you will do just this!
Use the OpenCV face detection functionality you built in previous Sections to expand the functionality of your keypoints detector to color images with arbitrary size. Your function should perform the following steps
Note: step 4 can be the trickiest because remember your convolutional network is only trained to detect facial keypoints in $96 \times 96$ grayscale images where each pixel was normalized to lie in the interval $[0,1]$, and remember that each facial keypoint was normalized during training to the interval $[-1,1]$. This means - practically speaking - to paint detected keypoints onto a test face you need to perform this same pre-processing to your candidate face - that is after detecting it you should resize it to $96 \times 96$ and normalize its values before feeding it into your facial keypoint detector. To be shown correctly on the original image the output keypoints from your detector then need to be shifted and re-normalized from the interval $[-1,1]$ to the width and height of your detected face.
When complete you should be able to produce example images like the one below
import cv2
import matplotlib.pyplot as plt
import numpy as np
# Load in color image for face detection
image = cv2.imread('images/obamas4.jpg')
# Convert the image to RGB colorspace
image_copy = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# plot our image
fig = plt.figure(figsize = (9,9))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('image copy')
ax1.imshow(image_copy)
<matplotlib.image.AxesImage at 0x7f27545c5860>
### TODO: Use the face detection code we saw in Section 1 with your trained conv-net
def face_keypoint_detector(image):
'''
Takes in an image(BGR) and plots the facial bounding box and keypoints on the image.
Returns the new image, the face bounding box coordinates and the keypoint coordinates.
'''
# Convert image to grayscale
image_copy = np.copy(image)
gray = cv2.cvtColor(image_copy, cv2.COLOR_BGR2GRAY)
# Detect faces
face_cascade = cv2.CascadeClassifier('detector_architectures/haarcascade_frontalface_default.xml')
faces = face_cascade.detectMultiScale(gray, 2.75,8)
faces_keypoints = []
# Loop through faces
for i, (x,y,w,h) in enumerate(faces):
cv2.rectangle(image_copy, (x,y), (x+w,y+h),(255,120,150), 3)
# Crop Faces
face = gray[y:y+h, x:x+w]
# Scale Faces to 96x96
scaled_face = cv2.resize(face, (96,96), 0, 0, interpolation=cv2.INTER_AREA)
# Normalize images to be between 0 and 1
input_image = scaled_face / 255
# Format image to be the correct shape for the model
input_image = np.expand_dims(input_image, axis = 0)
input_image = np.expand_dims(input_image, axis = -1)
# Use model to predict keypoints on image
landmarks = model.predict(input_image)[0]
# Adjust keypoints to coordinates of original image
landmarks[0::2] = landmarks[0::2] * w/2 + w/2 + x
landmarks[1::2] = landmarks[1::2] * h/2 + h/2 + y
faces_keypoints.append(landmarks)
# Paint keypoints on image
for point in range(15):
cv2.circle(image_copy, (landmarks[2*point], landmarks[2*point + 1]), 2, (255, 255, 0), -1)
return image_copy, faces, faces_keypoints
img, faces, keypoints = face_keypoint_detector(image)
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
fig = plt.figure(figsize=(20,20))
ax = fig.add_subplot(111)
ax.set_xticks([])
ax.set_yticks([])
ax.imshow(img)
<matplotlib.image.AxesImage at 0x7f2754598780>
Now you can add facial keypoint detection to your laptop camera - as illustrated in the gif below.
The next Python cell contains the basic laptop video camera function used in the previous optional video exercises. Combine it with the functionality you developed for keypoint detection and marking in the previous exercise and you should be good to go!
import cv2
import time
from keras.models import load_model
def laptop_camera_go():
# Create instance of video capturer
cv2.namedWindow("face detection activated")
vc = cv2.VideoCapture(0)
# Try to get the first frame
if vc.isOpened():
rval, frame = vc.read()
else:
rval = False
# keep video stream open
while rval:
frame, _, _ = face_keypoint_detector(frame)
# plot image from camera with detections marked
cv2.imshow("face detection activated", frame)
# exit functionality - press any key to exit laptop video
key = cv2.waitKey(20)
if key > 0: # exit by pressing any key
# destroy windows
cv2.destroyAllWindows()
# hack from stack overflow for making sure window closes on osx --> https://stackoverflow.com/questions/6116564/destroywindow-does-not-close-window-on-mac-using-python-and-opencv
for i in range (1,5):
cv2.waitKey(1)
return
# read next frame
time.sleep(0.05) # control framerate for computation - default 20 frames per sec
rval, frame = vc.read()
# Run your keypoint face painter
laptop_camera_go()
Using your freshly minted facial keypoint detector pipeline you can now do things like add fun filters to a person's face automatically. In this optional exercise you can play around with adding sunglasses automatically to each individual's face in an image as shown in a demonstration image below.
To produce this effect an image of a pair of sunglasses shown in the Python cell below.
# Load in sunglasses image - note the usage of the special option
# cv2.IMREAD_UNCHANGED, this option is used because the sunglasses
# image has a 4th channel that allows us to control how transparent each pixel in the image is
sunglasses = cv2.imread("images/sunglasses_4.png", cv2.IMREAD_UNCHANGED)
# Plot the image
fig = plt.figure(figsize = (6,6))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.imshow(sunglasses)
ax1.axis('off');
This image is placed over each individual's face using the detected eye points to determine the location of the sunglasses, and eyebrow points to determine the size that the sunglasses should be for each person (one could also use the nose point to determine this).
Notice that this image actually has 4 channels, not just 3.
# Print out the shape of the sunglasses image
print ('The sunglasses image has shape: ' + str(np.shape(sunglasses)))
The sunglasses image has shape: (1123, 3064, 4)
It has the usual red, blue, and green channels any color image has, with the 4th channel representing the transparency level of each pixel in the image. Here's how the transparency channel works: the lower the value, the more transparent the pixel will become. The lower bound (completely transparent) is zero here, so any pixels set to 0 will not be seen.
This is how we can place this image of sunglasses on someone's face and still see the area around of their face where the sunglasses lie - because these pixels in the sunglasses image have been made completely transparent.
Lets check out the alpha channel of our sunglasses image in the next Python cell. Note because many of the pixels near the boundary are transparent we'll need to explicitly print out non-zero values if we want to see them.
# Print out the sunglasses transparency (alpha) channel
alpha_channel = sunglasses[:,:,3]
print ('the alpha channel here looks like')
print (alpha_channel)
# Just to double check that there are indeed non-zero values
# Let's find and print out every value greater than zero
values = np.where(alpha_channel != 0)
print ('\n the non-zero values of the alpha channel look like')
print (values)
the alpha channel here looks like [[0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] ... [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0] [0 0 0 ... 0 0 0]] the non-zero values of the alpha channel look like (array([ 17, 17, 17, ..., 1109, 1109, 1109]), array([ 687, 688, 689, ..., 2376, 2377, 2378]))
This means that when we place this sunglasses image on top of another image, we can use the transparency channel as a filter to tell us which pixels to overlay on a new image (only the non-transparent ones with values greater than zero).
One last thing: it's helpful to understand which keypoint belongs to the eyes, mouth, etc. So, in the image below, we also display the index of each facial keypoint directly on the image so that you can tell which keypoints are for the eyes, eyebrows, etc.
With this information, you're well on your way to completing this filtering task! See if you can place the sunglasses automatically on the individuals in the image loaded in / shown in the next Python cell.
# Load in color image for face detection
image = cv2.imread('images/obamas4.jpg')
# Convert the image to RGB colorspace
image_copy = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
# Plot the image
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Original Image')
ax1.imshow(image_copy)
<matplotlib.image.AxesImage at 0x7f27478d0940>
_, _, keypoints = face_keypoint_detector(image)
keypoints
[array([302.23444, 146.67249, 231.72122, 147.27019, 286.45847, 143.37564, 315.04388, 143.09003, 245.29807, 142.91345, 218.50731, 144.12323, 282.35477, 129.80107, 325.20654, 137.46405, 252.20038, 129.1363 , 207.4618 , 135.86952, 268.99786, 194.23145, 302.07208, 210.15956, 230.3615 , 210.61688, 267.00812, 209.64203, 265.9977 , 223.39471], dtype=float32), array([484.26746, 209.16016, 419.3915 , 208.65985, 471.70468, 208.51385, 500.25055, 204.83105, 431.85425, 208.68027, 406.23657, 208.44183, 466.62384, 193.1633 , 510.0116 , 187.74367, 434.3417 , 195.7225 , 394.0462 , 190.4181 , 451.1512 , 252.45204, 488.61578, 263.24664, 421.51688, 266.55023, 453.2976 , 269.7915 , 452.75894, 281.74936], dtype=float32)]
## (Optional) TODO: Use the face detection code we saw in Section 1 with your trained conv-net to put
## sunglasses on the individuals in our test image
def overlay_sunglasses(keypoints, image):
'''
Adds sunglasses to a persons face.
'''
sunglasses = cv2.imread("images/sunglasses_4.png", cv2.IMREAD_UNCHANGED)
for i in range(len(keypoints)):
# resize sunglasses image to match eyebrow keypoints
glasses_width = 1.1*(keypoints[i][14] - keypoints[i][18])
scale_factor = glasses_width/sunglasses.shape[1]
sg = cv2.resize(sunglasses,None, fx=scale_factor, fy = scale_factor, interpolation=cv2.INTER_AREA)
width = sg.shape[1]
height = sg.shape[0]
# top left corner of sunglasses: x coordinate = average x coordinate of eyes - width/2
x1 = int((keypoints[i][2] + keypoints[i][0])/2 - width/2)
x2 = x1 + width
# top left corner of sunglasses: y coordinate = average y coordinate of eyes - height/2
y1 = int((keypoints[i][3] + keypoints[i][1])/2 - height/3)
y2 = y1 + height
# Create an alpha mask based on the transparency values
alpha_sun = np.expand_dims(sg[:, :, 3]/255.0, axis=-1)
alpha_face = 1.0 - alpha_sun
# Take a weighted sum of the image and the sunglasses using the alpha values and (1- alpha)
image[y1:y2, x1:x2] = (alpha_sun * sg[:, :, :3] + alpha_face * image[y1:y2, x1:x2])
return image
img = overlay_sunglasses(keypoints, image_copy)
# Plot the image
fig = plt.figure(figsize = (8,8))
ax1 = fig.add_subplot(111)
ax1.set_xticks([])
ax1.set_yticks([])
ax1.set_title('Original Image')
ax1.imshow(image_copy)
<matplotlib.image.AxesImage at 0x7f274786af98>
Now you can add the sunglasses filter to your laptop camera - as illustrated in the gif below.
The next Python cell contains the basic laptop video camera function used in the previous optional video exercises. Combine it with the functionality you developed for adding sunglasses to someone's face in the previous optional exercise and you should be good to go!
import cv2
import time
from keras.models import load_model
import numpy as np
def laptop_camera_go():
# Create instance of video capturer
cv2.namedWindow("face detection activated")
vc = cv2.VideoCapture(0)
# try to get the first frame
if vc.isOpened():
rval, frame = vc.read()
else:
rval = False
# Keep video stream open
while rval:
_, _, keypoints = face_keypoint_detector(frame)
if len(keypoints) > 0:
frame = overlay_sunglasses(keypoints, frame)
# Plot image from camera with detections marked
cv2.imshow("face detection activated", frame)
# Exit functionality - press any key to exit laptop video
key = cv2.waitKey(20)
if key > 0: # exit by pressing any key
# Destroy windows
cv2.destroyAllWindows()
for i in range (1,5):
cv2.waitKey(1)
return
# Read next frame
time.sleep(0.05) # control framerate for computation - default 20 frames per sec
rval, frame = vc.read()
# Load facial landmark detector model
model = load_model('my_model.h5')
# Run sunglasses painter
laptop_camera_go()