Notebook

Active Appearance Models¶

By Abhijeet Kislay (GitHub ID: kislayabhi)¶

Here we have developed a parameterized face needed by the Active Appearance Model (AAM), where the variability of shape and texture is captured from a representative training set. PCA on shape and texture is applied to produce the face model which describes the trained faces with a photorealistic quality.

Background¶

AAM is an object model containing statistical information of its shape and texture. It is a powerful way of capturing shape and texture variation from objects. It comes along with an efficient search algorithm that can tell exactly where and how a model is located in a picture frame.

The two models needed to make an AAM are:

Shape Models
Texture Models

In [1]:

#following libraries will be used here

#import Opencv library
try:
    import cv2
except ImportError:
    print "You must have OpenCV installed"

import matplotlib.pyplot as plt
import numpy as np
from modshogun import *
%matplotlib inline 

#Use the following function when reading an image through OpenCV and displaying through plt.
def showfig(image, ucmap):
    #There is a difference in pixel ordering in OpenCV and Matplotlib.
    #OpenCV follows BGR order, while matplotlib follows RGB order.
    if len(image.shape)==3 :
        b,g,r = cv2.split(image)       # get b,g,r
        image = cv2.merge([r,g,b])     # switch it to rgb
    imgplot=plt.imshow(image, ucmap)
    imgplot.axes.get_xaxis().set_visible(False)
    imgplot.axes.get_yaxis().set_visible(False)

Shape Model¶

As mentioned previously, AAMs require a shape model, and this role is played by Active Shape Models (ASMs). In the following section, we have described the method to build it.

The shape model is generated through the combination of shape variations and for that a training set of annotated images is required. That is we need to have several images marked with points on key positions of a face which outlines the main features.

Few of the training images with the landmark points are shown below. There are 68 landmarks on a face. These are usually marked up by hand and they outline several face features, such as mouth contour, nose, eyes, eyebrows, and face shape since they are easier to track:

In [2]:

plt.rcParams['figure.figsize'] = 17, 7

fig=plt.figure()
for i in xrange(1,4):
    image_path="../../../data/AAM/images/%d.jpg"%i
    tim_cootes=cv2.imread(image_path)
    axes=fig.add_subplot(1,3,i)
    data=np.loadtxt('../../../data/AAM/points/%d.pts'%i, usecols=range(2))
    x=data[:,0]
    y=data[:,1]
    axes.plot(x,y,'*')
    showfig(tim_cootes, None)

The procedures to build an ASM are as follows:

Aligning shapes to a common frame using Generalized Procrustes Analysis
Modeling the dataset shape variation using PCA

Generalized Procrustes Analysis¶

The alignment of two shapes consists on finding the Similarity parameters (scale, rotation and translation) that best match one shape to another by minimizing a given metric. The classical solution of align two shapes is the Procrustes Analysis method. It align shapes with the same number of landmarks with one-to-one point correspondences, which is sufficient for the AAM standard formulation.

The landmark points that we are provided right now are not aligned in a way that helps us to build an ASM. This can be seen in the following figure:

In [3]:

plt.rcParams['figure.figsize'] = 7, 7 
figure, axis=plt.subplots(1,1)
plt.xlim(0, 480)
plt.ylim(0, 480)
plt.title("The landmark points for all the face images in the training set")
for i in xrange(1,26):
    Y=np.loadtxt('../../../data/AAM/points/%d.pts'%i, usecols=range(2))
    axis.plot(640-Y[:,0], 480-Y[:,1],'+', color='blue', markersize=5)
    axis.set_xticks([])
    axis.set_yticks([])

A Generalized Procrustes Analysis(GPA) consists of sequentially aligning pairs of shapes with Procrustes using the reference shape (the mean shape) and align others to it.

To facilitate this iterative process, we define a procrustes function below:

In [4]:

def procrustes(X, Y):
    """
    Inputs:
    ------------
    X, Y    
        matrices of target and input coordinates. they must have equal
        numbers of  points (rows)

    Outputs
    ------------
    Z
        the matrix of transformed Y-values
    """

    n,m = X.shape
    ny,my = Y.shape

    muX = X.mean(0)
    muY = Y.mean(0)

    X0 = X - muX
    Y0 = Y - muY

    ssX = (X0**2.).sum()
    ssY = (Y0**2.).sum()

    # centred Frobenius norm
    normX = np.sqrt(ssX)
    normY = np.sqrt(ssY)

    # scale to equal (unit) norm
    X0 /= normX
    Y0 /= normY

    # optimum rotation matrix of Y
    A = np.dot(X0.T, Y0)
    U,s,Vt = np.linalg.svd(A,full_matrices=False)
    V = Vt.T
    T = np.dot(V, U.T)

    traceTA = s.sum()

    # transformed coords
    Z = normX*traceTA*np.dot(Y0, T) + muX

    return Z

At the beginning any shape can be chosen to be the initial mean. After the alignment a new estimate for the mean is recomputed and again the shapes are aligned to this mean. The procedure is performed repeatedly until the mean shape don't change significantly within iterations. Normally this procedure converges in two iterations.

In [5]:

# ****GPA*****
# choose any shape as the intial mean
X=np.loadtxt('1.pts', usecols=range(2))

# two iterations are enough 
for j in xrange(2):
    Y_new=[]
    
    #our training set has 25 images numbered 1,2,..,24,25
    for i in xrange(1,26):
        Y=np.loadtxt('%d.pts'%i, usecols=range(2))
        z=procrustes(X, Y)
        Y_new.append(z) 
    #recompute the mean
    X=sum(Y_new)/float(len(Y_new)) 

Lets check the output after applying the GPA over the landmark points that we had:

In [6]:

plt.rcParams['figure.figsize'] = 7, 7 
Y_new[0].shape
figure, axis=plt.subplots(1,1)
plt.title("The GPA processed landmark points")
plt.xlim(0, 480)
plt.ylim(0, 480)
for i in xrange(25):    
    axis.plot(640-Y_new[i][:,0], 480-Y_new[i][:,1],'+', color='blue', markersize=5)
    axis.set_xticks([])
    axis.set_yticks([])

After aligning these landmarks using GPA, we will arrange them in the following format where the $k$ aligned landmarks in the two dimensions are given as:

$X = (x_1,y_1,...,x_k,y_k)$

It is important to note that each landmark is consistently labelled across all the training images. So, for instance, if the left part of the mouth is landmark number 3 in the first image, it will need to be number 3 in all the other images.

Principal Component Analysis (PCA)¶

The PCA is a statistical technique that allows data dimension reduction. This process searches for directions in the data that has largest variance and subsequently project the data onto it. For more details checkout this notebook on PCA.

Applying PCA on the previously aligned data, the statistical shape variation can be modeled with:

$\mathbf{x}=\mathbf{\bar{x}}+\mathbf{E_s}\mathbf{y_s}$

where new shapes $\mathbf{x}$ , are synthesised by deforming the mean shape, $\mathbf{\bar{x}}$ , using a weighted linear combination of eigenvectors of the covariance matrix, $\mathbf{E_s}$ . $\mathbf{y_s}$ is a vector of shape parameters which represent the weights. $\mathbf{E_s}$ holds $t_s$ most important eigenvectors that explain a user defined variance. It is possible to recover the shape parameters associated with each shape by

$\mathbf{y_s}=\mathbf{E_s}^T(\mathbf{x}-\mathbf{\bar{x}})$

where $\mathbf{y_s}$ vector defines a set of deformable model parameters.

In [7]:

obs_matrix=[]

#arrange the data in the format (x1,y1,...,xk,yk)
for i in xrange(25):
    obs_matrix.append(np.hstack(Y_new[i]))

#prepare the observation matrix for PCA    
obs_matrix=np.array(obs_matrix).T

#convert it into Shogun RealFeatures format
train_features=RealFeatures(np.array(obs_matrix))

#set the PCA mode to AUTO
preprocessor=PCA(AUTO)

#set the number of eigenvectors to hold onto, here three
preprocessor.set_target_dim(24)

#all set. Run it
preprocessor.init(train_features)
mean=preprocessor.get_mean()

#get the eigenvectors of the covariance marix
Es=preprocessor.get_transformation_matrix()
#get the eigenvalues of the covariance matrix
eigenvalues_s=preprocessor.get_eigenvalues()

#project the data to their principal components
ys=preprocessor.apply_to_feature_matrix(train_features)

The variance of the parameter $y_{si}: i = 1,..., ts$ over the training set is given by $λ_i$ . Limiting $y_{si}$ between $±3 \sqrt{λ_i} = ±3σ_i$ is safe that the shape generated is similar to the ones in the training set.

The following images shows the first three shape parameters varied between $[-3\sigma_i, +3\sigma_i]$ . The first variation mode, as expected displays more information associated, causing a bigger movement between the landmark position. The lower significatives modes cause a more local variation.

In [8]:

plt.rcParams['figure.figsize'] = 17,5

shape_container=[]

for parameter_no in xrange(3):
    std_dev=np.int(eigenvalues_s[parameter_no]**0.5)
    figure=plt.figure()
    deviation_data=np.array([-3*std_dev, -1.5*std_dev, 0, 1.5*std_dev, 3*std_dev])
    title_data=np.array(['-3','-1.5','0','+1.5','+3'])
    plot_no=0
    for deviation in deviation_data:
        
        plot_no=plot_no+1
        new_ys=np.copy(ys)
        
        #vary one of the parameter
        new_ys[parameter_no,1]=deviation
        
        # reconstruct the shape landmarks,but with the distorted ys
        reconstructed_image=np.hstack(np.dot(Es,new_ys[:,1]))+mean   
        image=np.resize(reconstructed_image,[68,2])
        shape_container.append(image)
        
        axis=figure.add_subplot(1,5,plot_no)
        axis.plot(600-image[:,0], 600-image[:,1])
        plt.title((title_data[plot_no-1])+' $\sigma %d$'%parameter_no)
        axis.set_xticks([])
        axis.set_yticks([])

Texture Model¶

The texture is defined as the pixel intensities over the modeled entity. Here we have described the procedures to build a statistical texture model.

Similarly to the shape model, where all the shapes are previously aligned into a common frame, the texture model requires the alignment of all texture samples to a reference texture frame also.
The texture is mapped in a way that the control points from each sample match the control points of a suitable reference frame, the mean shape.
Delaunay triangulation is used in the mean shape control points to establish triangles that will be used to map pixel intensities by Piece-wise Affine Warping.
A statistical texture model is build using Principal Components Analysis, describing the texture in a condensed model.

Texture Mapping - Warping¶

The texture mapping is performed by using a piece-wise affine warp, i.e. partitioning the convex hull of the mean shape by a set of triangles using the Delaunay triangulation.

In mathematics and computational geometry, a Delaunay triangulation for a set P of points in a plane is a triangulation such that no point in P is inside the circumcircle of any triangle formed. Delaunay triangulation maximize the minimum angle of all the triangles in the triangulation. They tend to avoid skinny triangles.

Below we have shown the Delaunay triangulation on the set of training face images with their annotated landmarks. The concept is very simple. We will create triangles including our annotated points and then map from one triangle to another.

In [9]:

import matplotlib.delaunay as md
plt.rcParams['figure.figsize'] = 17,8
fig=plt.figure()

for i in reversed(xrange(1,7)):
    image_path="../../../data/AAM/images/%d.jpg"%i      
    tim_cootes=cv2.imread(image_path)
    tim_cootes=cv2.cvtColor(tim_cootes, cv2.COLOR_BGR2GRAY)

    dest_data=np.loadtxt('../../../data/AAM/points/%d.pts'%i, usecols=range(2))
    dest_x=dest_data[:,0]
    dest_y=dest_data[:,1]
    _,_,dest_triangles,_=md.delaunay(dest_x, dest_y)
    axes=fig.add_subplot(2,3,i)
    for t in dest_triangles:
        t_ext=[t[0], t[1], t[2], t[0]] 
        axes.plot(dest_x[t_ext], dest_y[t_ext],'r')

    axes.plot(dest_x,dest_y,'*')
    showfig(tim_cootes, plt.get_cmap('gray'))

A piecewise Affine Warp is the texture mapping procedure where each pixel from the image to sample, belonging to a specific triangle, is mapped into the respective destination triangle in the mean shape frame.

The following shows the result of warping all the mapped triangles in the left image to a mean reference frame

In [10]:

mask_container=[]

plt.rcParams['figure.figsize'] = 15,17
fig=plt.figure()
plot_no=1
for n in xrange(1,26):
    
    image_path="../../../data/AAM/images/%d.jpg"%n
    tim_cootes=cv2.imread(image_path)
    tim_cootes=cv2.cvtColor(tim_cootes, cv2.COLOR_BGR2GRAY)
    
    axes=fig.add_subplot(9,6,plot_no)
    showfig(tim_cootes,plt.get_cmap('gray'))

    s_data=np.loadtxt('../../../data/AAM/points/%d.pts'%n, usecols=range(2))
    s_x=s_data[:,0]
    s_y=s_data[:,1]
    warp_final=np.uint8(np.zeros(tim_cootes.shape))
    
    for i in xrange(dest_triangles.shape[0]):

        # source is the co-ordinates of the triangle from the current training image
        source = s_data[dest_triangles[i]]
        
        # destination is the co-ordinates of the triangle from the mean reference image
        destination = X[dest_triangles[i]]
        
        #calculate an affine transform from three pairs of the corresponding points.
        #a 2x3 affine transform matrix is returned.
        warp_mat=cv2.getAffineTransform(np.float32(source), np.float32(destination))
        
        #applies an affine transformation on the training image
        output_warp=cv2.warpAffine(tim_cootes, warp_mat, (640,480))
        
        #we only need the current triangle from the output_warp
        warp_mask=np.uint8(np.zeros(tim_cootes.shape))
        destination=np.int32(destination)
        #fill the warp_mask with white color in the region surrounded by the destination triangle 
        cv2.fillConvexPoly(warp_mask, destination, 255) 
        #copy that filled area from the output_warp and paste it in the warp_final
        bool_index=(warp_mask==255)
        warp_final[bool_index]=output_warp[bool_index]

    axes=fig.add_subplot(9,6, plot_no+1)
    showfig(warp_final,plt.get_cmap('gray'))
    plot_no=plot_no+2
    mask_container.append(warp_final)

Apply PCA the same way we did previously with the shape models

In [11]:

for i in xrange(25):   
    a=mask_container[i]
    mask_container[i]=a.flatten()
mask_vstack=np.double(np.vstack(mask_container))

#form the observation matrix    
obs_matrix=mask_vstack.T

#apply PCA just the way we did in shape models
train_features = RealFeatures(obs_matrix)
preprocessor=PCA(AUTO)
preprocessor.set_target_dim(24)
preprocessor.init(train_features)
mean=preprocessor.get_mean()
Es=preprocessor.get_transformation_matrix()
eigenvalues_g=preprocessor.get_eigenvalues()
yg=preprocessor.apply_to_feature_matrix(train_features)

Here we again vary the PCA weights just like previously did with the shape models. We see subtle variations in texture appearing in the face images.

In [12]:

plt.rcParams['figure.figsize'] = 17,5
texture_container=[]

for parameter_no in xrange(3):
    std_dev=np.int(eigenvalues_g[parameter_no]**0.5)
    figure=plt.figure()
    deviation_data=np.array([-3*std_dev, -1.5*std_dev, 0, 1.5*std_dev, 3*std_dev])
    title_data=np.array(['-3','-1.5','0','+1.5','+3'])
    plot_no=0
    for deviation in deviation_data:
        plot_no=plot_no+1
        new_yg=np.copy(yg)
        new_yg[parameter_no,4]=deviation
        reconstructed_image=np.hstack(np.dot(Es, new_yg[:,4]))+mean    
        image=np.resize(reconstructed_image, [480,640])
        texture_container.append(image)
        axis=figure.add_subplot(1,5,plot_no)
        plt.title((title_data[plot_no-1])+' $\sigma %d$'%parameter_no)
        showfig(image, plt.get_cmap('gray'))

Combining the two models together¶

Building an AAM instance together is based on the combined statistical model of shape and texture. An AAM instance is built by generating the texture in the mean reference frame and warping it to the control points given by the different shape models.

In [13]:

plt.rcParams['figure.figsize'] = 17,7
fig=plt.figure()
for plot_no in xrange(15):
    
    texture=texture_container[5*int(np.floor(plot_no/5))]
    dest_image_points=shape_container[plot_no%5]

    dest_image_x=dest_image_points[:,0]
    dest_image_y=dest_image_points[:,1]
    _,_,new_dest_triangles,_=md.delaunay(dest_image_x, dest_image_y)
    
    axes=fig.add_subplot(3,5,plot_no+1)
    for t in new_dest_triangles:
        t_ext=[t[0], t[1], t[2], t[0]] 
    
    warp_final=np.uint8(np.zeros(texture.shape))
    
    for i in xrange(new_dest_triangles.shape[0]):

        # source is the co-ordinates of the triangle from the current training image.
        # since the current training image is bounded by the mean reference frame, we use X
        source = X[new_dest_triangles[i]]

        # destination is the co-ordinates of the triangle from the current choosen shape model
        destination = dest_image_points[new_dest_triangles[i]]

        # calculate an affine transform from three pairs of the corresponding points.
        # a 2x3 affine transform matrix is returned.
        warp_mat=cv2.getAffineTransform(np.float32(source), np.float32(destination))

        # applies an affine transformation on the current texture image
        output_warp=cv2.warpAffine(texture, warp_mat, (640,480))

        # we only need the current triangle from the output_warp
        warp_mask=np.uint8(np.zeros(texture.shape))
        destination=np.int32(destination)
        # fill the warp_mask with white color in the region surrounded by the destination triangle 
        cv2.fillConvexPoly(warp_mask, destination, 255) 
        # copy that filled area from the output_warp and paste it in the warp_final
        bool_index=(warp_mask==255)
        warp_final[bool_index]=output_warp[bool_index]

    showfig(warp_final,plt.get_cmap('gray'))

References:¶

[1]. Active Appearance Models, T.F. Cootes, G. J. Edwards, and C. J. Taylor, ECCV, 2:484–498, 1998

[2]. Active Shape Models – Their Training and Application, T.F. Cootes, C.J. Taylor, D.H. Cooper, and J. Graham, Computer Vision and Image Understanding, (61):38–59, 1995

[3]. Active Appearance Models for Facial Expression Recognition and Monocular Head Pose Estimation Master Thesis, P. Martins, 2008

In [13]: