FACE RECOGNITION
IN HYPER FACIAL FEATURE SPACE
A Project
Presented to the faculty of the Department of Computer Science
California State University, Sacramento
Submitted in partial satisfaction of
the requirements for the degree of
MASTER OF SCIENCE
in
Computer Science
by
Amer Harb
FALL
2013
© 2013
Amer Harb
ALL RIGHTS RESERVED
ii
FACE RECOGNITION
IN HYPER FACIAL FEATURE SPACE
A Project
by
Amer Harb
Approved by:
__________________________________, Committee Chair
Scott Gordon
__________________________________, Second Reader
Behnam Arad
____________________________
Date
iii
Student: Amer Harb
I certify that this student has met the requirements for format contained in the
University format manual, and that this project is suitable for shelving in the Library
and credit is to be awarded for the project.
__________________________, Graduate Coordinator
___________________
Nikrouz Faroughi
Date
Department of Computer Science
iv
Abstract
of
FACE RECOGNITION
IN HYPER FACIAL FEATURE SPACE
by
Amer Harb
A proof of concept was achieved in this project, in which the dimensions of a
hyperspace were built, denoted “face hyperspace”, out of facial features. Each
dimension in that space represents a facial feature or sub-feature. A vector (point) in
that pace can represent a face. The vector is to some degree tolerant to lighting and
contrast variations.
For every feature, classifiers were built that process feature images to produce a number
representing a feature class. To recognize a face image, it is passed through a face
feature extraction routine that identifies and extracts each face feature in an image
separately. Next each image feature is passed into a classifier, which in turn generates
group numbers representing the face features. Finally, a vector representing the image is
built by concatenating the classifier outputs of all features. The vector is compared to
vectors in an image database for a matching vector using Euclidean distance.
_______________________, Committee Chair
Scott Gordon
_______________________
Date
v
TABLE OF CONTENTS
Chapter
Page
1 INTRODUCTION ........................................................................................................... 1
1.1 Motivation ........................................................................................................ 2
1.2 Limitations ........................................................................................................ 2
2 BACKGROUND ............................................................................................................. 3
2.1 Active Shape Model (ASM) ............................................................................. 3
2.2 Artificial Neural Network (ANN) .................................................................... 4
3 ACTIVE SHAPE MODEL (ASM) ................................................................................. 7
3.1 Shape Model ..................................................................................................... 7
3.2 Shape ................................................................................................................ 8
3.3 Aligning Shapes................................................................................................ 9
3.4 Procrustes Analysis .......................................................................................... 9
3.5 Align Shapes ................................................................................................... 10
3.6 Scaling shapes: ............................................................................................... 12
3.7 Rotate Shapes ................................................................................................. 14
3.8 Principal Components Analysis ..................................................................... 16
3.9 Profile Model .................................................................................................. 18
3.10 New Image Profile ........................................................................................ 21
4 EXTRACT FEATURE IMAGES ................................................................................. 23
4.1 Feature Attributes ........................................................................................... 25
vi
4.2 Build Neural Networks Training Data ........................................................... 27
4.2.1
Scale images .................................................................................27
4.2.2
Extract Input and Output training Data………….…….................27
4.3 Neural Networks ............................................................................................. 29
4.4 Neural Networks Topology ............................................................................ 29
4.5 Classifiers Collection...................................................................................... 30
4.6 Recognizing a face ......................................................................................... 31
4.7 Face Detection ................................................................................................ 33
4.8 Initialize Face Landmarks .............................................................................. 33
4.9 Locate Facial Features .................................................................................... 34
5 EXTRACT FEATURE IMAGES ..................................................................... ………37
5.1 Build Face Vector ........................................................................................... 37
5.2 Face Database ................................................................................................. 37
5.3 Face Vector Search ......................................................................................... 38
6 RESULTS ...................................................................................................................... 40
6.1 Neural Network Training Results ................................................................... 40
6.2 Generalization ................................................................................................. 41
6.3 Lighting and contrast tolerance ...................................................................... 43
7 CONCLUSION ............................................................................................................. 46
8 FUTURE WORK .......................................................................................................... 47
References………………...………………………………………………………………48
vii
List of figures
Figures
Page
1: Neural network example showing a possible structure of a neural network having a
fully connected input, one hidden and an output layers. .................................... 5
2: Face landmarks, from fei database ................................................................................. 8
3: Triangular shapes to demonstrate shape alignments .................................................... 10
4: Shapes been shifted to the same location ..................................................................... 11
5: Shapes are scaled to the same size ............................................................................... 14
6: Shapes rotated to the same orientation ......................................................................... 16
7: PCA algorithm .............................................................................................................. 17
8: Norm from a point to the image boundary ................................................................... 20
9: Points on the norm ........................................................................................................ 20
10: Profile vectors of a point and 3 adjacent neighbors on each side ............................... 21
11: Results of applying the feature locator on a face image ............................................. 22
12: Extract facial features sequence ................................................................................. 23
13: Inclination of upper eye lid ........................................................................................ 24
14: Feature image file name format .................................................................................. 25
15: Extract input and desired output from feature images to train neural networks ........ 28
16: Face image input training data sample ....................................................................... 28
17: Reading a face image methods in the developed system ........................................... 31
18: Loading the models in the developed system ............................................................. 32
viii
19: Face look up ............................................................................................................... 32
20: Face detection to locate and measure the face ........................................................... 34
21: Profile vectors of a point and 3 adjacent neighbors on each side ............................... 35
22: Face recognition processes ......................................................................................... 39
23: Original ....................................................................................................................... 44
24: Darkened by adding 30 to all pixel intensities ........................................................... 44
25: Darkened by adding 60 to all pixel intensities ........................................................... 44
26: Darkened by adding 90 to all pixel intensities ........................................................... 44
27: Blurred ........................................................................................................................ 45
28: Sharpened by laplace filter ......................................................................................... 45
ix
1
Chapter 1
INTRODUCTION
Humans easily perform face recognition to identify each other. As computers and
computer science advance through time, researchers gradually developed an interest in
using computers for face recognition. Current technology lacks accurate performance
especially if no constraints are imposed (e.g. pose, expression, lighting, occlusions, age).
Humans on the other hand perform badly recognizing animal faces. Their good
performance recognizing human faces is likely due to extensive training on human faces.
This leads us to expect that computers, eventually, using the right technology and training
method, will learn how to recognize human faces.
Some industries showed early interest in face recognition, and many more developed
interest as the technology gradually advanced. Examples of these industries are law
enforcement, security, robotics, surveillance and auto identification.
2
1.1 Motivation
Existing systems impose constraints on the images, such as having standard frontal pose
with no expressions still, the performance of these systems are limited due the variation
of light, shades and contrasts. Most face recognition algorithms end up in a kind of
hyperspace search. Some of these algorithms implicitly create the dimensions in these
spaces while others explicitly build them out of correlated or statistical data. Surprisingly,
there does not appear to be a system that uses the facial features as the bases (dimensions)
of their hyper face space. It is expected that using the facial features as the bases of the
face space will? improve the performance and tolerate the variation of lighting and
contrast in face images, because the dimensions of the space are built out of the
information of the features rather than the intensity variation of the image pixels.
1.2 Limitations
This project is limited to gray level images ([0 – 255] level range). Moreover, images
should be in standard frontal pose with no expressions and no occlusions.
3
Chapter 2
BACKGROUND
Some work on face recognition started as early as 1960’s but the first fully automated
face recognition system was developed by Takeo Kenade in 1973 [1]. In the 1980’s, there
was a number of prominent approaches that led successive studies by many researchers.
Among these are the works by Sirovich and Kirby in 1986. Their methods were based on
the Principal Component Analysis (PCA) which led to the work of Mathew Turk and
Alex Pentland in 1992 [2], where they used eigenfaces for recognition. Other relevant
methods that were used in many studies are deformable templates, Linear Discriminant
Analysis (LDA), Independent Component Analysis (ICA), Statistical Shape and
Appearance Models, local filters such as Gabor jets and the design of the AdaBoost
cascade classifier Architecture for face detection [1].
2.1 Active Shape Model (ASM)
ASM is a statistical shape model “iteratively deform to fit to an example of the object in
a new image, developed by Tim Cootes and Chris Taylor in 1995.” [Wikipedia]
ASM has two models that are applied successively in multiple iterations to best fit the
object shape. The first is a model of points related to each other that represents a shape.
The second is a separate intensity profile of the neighbor pixels around every point in the
first model. The models are constructed from training images of the object to be modeled.
Points, that are numbered, will be placed on the landmarks and outline that represents the
shape of the object on all training images. The points are placed in the same
4
corresponding location of each image. For every marked point on every image, a profile
of the neighboring pixels around that point are gathered (this implementation uses the
points along the tangent line to the object shape). To apply the algorithm, to search for an
object on a new image, the points, at first, will be assigned initial positions where the two
models will be iteratively applied as follows:

For every point, search the neighboring pixel intensities to update the point
position to the location that best matches the model point profile. (applies to each
point separately)

Apply the model shape, the first model mentioned above, to the new point
positions and modify them if required to confirm to the whole shape. (Applies to
all the points together).
For more information on ASM refer to [Cootes] and [Dotterer].
2.2 Artificial Neural Network (ANN)
Artificial Neural Networks attempt to mimic biological neurons. They are built out of
structured mathematical accumulator nodes, lined up in columns of fully interconnected
parallel layers. Each node processes multiple weighted inputs to produce one output. The
layers are stacked between two outer layers. The layer on the left is the input layer and
the one on the right is the output layer. The middle layers are called hidden layers. Most
ANN have one or two hidden layers. ANN’s are usually used as a nonlinear mapping
5
between input values and output values. They can be used to model complex
relationships between inputs and outputs or to find patterns in data.
Input
Hidden
Output
Figure 1: Neural Network example showing a possible structure of a neural network having a fully
connected input, one hidden and an output layers.
Each node in a layer is fully connected to the nodes in the next layer by a weighted
connection. The output at a node is the summation of the previous layers nodes
multiplied by their corresponding weights.
For a neural network to learn a certain function or a pattern in data, it needs to be trained.
A number of training images and their corresponding desired outputs are presented to the
neural network to be trained. In machine learning, this type of learning is called
supervised learning. The number of training images is proportional to the number of
classes and the input size. It ranges, usually, from a few hundred to a hundred thousand.
The training is done by adjusting the weights of the node connections. This is
accomplished in multiple iterations by feeding the input to the network and evaluating the
6
gradient decent of the error between the network output and the desired output, in a
process called back propagation.
7
Chapter 3
ACTIVE SHAPE MODEL (ASM)
ASM used to search for a certain object in images. The object is represented by a set of
points placed on the landmarks and edges of the object. The set of point’s portrait the
object, they are referred to as “shape”. ASM use two models iteratively to perform the
search. One model, called Shape Model, governs the total shape of the points and their
relative position to each other; while the other model, called Profile Model, is a point
specific model that capture the intensity profile of neighbor pixels around every point in
the Shape Model. The models are constructed from training images of the object to be
modeled. For more information on ASM refer to [Cootes] and [Dotterer].
3.1 Shape Model
The Shape Model is composed of a group of points representing an object shape. The
objective of the model is to account for the normal and natural variations of shapes of an
object. Hence, the group of points should be located at landmarks or edges of the object
in such a way that they implicitly depict the object. To accommodate different shape
variations we use multiple shapes of the object to build the Model. In this case, our object
is the human face. Therefore, we use multiple images of different human faces to build
the model.
8
3.2 Shape
The group of points on one particular object (face) is called a shape. In general, a shape
can be a two or three dimensions. In this project, two-dimensional shapes are used.
Therefore, each point in the shape have x and y coordinates. The shape can be
represented as a vector by concatenating all points coordinates. For instance if we have
three points in a shape: A, B and C; the shape vector would be xA yA xB yB xC yC.
Other implementations of ASM concatenate all x values of all points followed by the y
values. This project uses the former.
The points are manually placed on the training face images. Each point has a number and
represents a particular landmark or information on a face; therefore, it should be placed
accurately on the same corresponding spot on each image.
Figure 2: face landmarks, from FEI database
9
3.3 Aligning Shapes
Shape points location may vary on different images. To capture the variation range of a
point location it may have on different faces, we align the shapes that been manually put
on all training images; then determine the range of variation for every point on the shape.
Two shapes are considered to be aligned if a specific distance between them is less than a
small threshold. This distance is called Frobenius Norm:
Frobenius Norm =√∑n1(shapeAi − shapeBi )2
To align all shapes we need to perform a number of operations on all shapes. These
operations are translation, scaling and rotation. One process that does all three operations
is called Procrustes Analysis.
3.4 Procrustes Analysis
Procrustes process in general is a process of fitting things together by stretching or
cutting them. The name Procrustes refers to a rogue smith and bandit from Greek
mythology. He had an iron bed. He invites travelers to sleep in this bed, and then made
them fit it by either stretching their limbs or cutting them off.
To simplify the explanation of Procrustes process, simple shapes will be used to
demonstrate the process step by step. Consider the two triangles shown below in figure 3:
10
Figure 3: Triangular Shapes to demonstrate shape alignments
3.5 Align Shapes
The objective of the whole process is to align these two shapes so they will end up as
closely as possible having the same orientation, size and location. Thus if exactly similar
shapes were aliened, they would end up on top of each other even if they have different
size, orientation and location. Let us start the process for the two triangles shown above.
All objects are centered on their center of mass. To bring an object center of mass to be at
the origin (0,0) we subtract the coordinate mean from each corresponding point
coordinate. This process called translation. The coordinate means can be computed by
summing all similar coordinates of the shape points then divide the sum by the number of
points. The example below demonstrate the steps:
Compute the triangles Xmean (µx ) and Ymean (µy ):
Green = [ (12,24),
(28,22),
(20,14)]
Red
(10,10),
(12,8) ]
= [ (8,6),
Green X Means: µx =
∑n
1 xi
n
=
12+28+20
3
= 20
11
Green Y Means: µy =
Red
X Means: µx =
Red
Y Means: µx =
∑n
1 yi
n
∑n
1 xi
n
∑n
1 yi
n
=
24+22+14
=
8+10+12
=
6+10+8
3
3
3
= 20
= 10
=8
Shift shape center of mass to the origin
x i = x i − µx
and
yi = yi − µy
Green X’s :
12-20 = -8 ,
28-20=8 ,
20-20 = 0
Green Y’s :
24-20 =
22-20=2 ,
14-20 =-6
Red
X’s :
8 -10
= -2 , 10-10=0 ,
12-10 = 2
Red
Y’s :
6 - 8
= -2 , 10- 8=2 ,
8 – 8 = 0
4 ,
The two translated triangles look as shown in figure 4 below:
Figure 4: shapes been shifted to the same location
12
3.5.1
3.6 Scaling shapes:
Bringing all shapes to the same size can be done in multiple ways, the easiest being to
scale all shapes to unit size. This can be achieved by dividing each point coordinate by
the corresponding length of that shape coordinate. This process is called normalization.
Its computation is done as follows:
||X|| = √∑𝑛1(𝑥𝑖 )2
𝑥𝑖
=
𝑥𝑖
||𝑋||
||Y|| = √∑𝑛1(𝑦𝑖 )2
𝑦𝑖
=
𝑦𝑖
||𝑌||
Translated green points =
(-8, 4)
Translated Red
(-2, -2)
points =
||Xgreen|| = √−82 + 82 + 02
||Xgreen|| = √128
||Xgreen|| = 11.314
||Ygreen|| = √42 + 22 + −62
||Ygreen|| = √56
||Ygreen|| = 7.5
||Xred|| = √−22 + 02 + 22
(8, 2)
(0, 2)
(0, -6)
(2, 0)
13
||Xred|| = √8
||Xred|| = 2.83
||Yred|| = √−22 + 22 + 02
||Yred|| = √8
||Yred|| = 2.83
−8
8
0
Scaled Xgreen =11.314 = -0.71 11.314 = 0.71
Scaled Ygreen =
4
7.5
2
= 0.53
7.5
−2
Scaled Xred
=2.83 = -0.71
Scaled Yred
=2.83 = -0.71
−2
11.314
= 0.27
0
2.83
2
2.83
−6
7.5
= 0
= 0.8
2
= 0
2.83
= 0.71
0
2.83
= 0
Scaled Translated green points =
(-0.71, 0.53)
Scaled Translated Red
(-0.71, -0.71)
(0.71, 0.27)
(0, 0.8)
points =
(0, 0.71)
(0.71, 0)
The two scaled translated triangles look as shown in figure 5 below:
= 0.71
14
12
12
10
10
88
66
44
22
22
44
66
88 10
10 12
12 14
14
Figure 5: shapes are scaled to the same size
3.7 Rotate Shapes
Next choose one shape and rotate all other shapes to be within the smallest Frobenius
Norm distance or within an acceptable threshold Frobenius Norm distance. To perform
the rotation we use the singular value decomposition (SVD) of the matrix multiplication
results of the transposed shape to be rotated multiplied by the shape the rotation will be
toward.
Rot shape = shape1* shapei T
U∑ V t = SVD (Rot shape )
Apply to triangle example:
Rot shape=(-0.71 0.53 0.71 0.27 0
first Rot shape =
0.5 -0.95
1
-0.19
0.8) * (−0.71 − 0.71 0 0.71 0.71 0)𝑇
15
15th Rot shape =
1.1 -0.46
-0.19
0.85
The rotation matrix rotates shapei around its center of mass (the origin) toward shape1is:
Rot matrix = V U t
The result of rotation is
Rot shapei = shapei * Rot matrix
Now we compute the Frobenius Norm distance to check if it is less than our
predetermined threshold; if yes then aligning is done otherwise iterate the process of
rotation using the last Rot shapei to rotate toward shape1 and continue this iteration till the
Frobenius Norm distance is less than threshold.
Frobenius = √∑n1(shape1 − Rot shapei )2
After 15 iterations using a threshold for Frobenius Norm = 0.5; its value was 0.314
Rotated Scaled Translated Red
As shown in figure 6 below:
points = (-0.94, 0.34) (0.64, 0.3) (0.3, -0.64)
16
12
12
10
10
88
66
44
22
22
44
66
88 10
10 12
12 14
14
Figure 6: Shapes rotated to the same orientation
3.8 Principal Components Analysis
So far, we got all the training images aligned and having almost the same orientation, size
and have their center of mass at the same location. The objective now is to capture all the
variation in their shapes. The variation can be seen and harvested out of the different
location values each point in a shape have relative to its corresponding points in the other
shapes. If we build a matrix of all shape vectors arranged on top of each other; each
column in the matrix would be the same point number coordinate for all shapes.
Aligned green points =(-0.71, 0.53)
(0.71, 0.27)
(0, 0.8)
Aligned Red points =
(0.64, 0.3)
(0.3, -0.64)
Shapes Matrix
(-0.94, 0.34)
= (-0.71 0.53
(-0.94 0.34
0.71 0.27
0
0.64 0.3
0.3
0.8
)
-0.64)
Notice that column 3 is the x coordinates of the second point in each shape. To satisfy our
objective, we need to find the range of variation in each column, and how much each
column affects other columns. Applying Principal Component Analysis (PCA) on this
17
Matrix will us these information. To apply PCA on the matrix above, we do the following
as shown in figure 7:
1. Take the mean of every column
2. Subtract each value in the matrix from its corresponding mean.
Sub Matrix = Matrix - Mean.
3. Compute the covariance matrix.
Cov Matrix = Sub Matrix * Sub Matrix transpose
4. Find and sort the eigenvalues of the covariance matrix.
5. Choose the eigenvectors for the corresponding significant eigenvalues.
6. Build the data in terms of the selected eigenvectors
Data in New dimension = eigenvectors transposed * Sub Matrix
7. Now we can generate new shapes using the selected eigenvector
new Shape = Mean + eigenvectors * b
b is a vector of coefficients, each corresponding to an eigenvector. They scale the
contribution of their corresponding eigenvector in the new shape.
b = eigenvectors transposed (new Shape – Mean)
Figure 7: PCA algorithm
To make sure the new shape is similar to the training shapes and deforms within
acceptable limits; or in other words, to make sure the points on the new shape vary within
the range of their corresponding point’s variation in all training images; we enforce a
18
limit on the values of the b vector. Cootes et. al. (2004; 1995) suggests that the
deformation values bi should be bounded by the interval -3√λ𝑖 and + 3√λ𝑖 . λ𝑖 Is the
eigenvalue that corresponds to the selected eigenvector i.
PCA accomplished the job we required. Subtracting the corresponding mean out of the
columns reveals the variations in that corresponding point. The covariance matrix
computes the effect of the variation of one point on the others. Choosing the significant
eigenvectors will enable us to create new bases (dimensions) for shapes space that has
less dimensions and reduces the noise, which originally existed in the data in terms of the
insignificant eigenvectors. The new bases constitute the Shape Model of the training
images.
The exact criteria of choosing the optimum number of eigenvectors are application
dependent. After sorting the eigenvalues, if there is a clear cut between two values where
the difference is relatively big, then that could be an obvious choice, otherwise empirical
testing would determine the right choice of eigenvectors numbers to choose.
3.9 Profile Model
To decide the location of the points on a new face, we use a face detector to locate the
face on an image. The face detector used is the Haar Classifier Cascade available in
openCV. It returns the smallest square box which fits in the face. Using the width of the
box returned, we calculate the ratio of the new face width to the shape model width. We
19
use the ratio to scale the points mean of the model to fit the new face measurement. The
new points obtained are the initial locations of the points on the new face.
Due to the difference of face shapes, the spatial ratios and locations of the landmarks to
each other vary from one face to another. Therefore, the calculated initial points are not
the right location of the face landmarks we seek; but are within the vicinity of those
landmarks. We need certain information for each point to help locate the points on any
face. For that reason, we gather this information from the training data. For each point,
we build a profile of intensity variation for the neighbors around a point. This can be
implemented in multiple ways; in this project, a simple implementation was used that
builds a normalized intensity gradient along the orthogonal line from the point to the
shape boundary. The gradient in this sense expresses the difference of intensity from one
pixel to the other, where the value at n equals the intensity at n minus the intensity at (n1). After obtaining all the vector values, we normalize the vector by dividing each
element by the sum of all vector element absolute values. The use of the normalized
intensity gradient equalizes the variation in images lighting and contrast.
Around every point along the orthogonal line to the image boundary, we choose some
number of points. In this implementation, we choose three on each side, plus the point
itself, which makes the number of elements in the vector equal seven, as shown in the
figure below.
To find the orthogonal line to the image boundary (or the line the landmark represents) at
a certain point (C), we use two other points (A and B) around point C. We find the
perpendicular line from point C to the line connecting A and B as described in figure 8.
20
B
dd
A
C
Figure 8: Norm from a point to the image boundary
So far, for each point in all training images, we built a vector of seven elements as shown
in figure 9. Next, we compute the mean µ for each element in a vector across all images.
Like always, when we have multiple means related to one object (point), we compute the
covariance matrix 𝑐𝑚 to show the inter dependencies of these elements on each other.
The size of the covariance matrix for each point is 7 X 7. Since we have 46 points in a
shape, we would have 46 vectors and 46 X 7 elements thus 46 X 7 means and 46
covariance matrixes.
Figure 9: points on the norm
21
3.10
New Image Profile
When searching a new image to find the right location for all landmarks; we start with the
initial locations, discussed above. For every point, we build multiple profiles, the same
way we did for training images, along the point and orthogonal to the image boundary.
The difference here is, we build multiple profiles one along the point itself and others
along neighbors on each side of the point. This implementation builds three on each side.
Along every profile, we use the previous pixel, before the profile first pixel, to calculate
the gradient intensity at the first element in the vector as shown in figure 10.
Previous pixels before the profile
First profile
vecotr P1
Figure 10: profile vectors of a point and 3 adjacent neighbors on each side
We compare these profiles Pi to the mean profile (model profile) we built for the
corresponding point at the training time. We calculate the Mahalanobis distance between
these profiles and the model profile where:
Mahalanobis distance = (𝑃𝑖 − µ )𝑇 𝑐𝑚 −1 (𝑃𝑖 − µ)
22
We choose the profile with the smallest distance and set the middle point in that profile to
be the new suggested point for the landmark considered.
We calculate all new suggested locations for all landmarks, and then we apply the Model
Shape to fix the whole shape to be a valid shape consistent with the model; which may
modify some of the suggested landmark locations. We continue applying these two
models in iterations until the max iteration count is reached or the number of points
shifted, with value greater than shift threshold, are less than points shifted threshold.
Figure 11: results of applying the feature locator on a face image
23
Chapter 4
EXTRACT FEATURE IMAGES
After building the ASM models, we use them to extract facial features from the training
face images. We read training images one at a time and put features into their dedicated
folders. For each feature we use the coordinates of specific landmarks, those that are on
or near the boundary of the feature, to find the minimum and maximum x and y values of
the subarea of the face image the feature image to be extracted; figure 12 depict this
process.
Read training images
one at a time.
For each image, extract
its facial features.
Save each feature image
in its corresponding folder.
Figure 12: extract facial features sequence
Feature images need to be juxtaposited and classified into numbers that order the features
according to their similarities. Using one number to represent a feature will not work,
because all features have multiple attributes. Different faces may have, for one feature, a
variety of resemblance among attributes, where some are similar in some attributes but
are different in others. To accurately represent a feature, we should represent almost all
its attributes by numbers that characterize them and express their information in some
mathematical representation. Once we calculate all the numbers symbolizing the
24
attributes of a feature, we concatenate all the numbers together. In other words, we need
to represent each feature by a vector rather than a number.
The attributes are features information like height, width, inclination, protrudes etc. A
digit or a group of concatenated digits shall represent each attribute. All digits range from
1 (low) to 5 (high). The inclination measures the average skew angle of a certain feature
shape and will be represented as follows:
1 represents 60 degrees, 2 represent 30
degrees, 3 represent zero degree, 4 represent -30 degrees and 5 represent -60 degrees.
Inclinations
1: /
2:
3:
4:
5: \
Width, thickness
1:
2:
3:
4:
5:
2
2
3
4
4
Figure 13: inclination of upper eye lid
Juxtaposition and classifying the features is a manual process. A person shall go to a
folder with a feature from all training images, open each image separately, then decide on
the right numbers that represent the attributes been chosen for that feature. Finally,
append this attribute vector to the end of the image file name. An example is shown in
figure 14:
25
$178a-22344_53332_543423.jpg
The original image name is in red color while the appended class string in green color
Figure 14: feature image file name format
4.1 Feature Attributes
All feature attributes considered and their digit description are listed below. Multiple
dashes for an attribute means breaking that attribute in adjacent segments.
Eyes 16 digits representing it as follows:
Upper lid Inclination
Lower lid Inclination
Width
Height
Eye inclination
Protrudes
Eye bags
Smoothness
---------
Eyebrows 13 digits representing it as follows:
Inclination
Width
Dense
Length
Smoothness
---------
Left half Lips 32 digits representing it as follows:
Upper lip Inclination
Upper lip width
Lower lip Inclination
Lower lip width
Lips meet inclination
Length
Smoothness
--------------------------
26
Nose 12 digits representing it as follows:
Width
Length
Nose tip size
Nose tip rise
Septum width
Septum height
Septum protrudes
Nostril size
Nostril base
Smoothness
---
Forehead 8 digits representing it as follows:
Protrudes
Middle Protrudes
Width
Length
-----
Left Chin 9 digits representing it as follows:
Chin edge
Lip to chin protrudes
--------
Cheeks 5 digits representing it as follows:
Width
Protrudes
Smoothness
---
Left face edge 10 digits representing it as follows:
Upper side inclination
Lower side inclination
---------
27
4.2 Build Neural Networks Training Data
By now, all feature images have the vector classes representing their features appended to
the end of file names. The next step is to read each feature folder and build the neural
network training data.
4.2.1
Scale images
The images, belonging to one feature, do not have the same image size. By evaluating the
images heights and widths, we choose one height and width to scale all images for a
feature to that size. In general, we step down the dimension of images 2 to 5 times in
order to reduce the training and processing time of the neural networks.
4.2.2
Extract Input and Output training Data
After scaling the images down, we scale the intensity values to the range 0-1 by dividing
every value by 255. Next, we extract the scaled pixels intensities into an array of a type
double (double[][]). Together with the input array, we extract from the image file name
the feature attributes class vector, to be the desired output to train the neural networks. A
summary of the process is shown in figure 15, and a sample set of training inputs is
shown in figure 16.
28
Extract the output classes
from their image name.
pass each class with the
input to a neural net
Extract data from all
scaled images, for
one feature, in a
double[][]
Read each feature folder
and scale all images to
the same size.
Figure 15: extract input and desired output from feature images to train neural networks
0.9615384615384616
0.2076923076923077
0.19615384615384615 0.24615384615384617 0.23846153846153847
0.08846153846153847 0.11923076923076924 0.35 0.4115384615384615 0.4307692307692308
0.4653846153846154 0.4653846153846154 0.46153846153846156
0.21153846153846154 0.18461538461538463
0.10384615384615385
0.43846153846153846
0.16538461538461538
0.11538461538461539
0.45
0.17307692307692307
0.3038461538461538
0.4
0.43846153846153846 0.4576923076923077 0.47307692307692306 0.4846153846153846 0.48846153846153845 0.48846153846153845
0.9615384615384616 0.16538461538461538 0.15 0.23076923076923078 0.18846153846153846 0.12307692307692308 0.1076923076923077
0.34615384615384615 0.41923076923076924 0.4461538461538462 0.45384615384615384 0.46923076923076923 0.45384615384615384
0.46153846153846156
0.4576923076923077
0.45
0.16923076923076924
0.15
0.18846153846153846
0.16538461538461538
0.07307692307692308 0.17307692307692307 0.3230769230769231 0.3269230769230769 0.3 0.27692307692307694 0.22692307692307692 0.2
0.1576923076923077 0.1576923076923077 0.18846153846153846 0.16538461538461538 0.16538461538461538 0.16923076923076924
0.13846153846153847 0.05384615384615385 0.21923076923076923 0.36538461538461536 0.33076923076923076 0.24615384615384617
0.12307692307692308
0.15 0.23076923076923078 0.18846153846153846 0.12307692307692308 0.1076923076923077 0.34615384615384615 0.41923076923076924
0.4461538461538462 0.45384615384615384 0.46923076923076923 0.45384615384615384 0.46153846153846156 0.4576923076923077 0.45
0.16923076923076924 0.15 0.18846153846153846 0.16538461538461538 0.07307692307692308 0.17307692307692307 0.3230769230769231
0.3269230769230769 0.3 0.27692307692307694 0.22692307692307692 0.2 0.1576923076923077 0.1576923076923077 0.18846153846153846
0.16538461538461538 0.16538461538461538 0.16923076923076924 0.13846153846153847 0.05384615384615385 0.21923076923076923
0.36538461538461536 0.33076923076923076 0.24615384615384617 0.12307692307692308
Figure 16: face image input training data sample
29
4.3 Neural Networks
To implement the neural network, the Encog neural network framework was used, which
is a library of classes provided for Java and other languages [Heaton]. Encog provides
many training techniques like back propagation and others. For this project the Resilient
Propagation Training was used instead of the backpropagation because, as described by
[Heaton], it is more efficient and requires fewer parameters that need calibration. Both
training methods use gradient of the error between the network output and the desired
output. The way they use the gradient is different: in backpropagation, the value of the
gradient is used in calculate the new weight values, while in resilient propagation, the
sign (direction) of the gradient is used rather than the value of the error, to change the
new weights by magnitudes that change as the training progresses.
4.4 Neural Networks Topology
The number of inputs and outputs of the neural net, for every feature class, is absolutely
determined by the size of the scale feature images and the number of attribute class digits.
On the other hand, the number of hidden layers and their size was determined by trial and
error, as is usual when using neural networks. Three hidden layers were used for all
neural network classifiers, having their size related to their input size where the first layer
is one thirtieth of the input size, the second layer is one fortieth of the input size and the
third is one fifty of the input size.
The code for implementing the neural networks is shown below:
30
network = new BasicNetwork();
network.addLayer(new BasicLayer(input[0].length));
network.addLayer(new BasicLayer(input[0].length/30));
network.addLayer(new BasicLayer(input[0].length/40));
network.addLayer(new BasicLayer(input[0].length/50));
network.addLayer(new BasicLayer(desiredOutput[0].length));
network.getStructure().finalizeStructure();
network.reset();
// create training data
NeuralDataSet trainingSet =
New BasicNeuralDataSet(this.input, this.desiredOutput);
final Train train =
new ResilientPropagation(network, trainingSet);
int epoch = 1;
do {
train.iteration();
epoch++;
} while(train.getError() > 0.005 && epoch < 7000);
4.5 Classifiers Collection
To compute the numbers that represent each attribute for a feature, we need to build a
classifier for each attribute, where we pass the feature image to that classifier which in
turn produces a number that characterizes the corresponding feature attribute. We loop
through all the folders of feature training images and use the image file names to
determine the number of classes each feature has. We build a neural net classifier for all
attributes and store them in a map collection of maps:
Map<Integer, Map< Integer, EncogNeuralNat >>. The outer map is the feature number
mapped to its attributes, while the inner map is the attribute mapped to its neural net
classifier.
31
4.6 Recognizing a face
The developed system can handle two ways of image entries for image faces to be
recognized. The first is by opening an image file stored on the file system, and the second
is by capturing the image using the computer camera; figure 17 shows the UI marking the
buttons to do that.
Figure 17: reading a face image methods in the developed system
The first time the system is run, the ASM models and the neural network classifier
objects have to be loaded by pressing the load model from file button and load classifier
button. The loading of these models need to be done only once every time the system
starts. ; figure 18 shows the UI marking the buttons to do that.
32
Figure 18: loading the models in the developed system
Once the image is entered by either way, to recognize the image, the lookup face button
is pressed which will start the recognition process. See figure 19 below.
Figure 19: face look up
33
4.7 Face Detection
The first step in the recognition process is to detect the face in the image at hand, using
the face detection functions provided by OpenCV. “OpenCV stands for Open Source
Computer Vision. It was originally started by Intel back in the late 90s and is currently
released under the open source BSD license. While it is mainly written in C, it has been
ported to Python, Java, and other languages. In Java, it is available through JavaCV,
which is a wrapper that calls the native functions” [TK Gospodinov]. OpenCV uses Paul
Viola Haar-like features face detector. The face detector returns the coordinates of a box
that fits the face. Hence, the dimensions of the box are the same as the face detected.
4.8 Initialize Face Landmarks
Using the dimensions attained by the face detector, we scale the ASM Shape Model mean
points to the size of the detected face and use the resultant points as the initial points on
the detected face. The points will not be on the right landmarks where they should be, but
they would have the correct face shape size. Therefore, we iteratively apply the ASM
models to reach to the correct point locations. Figure 20 shows the UI after pressing the
button “Detect Face” for a particular face image.
34
Figure 20: face detection to locate and measure the face
4.9 Locate Facial Features
For each initial point on the image face to be recognized, we build multiple profiles, the
same way we did for training images, along the line through the landmark and orthogonal
to the image boundary. The difference here is, we build multiple profiles one along the
point itself and others along the neighbors on each side of the point as shown in figure 21.
In this implementation, three were built on each side. Along every profile, we use the
previous pixel, before the profile first pixel, to calculate the gradient intensity at the first
element in the vector.
35
Previous pixels before the profile
First profile
vecotr P1
Create 7 profile vectors, one along the point itself and 3 on both sides
Figure 21: profile vectors of a point and 3 adjacent neighbors on each side
The distance between a search profile P and the model mean profile µ is calculated using
the Mahalanobis distance = (𝑃𝑖 − µ )𝑇 𝑐𝑚 −1 (𝑃𝑖 − µ). 𝑐𝑚 is the covariance matrix of the
corresponding landmark in the training images mean variation profiles matrix. The
profile that produces the smallest Mahalanobis distance will be chosen and consequently
the mid-point along that profile will be the new suggested landmark point for the current
iteration. Once all the new suggested points for all landmarks are calculated, we apply the
shape model to check if the new shape is a valid shape according to the training images
shape model. This is accomplished by checking the values (coefficients) of the vector
representing the new suggested shape computed out using the Shape Model eigenvectors
and means as follows:
b = eigenvectors transposed (new Shape – Mean)
Where b is a vector of coefficients, each corresponding to an eigenvector. They scale the
contribution of their corresponding eigenvector in the new shape. Cootes et. al. (2004;
36
1995) suggests that the deformation values bi should be bounded by the interval -3√λ𝑖
and + 3√λ𝑖 . λ𝑖 Is the eigenvalue corresponding to the selected eigenvector i. To force the
new suggested shape to be consistent with the shape model, we hard limit every value in
the vector to the range -3√λ𝑖 and + 3√λ𝑖 . If a value in a column in vector b is greater
than three times the square root of the corresponding eigenvalue for that column, we set
that value of that column to + 3√λ𝑖 . Similarly, if a value of a column in vector b is less
than the negative of the square root of the corresponding eigenvalue for that column, we
set that value of that column to - 3√λ𝑖 . Since vector b got modified by trimming its
column that exceeded the ranged specified, we recomputed the new suggested shape
using the new b vector as follows:
new Shape = Mean + eigenvectors * b
This may cause some landmarks to have newer suggested positions; hence we reapply the
points profile model to search around the newly suggested points if there is a neighbor
profile that better matches the model profile for that point. We continue applying these
two models iteratively until the max iteration count is reached or the number of points
shifted (with value greater than shift threshold) are less than points shifted threshold.
37
Chapter 5
EXTRACT FEATURE IMAGES
Once the loop of applying the two ASM models finishes, we get to the closest position
possible to the landmarks on the face we are looking up. Next we extract the feature
images into images of their own. The objective is to find, for every feature, two points on
a face that marks the upper left corner and lower right corner of the smallest possible
rectangle that fits the feature. For every feature, we determine a group of landmarks that
pound the feature location on a face; we always use these same landmarks for the
corresponding feature. We calculate four values out of the coordinates of these points:
the minimum x, minimum y, maximum x and maximum y coordinates. The minimum x
and y values will be used as the upper left corner and the maximum x and y will be used
as the lower right corner of the feature boundary rectangle.
5.1 Build Face Vector
After extracting feature images, we loop over them and loop over the classifier collection
passing every feature image to the all the attribute classifiers belong to it. We collect the
output numbers the classifier generates and concatenate them into a vector that represents
the face. We use the generated vector to search a database of face images.
5.2 Face Database
A database was built that has among other tables a face table. Each record in the face
table is for a person face. The record contains information about the person and includes
38
the face image and the face vector, which is built using the output of the feature
classifiers. Initially records in the table were from the training face images. Thereafter,
whenever a miss on a face lookup happen, the application will prompt the user if he/she
wishes to add the person to the database. If the user responds yes to add the person,
he/she will be directed to a screen where additional information on the person to be added
is gathered.
5.3 Face Vector Search
The vector we built is a point in the hyper facial features space. Each column in the
vector is a coefficient of a dimension in that space. The vector can be viewed as a line
from the origin of that space to the point. Two vectors match, or represent the same face,
if they are close to each other (have a small distance between them) and not necessarily
have exactly the same column values. Finding the distance between two vectors (two
points in a multi dimension space) is computed by finding the Euclidean distance, which
equals the length of the line connecting the two points and computed using the
Pythagorean Theorem.
Euclidean Distance = √∑𝑛𝑖=1(𝐴𝑖 − 𝐵𝑖 )2
Finding the distance between two points in one dimension is simply finding the
difference between the points values for example the distance between points A = 3 and
point B = 8 is 8 – 3 = 5. Finding the Euclidean Distance for the same points = √(8 − 3)2
= 8-3 = 5.
39
Finding the distance between two points in two dimensions, between points A(2,3) and
B(6,9) is
√(6 − 2)2 + (9 − 3)2 = √52.
To search a face in the database: we produce the vector representation of that face then
compute the Euclidean distance with all other face vectors in the database. We choose the
face having the closest distance, within a threshold, with the face we are looking up.
Figure 22 shows a diagram of the process flow of face recognition.
Read face image
Find neighbor intensity
profile around each point
Extract Feature images
Detect face
Initialize face landmark
points
Choose a neighbor point if
its profile better matches
the corresponding model
point profile
Apply the model to adjust
the whole points shape
Pass each feature image to
Neural Net Classifier
Use Classifiers output to build
image vector then lookup
database for nearest
Euclidian distance
Figure 22: Face recognition processes
40
Chapter 6
RESULTS
We have two objectives we are trying to accomplish in this project:
1. Use the facial features as bases for the hyper face space, where a vector in that
space can uniquely represent a face.
2. The face vector is, to a degree, impervious to the lighting and contrast variation.
We will start be evaluating the neural network training results which affects the overall
result of the project.
6.1 Neural Network Training Results
The training iteration continues until the error is reduced to 0.005 or the maximum
training iteration of 7000 is reached. Classifiers that did not reach the error threshold,
most likely, did not learn the class they have been trained to learn. Although, even when
most classifiers training iteration terminated by achieving the error threshold, they do not
necessarily generalize or perform well on other non-seen images. The final error and the
number of iteration resulted from training all the classifiers are listed below:
41
featureNumber 0 C:\left_eyes_Upper_Lid
Epoch #391 Error:0.004994191524030874
Epoch #104 Error:0.004984493529422337
Epoch #265 Error:0.004985108468826185
Epoch #484 Error:0.0049929719580387765
Epoch #460 Error:0.004984689654502988
featureNumber 1 C:\left_eyes_Lower_Lid
Epoch #827 Error:0.004986937733546353
Epoch #971 Error:0.00499552864373195
Epoch #5210 Error:0.0049999223714012235
Epoch #1504 Error:0.004999113098132049
Epoch #849 Error:0.004998021528461399
Epoch #7000 Error:0.0191719140465112
Epoch #798 Error:0.004992879184250633
Epoch #530 Error:0.004989525345879893
Epoch #1262 Error:0.004995976382873747
Epoch #7000 Error:0.00677015938958029
Epoch #614 Error:0.0049977379423736265
featureNumber 2 C:\left_eyebrows
Epoch #360 Error:0.004980656237549085
Epoch #7000 Error:0.007036987719454902
Epoch #376 Error:0.004981622225126858
Epoch #275 Error:0.004965901982859265
Epoch #287 Error:0.004995343525756889
Epoch #595 Error:0.004990825343522399
Epoch #7000 Error:0.17772509850215729
Epoch #611 Error:0.004984436210732808
Epoch #819 Error:0.004995913313891747
Epoch #686 Error:0.004993121414198667
Epoch #860 Error:0.004990753706857306
Epoch #384 Error:0.00499366749923531
Epoch #601 Error:0.004987558523469151
featureNumber 3 C:\upper_lips
Epoch #5991 Error:0.004998927140866537
Epoch #2795 Error:0.004999661918578198
Epoch #881 Error:0.00499867228063733
Epoch #7000 Error:0.005961182177266967
Epoch #7000 Error:0.019647908671887468
Epoch #1247 Error:0.004997730465916958
Epoch #7000 Error:0.019667583174116997
Epoch #4596 Error:0.004999268018850392
Epoch #7000 Error:0.12192198322262622
Epoch #7000 Error:0.2747058146029274
Epoch #7000 Error:0.15185164589227265
Epoch #7000 Error:0.4668459633858304
featureNumber 4 C:\lower_lips
Epoch #7000 Error:0.012662371017759608
Epoch #548 Error:0.004988885991709977
Epoch #526 Error:0.0049933770846231395
Epoch #941 Error:0.004991239325289801
Epoch #1603 Error:0.004998223257053886
Epoch #11 Error:0.0018117435133911375
Epoch #7000 Error:0.13907564763869243
Epoch #7000 Error:0.020912792428856374
Epoch #7000 Error:0.4245340767894438
Epoch #7000 Error:0.007688870684335883
Epoch #7000 Error:0.012568393085617127
Epoch #7000 Error:0.011440157043761789
Epoch #1162 Error:0.004998001437164182
Epoch #4252 Error:0.004999434530689529
Epoch #6012 Error:0.004999532729013137
Epoch #7000 Error:0.04069135814238408
Epoch #770 Error:0.004998132157863931
Epoch #130 Error:0.0049819863543390085
Epoch #7000 Error:0.06943209890660564
Epoch #7000 Error:0.03361610332221863
featureNumber 5 C:\noses
Epoch #7000 Error:0.07157927286693923
Epoch #7000 Error:0.06733283516745575
Epoch #7000 Error:0.05136952769052274
Epoch #7000 Error:0.09643124024979384
featureNumber 6 C:\lower_noses
Epoch #7000 Error:0.45334147618528975
Epoch #7000 Error:0.33231235914052365
Epoch #7000 Error:0.07814296249175354
Epoch #7000 Error:0.13780606729853667
Epoch #7000 Error:0.6131983652534196
Epoch #7000 Error:0.20341779451847417
Epoch #7000 Error:0.3372756020290337
Epoch #7000 Error:0.09376213622449585
featureNumber 7 C:\foreheads
Epoch #7000 Error:0.007711066513603634
Epoch #2777 Error:0.004999122403146206
Epoch #7000 Error:0.00675278630918557
Epoch #7000 Error:0.2073878013587105
Epoch #7000 Error:0.11383902379764574
Epoch #7000 Error:0.008530828794424915
Epoch #1205 Error:0.004995181644837944
Epoch #5252 Error:0.004999958294297756
featureNumber 8 C:\chins
Epoch #1487 Error:0.004999785912154509
Epoch #709 Error:0.0049748791968673165
Epoch #341 Error:0.004988587575850116
Epoch #7000 Error:0.019838039501158037
Epoch #353 Error:0.00498672604896285
Epoch #8 Error:0.004309270470980495
Epoch #760 Error:0.004995590228877071
Epoch #2357 Error:0.0049988521785205
Epoch #534 Error:0.004982904642091496
featureNumber 9 C:\cheeks
Epoch #7000 Error:0.03003274451185928
Epoch #7000 Error:0.01547748415994407
Epoch #7000 Error:0.2984376373571743
Epoch #7000 Error:0.17093035708830454
Epoch #4028 Error:0.0049998166749983565
featureNumber 10 C:\upper_left_faces
Epoch #611 Error:0.004988326771541779
Epoch #445 Error:0.004991463334444358
Epoch #480 Error:0.0049893915371665735
Epoch #7000 Error:0.005209573790832722
Epoch #311 Error:0.004986990788078784
featureNumber 11 C:\lower_left_faces
Epoch #7000 Error:0.005510204084803682
Epoch #124 Error:0.004998436710232547
Epoch #476 Error:0.0049723089464967975
Epoch #7 Error:0.002414108153429113
Epoch #68 Error:0.004937368026882365
Traning is done
6.2 Generalization
Not all classifiers were successfully trained, by achieving the required threshold for the
output error, as is clear from the error listing above. Almost, more than 95 percent did
42
train successfully but later did not generalize well. The training images for certain feature
information should be constrained to include that feature information as isolated as
possible. This means we have to precisely determine the coordinates on the face image
where the feature sub image will be extracted. This is required to reduce the clatter of
information in that sub image to simplify the neural network training. The feature locator
tool (ASM) we implemented calculates the coordinates of the features. The tool accuracy
is not great due to its simplistic implementation; hence, the resulting sub images contain
more or less information than they should.
In addition to that, human error in the manual process of assigning the desired classes to
the training images, contributes to some of the training problems.
To evaluate the performance on multiple images, I used the training images with an
added shift to the feature sub images coordinate to have a different input image. I passed
the feature images to the neural network classifiers then I compared their output to the
desired output on the images file name and aggregated the matching ones to be displayed
as
Feature Num 0
cls Num 0 val 62
cls Num 1 val 117
cls Num 2 val 92
cls Num 3 val 98
cls Num 4 val 84
Feature Num 1
cls Num 0 val 4
cls Num 1 val 5
cls Num 2 val 50
cls Num 3 val 3
cls Num 4 val 1
cls Num 5 val 11
cls Num 6 val 16
cls Num 7 val 11
cls Num 8 val 35
cls Num 9 val 32
cls Num 10 val 24
follows:
Feature Num 2
cls Num 0 val 34
cls Num 1 val 65
cls Num 2 val 45
cls Num 3 val 59
cls Num 4 val 38
cls Num 5 val 35
cls Num 6 val 33
cls Num 7 val 4
cls Num 8 val 46
cls Num 9 val 9
cls Num 10 val 25
cls Num 11 val 43
cls Num 12 val 40
43
Feature Num 3
cls Num 0 val 103
cls Num 1 val 63
cls Num 2 val 85
cls Num 3 val 144
cls Num 4 val 35
cls Num 5 val 35
cls Num 6 val 4
cls Num 7 val 44
cls Num 8 val 36
cls Num 9 val 24
cls Num 10 val 74
cls Num 11 val 10
Feature Num 4
cls Num 0 val 62
cls Num 1 val 2
cls Num 2 val 8
cls Num 3 val 82
cls Num 4 val 28
cls Num 5 val 34
cls Num 6 val 42
cls Num 7 val 3
cls Num 8 val 84
cls Num 9 val 40
cls Num 10 val 41
cls Num 11 val 8
cls Num 12 val 12
cls Num 13 val 3
cls Num 14 val 26
cls Num 15 val 163
cls Num 16 val 140
cls Num 17 val 33
cls Num 18 val 8
Feature Num 5
cls Num 0 val 103
cls Num 1 val 105
cls Num 2 val 106
cls Num 3 val 32
Feature Num 7
cls Num 0 val 25
cls Num 1 val 53
cls Num 2 val 69
cls Num 3 val 67
cls Num 4 val 29
cls Num 5 val 38
cls Num 6 val 97
cls Num 7 val 81
Feature Num 8
cls Num 0 val 103
cls Num 1 val 136
cls Num 2 val 89
cls Num 3 val 30
cls Num 4 val 136
cls Num 5 val 127
cls Num 6 val 74
cls Num 7 val 82
Feature Num 9
cls Num 0 val 37
cls Num 1 val 58
cls Num 2 val 79
cls Num 3 val 41
cls Num 4 val 50
Feature Num 10
cls Num 0 val 91
cls Num 1 val 124
cls Num 2 val 110
cls Num 3 val 134
cls Num 4 val 152
Feature Num 11
cls Num 0 val 4
cls Num 1 val 3
cls Num 2 val 4
Feature Num 6
cls Num 0 val 66
cls Num 1 val 93
cls Num 2 val 101
cls Num 3 val 9
cls Num 4 val 3
cls Num 5 val 95
cls Num 6 val 58
6.3 Lighting and contrast tolerance
To evaluate the tolerance to lighting and contrast, we generate a vector representation for
multiple images of one face having different lighting and contrast values. To simulate the
different lighting we add a constant value to all pixel intensity of an image to darken it.
We did this for three images where we added 30 to one image, 60 to the second and 90 to
the third. To simulate contrast variation, we blurred one image using Gaussian-smoothing
filter and sharpened another using Laplace filter as shown in the figures (23-28) below.
44
Figure 23: original
Figure 24: darkened by adding 30 to all pixel intensities
Figure 25: darkened by adding 60 to all pixel intensities
Figure 26: darkened by adding 90 to all pixel intensities
45
Figure 27: blurred
Figure 28: sharpened by Laplace filter
The vectors generated for each of the images are listed below. Each row represents one
vector. The order of how the vectors are listed is as follows: original, darkened30,
darkened60, darkened90, blurred and sharpened.
22344
22344
23344
13324
22344
34325
44322312322
43432351423
44423321224
52423422323
43422412433
42432411224
1244311323333
1354421354341
1254311454331
1354315554312
1244321314344
2324431112311
133514333453
222314153453
132524343453
243524113453
133514333453
222324153453
34443224331513543323
34443224324513543333
54433224354543523323
34433224355543423323
34433224331413543323
44532224315243533333
2233
2233
2233
2233
2233
2233
45322223
45322223
45322223
45322223
45322223
45322223
32344531
21344554
32444335
42544345
31344534
13344533
543432121
544352112
543522112
434552312
542432121
525222312
43333
42333
42333
32334
43333
24331
11111
11111
11111
11111
11111
12211
55534
55534
55434
55434
55534
54534
The tolerance to the lighting and contrast variation is shown by maintaining the same
class number of a certain feature information, as shown above along columns in the
vector representations.
46
Chapter 7
CONCLUSION
Some feature classifiers completed its neural network training successfully by reaching
an error less than 0.005 before the maximum number of iteration reached. Most of these
classifiers were able to achieve both objectives: of uniquely identifying the feature and
have the same or close representation for the same face under different conditions of
lighting and contrast.
Other feature classifiers failed to complete their neural network training successfully for
one or more reasons among the following:

Manual errors in assigning the right desired class to the training image.

feature images have too many information. This can be fixed by finding the
smallest sub-image containing the feature to be classified.

The classes assigned to feature information are not distinctly accurate grouping.

Some feature information cannot be learned successfully without additional image
processing techniques to embolden feature information.

The feature locator tool implemented in this project is a simple one, which,
sometimes, renders inaccurate feature locations.
47
Chapter 8

FUTURE WORK
Use variable weighted strength multiplier for each feature that depends on how
much information that feature contributes to identify the face and also on how
stable the feature shape with different face expressions.

Use Active Shape Model (ASM) with more points that clearly circumferences all
facial features.

Measure the changes of feature shapes with different expressions.

Use colored images

Allow occlusions (e.g. eyeglasses , hats).
48
REFERENCES
1. Anthony Dotterer. “Using Procrustes Analysis and Principal Component Analysis
to Detect Schizophrenic Brains”. 2006
http://www.personal.psu.edu/asd136/CSE586/Procrustes/report/Using%20PA%20
and%20PCA%20to%20Detect%20Schizos.pdf
2. FEI Face Database. http://fei.edu.br/~cet/facedatabase.html. Face images taken
between June 2005 and March 2006 at the Artificial Intelligence Laboratory of
FEI in São Bernardo do Campo, São Paulo, Brazil.
3. Jeff Heaton Introduction to Encog2.5 for Java. 2010.
http://www.heatonresearch.com/dload/ebook/IntroductionToEncogJava.pdf.
4. Julio C´esar Pastrana P´erez .Active shape models with focus on overlapping
problems applied to plant detection and soil pore analysis. http://www.bgthannover.de/homepagethesis/pastrana_phd_2012.pdf. 2012
5. Lindsay I Smith. A tutorial on Principal Components Analysis. 2002
http://www.google.com/url?sa=t&rct=j&q=a%20tutorial%20on%20principal%20
components%20analysis&source=web&cd=1&cad=rja&sqi=2&ved=0CDQQFjA
A&url=http%3A%2F%2Fwww.ce.yildiz.edu.tr%2Fpersonal%2Fsongul%2Ffile%
2F1097%2Fprincipal_components.pdf&ei=dORYUayNHuf9iwKi74GYAQ&usg
=AFQjCNFAAD718BgyS8tVYTRLpcLjXaRfsA&bvm=bv.44442042,d.cGE.
6. Stan, Li and Anil, Jain. “Handbook of Face Recognition”. Springer 2011, 2nd
edition. Print.
7. Tim Cootes. http://personalpages.manchester.ac.uk/staff/timothy.f.cootes