Pose and Illumination Insensitive Face Recognition Technology: a

advertisement
ECE 738 Course Project
Face Recognition: A Literature Survey
Lin,Wei-Yang
9022090964
May 16, 2003
1
Face Recognition: A Literature Survey
1
Introduction
Face Recognition Technology (FRT) is a research area spanning several disciplines such
as image processing, pattern recognition, computer vision and neural network. There are
many applications of FRT as shown in Table 1. These applications range from matching
of photographs to real time matching of surveillance video. Depending on the specific
application, FRT has different level of difficulty and requires wide range of techniques.
In 1995, a review paper by Chellappa et al. [1] gives a through survey of FRT at that
time. During the past few years, FRT is still under rapid evolution.
In this report, I will provide a brief review of most recent developments in FRT.
Table 1: Applications of face recognition technology [1].
2
The Challenges in FRT
Though many FRT have been proposed, robust face recognition is still difficult. The
recent FERET test [5] has revealed that there are at least two major challenges:

The illumination variation problem
2

The pose variation problem
Either one or both problems can cause serious performance degradation in most of
existing systems. Unfortunately, these problems happen in many real world applications,
such as surveillance video. In the following, I will discuss some existing solutions for
these problems.
The general face recognition problem can be formulated as follows: Given single image
or sequence of images, recognize the person in image using a database. Solving the
problem consists of following steps: 1) face detection, 2) face normalization, 3) inquire
database.
2.1
The illumination problem
Images of the same face appear differently due to the change in lighting. If the change
induced by illumination is larger than the difference between individuals, systems would
not be able to recognize the input image. To handle the illumination problem, researchers
have proposed various methods. It has been suggested that one can reduce variation by
discarding the most important eigenface. And it is verified in [18] that discarding the first
few eigenfaces seems to work reasonably well. However, it causes the system
performance degradation for input images taken under frontal illumination.
In [19], different image representations and distance measures are evaluated. One
important conclusion that this paper draws is that none of these method is sufficient by
itself to overcome the illumination variations. More recently, a new image comparison
method was proposed by Jacobs et al. [20]. However this measure is not strictly
3
illumination-invariant because the measure changes for a pair of images of the same
object when the illumination changes.
An illumination subspace for a person has been constructed in [21, 22] for a fixed view
point. Thus under fixed view point, recognition result could be illumination–invariant.
One drawback to use this method is that we need many images per person to construct the
basis images of illumination subspace.
In [23], the authors suggest using Principal Component Analysis (PCA) to solve
parametric shape-from-shading (SFS) problem. Their idea is quite simple. They
reconstruct 3D face surface from single image using computer vision techniques. Then
compute the frontal view image under frontal illumination. Very good results are
demonstrated. I will explain their approach in detail later. Actually, there are a lot of
issues in how to reconstruct 3D surface from single image.
I will discuss two important illumination-invariant FRT in the follow sections.
2.2
The pose problem
The system performance drops significantly when pose variations are present in input
images. Basically, the existing solution can be divided into three types: 1) multiple
images per person are required in both training stage and recognition stage, 2) multiple
images per person are used in training stage but only one database image per person is
available in recognition stage, 3) single image based methods. The second type is the
most popular one.
4
Multiple images approaches
an illumination-based image synthesis method [24]
has been proposed for handling both pose and illumination problems. This method is
based on illumination cone to deal with illumination variation. For variations due to
rotation, it needs to completely resolve the GBR (generalized-bas-relief) ambiguity when
reconstructing 3D surface.
Hybrid approaches
So many algorithms of this type have been proposed. It is
probably the most practical solution up to now. Three reprehensive methods are reviewed
in this report: 1) linear class based method [25], 2) graph matching based method [26], 3)
view-based eigenface method [27]. The image synthesis method in [25] is based on the
assumption of linear 3D object classes and extension of linearity to images. In [26], a
robust face recognition scheme based on EBGM is proposed. They demonstrate
substantial improvement in face recognition under rotation. Also, their method is fully
automatic, including face localization, landmark detection and graph matching scheme.
The drawback of this method is the requirement of accurate landmark localization which
is not easy when illumination variations are present. The popular eigenface approach [28]
has been modified to achieve pose-invariant [27]. This method constructs eigenfaces for
each pose. More recently, a general framework called bilinear model has been proposed
[41]. The methods in this category have some common drawbacks: 1) they need many
images per person to cover possible poses. 2) The illumination problem is separated from
the pose problem.
Single Image Based Approaches
Gabor wavelet based feature extraction is proposed
5
for face recognition [38] and is robust to small-angle rotation. There are many papers on
invariant features in computer vision literature. There are little literatures talking about
using this technology to face recognition. Recent work in [39] sheds some light in this
direction. For synthesizing face images under different lighting or expression, 3D facial
models have been explored in [40]. Due to its complexity and computation cost it is hard
to apply this technology to face recognition.
3
The State of Art
In the following sections, I will discuss some recent research works in face recognition.
3.1
Applying Shape-from-Shading (SFS) to Face Recognition
The basic idea of SFS is to infer the 3D surface of object from the shading information in
image. In order to infer such information, we need to assume a reflectance model under
which the given image is generated from 3D object. There are many illumination models
available. Among these models, the Lambertian model is the most popular one and has
been used extensively in computer vision community for the SFS problem [42]. The
nature of SFS makes it an ill-posed problem in general. In other words, the reconstructed
3D surface cannot synthesize the images under different lighting angle. Fortunately,
theoretical advances make SFS problem a well-posed problem under certain conditions.
The key equation in SFS problem is the following irradiance equation [43]:
I [ x, y ]  R( p[ x, y ], q[ x, y ])
where I [ x, y ] is the image, R is the reflectance map and p[ x, y ], q[ x, y ] are the shape
gradients (partial derivatives of the depth map).With the assumption of a Lambertian
6
surface and a single, distant light source, the equation can be written as follows:
I   cos 
or
I
1  pPs  qQs
1  p 2  q 2 1  Ps2  Qs2
Since SFS algorithm provides face shape information, illumination and pose problems
can be solved simultaneously. For example, we can solve the illumination problem by
rendering the prototype image Ip from a given input image I. This can be done in two
steps: 1) apply SFS algorithm to obtain (p,q), 2) the new generate the prototype image Ip
under lighting angle = 0.
To evaluate some existing SFS algorithms, Zhao [13] applies several SFS algorithms to
1) synthetic face images which are generated based on Lambertian model and constant
albedo, 2) real face images. The experiment results show that these algorithms are not
good enough for real face images such that a significant improvement in face recognition
can be achieved. The reason is that face is composed of materials with different reflecting
properties: cheek skin, lip skin, eye, etc. hence, Lambertian model and constant albedo
can not provide good approximation. Zhao et al. [3] develop a symmetric SFS algorithm
using the Lambertian and varying albedo (x,y) as a better alternative. With the aid of a
generic 3D head model, they can shorten the two-step procedure of obtaining prototype
image (1. input image to shape via SFS, 2. shape to prototype image) to one step: input
image to prototype image directly.
Their algorithm is applied to more than 150 face images from the Yale University and
Weizmann database. The results clearly indicate the superior quality of prototype images
7
rendered by their method. They also conduct three experiments to evaluate the influence
in recognition performance when their algorithm is combined with existing FRT. The
first experiment demonstrates the improvements in recognition performance by using the
new illumination-invariant measure they define. The results are shown in table 2 and
table 3. The second experiment shows that using the rendered prototype images instead of
original input images can significantly improve existing FRT such as PCA and LDA. The
results of second experiment are shown in table 4 where P denotes prototype image.
Finally, in the third experiment they demonstrate that the recognition rate of subspace
LDA can be improved.
Database
Image Measure
Gradient Measure
Illumination-Invariant Measure
Yale
68.3%
78.3%
83.3%
Weizmann
86.5%
97.9%
81.3%
Table 2: Recognition performance using three different measures (One image per
person).
Database
Image Measure
Gradient Measure
Illumination-Invariant Measure
Yale
78.3%
88.3%
90.0%
Weizmann
72.9%
96.9%
87.9%
Table 3: Recognition performance using three different measures (Two images per
person).
Database
PCA
LDA
P-PCA
P-LDA
Yale
71.7%
88.3%
90.0
95.0%
Weizmann
97.9%
100%
95.8%
98.9%
Table 4: Recognition performance with/without using prototype image. (P denotes
prototype image)
8
3.2
Applying Illumination Cone to Face Recognition
In earlier work, it is shown that the images under arbitrary combination of light sources
form a convex cone in image space. This cone, called illumination cone, can be
constructed from as few as three images. Figure 1 demonstrates the process of
constructing the illumination cone. Figure 1a show seven original images with different
illumination used in estimation of illumination cone. Figure 1b shows the basis images of
illumination cone. They can be used to generate images under arbitrary illumination
condition. Figure 1c shows the synthesized images from illumination cone of one face.
The reconstructed 3D face surface and illumination cones can be combined together to
synthesize images under different illumination and pose. In [14], Georghiades et al. use
prior knowledge about the shape of face to resolve the Generalized bas-relief (GBR) [15]
ambiguity. Once the GBR parameters are calculated, it is a simple matter to render
synthetic images under different illumination and pose. Figure 2 shows the reconstructed
face surface. Figure 7 shows the synthetic images of a face under different pose and
illumination. Note that these images are generated from the seven training images in
figure 1a where the pose is fixed and only small variation in illumination. In contrast, the
synthetic images exhibit not only large variation in pose but also in illumination. They
performed two sets of recognition experiments. The first experiment, where only
illumination varies while pose remains fixed, was designed to compare other recognition
algorithms to illumination cone method. There are a total of 450 images (45 illumination
conditions × 10 faces). These images are divided into for groups (12°, 25°, 50° and 77°)
according to the angle between light source and camera axis. Table 5 shows the results.
Cones-attached means that illumination cone was constructed without cast shadow and
9
(a)
(b)
(c)
Figure 1: the process of constructing illumination cone. [14]
Figure 2: the reconstructed face surface of 10 persons. [14]
cones-cast means that the reconstructed face surface was used to determine cast shadow.
Notice that the cone subspace approximation has the same performance as the original
illumination cone.
10
Figure 3: Synthesized images under different pose and illumination.
Table 5: Error rate under different illumination while pose is fixed.
Figure 4: error rate under different pose and illumination.
11
In the second experiment, they are evaluating the recognition performance under
variation in pose and illumination. There are a total of 4,050 images (9 poses × 45
illumination conditions × 10 faces). Figure 4 shows the results. Their algorithm has very
low error rate for all poses except on the extreme lighting condition.
We can draw the following conclusions from their experiment results: 1) we can achieve
pose/illumination invariant recognition by using small number of images with fixed pose
and slightly different illumination, 2) the images of face under variable illumination can
be well approximated by a low-dimensional subspace.
3.3 Linear Object Classes Method
Consider the problem of recognizing a face under different pose or expression when only
one picture is given. Human visual system is certainly able to perform this task. The
possible reason is that we exploit the prior information about how face images transform.
Thus, the idea here is to learn image transformation from examples and then apply it to
the new face image in order to synthesize the virtual view that can be used in existing
face recognition system. Poggio and Vetter [25] introduce the technique of generating
artificial new images of an object. Their work is based on the idea of linear object
classes. These are 3D objects whose 3D shape can be represented as linear combination
of a small number of prototype objects. Thus, if the example set consists of frontal and
rotated view images, we can synthesize images of rotated view from the given input
image.
12
For human-made objects, which often consist of cuboids, cylinders, or other geometric
primitives, the assumption of linear object classes seems natural. However, in the case of
face, it is not clear how many examples are sufficient. They test their approach on a set of
Figure 5: each test face is rotated by using 49 faces as examples (not shown) and the
result are marked as output. Only for comparison the true rotated test face is shown on
the lower left (this face was not used in computation) [25].
13
50 faces, each given in two orientations (22.5˚ and 0˚). In their experiment, one face is
chosen as test face, and the other 49 faces are used as examples. In Figure 5, each test
face is shown on the upper left and the synthesized image is shown on lower right. The
true rotated test face is shown on the lower left. In the upper right, they also show the
synthesis of the test face through 49 examples in test orientation. This reconstruction of
the test face should be understood as the projection of the test face into the subspace
spanned the other 49 examples. The results are not perfect, but considering the small size
of example set, the reconstruction is pretty good. In general, the similarity of the
reconstruction to the input test face allows us to speculate that an example set of
hundreds faces may be sufficient to construct a huge variety of different faces. We can
conclude that the linear object class approach maybe a satisfactory approximation, even
for complex objects as faces.
Therefore, given only a single face image, we are able to generate additional synthetic
face images under different view angle. For face recognition task, these synthetic images
could be used to handle the pose variation. And this approach does not need any depth
information, so the difficult steps of generating 3D models can be avoided.
3.4 View-Based Eigenspace
The eigenface technique of Turk and Pentland [28] was generalized to view-based
eigenspace technique for handling pose variation [27]. These extension accounts for
variation in pose and lead to a more robust recognition system.
14
They formulate the problem of face recognition under different pose as follows: given N
individuals under M different poses, one can build a “view-based” set of M separate
eigenspaces. Each eigenspace captures the variation of N individuals in a common pose.
In view-based approach, the first step is to determine the pose of input face image by
selecting the eigenspace which best describes it. This could be accomplished by
calculating the Euclidian distance between input image and the projection of input image
in each eigenspace. The eigenspace yielding the smallest distance is the one with most
similar pose to input image. Once the proper eigenspace is determined, the input image is
coded using the eigenfaces of that space and then recognized.
They have evaluated the view-based approach with 189 images, 21 people with 9 poses.
The 9 poses of each person were evenly spaced from -90° to 90° along the horizontal
plane. Two different test mythologies were used to judge the recognition performance.
In the first series of experiments, the interpolation performance was tested by training on
the subset of available view {±90°, ±45°, 0°} and testing on the intermediate views
{±68°, ±23°}. The average recognition rate was 90% for view-based method. A second
series of experiments test the extrapolation performance by training on a range of
available view {e.g., -90° to +45°} and testing on views outside the training range {e.g.,
+68°, +90°}. For testing poses separated by ±23° from the training range, the average
recognition rate was 83% for view-based method.
3.5 Curvature-Based Face Recognition
In [7], they use curvature of surface to perform face recognition. This is a great idea since
the value of curvature at a point on the surface is invariant under the variation of
15
viewpoint and illumination. In this approach, a rotation laser scanner produces data of
high enough resolution such that accurate curvature calculation can be made. Face
segmentation can be made based on the sign of Gaussian curvature; this allows two
surface types: convex/concave and saddle regions. Their surface feature extraction
contains not just curvature sign but also principal curvature, principal direction, umbilic
points and extremes in both principal curvatures. The maximum and minimum curvature
at a point defines the principal curvatures. The directions associated with principal
curvatures are the principal directions. The principal curvatures and the principal
directions are given by the eigenvalues and eigenvectors of shape matrix. The product of
two principal curvatures is Gaussian curvature. And mean curvature is defined by the
mean value of two principal curvatures.
In practice, because these curvature calculations contain second order partial derivatives,
they are extremely sensitive to noise. A smoothing filter is required before calculating
curvature. Here, the dilemma is how to choose an appropriate smoothing level. If the
smoothing level is too low, twice derivative will amplify noise such that curvature
measurement is useless. On the other hand, over smoothing will modify the surface
features we are trying to measure. In their implementation, they precompute the curvature
values using several different levels of smoothing. They use the curvature maps from low
smoothing level to establish the location of features. Then, use the prior knowledge of
face structure to select the curvature values from the precomputed set. This is done
manually, I think. An example of principal curvature maps is given in figure 6.
Segmentation is somewhat straightforward. By using the sign of Gaussian curvature, K,
and mean curvature, H, face surface can be divided into four different kinds of regions:
16
K+, H+ is convex, K+, H- is concave, K-, H+ is saddle with k max  k min and K-, H- is
saddle with k max  k min . The boundary of these regions is called parabolic curve where
Gaussian curvature is zero. Figure 7 shows an example.
The author also talks about the calculation of surface descriptors. She tries to find as
much information as possible from range data such that this information is as unique as
the individual.
Figure 6: Principal curvature (a) and principal direction (c) of maximum curvature,
principal curvature (b) and principal direction (d) of minimum curvature.
17
Figure 7:
Segmentation of three faces by the sign of Gaussian and mean curvature:
concave (black), convex (white), saddle with positive mean curvature (light gray) and
saddle with negative mean curvature (dark gray).
With such a rich set of information available, there are many ways to construct a
comparison strategy. The author uses feature extraction and template matching to
perform face recognition. In the experiment, test set consists of 8 faces with 3 views each.
For each face there are two versions without expression and one with expression. The
experiment results show that 97% of the comparisons are correct.
In my opinion, the advantages of curvature-based technique are: 1) it solves the problem
of pose and illumination variation ate the same time. 2) There is a great deal of
information in curvature map which we haven’t taken advantage of. It is possible to find
an efficient way to deal with it.
However, there are some inherent problems in this approach: 1) Laser range finder
system is much more expensive compared with camera. And this technique cannot be
applied to the existing image database. This makes people don’t want to choose it if they
have another choice. 2) Even though the rage finder is not an issue any more, the
computation cost is too high and the curvature calculation is very sensitive to noise. If we
use principal component analysis to deal with range data, the error rate probably will be
similar while the computation complexity is much lower. 3) We can construct 3D face
surface from 2D image instead of expensive range finder. There are a lot of algorithms
available to do this. But you will not be able to calculate curvature from reconstructed 3D
18
face surface. As mentioned earlier, curvature calculation involves second derivative of
surface. Only the high-resolution data such as laser range finder makes the accurate
curvature calculation possible.
3.6 3D model-Based Face Recognition
To reduce the cost of system, Beumier and Acheroy [8] choose the 3D acquisition system
consisting of standard CCD camera and structured light. It is based on the projection of a
known light pattern. The pattern deformation contains thee depth information of the
object. 3D surface reconstruction is done by stripe detection and labeling. Form each
point of a stripe and its label, triangulation allows for X, Y, Z estimation. This process is
very fast while offering sufficient resolution for recognition purpose.
There are 120 persons in their experiment. Each on is taken three shots, corresponding to
central, limited left/right rotation and up/down rotation. Automatic database uses the
automatic program to get 3D information of each individual. In manual database, the 3D
extraction process was performed by clicking initial points in the deformed pattern.
With the 3D reconstruction, they are looking for characteristics to reduce the 3D data to a
set of features that could be easily and quickly compared. But they found nose seems to
be the only robust feature with limited effort. So, they gave up feature extraction and
considered global matching of face surface.
15 profiles are extracted by the intersection of face surface and parallel plane spaced with
1 cm. A distance measurement called profile distance is defined to quantify the difference
between 3D surfaces. This approach is slow: about 1 second to compare two face
surfaces. In order to speed up this algorithm, they tried to use only the central profile and
19
Figure 8: a) Projection of known light pattern
b) Analyze the pattern deformation.
Figure 9: Reconstructed face surface
two lateral profiles in comparison. ROC curves are shown in figure 10 to illustrate the
influence of comparison strategy. In central/lateral profile comparison, error rate is
sacrificed (from 3.5% to 6.2%) to earn the speed of surface comparison. In the left of
figure 10, the manual refinement gives us better recognition performance. This tells us
that there is room to improve automatic 3D acquisition system.
Advantage: 1) additional cost is only the projector and pattern slide. 2) Switching the
slide on and off allows acquiring both 2D image and 3D information. The fusion of 2D
and 3D information can increase the recognition performance. 3) The projector
20
illumination reduces the influence of ambient light. 4) 3D reconstruction and profile
comparison can avoid pose variation.
Problems: 1) automatic 3D reconstruction is not good enough. An obvious improvement
can be done by manual refinement. 2) Profile matching is very expensive computational
task. In face authentication, this is not an issue. But in face recognition with big database,
the speed would be terribly slow.
Figure 10.
ROC curves of 3D face surface recognition, with 15 profiles (left) and
with central/lateral profiles (right).
3.7 Elastic Bunch Graph Matching
In [26], they use Gabor wavelet transform to extract face features so that the recognition
performance can be invariant to the variation in poses. Here, I want to talk about some
terminologies they use first and discuss how they build the face recognition system.
For each feature point on the face, it is transformed with a family of Gabor wavelets. The
set of Gabor wavelets consists of 5 different spatial frequencies and 8 orientations.
Therefore, one feature point has 40 corresponding Gabor wavelet coefficients. A jet is
21
defined as the set J i  of Gabor wavelet coefficients for one feature points. It can be
written as J i  ai exp( ii ) .
A labeled Graph G represents a face consists of N nodes connected by E edges. The
nodes are located at feature points called fiducial points. For example, the pupils, the
corners of mouth, the tip of nose are all fiducial points. The nodes are labeled with jets
J n . Graphs for different head pose differ in geometry and local features. To be able to
compare graphs of different poses, the manually defines pointers to associate
corresponding nodes in different graphs.
In order to extract graphs automatically for new face, they need a general representation
for face. This representation should cover a wide range of possible variations in
appearance of face. This representative set has stack-like structure, called face bunch
graph (FBG) (see figure 11).
Figure 11 : The Face Bunch Graph serves as a general representation of face. Each stack
of discs represents a jet. One can choose the best match jet from a bunch of jets attached
to a single node
22
A set of jets referring to on fiducial point is called a bunch. An eye bunch, for instance,
may includes jets from closed, open, female and male eyes etc. to cover possible
variation. The Face Bunch Graph is given the same structure as the individual graph.
In searching for fiducial points in new image of face, the procedure described below
selects the best fitting jet from the bunch dedicated to each fiducial point.
The first set of graphs is generated manually. Initially, when the FBG contains only few
faces, it is necessary to check the matching result. Once the FBG is rich enough
(approximately 70 graphs), the matching results are pretty good.
Matching a FBG on a new image is done by maximizing the graph similarity between
image graph and the FBG of the same pose. For an image graph G with nodes n = 1,…,N
and edges e = 1,…,E and FBG B with model graph m = 1,…,M the similarity is defined
as
1
S (G, B) 
N
 max (S ( J
n
m
I
n
,J
Bm
n


( xeI  xeB ) 2
))  

E e
( xeB ) 2

since the FBG provides several jets for each fiducial point, the best one is selected and
used for comparison. The best fitting jets serve as local experts for the new image.
They use the FERET database to test their system. First, the size and location of face is
determined and face image is normalized in size. In this step several FBGs of different
size are required; the best fitting one is used for size estimation. In FERET database, each
image has a label indicating the pose, there is no need to estimate pose. But pose could be
estimated automatically in similar way as size.
After extracting model graphs from the gallery images, recognition is possible by
comparing an image graph to all model graphs and selecting the one with highest
similarity value. A comparison against a gallery of 250 individuals takes less than one
23
second.
The poses used here are: neutral frontal view (fa), frontal view with different
expression (fb), half-profile right (hr) or left (hl), and profile right (pr) and left (pl).
Recognition results are shown in Table 6.
The recognition rate is high for frontal against frontal images (first row). This is due to
the fact that two frontal views show only little variation. The recognition rate is till high
for right profile against left profile (third row). When comparing left and right halfprofile, the recognition rate drops dramatically (second row). The possible reason is the
variation in rotation angle – visual inspection shows that rotation angle may differ by up
to 30°. By comparing frontal views or profile against half profile, a further reduction in
recognition rate is observed.
From the experiment results, it is obvious that Gabor wavelet coefficients are not
invariant under rotation. Before performing recognition, you still need to estimate pose
and find corresponding FBG.
Table 6: Recognition results for cross-run between different galleries. The combination in
the four bottom rows are due to the fact that not all poses were available for all person.
The table also shows how often the correct face is identified as rank one and how often it
is among the top ten.
24
3.8
Face recognition from 2D and 3D images
Very few research groups focus on face recognition from both 2D and 3D face images.
Almost all existing systems rely on a single type of face information: 2D images or 3D
range data. It is obvious that 3D range data can compensate for the lack of depth
information in 2D image. 3D shape is also invariant under the variation of illumination.
Therefore, integrating 2D and 3D information will be a possible way to improve the
recognition performance.
In [29], corresponding 3D range data and 2D image are obtained at the same time and the
perform face recognition. Their approach consists of following steps:
Find feature points: In order to extract both 2D and 3D features, two type of feature
points are defined. One type is in 3D and is described by Point signature. The other type
is in 2D and is represented as Gabor wavelet response. Four 3D feature points and 10 2D
feature points are selected as shown in figure 12.
Figure 12: 2D feature points are shown by “.” And 3D feature points are shown by “x”.
Gabor wavelet function has tunable orientation and frequency. Therefore, it can be
configured to extract a specific band of frequency components from an image. This
makes it robust against translation, distortion and scaling. Here, they use a widely used
version of Gabor wavelets [26]. For each feature point, there a set of Gabor wavelets
25
consisting of 5 different frequencies and 8 orientations. Localizing a 2D feature point is
accomplished with the following steps.
1. For each pixel k in the search range, calculate the corresponding Jet Jk.
2. Compute the similarity between Jk and all the jets in corresponding bunch.
3. In the search area, the pixel with highest similarity value is chosen as the location
of feature point.
In this paper, point signature is chosen to describe feature points in 3D. Its main idea is
summarized in here. For a given point, we place a sphere with center at this point. The
intersection between sphere and face surface is a 3D curve C. a plane P is defined as the
tangential plane for the given point. The projection of C to P form a new curve called C’.
The distance between C and C’ is a function of . This function is called point signature.
Point signature is invariant under rotation and translation. The definition of point
signatures is also illustrated figure 13.
Figure 13: the definition of point signature.
26
Based on the principal component analysis, four eigenspace are constructed for four 3D
feature points. So, there are eigen point signatures for each 3D feature points. Localizing
a 3D feature point has similar steps as localizing 2D feature points:
1. For each point the search range, calculate the corresponding point signature.
2. The point signature can be approximated by the linear combination of eigen point
signatures.
3. Compute the error between real point signature and approximated point signature.
4. In the search area, the point with smallest error chosen as the location of feature
point.
Face recognition
here, each face is expressed as in terms of shape vector and texture
vector. Shape vector X s   3694 consists of point signatures of four feature points and
their eight neighboring points. Each point signature is sampled with equal interval 10°.
The texture vector X t   4010 consists of Gabor wavelet coefficients of 10 2D feature
points. Different weight is assigned for different feature point based on their detection
accuracy.
With the shape vector and texture vector of each training face, texture subspace  t and
shape subspace  s can be constructed. For a new face image, their shape vector and
texture vector will be projected to shape subspace and texture subspace respectively. The
projection coefficient vector  t and  s are then obtained. The feature vector  st for this
face is calculated from  t and  s as the following equation:
T
  Ts  Tt 
 .
 st  
,



t 
 s
27
With the feature vector, the remaining task is simply choosing a classifier with lower
error rate. In this paper, two classifiers are used and compared: one is based on maximal
similarity value, and the other one is based on support vector machine.
Experiment
Their experiment was carried out using face images of 50 individuals. Each person
provides six facial images with view point and expression variation. Some of these
images are chosen as the training images and the rest are taken as test image. So, the
training set and testing set are disjoint.
For a new test image, after localizing the feature points, a 36×9×4-dimension shape
vector and a 40×10-dimension texture vector are calculated. These two vectors are
projected into corresponding subspace. The projection coefficients are augmented to form
an integrated feature vector.
In order to evaluate recognition performance, two test cases are performed with different
features, different number of principal components and different classifiers.
Case 1: Comparison of recognition performance with different features, point signature
(PS), Gabor coefficients (GC) and PS+GC.
Figure 14 (a)shows the recognition rate versus subspace dimensions with different chosen
features. The results confirm their assumption that combination of 2D and 3D
information can improve recognition performance.
Case 2: compare the recognition performance of different classifiers, similarity function
and Support Vector Machine. Figure 14 (b), (c) and (d) show the recognition rate versus
subspace dimension with different classifiers. Result in (b) is obtained using point
signature as feature, (c) is obtained using Gabor coefficients as feature and (c) is obtained
28
using PS+GC as feature. With SVM as the classifier, higher recognition rate is obtained
in all three cases.
Figure 14: recognition rate vs. subspace dimensions: (a) different chosen feature with the
same classifier; (b) chosen feature: PS; (c) chosen feature: GC; (d) chosen feature:
PS+GC.
3.9
Pose estimation from single image
Generally, a face recognition problem can be divided into two major parts: normalization
and recognition. In normalization, we need to estimate the size, illumination, expression,
and pose of face from the given image and then transform input image into normalized
format which can be recognized by the recognition algorithm. Therefore, how to estimate
pose accurately and efficiently is an important problem in face recognition. Solving this
problem is a key step in building a robust face recognition system.
Ji [10] propose a new approach for estimating 3D pose of face from single image. He
assumes that the shape of face can be approximated by an ellipse. The pose of face can be
expressed in terms of yaw angle, pitch angle and roll angle of ellipse (see figure 15, 16).
His system consists of three major parts: pupil detection, face detection and pose
estimation.
29
Figure 15: face shape can be approximated by an ellipse
Figure 16: pose of face can be expressed in terms of yaw, pitch and roll angle.
In the following sections, I will discuss how his system works.
Pupil detection
in order to find pupil easily, an IR light source is used in his
system. Under the IR illumination, pupil will be much brighter than the other parts of
face. Therefore, pupil detection can be bone easily by setting a threshold. This is a robust
operation and can be done in real time.
Face detection
Face detection is somewhat more difficult because face
may have significantly different shape under different pose. In his system, face detection
consists of following steps: 1) find the approximate location of face based on the location
30
of pupil. 2) perform the constrained sum of gradients maximization procedure to search
for the exact face location.
In the second step, he applies a technique proposed by Birchfield [30]. Birchfiled’s
technique is aimed at detecting head instead of face. To detect face, he puts constraints on
the gradient maximization procedure. The constraints he exploits are size, location and
symmetry. Specifically, he draws a line between the detected pupils as shown in figure
17. The distance of pupils and their location are used to constraint the size and location of
ellipse. The ellipse should minimally and symmetrically include the two detected pupils.
Results of face detection are shown in figure 18.
Figure 17: face ellipse parameterization.
Figure 18: results of face detection at different pose.
31
Pose estimation
with the pupil location, roll angle can be easily calculated from
following equation:
  arctan
p1 y  p 2 y
p1x  p 2 x
where p1 and p 2 are the location of pupils. The basic idea about how to estimate the yaw
and pitch angle is simply figuring out the correspondence between 2D ellipse and 3D
ellipse. The details will not be discussed in this report since the derivation takes too much
space.
In order to evaluate his pose estimation algorithm, artificial ellipses are utilized to verify
the accuracy of estimation results. Experiment results are shown in figure 19.
Figure 19: pose estimation for the synthetic ellipses.
Actual ellipse images with different pitch and yaws are used to further investigate the
performance of his algorithm. The computed rotation angle is shown below each image in
figure 20. The estimation results are consistent with the perceived orientations.
32
Figure 20: the estimated orientation of actual ellipses. The number below each image is
the estimated pitch or yaw angle.
Figure 21: pose estimation of face image sequence.
The ultimate test is to study its performance with human faces. For this purpose, image
sequences of face with different pitches and yaws are captured. Ellipse detection is
performed on each image. The detected ellipse is used to estimate the pose. Experiment
results are shown in figure 21.
By using his algorithm, we can find the location of face, track the location of face and
estimate the pose. In other words, his algorithm almost provides everything we need in
building a face recognition system. The better part is we can recognize face not only from
image but also from video.
33
, we can simply apply the hybrid mode face recognition where multiple images are
available for each person. The closest pose can be found in database by using the
estimated pose. Then, use eigenface approach to perform face recognition. As mentioned
before, there are two major drawbacks: 1) it needs many possible images to cover
possible views, and 2) it cannot handle illumination variation. But this type of face
recognition techniques is probably the most practical method up to now.
4
Discussion and Conclusion
In this paper, a variety of face recognition technologies are studied. Basically, they can be
divided into two major groups. The first one: Since eigenface is an efficient method to
perform face recognition under fixed illumination and pose, most of recent works focus
on how to compensate the variation in pose and illumination. For example, SFS,
illumination cone and linear object method are in this direction. The second one: Instead
of using eigenface, some other methods try to avoid illumination and pose problem at the
beginning of their algorithm. For example, curvature-based method and face bunch graph
matching are belonged to this category.
I give below a concise summary followed by conclusions in the same order as methods
appear in this report.

SFS has great potential to be applied in real world application since it requires
only one input image and one registered image per person. But, SFS problem is
still hard to solve so far. The current method has high computational complexity
34
and the re-rendering image has poor quality. So, there are many issues to be
solved before it can be applied to commercial or law enforcement application.

Illumination cone method shows the improvement in recognition performance
under the variation of pose and illumination. One problem is that it requires many
images per person to construct the subspace. For commercial applications, the use
of many images per person is not feasible due to the cost consideration. The other
issue is the computational cost when the database contains a large number of
images. If it took too much time to recognize face when database is large, it is
discouraged form further development.

Linear object method is inspired by the way human look at the world. It provides
a new angle to think about face recognition problem. In [25], they successfully
transform the image from half profile to frontal view given the 49 examples. So,
one possible future work in this direction is to cover more variation in pose or
illumination by expanding their method.

View-base eigenspace method constructs subspace for each pose. Their
experiments show that performance degradation occurs if the input image has
different pose from database. In other words, you need to construct a lot of
subspaces in order to achieve high recognition rate under any pose. So, this
method is an impractical solution for designing a pose-invariant face recognition
system. But, for recognizing face from mug shots in which only three poses exist,
it is probably most feasible solution so far.

Curvature-based face recognition doesn’t have illumination variation since it
recognizes face from range data. So, the cost for implementing this system is
35
higher than other face recognition system which uses camera to acquire data. The
other issue mentioned by the author is that curvature calculation is sensitive to
noise. We can say that this method has high computation complexity and high
cost. Even though it seems a good idea for handling pose and illumination
problem, not too many people try this method in the past few years.

3D model-based method performs face recognition from 3D face surface without
using curvature calculation. The face surface is constructed by projecting pattern
to face. The additional cost for this system is projector and pattern slide. Their
system provides an example of low cost face recognition system using 3D
information. One issue with their system is that their surface reconstruction
algorithm needs manual adjustment to achieve better recognition rate. The fully
automatic reconstruction algorithm is the possible future work in this direction.

Face bunch graph uses the Gabor wavelet transformation to extract face feature
and then collect the variations of feature points to form face bunch graph. The
primary advantage of this method is that the feature extracted by Gabor wavelet is
invariant in certain degree. So, this is an efficient way to design an robust face
recognition system.

Most of existing face recognition algorithms are based on either 2D images or 3D
range data. Combination of 2D and 3D information can further improve the
recognition performance. Here, the authors use point signature to extract 3D
features and Gabor wavelet transformation to extract 2D features. The feature
vector is defined as weighted combination of point signature and Gabor
36
coefficients. With this feature vector, you can perform recognition using any
classifier as long as it provides higher recognition rate.

Recognizing face from video is probably the most difficult problem. One should
be able to tract the location of face, estimate the pose of face and then recognize
face. In [10], their method provides the infrastructure for recognizing face from
video. Face tracking and pose estimation can be done by using their method. We
can expand it to a face recognition system by simply adding the view based
eigenspace algorithm.
5
Reference
[1] R. Chellappa, C.L. Wilson, and Sirohey, “Human and Machine Recognition of Faces,
A survey,” Proc. of the IEEE, Vol. 83, pp. 705-740, 1995.
[3] 3D model enhanced face recognition
Wen Yi Zhao; Chellappa, R.;
Image Processing, 2000. Proceedings. 2000 International Conference on , Volume: 3 ,
2000
Page(s): 50 -53 vol.3
[4] SFS based view synthesis for robust face recognition
Wen Yi Zhao; Chellappa, R.;
Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International
Conference on , 2000
Page(s): 285 –292
[5] The FERET evaluation methodology for face-recognition algorithms
Phillips, P.J.; Hyeonjoon Moon; Rizvi, S.A.; Rauss, P.J.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 22 Issue: 10
, Oct 2000
Page(s): 1090 -1104
[6] Human and machine recognition of faces: a survey
Chellappa, R.; Wilson, C.L.; Sirohey, S.;
Proceedings of the IEEE , Volume: 83 Issue: 5 , May 1995
37
Page(s): 705 -741
[7] G. Gordon, ``Face Recognition from Depth Maps and Surface Curvature", in Proc. of
SPIE, Geometric Methods in Computer Vision, San Diego, July 1991. Vol. 1570.
[8] C. Beumier, M. Acheroy, Automatic 3D Face Authentication In Image and Vision
Computing, Vol. 18, No. 4, pp 315-321, Feb 1999.
[9] ] Ali Md. Haider, and Toyohisa Kaneko, “Automated 3D-2D Projective Registration
of Human Facial Images Using Edge Features”, Accepted for publication in the
International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI).
[10] Qiang Ji, 3D Face pose estimation and tracking from a monocular camera ", Image
and Vision Computing, Volume 20, issue 7, pages 499-511, 2002. download the paper
[11] S. Sakamoto, R. Ishiyama, and J. Tajima, “3D model-based face recognition system
with robustness against illumination changes,” NEC Research and Development, vol.43,
pp. 15-19, 2002.
[12] "What is the Set of Images of an Object under all Possible Illumination Conditions?"
P.N. Belhumeur and D. Kriegman, International Journal of Computer Vision, 28(3), 245260 (1998).
[13] W. Zhao, “Robust Image Based 3D Face Recognition,” PhD Thesis, University of
Maryland, 1999.
[14] From few to many: illumination cone models for face recognition under
variable lighting and pose
Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 23 Issue: 6 ,
Jun 2001
Page(s): 643 -660
[15] The bas-relief ambiguity
Belhumeur, P.N.; Kriegman, D.J.; Yuille, A.L.;
Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer
Society Conference on , 17-19 Jun 1997
Page(s): 1060 -1066
[18] Eigenfaces vs. Fisherfaces: recognition using class specific linear projection
Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
Jul 1997
Page(s): 711 -720
38
[19] Face recognition: the problem of compensating for changes in illumination
direction
Adini, Y.; Moses, Y.; Ullman, S.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
Jul 1997
Page(s): 721 -732
[20]
Comparing images under variable illumination
Jacobs, D.W.; Belhumeur, P.N.; Basri, R.;
Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer
Society Conference on , 23-25 Jun 1998
Page(s): 610 -617
[21]
What is the set of images of an object under all possible lighting conditions?
Belhumeur, P.N.; Kriegman, D.J.;
Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE
Computer Society Conference on , 18-20 Jun 1996
Page(s): 270 -277
[22]
A low-dimensional representation of human faces for arbitrary lighting conditions
Hallinan, P.W.;
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE
Computer Society Conference on , 21-23 Jun 1994
Page(s): 995 -999
[23]
Statistical Approach to Shape from Shading: Reconstruction of 3D Face Surfaces
from Single 2D Images (1997)
Joseph J. Atick, Paul A. Griffin, A. Norman Redlich
Neural Computation, Vol. 8, pp. 1321-1340, 1996
[24]
Illumination-based image synthesis: creating novel images of human faces under
differing pose and lighting
Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.;
Multi-View Modeling and Analysis of Visual Scenes, 1999. (MVIEW '99) Proceedings.
IEEE Workshop on , 1999
Page(s): 47 -54
[25]
Linear object classes and image synthesis from a single example image
Vetter, T.; Poggio, T.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
39
Jul 1997
Page(s): 733 -742
[26]
Face recognition by elastic bunch graph matching
Wiskott, L.; Fellous, J.-M.; Kuiger, N.; von der Malsburg, C.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 ,
Jul 1997
Page(s): 775 -779
[27]
View-based and modular eigenspaces for face recognition
Pentland, A.; Moghaddam, B.; Starner, T.;
Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE
Computer Society Conference on , 21-23 Jun 1994
Page(s): 84 -91
[28]
M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive
Neuroscience, Vol. 3, No. 1, pp. 71-86, Winter 1991
[29]
Integrated 2D and 3D images for face recognition
Yingjie Wang; Chin-Seng Chua; Yeong-Khing Ho; Ying Ren;
Image Analysis and Processing, 2001. Proceedings. 11th International Conference on ,
26-28 Sep 2001
Page(s): 48 -53
[30]
Elliptical head tracking using intensity gradients and color histograms
Birchfield, S.;
Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer
Society Conference on , 23-25 Jun 1998
Page(s): 232 -237
[38]
A feature based approach to face recognition
Manjunath, B.S.; Chellappa, R.; von der Malsburg, C.;
Computer Vision and Pattern Recognition, 1992. Proceedings CVPR '92., 1992 IEEE
Computer Society Conference on , 15-18 Jun 1992
Page(s): 373 -378
[39]
Geometric and illumination invariants for object recognition
Alferez, R.; Yuan-Fang Wang;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 21 Issue: 6 ,
40
Jun 1999
Page(s): 505 –536
[40]
Automatic creation of 3D facial models
Akimoto, T.; Suenaga, Y.; Wallace, R.S.;
Computer Graphics and Applications, IEEE , Volume: 13 Issue: 5 , Sep 1993
Page(s): 16 -22
[41]
Learning bilinear models for two-factor problems in vision
Freeman, W.T.; Tenenbaum, J.B.;
Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer
Society Conference on , 17-19 Jun 1997
Page(s): 554 -560
[42]
Surface reflection: physical and geometrical perspectives
Nayar, S.K.; Ikeuchi, K.; Kanade, T.;
Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 13 Issue: 7 ,
Jul 1991
Page(s): 611 -634
[43]
Shape from shading
Berthold K. P. Hornand and Michael J. Brooks,
Cambridge, Mass. : MIT Press, 1989.
41
Download