ECE 738 Course Project Face Recognition: A Literature Survey Lin,Wei-Yang 9022090964 May 16, 2003 1 Face Recognition: A Literature Survey 1 Introduction Face Recognition Technology (FRT) is a research area spanning several disciplines such as image processing, pattern recognition, computer vision and neural network. There are many applications of FRT as shown in Table 1. These applications range from matching of photographs to real time matching of surveillance video. Depending on the specific application, FRT has different level of difficulty and requires wide range of techniques. In 1995, a review paper by Chellappa et al. [1] gives a through survey of FRT at that time. During the past few years, FRT is still under rapid evolution. In this report, I will provide a brief review of most recent developments in FRT. Table 1: Applications of face recognition technology [1]. 2 The Challenges in FRT Though many FRT have been proposed, robust face recognition is still difficult. The recent FERET test [5] has revealed that there are at least two major challenges: The illumination variation problem 2 The pose variation problem Either one or both problems can cause serious performance degradation in most of existing systems. Unfortunately, these problems happen in many real world applications, such as surveillance video. In the following, I will discuss some existing solutions for these problems. The general face recognition problem can be formulated as follows: Given single image or sequence of images, recognize the person in image using a database. Solving the problem consists of following steps: 1) face detection, 2) face normalization, 3) inquire database. 2.1 The illumination problem Images of the same face appear differently due to the change in lighting. If the change induced by illumination is larger than the difference between individuals, systems would not be able to recognize the input image. To handle the illumination problem, researchers have proposed various methods. It has been suggested that one can reduce variation by discarding the most important eigenface. And it is verified in [18] that discarding the first few eigenfaces seems to work reasonably well. However, it causes the system performance degradation for input images taken under frontal illumination. In [19], different image representations and distance measures are evaluated. One important conclusion that this paper draws is that none of these method is sufficient by itself to overcome the illumination variations. More recently, a new image comparison method was proposed by Jacobs et al. [20]. However this measure is not strictly 3 illumination-invariant because the measure changes for a pair of images of the same object when the illumination changes. An illumination subspace for a person has been constructed in [21, 22] for a fixed view point. Thus under fixed view point, recognition result could be illumination–invariant. One drawback to use this method is that we need many images per person to construct the basis images of illumination subspace. In [23], the authors suggest using Principal Component Analysis (PCA) to solve parametric shape-from-shading (SFS) problem. Their idea is quite simple. They reconstruct 3D face surface from single image using computer vision techniques. Then compute the frontal view image under frontal illumination. Very good results are demonstrated. I will explain their approach in detail later. Actually, there are a lot of issues in how to reconstruct 3D surface from single image. I will discuss two important illumination-invariant FRT in the follow sections. 2.2 The pose problem The system performance drops significantly when pose variations are present in input images. Basically, the existing solution can be divided into three types: 1) multiple images per person are required in both training stage and recognition stage, 2) multiple images per person are used in training stage but only one database image per person is available in recognition stage, 3) single image based methods. The second type is the most popular one. 4 Multiple images approaches an illumination-based image synthesis method [24] has been proposed for handling both pose and illumination problems. This method is based on illumination cone to deal with illumination variation. For variations due to rotation, it needs to completely resolve the GBR (generalized-bas-relief) ambiguity when reconstructing 3D surface. Hybrid approaches So many algorithms of this type have been proposed. It is probably the most practical solution up to now. Three reprehensive methods are reviewed in this report: 1) linear class based method [25], 2) graph matching based method [26], 3) view-based eigenface method [27]. The image synthesis method in [25] is based on the assumption of linear 3D object classes and extension of linearity to images. In [26], a robust face recognition scheme based on EBGM is proposed. They demonstrate substantial improvement in face recognition under rotation. Also, their method is fully automatic, including face localization, landmark detection and graph matching scheme. The drawback of this method is the requirement of accurate landmark localization which is not easy when illumination variations are present. The popular eigenface approach [28] has been modified to achieve pose-invariant [27]. This method constructs eigenfaces for each pose. More recently, a general framework called bilinear model has been proposed [41]. The methods in this category have some common drawbacks: 1) they need many images per person to cover possible poses. 2) The illumination problem is separated from the pose problem. Single Image Based Approaches Gabor wavelet based feature extraction is proposed 5 for face recognition [38] and is robust to small-angle rotation. There are many papers on invariant features in computer vision literature. There are little literatures talking about using this technology to face recognition. Recent work in [39] sheds some light in this direction. For synthesizing face images under different lighting or expression, 3D facial models have been explored in [40]. Due to its complexity and computation cost it is hard to apply this technology to face recognition. 3 The State of Art In the following sections, I will discuss some recent research works in face recognition. 3.1 Applying Shape-from-Shading (SFS) to Face Recognition The basic idea of SFS is to infer the 3D surface of object from the shading information in image. In order to infer such information, we need to assume a reflectance model under which the given image is generated from 3D object. There are many illumination models available. Among these models, the Lambertian model is the most popular one and has been used extensively in computer vision community for the SFS problem [42]. The nature of SFS makes it an ill-posed problem in general. In other words, the reconstructed 3D surface cannot synthesize the images under different lighting angle. Fortunately, theoretical advances make SFS problem a well-posed problem under certain conditions. The key equation in SFS problem is the following irradiance equation [43]: I [ x, y ] R( p[ x, y ], q[ x, y ]) where I [ x, y ] is the image, R is the reflectance map and p[ x, y ], q[ x, y ] are the shape gradients (partial derivatives of the depth map).With the assumption of a Lambertian 6 surface and a single, distant light source, the equation can be written as follows: I cos or I 1 pPs qQs 1 p 2 q 2 1 Ps2 Qs2 Since SFS algorithm provides face shape information, illumination and pose problems can be solved simultaneously. For example, we can solve the illumination problem by rendering the prototype image Ip from a given input image I. This can be done in two steps: 1) apply SFS algorithm to obtain (p,q), 2) the new generate the prototype image Ip under lighting angle = 0. To evaluate some existing SFS algorithms, Zhao [13] applies several SFS algorithms to 1) synthetic face images which are generated based on Lambertian model and constant albedo, 2) real face images. The experiment results show that these algorithms are not good enough for real face images such that a significant improvement in face recognition can be achieved. The reason is that face is composed of materials with different reflecting properties: cheek skin, lip skin, eye, etc. hence, Lambertian model and constant albedo can not provide good approximation. Zhao et al. [3] develop a symmetric SFS algorithm using the Lambertian and varying albedo (x,y) as a better alternative. With the aid of a generic 3D head model, they can shorten the two-step procedure of obtaining prototype image (1. input image to shape via SFS, 2. shape to prototype image) to one step: input image to prototype image directly. Their algorithm is applied to more than 150 face images from the Yale University and Weizmann database. The results clearly indicate the superior quality of prototype images 7 rendered by their method. They also conduct three experiments to evaluate the influence in recognition performance when their algorithm is combined with existing FRT. The first experiment demonstrates the improvements in recognition performance by using the new illumination-invariant measure they define. The results are shown in table 2 and table 3. The second experiment shows that using the rendered prototype images instead of original input images can significantly improve existing FRT such as PCA and LDA. The results of second experiment are shown in table 4 where P denotes prototype image. Finally, in the third experiment they demonstrate that the recognition rate of subspace LDA can be improved. Database Image Measure Gradient Measure Illumination-Invariant Measure Yale 68.3% 78.3% 83.3% Weizmann 86.5% 97.9% 81.3% Table 2: Recognition performance using three different measures (One image per person). Database Image Measure Gradient Measure Illumination-Invariant Measure Yale 78.3% 88.3% 90.0% Weizmann 72.9% 96.9% 87.9% Table 3: Recognition performance using three different measures (Two images per person). Database PCA LDA P-PCA P-LDA Yale 71.7% 88.3% 90.0 95.0% Weizmann 97.9% 100% 95.8% 98.9% Table 4: Recognition performance with/without using prototype image. (P denotes prototype image) 8 3.2 Applying Illumination Cone to Face Recognition In earlier work, it is shown that the images under arbitrary combination of light sources form a convex cone in image space. This cone, called illumination cone, can be constructed from as few as three images. Figure 1 demonstrates the process of constructing the illumination cone. Figure 1a show seven original images with different illumination used in estimation of illumination cone. Figure 1b shows the basis images of illumination cone. They can be used to generate images under arbitrary illumination condition. Figure 1c shows the synthesized images from illumination cone of one face. The reconstructed 3D face surface and illumination cones can be combined together to synthesize images under different illumination and pose. In [14], Georghiades et al. use prior knowledge about the shape of face to resolve the Generalized bas-relief (GBR) [15] ambiguity. Once the GBR parameters are calculated, it is a simple matter to render synthetic images under different illumination and pose. Figure 2 shows the reconstructed face surface. Figure 7 shows the synthetic images of a face under different pose and illumination. Note that these images are generated from the seven training images in figure 1a where the pose is fixed and only small variation in illumination. In contrast, the synthetic images exhibit not only large variation in pose but also in illumination. They performed two sets of recognition experiments. The first experiment, where only illumination varies while pose remains fixed, was designed to compare other recognition algorithms to illumination cone method. There are a total of 450 images (45 illumination conditions × 10 faces). These images are divided into for groups (12°, 25°, 50° and 77°) according to the angle between light source and camera axis. Table 5 shows the results. Cones-attached means that illumination cone was constructed without cast shadow and 9 (a) (b) (c) Figure 1: the process of constructing illumination cone. [14] Figure 2: the reconstructed face surface of 10 persons. [14] cones-cast means that the reconstructed face surface was used to determine cast shadow. Notice that the cone subspace approximation has the same performance as the original illumination cone. 10 Figure 3: Synthesized images under different pose and illumination. Table 5: Error rate under different illumination while pose is fixed. Figure 4: error rate under different pose and illumination. 11 In the second experiment, they are evaluating the recognition performance under variation in pose and illumination. There are a total of 4,050 images (9 poses × 45 illumination conditions × 10 faces). Figure 4 shows the results. Their algorithm has very low error rate for all poses except on the extreme lighting condition. We can draw the following conclusions from their experiment results: 1) we can achieve pose/illumination invariant recognition by using small number of images with fixed pose and slightly different illumination, 2) the images of face under variable illumination can be well approximated by a low-dimensional subspace. 3.3 Linear Object Classes Method Consider the problem of recognizing a face under different pose or expression when only one picture is given. Human visual system is certainly able to perform this task. The possible reason is that we exploit the prior information about how face images transform. Thus, the idea here is to learn image transformation from examples and then apply it to the new face image in order to synthesize the virtual view that can be used in existing face recognition system. Poggio and Vetter [25] introduce the technique of generating artificial new images of an object. Their work is based on the idea of linear object classes. These are 3D objects whose 3D shape can be represented as linear combination of a small number of prototype objects. Thus, if the example set consists of frontal and rotated view images, we can synthesize images of rotated view from the given input image. 12 For human-made objects, which often consist of cuboids, cylinders, or other geometric primitives, the assumption of linear object classes seems natural. However, in the case of face, it is not clear how many examples are sufficient. They test their approach on a set of Figure 5: each test face is rotated by using 49 faces as examples (not shown) and the result are marked as output. Only for comparison the true rotated test face is shown on the lower left (this face was not used in computation) [25]. 13 50 faces, each given in two orientations (22.5˚ and 0˚). In their experiment, one face is chosen as test face, and the other 49 faces are used as examples. In Figure 5, each test face is shown on the upper left and the synthesized image is shown on lower right. The true rotated test face is shown on the lower left. In the upper right, they also show the synthesis of the test face through 49 examples in test orientation. This reconstruction of the test face should be understood as the projection of the test face into the subspace spanned the other 49 examples. The results are not perfect, but considering the small size of example set, the reconstruction is pretty good. In general, the similarity of the reconstruction to the input test face allows us to speculate that an example set of hundreds faces may be sufficient to construct a huge variety of different faces. We can conclude that the linear object class approach maybe a satisfactory approximation, even for complex objects as faces. Therefore, given only a single face image, we are able to generate additional synthetic face images under different view angle. For face recognition task, these synthetic images could be used to handle the pose variation. And this approach does not need any depth information, so the difficult steps of generating 3D models can be avoided. 3.4 View-Based Eigenspace The eigenface technique of Turk and Pentland [28] was generalized to view-based eigenspace technique for handling pose variation [27]. These extension accounts for variation in pose and lead to a more robust recognition system. 14 They formulate the problem of face recognition under different pose as follows: given N individuals under M different poses, one can build a “view-based” set of M separate eigenspaces. Each eigenspace captures the variation of N individuals in a common pose. In view-based approach, the first step is to determine the pose of input face image by selecting the eigenspace which best describes it. This could be accomplished by calculating the Euclidian distance between input image and the projection of input image in each eigenspace. The eigenspace yielding the smallest distance is the one with most similar pose to input image. Once the proper eigenspace is determined, the input image is coded using the eigenfaces of that space and then recognized. They have evaluated the view-based approach with 189 images, 21 people with 9 poses. The 9 poses of each person were evenly spaced from -90° to 90° along the horizontal plane. Two different test mythologies were used to judge the recognition performance. In the first series of experiments, the interpolation performance was tested by training on the subset of available view {±90°, ±45°, 0°} and testing on the intermediate views {±68°, ±23°}. The average recognition rate was 90% for view-based method. A second series of experiments test the extrapolation performance by training on a range of available view {e.g., -90° to +45°} and testing on views outside the training range {e.g., +68°, +90°}. For testing poses separated by ±23° from the training range, the average recognition rate was 83% for view-based method. 3.5 Curvature-Based Face Recognition In [7], they use curvature of surface to perform face recognition. This is a great idea since the value of curvature at a point on the surface is invariant under the variation of 15 viewpoint and illumination. In this approach, a rotation laser scanner produces data of high enough resolution such that accurate curvature calculation can be made. Face segmentation can be made based on the sign of Gaussian curvature; this allows two surface types: convex/concave and saddle regions. Their surface feature extraction contains not just curvature sign but also principal curvature, principal direction, umbilic points and extremes in both principal curvatures. The maximum and minimum curvature at a point defines the principal curvatures. The directions associated with principal curvatures are the principal directions. The principal curvatures and the principal directions are given by the eigenvalues and eigenvectors of shape matrix. The product of two principal curvatures is Gaussian curvature. And mean curvature is defined by the mean value of two principal curvatures. In practice, because these curvature calculations contain second order partial derivatives, they are extremely sensitive to noise. A smoothing filter is required before calculating curvature. Here, the dilemma is how to choose an appropriate smoothing level. If the smoothing level is too low, twice derivative will amplify noise such that curvature measurement is useless. On the other hand, over smoothing will modify the surface features we are trying to measure. In their implementation, they precompute the curvature values using several different levels of smoothing. They use the curvature maps from low smoothing level to establish the location of features. Then, use the prior knowledge of face structure to select the curvature values from the precomputed set. This is done manually, I think. An example of principal curvature maps is given in figure 6. Segmentation is somewhat straightforward. By using the sign of Gaussian curvature, K, and mean curvature, H, face surface can be divided into four different kinds of regions: 16 K+, H+ is convex, K+, H- is concave, K-, H+ is saddle with k max k min and K-, H- is saddle with k max k min . The boundary of these regions is called parabolic curve where Gaussian curvature is zero. Figure 7 shows an example. The author also talks about the calculation of surface descriptors. She tries to find as much information as possible from range data such that this information is as unique as the individual. Figure 6: Principal curvature (a) and principal direction (c) of maximum curvature, principal curvature (b) and principal direction (d) of minimum curvature. 17 Figure 7: Segmentation of three faces by the sign of Gaussian and mean curvature: concave (black), convex (white), saddle with positive mean curvature (light gray) and saddle with negative mean curvature (dark gray). With such a rich set of information available, there are many ways to construct a comparison strategy. The author uses feature extraction and template matching to perform face recognition. In the experiment, test set consists of 8 faces with 3 views each. For each face there are two versions without expression and one with expression. The experiment results show that 97% of the comparisons are correct. In my opinion, the advantages of curvature-based technique are: 1) it solves the problem of pose and illumination variation ate the same time. 2) There is a great deal of information in curvature map which we haven’t taken advantage of. It is possible to find an efficient way to deal with it. However, there are some inherent problems in this approach: 1) Laser range finder system is much more expensive compared with camera. And this technique cannot be applied to the existing image database. This makes people don’t want to choose it if they have another choice. 2) Even though the rage finder is not an issue any more, the computation cost is too high and the curvature calculation is very sensitive to noise. If we use principal component analysis to deal with range data, the error rate probably will be similar while the computation complexity is much lower. 3) We can construct 3D face surface from 2D image instead of expensive range finder. There are a lot of algorithms available to do this. But you will not be able to calculate curvature from reconstructed 3D 18 face surface. As mentioned earlier, curvature calculation involves second derivative of surface. Only the high-resolution data such as laser range finder makes the accurate curvature calculation possible. 3.6 3D model-Based Face Recognition To reduce the cost of system, Beumier and Acheroy [8] choose the 3D acquisition system consisting of standard CCD camera and structured light. It is based on the projection of a known light pattern. The pattern deformation contains thee depth information of the object. 3D surface reconstruction is done by stripe detection and labeling. Form each point of a stripe and its label, triangulation allows for X, Y, Z estimation. This process is very fast while offering sufficient resolution for recognition purpose. There are 120 persons in their experiment. Each on is taken three shots, corresponding to central, limited left/right rotation and up/down rotation. Automatic database uses the automatic program to get 3D information of each individual. In manual database, the 3D extraction process was performed by clicking initial points in the deformed pattern. With the 3D reconstruction, they are looking for characteristics to reduce the 3D data to a set of features that could be easily and quickly compared. But they found nose seems to be the only robust feature with limited effort. So, they gave up feature extraction and considered global matching of face surface. 15 profiles are extracted by the intersection of face surface and parallel plane spaced with 1 cm. A distance measurement called profile distance is defined to quantify the difference between 3D surfaces. This approach is slow: about 1 second to compare two face surfaces. In order to speed up this algorithm, they tried to use only the central profile and 19 Figure 8: a) Projection of known light pattern b) Analyze the pattern deformation. Figure 9: Reconstructed face surface two lateral profiles in comparison. ROC curves are shown in figure 10 to illustrate the influence of comparison strategy. In central/lateral profile comparison, error rate is sacrificed (from 3.5% to 6.2%) to earn the speed of surface comparison. In the left of figure 10, the manual refinement gives us better recognition performance. This tells us that there is room to improve automatic 3D acquisition system. Advantage: 1) additional cost is only the projector and pattern slide. 2) Switching the slide on and off allows acquiring both 2D image and 3D information. The fusion of 2D and 3D information can increase the recognition performance. 3) The projector 20 illumination reduces the influence of ambient light. 4) 3D reconstruction and profile comparison can avoid pose variation. Problems: 1) automatic 3D reconstruction is not good enough. An obvious improvement can be done by manual refinement. 2) Profile matching is very expensive computational task. In face authentication, this is not an issue. But in face recognition with big database, the speed would be terribly slow. Figure 10. ROC curves of 3D face surface recognition, with 15 profiles (left) and with central/lateral profiles (right). 3.7 Elastic Bunch Graph Matching In [26], they use Gabor wavelet transform to extract face features so that the recognition performance can be invariant to the variation in poses. Here, I want to talk about some terminologies they use first and discuss how they build the face recognition system. For each feature point on the face, it is transformed with a family of Gabor wavelets. The set of Gabor wavelets consists of 5 different spatial frequencies and 8 orientations. Therefore, one feature point has 40 corresponding Gabor wavelet coefficients. A jet is 21 defined as the set J i of Gabor wavelet coefficients for one feature points. It can be written as J i ai exp( ii ) . A labeled Graph G represents a face consists of N nodes connected by E edges. The nodes are located at feature points called fiducial points. For example, the pupils, the corners of mouth, the tip of nose are all fiducial points. The nodes are labeled with jets J n . Graphs for different head pose differ in geometry and local features. To be able to compare graphs of different poses, the manually defines pointers to associate corresponding nodes in different graphs. In order to extract graphs automatically for new face, they need a general representation for face. This representation should cover a wide range of possible variations in appearance of face. This representative set has stack-like structure, called face bunch graph (FBG) (see figure 11). Figure 11 : The Face Bunch Graph serves as a general representation of face. Each stack of discs represents a jet. One can choose the best match jet from a bunch of jets attached to a single node 22 A set of jets referring to on fiducial point is called a bunch. An eye bunch, for instance, may includes jets from closed, open, female and male eyes etc. to cover possible variation. The Face Bunch Graph is given the same structure as the individual graph. In searching for fiducial points in new image of face, the procedure described below selects the best fitting jet from the bunch dedicated to each fiducial point. The first set of graphs is generated manually. Initially, when the FBG contains only few faces, it is necessary to check the matching result. Once the FBG is rich enough (approximately 70 graphs), the matching results are pretty good. Matching a FBG on a new image is done by maximizing the graph similarity between image graph and the FBG of the same pose. For an image graph G with nodes n = 1,…,N and edges e = 1,…,E and FBG B with model graph m = 1,…,M the similarity is defined as 1 S (G, B) N max (S ( J n m I n ,J Bm n ( xeI xeB ) 2 )) E e ( xeB ) 2 since the FBG provides several jets for each fiducial point, the best one is selected and used for comparison. The best fitting jets serve as local experts for the new image. They use the FERET database to test their system. First, the size and location of face is determined and face image is normalized in size. In this step several FBGs of different size are required; the best fitting one is used for size estimation. In FERET database, each image has a label indicating the pose, there is no need to estimate pose. But pose could be estimated automatically in similar way as size. After extracting model graphs from the gallery images, recognition is possible by comparing an image graph to all model graphs and selecting the one with highest similarity value. A comparison against a gallery of 250 individuals takes less than one 23 second. The poses used here are: neutral frontal view (fa), frontal view with different expression (fb), half-profile right (hr) or left (hl), and profile right (pr) and left (pl). Recognition results are shown in Table 6. The recognition rate is high for frontal against frontal images (first row). This is due to the fact that two frontal views show only little variation. The recognition rate is till high for right profile against left profile (third row). When comparing left and right halfprofile, the recognition rate drops dramatically (second row). The possible reason is the variation in rotation angle – visual inspection shows that rotation angle may differ by up to 30°. By comparing frontal views or profile against half profile, a further reduction in recognition rate is observed. From the experiment results, it is obvious that Gabor wavelet coefficients are not invariant under rotation. Before performing recognition, you still need to estimate pose and find corresponding FBG. Table 6: Recognition results for cross-run between different galleries. The combination in the four bottom rows are due to the fact that not all poses were available for all person. The table also shows how often the correct face is identified as rank one and how often it is among the top ten. 24 3.8 Face recognition from 2D and 3D images Very few research groups focus on face recognition from both 2D and 3D face images. Almost all existing systems rely on a single type of face information: 2D images or 3D range data. It is obvious that 3D range data can compensate for the lack of depth information in 2D image. 3D shape is also invariant under the variation of illumination. Therefore, integrating 2D and 3D information will be a possible way to improve the recognition performance. In [29], corresponding 3D range data and 2D image are obtained at the same time and the perform face recognition. Their approach consists of following steps: Find feature points: In order to extract both 2D and 3D features, two type of feature points are defined. One type is in 3D and is described by Point signature. The other type is in 2D and is represented as Gabor wavelet response. Four 3D feature points and 10 2D feature points are selected as shown in figure 12. Figure 12: 2D feature points are shown by “.” And 3D feature points are shown by “x”. Gabor wavelet function has tunable orientation and frequency. Therefore, it can be configured to extract a specific band of frequency components from an image. This makes it robust against translation, distortion and scaling. Here, they use a widely used version of Gabor wavelets [26]. For each feature point, there a set of Gabor wavelets 25 consisting of 5 different frequencies and 8 orientations. Localizing a 2D feature point is accomplished with the following steps. 1. For each pixel k in the search range, calculate the corresponding Jet Jk. 2. Compute the similarity between Jk and all the jets in corresponding bunch. 3. In the search area, the pixel with highest similarity value is chosen as the location of feature point. In this paper, point signature is chosen to describe feature points in 3D. Its main idea is summarized in here. For a given point, we place a sphere with center at this point. The intersection between sphere and face surface is a 3D curve C. a plane P is defined as the tangential plane for the given point. The projection of C to P form a new curve called C’. The distance between C and C’ is a function of . This function is called point signature. Point signature is invariant under rotation and translation. The definition of point signatures is also illustrated figure 13. Figure 13: the definition of point signature. 26 Based on the principal component analysis, four eigenspace are constructed for four 3D feature points. So, there are eigen point signatures for each 3D feature points. Localizing a 3D feature point has similar steps as localizing 2D feature points: 1. For each point the search range, calculate the corresponding point signature. 2. The point signature can be approximated by the linear combination of eigen point signatures. 3. Compute the error between real point signature and approximated point signature. 4. In the search area, the point with smallest error chosen as the location of feature point. Face recognition here, each face is expressed as in terms of shape vector and texture vector. Shape vector X s 3694 consists of point signatures of four feature points and their eight neighboring points. Each point signature is sampled with equal interval 10°. The texture vector X t 4010 consists of Gabor wavelet coefficients of 10 2D feature points. Different weight is assigned for different feature point based on their detection accuracy. With the shape vector and texture vector of each training face, texture subspace t and shape subspace s can be constructed. For a new face image, their shape vector and texture vector will be projected to shape subspace and texture subspace respectively. The projection coefficient vector t and s are then obtained. The feature vector st for this face is calculated from t and s as the following equation: T Ts Tt . st , t s 27 With the feature vector, the remaining task is simply choosing a classifier with lower error rate. In this paper, two classifiers are used and compared: one is based on maximal similarity value, and the other one is based on support vector machine. Experiment Their experiment was carried out using face images of 50 individuals. Each person provides six facial images with view point and expression variation. Some of these images are chosen as the training images and the rest are taken as test image. So, the training set and testing set are disjoint. For a new test image, after localizing the feature points, a 36×9×4-dimension shape vector and a 40×10-dimension texture vector are calculated. These two vectors are projected into corresponding subspace. The projection coefficients are augmented to form an integrated feature vector. In order to evaluate recognition performance, two test cases are performed with different features, different number of principal components and different classifiers. Case 1: Comparison of recognition performance with different features, point signature (PS), Gabor coefficients (GC) and PS+GC. Figure 14 (a)shows the recognition rate versus subspace dimensions with different chosen features. The results confirm their assumption that combination of 2D and 3D information can improve recognition performance. Case 2: compare the recognition performance of different classifiers, similarity function and Support Vector Machine. Figure 14 (b), (c) and (d) show the recognition rate versus subspace dimension with different classifiers. Result in (b) is obtained using point signature as feature, (c) is obtained using Gabor coefficients as feature and (c) is obtained 28 using PS+GC as feature. With SVM as the classifier, higher recognition rate is obtained in all three cases. Figure 14: recognition rate vs. subspace dimensions: (a) different chosen feature with the same classifier; (b) chosen feature: PS; (c) chosen feature: GC; (d) chosen feature: PS+GC. 3.9 Pose estimation from single image Generally, a face recognition problem can be divided into two major parts: normalization and recognition. In normalization, we need to estimate the size, illumination, expression, and pose of face from the given image and then transform input image into normalized format which can be recognized by the recognition algorithm. Therefore, how to estimate pose accurately and efficiently is an important problem in face recognition. Solving this problem is a key step in building a robust face recognition system. Ji [10] propose a new approach for estimating 3D pose of face from single image. He assumes that the shape of face can be approximated by an ellipse. The pose of face can be expressed in terms of yaw angle, pitch angle and roll angle of ellipse (see figure 15, 16). His system consists of three major parts: pupil detection, face detection and pose estimation. 29 Figure 15: face shape can be approximated by an ellipse Figure 16: pose of face can be expressed in terms of yaw, pitch and roll angle. In the following sections, I will discuss how his system works. Pupil detection in order to find pupil easily, an IR light source is used in his system. Under the IR illumination, pupil will be much brighter than the other parts of face. Therefore, pupil detection can be bone easily by setting a threshold. This is a robust operation and can be done in real time. Face detection Face detection is somewhat more difficult because face may have significantly different shape under different pose. In his system, face detection consists of following steps: 1) find the approximate location of face based on the location 30 of pupil. 2) perform the constrained sum of gradients maximization procedure to search for the exact face location. In the second step, he applies a technique proposed by Birchfield [30]. Birchfiled’s technique is aimed at detecting head instead of face. To detect face, he puts constraints on the gradient maximization procedure. The constraints he exploits are size, location and symmetry. Specifically, he draws a line between the detected pupils as shown in figure 17. The distance of pupils and their location are used to constraint the size and location of ellipse. The ellipse should minimally and symmetrically include the two detected pupils. Results of face detection are shown in figure 18. Figure 17: face ellipse parameterization. Figure 18: results of face detection at different pose. 31 Pose estimation with the pupil location, roll angle can be easily calculated from following equation: arctan p1 y p 2 y p1x p 2 x where p1 and p 2 are the location of pupils. The basic idea about how to estimate the yaw and pitch angle is simply figuring out the correspondence between 2D ellipse and 3D ellipse. The details will not be discussed in this report since the derivation takes too much space. In order to evaluate his pose estimation algorithm, artificial ellipses are utilized to verify the accuracy of estimation results. Experiment results are shown in figure 19. Figure 19: pose estimation for the synthetic ellipses. Actual ellipse images with different pitch and yaws are used to further investigate the performance of his algorithm. The computed rotation angle is shown below each image in figure 20. The estimation results are consistent with the perceived orientations. 32 Figure 20: the estimated orientation of actual ellipses. The number below each image is the estimated pitch or yaw angle. Figure 21: pose estimation of face image sequence. The ultimate test is to study its performance with human faces. For this purpose, image sequences of face with different pitches and yaws are captured. Ellipse detection is performed on each image. The detected ellipse is used to estimate the pose. Experiment results are shown in figure 21. By using his algorithm, we can find the location of face, track the location of face and estimate the pose. In other words, his algorithm almost provides everything we need in building a face recognition system. The better part is we can recognize face not only from image but also from video. 33 , we can simply apply the hybrid mode face recognition where multiple images are available for each person. The closest pose can be found in database by using the estimated pose. Then, use eigenface approach to perform face recognition. As mentioned before, there are two major drawbacks: 1) it needs many possible images to cover possible views, and 2) it cannot handle illumination variation. But this type of face recognition techniques is probably the most practical method up to now. 4 Discussion and Conclusion In this paper, a variety of face recognition technologies are studied. Basically, they can be divided into two major groups. The first one: Since eigenface is an efficient method to perform face recognition under fixed illumination and pose, most of recent works focus on how to compensate the variation in pose and illumination. For example, SFS, illumination cone and linear object method are in this direction. The second one: Instead of using eigenface, some other methods try to avoid illumination and pose problem at the beginning of their algorithm. For example, curvature-based method and face bunch graph matching are belonged to this category. I give below a concise summary followed by conclusions in the same order as methods appear in this report. SFS has great potential to be applied in real world application since it requires only one input image and one registered image per person. But, SFS problem is still hard to solve so far. The current method has high computational complexity 34 and the re-rendering image has poor quality. So, there are many issues to be solved before it can be applied to commercial or law enforcement application. Illumination cone method shows the improvement in recognition performance under the variation of pose and illumination. One problem is that it requires many images per person to construct the subspace. For commercial applications, the use of many images per person is not feasible due to the cost consideration. The other issue is the computational cost when the database contains a large number of images. If it took too much time to recognize face when database is large, it is discouraged form further development. Linear object method is inspired by the way human look at the world. It provides a new angle to think about face recognition problem. In [25], they successfully transform the image from half profile to frontal view given the 49 examples. So, one possible future work in this direction is to cover more variation in pose or illumination by expanding their method. View-base eigenspace method constructs subspace for each pose. Their experiments show that performance degradation occurs if the input image has different pose from database. In other words, you need to construct a lot of subspaces in order to achieve high recognition rate under any pose. So, this method is an impractical solution for designing a pose-invariant face recognition system. But, for recognizing face from mug shots in which only three poses exist, it is probably most feasible solution so far. Curvature-based face recognition doesn’t have illumination variation since it recognizes face from range data. So, the cost for implementing this system is 35 higher than other face recognition system which uses camera to acquire data. The other issue mentioned by the author is that curvature calculation is sensitive to noise. We can say that this method has high computation complexity and high cost. Even though it seems a good idea for handling pose and illumination problem, not too many people try this method in the past few years. 3D model-based method performs face recognition from 3D face surface without using curvature calculation. The face surface is constructed by projecting pattern to face. The additional cost for this system is projector and pattern slide. Their system provides an example of low cost face recognition system using 3D information. One issue with their system is that their surface reconstruction algorithm needs manual adjustment to achieve better recognition rate. The fully automatic reconstruction algorithm is the possible future work in this direction. Face bunch graph uses the Gabor wavelet transformation to extract face feature and then collect the variations of feature points to form face bunch graph. The primary advantage of this method is that the feature extracted by Gabor wavelet is invariant in certain degree. So, this is an efficient way to design an robust face recognition system. Most of existing face recognition algorithms are based on either 2D images or 3D range data. Combination of 2D and 3D information can further improve the recognition performance. Here, the authors use point signature to extract 3D features and Gabor wavelet transformation to extract 2D features. The feature vector is defined as weighted combination of point signature and Gabor 36 coefficients. With this feature vector, you can perform recognition using any classifier as long as it provides higher recognition rate. Recognizing face from video is probably the most difficult problem. One should be able to tract the location of face, estimate the pose of face and then recognize face. In [10], their method provides the infrastructure for recognizing face from video. Face tracking and pose estimation can be done by using their method. We can expand it to a face recognition system by simply adding the view based eigenspace algorithm. 5 Reference [1] R. Chellappa, C.L. Wilson, and Sirohey, “Human and Machine Recognition of Faces, A survey,” Proc. of the IEEE, Vol. 83, pp. 705-740, 1995. [3] 3D model enhanced face recognition Wen Yi Zhao; Chellappa, R.; Image Processing, 2000. Proceedings. 2000 International Conference on , Volume: 3 , 2000 Page(s): 50 -53 vol.3 [4] SFS based view synthesis for robust face recognition Wen Yi Zhao; Chellappa, R.; Automatic Face and Gesture Recognition, 2000. Proceedings. Fourth IEEE International Conference on , 2000 Page(s): 285 –292 [5] The FERET evaluation methodology for face-recognition algorithms Phillips, P.J.; Hyeonjoon Moon; Rizvi, S.A.; Rauss, P.J.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 22 Issue: 10 , Oct 2000 Page(s): 1090 -1104 [6] Human and machine recognition of faces: a survey Chellappa, R.; Wilson, C.L.; Sirohey, S.; Proceedings of the IEEE , Volume: 83 Issue: 5 , May 1995 37 Page(s): 705 -741 [7] G. Gordon, ``Face Recognition from Depth Maps and Surface Curvature", in Proc. of SPIE, Geometric Methods in Computer Vision, San Diego, July 1991. Vol. 1570. [8] C. Beumier, M. Acheroy, Automatic 3D Face Authentication In Image and Vision Computing, Vol. 18, No. 4, pp 315-321, Feb 1999. [9] ] Ali Md. Haider, and Toyohisa Kaneko, “Automated 3D-2D Projective Registration of Human Facial Images Using Edge Features”, Accepted for publication in the International Journal of Pattern Recognition and Artificial Intelligence (IJPRAI). [10] Qiang Ji, 3D Face pose estimation and tracking from a monocular camera ", Image and Vision Computing, Volume 20, issue 7, pages 499-511, 2002. download the paper [11] S. Sakamoto, R. Ishiyama, and J. Tajima, “3D model-based face recognition system with robustness against illumination changes,” NEC Research and Development, vol.43, pp. 15-19, 2002. [12] "What is the Set of Images of an Object under all Possible Illumination Conditions?" P.N. Belhumeur and D. Kriegman, International Journal of Computer Vision, 28(3), 245260 (1998). [13] W. Zhao, “Robust Image Based 3D Face Recognition,” PhD Thesis, University of Maryland, 1999. [14] From few to many: illumination cone models for face recognition under variable lighting and pose Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 23 Issue: 6 , Jun 2001 Page(s): 643 -660 [15] The bas-relief ambiguity Belhumeur, P.N.; Kriegman, D.J.; Yuille, A.L.; Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on , 17-19 Jun 1997 Page(s): 1060 -1066 [18] Eigenfaces vs. Fisherfaces: recognition using class specific linear projection Belhumeur, P.N.; Hespanha, J.P.; Kriegman, D.J.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 , Jul 1997 Page(s): 711 -720 38 [19] Face recognition: the problem of compensating for changes in illumination direction Adini, Y.; Moses, Y.; Ullman, S.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 , Jul 1997 Page(s): 721 -732 [20] Comparing images under variable illumination Jacobs, D.W.; Belhumeur, P.N.; Basri, R.; Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on , 23-25 Jun 1998 Page(s): 610 -617 [21] What is the set of images of an object under all possible lighting conditions? Belhumeur, P.N.; Kriegman, D.J.; Computer Vision and Pattern Recognition, 1996. Proceedings CVPR '96, 1996 IEEE Computer Society Conference on , 18-20 Jun 1996 Page(s): 270 -277 [22] A low-dimensional representation of human faces for arbitrary lighting conditions Hallinan, P.W.; Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on , 21-23 Jun 1994 Page(s): 995 -999 [23] Statistical Approach to Shape from Shading: Reconstruction of 3D Face Surfaces from Single 2D Images (1997) Joseph J. Atick, Paul A. Griffin, A. Norman Redlich Neural Computation, Vol. 8, pp. 1321-1340, 1996 [24] Illumination-based image synthesis: creating novel images of human faces under differing pose and lighting Georghiades, A.S.; Belhumeur, P.N.; Kriegman, D.J.; Multi-View Modeling and Analysis of Visual Scenes, 1999. (MVIEW '99) Proceedings. IEEE Workshop on , 1999 Page(s): 47 -54 [25] Linear object classes and image synthesis from a single example image Vetter, T.; Poggio, T.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 , 39 Jul 1997 Page(s): 733 -742 [26] Face recognition by elastic bunch graph matching Wiskott, L.; Fellous, J.-M.; Kuiger, N.; von der Malsburg, C.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 19 Issue: 7 , Jul 1997 Page(s): 775 -779 [27] View-based and modular eigenspaces for face recognition Pentland, A.; Moghaddam, B.; Starner, T.; Computer Vision and Pattern Recognition, 1994. Proceedings CVPR '94., 1994 IEEE Computer Society Conference on , 21-23 Jun 1994 Page(s): 84 -91 [28] M. Turk and A. Pentland, "Eigenfaces for recognition," Journal of Cognitive Neuroscience, Vol. 3, No. 1, pp. 71-86, Winter 1991 [29] Integrated 2D and 3D images for face recognition Yingjie Wang; Chin-Seng Chua; Yeong-Khing Ho; Ying Ren; Image Analysis and Processing, 2001. Proceedings. 11th International Conference on , 26-28 Sep 2001 Page(s): 48 -53 [30] Elliptical head tracking using intensity gradients and color histograms Birchfield, S.; Computer Vision and Pattern Recognition, 1998. Proceedings. 1998 IEEE Computer Society Conference on , 23-25 Jun 1998 Page(s): 232 -237 [38] A feature based approach to face recognition Manjunath, B.S.; Chellappa, R.; von der Malsburg, C.; Computer Vision and Pattern Recognition, 1992. Proceedings CVPR '92., 1992 IEEE Computer Society Conference on , 15-18 Jun 1992 Page(s): 373 -378 [39] Geometric and illumination invariants for object recognition Alferez, R.; Yuan-Fang Wang; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 21 Issue: 6 , 40 Jun 1999 Page(s): 505 –536 [40] Automatic creation of 3D facial models Akimoto, T.; Suenaga, Y.; Wallace, R.S.; Computer Graphics and Applications, IEEE , Volume: 13 Issue: 5 , Sep 1993 Page(s): 16 -22 [41] Learning bilinear models for two-factor problems in vision Freeman, W.T.; Tenenbaum, J.B.; Computer Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE Computer Society Conference on , 17-19 Jun 1997 Page(s): 554 -560 [42] Surface reflection: physical and geometrical perspectives Nayar, S.K.; Ikeuchi, K.; Kanade, T.; Pattern Analysis and Machine Intelligence, IEEE Transactions on , Volume: 13 Issue: 7 , Jul 1991 Page(s): 611 -634 [43] Shape from shading Berthold K. P. Hornand and Michael J. Brooks, Cambridge, Mass. : MIT Press, 1989. 41