Local Manifold Matching for Face Recognition Wei Liu1 , Wei Fan1 , Yunhong Wang1, 2 , and Tieniu Tan1 NLPR, Institute of Automation, Chinese Academy of Sciences Beijing, 100080, P.R. China 2 School of Computer Science and Engineering, Beihang University Beijing, 100083, P.R. China Email: {wliu, wfan, wangyh, tnt}@nlpr.ia.ac.cn 1 Abstract— In this paper, we propose a novel classification method, called local manifold matching (LMM), for face recognition. LMM has great representational capacity of available prototypes and is based on the local linearity assumption that each data point and its k nearest neighbors from the same class lie on a linear manifold locally embedded in the image space. We present a supervised local manifold learning algorithm for learning all locally linear manifold structures. Then we propose the nearest manifold criterion for the classification in which the query feature point is assigned to the most matching face manifold. Experimental results show that kernel PCA incorporated with the LMM classifier achieves the best face recognition performance. A. Locally Linear Assumption The key issue of nearest feature classifiers (NFL) is that how and where to generate virtual prototype feature points. Since NFL uses a linear model to generate an infinite number of virtual prototypes, We think the virtual prototypes should be created in the patch which is linear or close to linear. I. I NTRODUCTION In general, appropriate facial representation and effective classification rules are two central issues in most face recognition systems. In this paper, we will mainly explore various classification rules to design a robust classifier. Up to now, a lot of pattern classification methods have been presented. One of the most popular classifiers among them is the nearest neighbor (NN) classifier [2]. Although NN is a very simple and convenient method, the representational capacity of face database is limited by the available prototypes in each class, which restrict the performance of NN. To extend the capacity of covering more variations for a face class, Li et al. presented the nearest feature line (NFL) classifier in literature [4]. The method creates virtual prototype feature points which complement the limited prototypes, thus improves the performance of the NN method by expanding the representational capacity of available prototypes. In this paper, we incorporate the advantage of virtual samples with manifold learning techniques. Firstly, we present a supervised local manifold learning algorithm to learn all local manifolds which are actually virtual ones. Subsequently, we propose a local manifold matching (LMM) classifier which is numerically stable and achieves best performance in balance of both recognition rate and computational cost. II. O UR M ETHOD The locally linear manifold is intended to create high dimensional virtual prototype feature points which will benefit classification in lack of training samples. Therefore, it has the great representational capacity of available prototypes and covers most sufficient facial variations. 0-7803-9134-9/05/$20.00 ©2005 IEEE (a) (b) Fig. 1. Local manifold examples. (a) x and its 5 nearest neighbors form a local manifold; (b) 7 local manifolds make up a global manifold, LM 2,3 overlap with LM 6,7. Motivated by LLE [5], we assume that each feature point xci (i = 1, · · · , Nc ; c = 1, · · · , C) and its k (1 ≤ k ≤ K = minc {Nc }−1) nearest neighboring points of the same class lie on a linear Euclidean space, called locally linear manifold or local manifold. The virtual samples will be created in the local manifold. Accurately speaking, local manifolds are virtual manifolds, which will facilitate effective manifold analysis on limited training samples and benefit classification. The local manifold usually distributes in a k-dimensional subspace, which will degenerate to the feature line when k equals 1. Examples for local manifold are shown in Fig. 1. B. Definition of Local Manifold and Local Manifold Distance We construct the local manifold as blow: select a neighborhood for each prototype, then learn a Euclidean subspace spanned with the neighborhood, virtual samples are generated based on the very subspace. Thus, in the sequel, each prototype and those artificial samples form the local manifold. Provided the training set {xi | xi ∈ <d , 1 ≤ i ≤ N }, for any prototype feature point xi , denote its k − N N as xN (i,j) (1 ≤ j ≤ k), where N (i, j) indicates the index of the jth nearest neighbor of xi . Let us define the local manifold on which xi and its neighbors reside as below: M(xi , k) = M{xi , xN (i,1) , xN (i,2) , ..., xN (i,k) } (1) From the viewpoint of set L theory, J we can unite or split local manifolds by the operator and , for example M M(A) M(B)=M(A ∪ B) K M(A) M(B)=M(A − B) (2) where A and B are two neighboring sets such as {xi , xN (i,1) , · · · , xN (i,k) }, M(A) and M(B) represent two local manifolds constructed with the two sets. Because the local manifold is supposed to be linear, we will apply classical linear techniques such as Principal Component Analysis (PCA) [3] to achieve linear embedded structures. For simplicity we denote each local manifold M(xi , k) as Mi , and then present a new definition of the covariance matrix of samples on Mi CMi = k X (xN (i,j) − xi )(xN (i,j) − xi )T (3) j=1 we apply PCA or SVD on CMi to attain the principal subspace UMi , which contains most of the information on the neighborhood which xi resides on. It is noticeable that the parameter k is not only the number of neighbors which construct the local manifold Mi coupled with xi , but also implies the upper limit of the intrinsic dimension for the local manifold Mi . For a new point x, to measure the matching extent with the local manifold Mi , i.e. the possibility under which x does lie in Mi , an available similarity measure is the Euclidean distance between x and its projection onto the target local manifold Mi . The projection point is denoted as pi , which is just the most matching virtual sample of the new point x in the local manifold Mi . Furthermore, we can detail the similarity measure by applying projective geometry. Projecting the difference vector x − xi into Mi (correlates the principal subspace UMi ), we get the difference vector T (x − xi ). Due to Pythagorean pi − xi and its coordinates UM i T theorem (which imply kpi −xi k = kUM (x−xi )k), we derive i the distance from x to the local manifold Mi p d(x, Mi ) = kx − pi k = kx − xi k2 − kpi − xi k2 q T (x − x )k2 (4) kx − xi k2 − kUM = i i which is called as local manifold distance or LM distance. From Eq.(4), we can expediently calculate the LM distance once provided xi and UMi . Therefore, we may describe a local manifold as a binary element to cater exclusively for classification: Mi = M(xi , k) '< xi , UMi > (5) where UMi is also called as regularization matrix, which rectifies the original Euclidean distance between the query and a single prototype to the manifold distance between the query and the local manifold that contains the prototype. As a binary element, we only utilize two matrices to find the most matching manifold in classification. What’s more, we are able to judge which class the manifold belongs to through the class label of xi . C. Supervised Local Manifold Learning (SLML) There is a key parameter in our model: the number of nearest neighbors (k), which should be set to proper number such that the locally linearity assumption holds. For each local manifold Mi , we can define a decomposition error to measure the extent of linearity of Mi ε(Mi ) = d(xN (i,k) , M{xi , xN (i,1) , ..., xN (i,k−1) })2 K (6) = d(xN (i,k) , Mi M{xN (i,k) })2 where we firstly decompose Mi and consider the degraded local manifold which xi and its k − 1 NNs reside on, then we use the squared LM distance, between the k nearest neighbor of xi and the degraded local manifold, as decomposition errors for local manifolds. Motivated by the important definition of the reconstruction error in the well known LLE’s [5] framework, we can show that the decomposition error ε(Mi ) is just the reconstruction error using k points xi , xN (i,1) , ..., xN (i,k−1) to linearly estimate the point xN (i,k) . So we can understand decomposition error as the manifold reconstruction error using k-1 prototypes to represent all k prototypes in each local manifold. The smaller the decomposition error ε(Mi ) is, the better the locally linearity of Mi holds. If xi preserves fixed, Mi will expand or shrink going with changes of k, thereby the decomposition error is a function of k. Integrate linearity inosculation situations of all local manifolds, we define the sum of decomposition errors, which is a function of k, is derived εk = N X ε(Mi ) (7) i=1 where the sum of errors εk reflects the global linearity inosculation since the locally linearity of single local manifold holds good must not ensure other local manifolds perform good in linearity inosculation. Let k increases from 0, we will find that the ideal k ∗ should be the value when the sum of errors starts to decrease and local manifolds attain stationary structures, thus we can identify the parameter k by D = {k| ∆εk k∗ = εk − εk−1 ≤ 0, 1 ≤ k ≤ K} = min{k} k∈D (8) where K is the maximum permissible number of nearest neighbors. If there are many prototypes for each subject in training, a smaller value can be set to K. We propose a supervised local manifold learning (SLML) algorithm by an iterative program, as shown in Tab. I. In the fist step, K-NNs for each prototype are identified in a supervised way. Secondly, the ideal parameter k ∗ is found through finite iteration based on the sum of decomposition errors. In each iteration, PCA is used to extract the principal subspace for each local manifold. Finally, all local manifolds Mi :< xi , UMi > with corresponding regularization matrices are learned. Notice that we define the local manifold for the special case k = 0 as follows (assume ε0 = 0) Mi = M(xi , 0) = M{xi } '< xi , 0 > d(x, Mi ) = kx − xi k (9) TABLE I In fact, we can bring the computational complexity of LMM down further. We have learned N local manifolds, of which a great quantity is likely to repeat especially when the optimal k ∗ approaches K. So, we must discard the redundant manifolds to save the computational cost. In SLML, an extra checking step against repetitions should be performed to save different manifolds. We denote these independent manifolds as Mit (1 ≤ t ≤ N̄ )(N̄ is the number of independent manifolds), rewrite (10) as below it∗ = arg min d(x, Mit ), L(x) = L(xit∗ ) 1≤t≤N̄ (11) III. E XPERIMENTS Our experiments are carried out on a mixed database of 125 persons and 985 images, which is a collection of three databases: 1) The ORL database. There are 40 persons and each person has 10 different images. 2) The YALE database. It contains 15 persons and 11 facial images for each person. 3) The FERET subset. Seventy persons are selected from the FERET database, and each person has six different images. All the images are resized to 92 × 112. There are facial expressions, illumination, and pose variations. In order to reduce the influence of some extreme illumination, histogram equalization is applied to the images as pre-processing. Fig. 2 shows some samples. S UPERVISED L OCAL M ANIFOLD L EARNING A LGORITHM Input: A training set with C classes: Z = {(xi , yi )| xi ∈ <d , yi = L(xi ) ∈ Y}. Label set Y = {1, 2, ..., C}. Initialize: The maximum permissible nearest neighbors K = min1≤c≤C {Nc − 1}, the sum of decomposition errors εj = 0(j = 0, · · · , K). For each prototype xi , find K-NNs belong the class with the label L(xi ) in Z, denote them as xN (i,j) (j = 1, · · · , K); assume the local manifold Mi = M{xi }, and set UMi = 0. Loop: Do for k = 1, 2, ..., K step 1. Calculate the current sum of decomposition errors. for i = 1, 2, ..., N εk ←− εk + d(xN (i,k) , Mi )2 end. step 2. Expand all local manifolds Mi and update principal subspaces UMi . for i = 1, 2, ..., N L M{xN (i,k) }, Mi ←− Mi learn a subspace UMi for the new manifold, update the binary element Mi :< xi , UMi > . end. step 3. If εk ≤ εk−1 , then abort loop. Output: The most suitable k ∗ , and the most suitable local manifolds Mi :< xi , UMi >. D. Local Manifold Matching (LMM) For classification, the optimal matching between the query and learned local manifolds are performed based on the powerful dissimilarity measure . The class label of the nearest manifold is just that of the query feature point, hence, we propose the nearest manifold criterion for classification, which is formulated as follows (L(x) represents which class the sample x belongs to) i∗ = arg min d(x, Mi ), L(x) = L(xi∗ ) 1≤i≤N (10) Fig. 2. Samples from the mixture database. A. Demonstration of LML To test our supervised local manifold learning algorithm (SLML), we track the sum of decomposition errors εk (×108 ) and error rate (%) with respect to k(<= 5), which is the number of nearest neighbors. For simplicity, we only use the ORL database. Fig. 3 shows the average sum of errors plus error rate as functions of the number of nearest neighbors (k). In each round, 6 images are randomly selected from the database for training and the remaining images of the same subject for testing. 20 tests are performed with different configuration of the training and test set, and the results are averaged. The standard eigenface method of Turk and Pentland [6] is first applied to the set of training images to reduce the dimension of facial image. In this experiment, we use 50 eigenfaces for each facial feature. Plotted in Fig. 3(a), the sum of decomposition error εk starts to decrease when k equals to 4, we think the local manifolds attain stationary structures at this time. Nevertheless, SLML will start overfitting at the same time. From Fig. 3(b) we can discover that the error rate increases after k = 4, which powerfully justifies our choice of optimal k ∗ (given by (8)). The LMM error rate (k ∗ = 4 in Fig. 3(b) with 50 eigenfaces is 3.0625% whereas the NN error rate (corresponding to LMM at k = 0 in Fig. 3(b)) is 4.69%. We list the error rates and recognition time in Tab. II, our method shows encouragingly comprehensive performance. Especially incorporated with KPCA, the proposed LMM method yields the lowest error rate (8.72%) for the mixture database with acceptable recognition time. Because LDA will break the manifold structure of face data, we do not combine the LMM classifier with Fisherface features. All experiments are implemented using the MATLAB V6.1 under Pentium IV personal computer with a clock speed of 2.4 GHZ. TABLE II C OMPARISON OF FIVE Sum of squared error 12 Method 10 PCA+NN PCA+NFL LDA+NN KPCA+NN PCA+LMM KPCA+LMM 8 6 4 Recognition Performance Error Rate (%) Run Time (ms) 18.09 8.649 16.14 65.53 13.23 9.340 15.06 8.981 10.89 18.57 8.72 20.48 2 0 0 1 2 3 4 3 4 k 5 (a) 5 4.5 Error rate (%) Dims 80 80 124 100 80 100 RECOGNITION METHODS 4 3.5 3 0 1 2 k 5 (b) Fig. 3. Demonstrate the SLML algorithm. (a) Sum of decomposition error εk vs. k on ORL; (b) Error rate vs. k on ORL. B. Performance of LMM To demonstrate the efficiency of local manifold matching (LMM), extensive experiments are done on the mixed database. All methods are compared on the same training sets and testing sets. The mixture database is divided into two nonoverlapping set for training and testing. The training data set consists of 500 images: 5 images, 6 images and 3 images per person are randomly selected from the ORL, the YALE database and the FERET subset respectively. The remaining 485 images are used for testing. Twenty runs are performed with different, random partitions between training and testing images, and the results are averaged. Eigenface method (PCA) [6], Fisherface method (LDA) [1] and KPCA [7] are applied to reduce the dimension of facial image and provide the features for recognition. Incorporated with PCA or KPCA, we use SLML on each subset (ORL, YALE, and FERET) of the mixed database and learn the optimal k ∗ , which equals to 4, 3, 2 respectively. IV. C ONCLUSION Following the work of NFL, we present the local manifold matching (LMM) method for face classification. Our method improves the performance of the NN or NFL method by expanding the representational capacity of available prototypes. In contrast to NFL, LMM creates much more virtual prototype feature points, of which a substantial part benefits classification confronted with limited prototypes. What’s more, strong correlations between our manifold matching method with the state-of-the-art manifold learning techniques such as Locally Linear Embedding (LLE) are discussed. Experiments are carried out on a mixed database and demonstrate the effectiveness of our method. ACKNOWLEDGMENT This work is sponsored by the Natural Sciences Foundation of China under grant No. 60121302 and 60335010. The authors thank the Olivetti Research Laboratory in Cambridge (UK) and the FERET program (USA) and the YALE University for their devotion of face databases. Great appreciations especially to Yilin Dong for her encourage. R EFERENCES [1] P.N. Belhumeur, J.P. Hespanha and D.J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,” IEEE Trans. on PAMI, vol. 19, no. 7, pp. 711-720, July 1997. [2] T.M. Cover and P.E. Hart, “Nearest Neighbor Pattern Classification,” IEEE Trans. on Information Theory, vol. 13, pp. 57-67, January 1967. [3] I.T. Jolliffe, Principal Component Analysis, Springer-Verlag, New York, 1986. [4] S.Z. Li and J-W. Lu, “Face Recognition Using the Nearest Feature Line Method,” IEEE Trans. on Neural Networks, vol. 10, no. 2, pp. 439-443, March 1999. [5] S.T. Roweis and L.K. Saul, “Nonlinear Dimensionality Reduction by Locally Linear Embedding,” Science, vol. 290, pp. 2323-2326, December 2000. [6] M.A. Turk and A.P. Pentland, “Face Recognition Using Eigenfaces,” in Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pp. 586-591, June 1991. [7] M.H. Yang, N. Ahuja and D. Kriegman, “Face Recognition Using Kernel Eigenfaces,” in Proc. of IEEE Int. Conf. on Image Processing, 2000.