Image and Vision Computing 21 (2003) 1037–1044 www.elsevier.com/locate/imavis Combined Fisherfaces framework Jian Yanga,b,*, Jing-yu Yanga, Alejandro F. Frangib b a Department of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, China Computer Vision Group, Aragon Institute of Engineering Research, Universidad de Zaragoza, E-50018 Zaragoza, Spain Received 19 October 2001; received in revised form 30 June 2003; accepted 3 July 2003 Abstract In this paper, a Complex LDA based combined Fisherfaces framework, coined Complex Fisherfaces, is developed for face feature extraction and recognition. In this framework, Principal Component Analysis (PCA) and Kernel PCA (KPCA) are first used for feature extraction. Then, the resulting PCA-based linear features and KPCA-based nonlinear features are integrated by complex vectors and, Complex LDA is further employed for feature fusion. The proposed method is tested on a subset of FERET database. The experimental results demonstrate that Complex Fisherfaces outperforms Fisherfaces and Kernel Fisherfaces. Also, the complex vector based parallel feature fusion strategy is demonstrated to be much more effective and robust than the super-vector based serial feature fusion strategy for face recognition. q 2003 Elsevier B.V. All rights reserved. Keywords: Fisher linear discriminant analysis; Principal component analysis; Kernel based methods; Fisherfaces; Eigenfaces; Feature extraction; Face recognition 1. Introduction Linear feature extraction based methods have been widely used in face recognition. Among them, there are two most famous techniques: Eigenfaces and Fisherfaces. Eigenfaces is based on the technique of Principal Component Analysis (PCA). It was originally proposed by Kirby and Sirovich [1,2] and popularized by Turk and Pentland [3, 4]. Fisherfaces is based on Fisher Linear Discriminant Analysis (LDA) and was put forward by Swets [5] and Belhumeur [6], respectively. Fisherfaces has shown to be better than Eigenfaces particularly under the condition where variations in illumination and facial expression occur. The recently arisen face recognition systems such as those in Refs. [7 – 11], are all based on LDA. Since face recognition is typically a small sample problem, LDA always becomes ill posed due to the singularity of the within-class scatter matrix, which makes the traditional LDA algorithm inapplicable. To avoid this difficulty, * Corresponding author. Address: Computer Vision Group, Aragon Institute of Engineering Research, Universidad de Zaragoza, E-50018, Zaragoza, Spain. Tel.: þ 34-976-762158; fax: þ 34-976-762111. E-mail addresses: jyang@unizar.es (J. Yang), yangjy@mail.njust.edu. cn (J. -Yang), afrangi@unizar.es (A.F. Frangi). 0262-8856/$ - see front matter q 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.imavis.2003.07.005 a popular way is to adopt the two-phase strategy [5,6], that is, PCA is first used to transform the original image vector space into an m (c # m # M 2 c; where M is the number of training samples and c is the number of pattern classes) dimensional one, then, LDA is further applied to reduce the dimension to d ðd # c 2 1Þ: Recently, the kernel-based nonlinear feature extraction techniques have been of wide concern. One is Kernel Principal Component Analysis (KPCA or Kernel PCA), and the other is Kernel Fisher Discriminant Analysis (KFD). KPCA was originally developed by Scholkopf in 1998 [12]. Subsequently, KFD was, respectively, proposed by Mika [13,14] and Baudat [15]. Mika’s work is mainly focused on two-class problems, while Baudat’s algorithm is applicable for multiclass problems. Yang [16] has applied KPCA and KFD to face recognition and proposed the techniques of Kernel Eigenfaces and Kernel Fisherfaces. Now, given a face image, it is easy to derive its linear features using PCA and nonlinear features using KPCA. Intuitively, these two kinds of features should be complimentary. In this paper, we try to integrate them into a united framework. Our strategy is to combine PCA-transformed feature vectors and KPCA-transformed feature vectors via complex vectors, and then to input them into a feature fusion container called Complex Fisher Linear Discriminant 1038 J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044 Analysis (Complex LDA) for a second feature extraction. The developed method is called Complex Fisherfaces. Also, we try an alternative strategy to combine the PCAtransformed features and KPCA-transformed features, i.e. align them into a super-vector and then perform LDA. This approach is called Serial Fisherfaces. However, our experimental results show Serial Fisherfaces is not as effective as Complex Fisherfaces. The remainder of this paper is organized as follows. In Section 2, we outline the methods of PCA (Eigenfaces), KPCA and Complex LDA. The combined Fisherfaces framework is built in Section 3. In Section 4, the proposed Complex Fisherfaces method is tested on FERET database and compared to the other methods such as Fisherfaces, Kernel Fisherfaces and Serial Fisherfaces. Finally, a brief conclusion is offered in Section 5. After the projection of sample x onto the eigenvector wj ; we can obtain the j-th feature 1 yj ¼ wTj x ¼ pffiffiffi uTj QT x; lj j ¼ 1; …; m ð3Þ The resulting features y1 ; y2 ; …; ym form a PCA-transformed feature vector Y ¼ ðy1 ; y2 ; …; ym ÞT for sample x. 2.2. Kernel principal component analysis In this section, we try to give a more transparent description for KPCA by comparing its technique to that of PCA. First of all, a nonlinear mapping F is used to map the input data space Rn into the feature space F: F : Rn ! F ð4Þ x 7 ! FðxÞ 2. Methods 2.1. Principal component analysis Given a set of M training samples x1 ; x2 ; …; xM in Rn ; the total covariance matrix can be formed by M 1 X C¼ ðx 2 x Þðxj 2 x ÞT ð1Þ M j¼1 j where x denotes the mean vector of all training samples. The orthonormal eigenvectors w1 ; w2 ; …; wm of C corresponding to m largest eigenvalues are chosen as projection axes. Generally, they can be calculated directly. However, for the problem like face recognition, the dimension of sample (image vector) is always very high so that it is considerably time-consuming to calculate C’s eigenvectors directly. Fortunately, we can use the Singular Value Decomposition (SVD) technique to determine the required eigenvectors. Let Q ¼ ½x1 2 x ; …; xM 2 x ; then C can be represented by C¼ 1 QQT : M Let us form the matrix R ¼ QT Q; which is an M £ M semi-positive definite matrix. In face recognition, since the number of training image samples is usually much smaller than the dimension of the image vector, the size of R is much smaller than that of C. So, it is easier to obtain its eigenvectors. Let us calculate the orthonormal eigenvectors u1 ; u2 ; …; um of R corresponding to m largest eigenvalues l1 $ l2 $ · · · $ lm : Then, by SVD theorem [18], the orthonormal eigenvectors w1 ; w2 ; …; wm of C corresponding to m largest eigenvalues l1 ; l2 ; …; lm are 1 wj ¼ pffiffiffi Quj ; lj j ¼ 1; …; m: ð2Þ Correspondingly, a pattern in the original input space Rn is mapped into a potentially much higher dimensional feature vector in the feature space F: An initial motivation of KPCA is to perform PCA in the feature space F: Let us construct the covariance matrix in the feature space F : M 1 X T C¼ ðFðxj Þ 2 FÞðFðx ð5Þ j Þ 2 FÞ M j¼1 where M X ¼ 1 Fðxj Þ: F M j¼1 However, it is not easy to centralize data directly in the feature space F: To avoid this difficulty, let us first consider the following noncentralized covariance matrix: M X ~ ¼ 1 C Fðxj ÞFðxj ÞT M j¼1 ð6Þ It is very computationally intensive or even impossible to ~ eigenvectors in a high-dimensional (even calculate C’s infinite-dimensional) feature space. KPCA can be viewed as utilizing two key techniques to solve this problem artfully. One is the SVD technique adopted in Eigenfaces, and the other is the so-called kernel tricks [12]. SVD technique can be used to transform the eigenvector calculation problem of a large-size matrix to the eigenvector calculation problem of a small-size matrix (see the discussion in Section 2.1) and, kernel tricks can be used to avoid the computation of dot products in the feature space by virtue of the following formula: kðx; yÞ ¼ ðFðxi Þ·Fðxj ÞÞ ð7Þ ~ can also Specifically, let Q ¼ ½Fðx1 Þ; …; FðxM Þ; then C be expressed by ~ ¼ 1 QQT : C M J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044 ~ ¼ QT Q: By virtue of kernel Let us form the matrix R tricks, we can determine the elements of the M £ M matrix ~ by R ~ ij ¼ Fðxj ÞT Fðxj Þ ¼ ðFðxi Þ·Fðxj ÞÞ ¼ kðxi ; yj Þ R 1 wj ¼ pffiffiffi Quj ; lj j ¼ 1; …; m: ð9Þ Now, a question is: how to get the covariance matrix C’s ~ by eigenvectors? Actually, if we centralize R ~ 2 1M R ~ 2 R1 ~ M þ 1M R1 ~ M R¼R ð10Þ (where ð1M Þij ¼ 1=M) and get R’s eigenvectors u1 ; u2 ; …; um ; we can obtain C’s eigenvectors w1 ; w2 ; …; wd from Eq. (9); for more details refer to Ref. [12]. After the projection of the mapped sample F(x) onto the eigenvector wj ; we can obtain the j-th feature yj ¼ wTj FðxÞ 1 ¼ pffiffiffi uTj QT FðxÞ lj 1 ¼ pffiffiffi uTj ½kðx1 ; xÞ; kðx2 ; xÞ; …; kðxM ; xÞ; lj ð11Þ j ¼ 1; …; m The resulting features y1 ; y2 ; …; ym form a KPCA-transformed feature vector Y ¼ ðy1 ; y2 ; …; ym ÞT for sample x. 2.3. Complex Fisher linear discriminant analysis Supposed A and B are two feature spaces defined on the pattern sample space V. Given an arbitrary sample j [ V; suppose the corresponding two feature vectors are a [ A and b [ B: Our idea is to combine them by a complex vector g ¼ a þ ib (i is the imaginary unit), which represents the combined feature of j. Note that if the dimensions of a and b are not equal, pad the lower dimensional one with zeros until its dimension is equal to the others’ before combination. We define the combined feature space on V by C ¼ {a þ ibla [ A; b [ B}: Obviously, it is an n-dimensional complex vector space, where n ¼ max{dim A, dim B}. In this space, the inner product is defined by ðX; YÞ ¼ XH Y; are, respectively, defined as follows: ð12Þ where X; Y [ C and H is the denotation of conjugate transpose. The space defined the above inner product is usually called Unitary space. In this space, the between-class scatter matrix, within-class scatter matrix and total scatter matrix c X Pðvi Þðmi 2 m0 Þðmi 2 m0 ÞH ð13Þ Pðvi ÞE{ðX 2 mi ÞðX 2 mi ÞH lvi } ð14Þ St ¼ Sb þ Sw ¼ E{ðX 2 m0 ÞðX 2 m0 ÞH } ð15Þ Sb ¼ i¼1 ð8Þ Let us calculate the orthonormal eigenvectors u1 ; u2 ; …; um ~ corresponding to m largest eigenvalues l1 $ l2 $ of R · · · $ lm : Then, by SVD technique, the orthonormal ~ corresponding to m largest eigenvectors w1 ; w2 ; …; wm of C eigenvalues l1 ; l2 ; …; lm are 1039 Sw ¼ c X i¼1 where Pðvi Þ is the prior probability of class i; mi is the mean vector of the training samples in class i; m0 is the mean vector across all training samples. From Eqs. (13) to (15), we know Sb ; Sw and St are all semi-positive definite Hermite matrices [17]. What is more, Sw and St are both positive definite matrix when Sw is nonsingular. In the following discussion, Sw is assumed to be nonsingular. The complex Fisher discriminant criterion function can be defined by: JðwÞ ¼ wH Sb w wH Sw w ð16Þ where w is an n-dimensional nonzero vector. Since Sw is positive definite and Sb is semi-positive definite, for any arbitrary w, we have wH Sb w $ 0 and wH Sw w . 0: This means that the values of JðwÞ are all nonnegative real number. So, the physical meaning of the complex Fisher criterion defined in Unitary space is similar to the classical Fisher criterion defined in Euclidean space. The vector w* maximizing the criterion is called complex Fisher optimal projection direction. Its physical meaning is that after the projection of samples onto w*, the ratio of the between-class scatter and the within-class scatter is maximized in Unitary space. Actually, w* can be chosen as the generalized eigenvector of the generalized eigenequation Sb X ¼ lSw X corresponding to the largest eigenvalue. However, the single optimal projection direction is generally not enough in real-world applications. So, we intend to find a set of projection axes w1 ; …; wd ; which maximize the criterion function JðwÞ subject to the following St -orthogonal constraints ( 1 i¼j H wj St wi ¼ dij ¼ i; j ¼ 1; …; d ð17Þ 0 i–j Note that here, in order to eliminate the statistical correlation between the LDA-transformed features, the projection axes are required to satisfy the St -orthogonal constraints rather than the usual orthogonal constraints [7, 17,19]. Yang et al. [17] have proved that the optimal projection axes w1 ; …; wd (d # c 2 1; where c is the number of classes) can be selected as the St -orthogonal generalized eigenvectors corresponding to d largest eigenvalues of the generalized eigen-equation Sb X ¼ lSw X: The following 1040 J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044 algorithm can be used to determine the optimal projection axes w1 ; …; wd : Complex LDA algorithm Step 1: Get the pre-whitening transformation matrix W, such that WH Sw W ¼ I: In fact, W ¼ VL21=2 ; where V ¼ ðV1 ; …; Vn Þ; L ¼ diagða1 ; …; an Þ; V1 ; …; Vn are eigenvectors of Sw, and a1 ; …; an are the associated eigenvalues. Step 2: Let S~ b ¼ WH Sb W; and calculate its orthonormal eigenvectors h1 ; …; hd corresponding to d largest eigenvalues. Then, the optimal projection axes are w1 ¼ Wh1 ; …; wd ¼ Whd : The resulting optimal projection axes w1 ; …; wd are used for feature extraction in the combined complex feature space. Given a sample and the corresponding combined feature vector Y, we can obtain its discriminant feature vector Z by the following transformation: Z ¼ PH Y; where P ¼ ðw1 ; …; wd Þ ð18Þ The feature extraction method described above is called Complex LDA. Obviously, the traditional LDA is a special case of Complex LDA. 3. Combined Fisherfaces framework In this section, a Complex LDA based combined Fisherfaces framework (coined Complex Fisherfaces) is developed for face image feature extraction and recognition. In this framework, PCA and KPCA are both used for feature extraction in the first phase. In the second phase, PCA-based linear features and KPCA-based nonlinear features are integrated by complex vectors, which are input into a feature fusion container called Complex LDA for a second feature extraction. Finally, the resulting Complex LDA transformed features are used for classification. This framework is illustrated in Fig. 1. For the current PCA plus LDA two-phase algorithms, in order to avoid the difficulty of the within-class scatter matrix being singular in the LDA phase, a feasible way is to discard the subordinate components (those corresponding to small eigenvalues) in the PCA phase [5,6]. Generally, only m principal components of PCA (KPCA) are retained and used to form the feature vectors, where m is generally subject to c # m # M 2 c: Liu and Wechsler [9,20] pointed out that a Fig. 1. Illustration of the combined Fisherfaces framework (Complex Fisherfaces). constraint that just make the within-class scatter matrix nonsingular still cannot guarantee good generalization for LDA. This is because the trailing eigenvalues of the withinclass scatter matrix tend to capture more noise if they were too small. Taking the generalization of LDA into account, we select m ¼ c components of PCA (KPCA) in the above framework. As we know, the difference of feature extraction techniques or measurements might lead to the numerical unbalance between two sets of features of a same pattern [17]. In order to counteract the numerical unbalance and gain satisfactory fusion performance, the original feature vectors should be normalized before the combination step. Given an original feature vector X, we use the following preprocessing technique to get the normalized feature vector Y: Y¼ X2m s ð19Þ where m denotes the mean vector of training samples; s¼ n 1X s; n j¼1 j where n is the dimension of X, and sj is the standard deviation of the j-th feature component of the training samples. Suppose the PCA-based feature vector is denoted by a and the KPCA-based feature vector is denoted by b. They are both c-dimensional vectors. After the normalization process, we get the normalized feature vectors ā and b̄. we Combining them by a complex vector, i.e. g ¼ a þ ib; get an c-dimensional combined feature space. In this space, the developed Complex LDA is exploited for feature extraction and fusion. This method is named Complex Fisherfaces. Besides the complex vector based method suggested above, there is an alternative feature fusion strategy which was adopted in Ref. [9] and called serial feature fusion in Ref. [17]. Specifically, given two feature vectors of a same pattern, a and b, after normalization, they are combined by a super-vector ! a g¼ : b Then, the traditional LDA is employed for a second feature extraction. If the two sets of feature vectors a and b are chosen as c-dimensional PCA-based feature vectors and KPCA-based feature vectors, the resulting method based on the above strategy is called Serial Fisherfaces in this paper. Here, it should be pointed out that Serial Fisherfaces is different from the method proposed in Ref. [9], where the two sets of features involved in fusion are shape- and texture-based features. Since extraction of the shape- and texture-based features needs manual operations, Liu’s combined Fisher classifier method [9] is semi-automatic. J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044 Table 1 The two-letter strings in the image name indicate the kind of imagery [22] Two-letter mark Pose angle (degrees) Description Number of subjects ba bj 0 0 200 200 bk 0 bd þ25 Frontal ‘b’ series Alternative expression to ‘ba’ Different illumination to ‘ba’ Subject faces to his left which is the photographer’s right be bf þ15 215 bg 225 1041 eyes and, the cropped image was resized to 80 £ 80 pixels and pre-processed by histogram equalization. Some example images of one person are shown in Fig. 2. 4.1. Experiments Subject faces to his right which is the photographer’s left 200 200 200 200 200 While, the Serial Fisherfaces and Complex Fisherfaces proposed are both automatic methods. 4. Experiments and analysis The FERET face image database is a result of the FERET program, which was sponsored by the Department of Defense through the DARPA Program [21,22]. It has become a standard database to test and evaluate state-of-theart face recognition algorithms. The proposed algorithm was applied to face recognition and tested on a subset of the FERET database. This subset includes 1400 images of 200 individuals (each individual has seven images). It is composed of the images whose names are marked with two-character strings: ‘ba’, ‘bj’, ‘bk’, ‘be’, ‘bf’, ‘bd’, and ‘bg’. These strings indicate the kind of imagery as shown in Table 1. This subset involves variations in facial expression, illumination, and pose (^ 158 and ^ 258). In our experiment, the facial portion of each original image was cropped based on the location of In our experiment, three images of each subject are randomly chosen for training, while the remaining images are used for testing. Thus, the total number of training samples is 600 and the total number of testing samples is 800. Fisherfaces [6], Kernel Fisherfaces [16], Complex Fisherfaces, and Serial Fisherfaces are, respectively, used for feature extraction. Like that in Complex Fisherfaces and Serial Fisherfaces, 200 principal components ðm ¼ c ¼ 200Þ are chosen in the first phase of Fisherfaces and Kernel Fisherfaces. Yang [16] has demonstrated that a second or third order polynomial kernel suffices to achieve good results with less computation than other kernels. So, for consistency with Yang’s studies, the polynomial kernel, kðx; yÞ ¼ ðx·y þ 1Þq ; is adopted ðq ¼ 2Þ here for all kernelrelated methods. Finally, a minimum Euclidean distance classifier is employed. Note that in the complex fusion space (Unitary space), the Euclidean distance between sample Z and pattern class l is defined by qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi gl ðZÞ ¼ ðZ 2 ml ÞH ðZ 2 ml Þ ð20Þ where ml denotes the mean vector of the training samples in class l: The decision rule of the minimum distance classifier is: if sample Z satisfies gk ðZÞ ¼ minl gl ðZÞ; then Z belongs to class k: The classification results are illustrated in Fig. 3. Note that in Fig. 3, for each method, we only show the recognition rates within the interval where the dimension (the number of features) varies from 6 to 26. This is because the maximal recognition rates of Fisherfaces, Kernel Fisherfaces and Complex Fisherfaces all occur within this interval, and their recognition rates begin to reduce after the dimension is over 26. Here, we pay less attention to Serial Fig. 2. Images of one person in FERET database. (a) Original images, (b) cropped images corresponding to images in (a). 1042 J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044 Fig. 3. The recognition rates of Fisherface, Kernel Fisherface, Complex Fisherfaces, and Serial Fisherfaces on test 1. Fisherfaces because its recognition rates are overall below 70%. From Fig. 3, we can see that the performance of Complex Fisherfaces is consistently better than Fisherfaces and Kernel Fisherfaces. Fisherfaces can only utilize the linear discriminant information while kernel Fisherfaces can only utilize the nonlinear discriminant information. Whereas, Complex Fisherfaces is able to make use of these two kinds of discriminant information, which turn out to be complimentary for achieving a better result. In comparison, the performance of Serial Fisherfaces is not satisfying. Its recognition rates are overall lower than those of Fisherfaces and Kernel Fisherfaces. Now, a question is: is the above result with respect to the choice of training set? In other words, if another set of training samples are chosen at random, would we obtain a similar result? To answer this question, we repeat the above experiment ten times. In each time, the training sample set is selected at random so that the training sample sets are different for ten tests (correspondingly, the testing sets are also different). For each method and four different dimensions (dimension ¼ 18, 20, 22, 24, respectively), the mean and standard deviation of the recognition rates across ten tests are illustrated in Fig. 4. Note that we chose dimension ¼ 18, 20, 22, 24 because it can be seen from Fig. 3 that the maximal recognition rates of Fisherfaces, Kernel Fisherfaces and Complex Fisherfaces all occur in the interval where the dimension varies from 18 to 24. Also, for each method mentioned above, the average recognition rates across ten tests and four dimensions are listed in Table 2. Moreover, Table 2 also gives the results of Eigenfaces [3] and Kernel Eigenfaces [16]. Fig. 4. The mean and standard deviation of the recognition rates of Fisherface, Kernel Fisherface, Complex Fisherfaces, and Serial Fisherfaces across 10 tests when the dimension ¼ 18, 20, 22, 24, respectively. J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044 Table 2 The average recognition rates (%) of Eigenfaces, Kernel Eigenfaces, Fisherfaces, Kernel Fisherfaces, Complex Fisherfaces, and Serial Fisherfaces across ten tests and four dimensions (18, 20, 22, 24) Eigenfaces Kernel Fisherfaces Kernel Complex Serial Eigenfaces Fisherfaces Fisherfaces Fisherfaces 17.53 16.94 77.87 77.16 80.30 66.96 Fig. 4 shows that Complex Fisherfaces outperforms Fisherfaces and Kernel Fisherfaces irrespective of different training sample sets and varying dimensions. The mean recognition rate of Complex Fisherfaces is over 2% higher than those of Fisherfaces and Kernel Fisherfaces and, its standard deviation is always between or below those of Fisherfaces and Kernel Fisherfaces. However, Serial Fisherfaces does not perform well across all these trials and dimensional variations. Its mean recognition rate is lower while its standard deviation is larger than other methods. Table 2 shows Fisherfaces, Kernel Fisherfaces and Complex Fisherfaces are all significantly superior to Eigenfaces and Kernel Eigenfaces. This indicates that LDA (or KFD) is really helpful for improving the performance of PCA (or KPCA) for face recognition. 4.2. Comparison analysis: why does Complex Fisherfaces outperform Serial Fisherfaces? Why does Complex Fisherfaces perform better than Serial Fisherfaces? In our opinion, the underlying reason is that the parallel feature fusion strategy based on complex vectors is more suitable for the small sample size problem like face recognition than the serial strategy based on supervectors. For small sample size problems, the higher the dimension of feature vector is, the more difficult it is to evaluate the scatter matrices accurately. If we combine two sets of features of a sample serially by a super-vector, the dimension of feature vector will increase double. For instance, the resulting super-vector ! a g¼ b is 400 dimensional after two 200-dimensional a and b are combined in the above experiment. Thus, after serial combination, it becomes more difficult to evaluate the scatter matrices (based on a relatively small number of training samples) than before. Whereas, the parallel feature fusion strategy based on complex vectors can avoid this difficulty since the dimension keeps invariable after combination. In our experiments, using parallel strategy, the size of the scatter matrices is still 200 £ 200, just like that before fusion. However, the size of the scatter matrices becomes 400 £ 400 as the serial strategy is used. Taking the characteristic of complex matrices into account, the amount of data needing evaluation using complex (parallel) 1043 Table 3 Ten smallest eigenvalues of the within-class scatter matrix in two different fusion spaces Strategy of combination Serial fusion (unit: 1 £ 1025) Complex fusion (unit: 1 £ 1023) 1 2 2.45 15.1 2.26 13.6 3 2.14 11.7 4 5 6 7 8 9 10 1.94 1.90 1.72 1.69 1.44 1.37 1.13 9.5 7.9 7.0 5.8 4.7 3.9 2.6 combination is only a half of that using serial combination. Note that for an l £ l complex matrix, there are 2l2 real numbers involved since one complex element is actually formed by two real numbers. Consequently, it is much easier to evaluate the corresponding scatter matrices using Complex Fisherfaces than Serial Fisherfaces. Actually, a more specific explanation can be given from the magnitude criterion point of view [9,20]. Liu and Wechsler [9,20] thought the trailing eigenvalues of the within-class scatter matrix should not be too small, for good generalization. Let us calculate 10 smallest eigenvalues of the within-class scatter matrix in two different fusion spaces and list them in Table 3. Table 3 shows the trailing eigenvalues of the within-class scatter matrix in the serial fusion space is much less than those in the complex (parallel) fusion space. This fact indicates that Complex Fisherfaces should have better generalization than Serial Fisherfaces. 5. Conclusion In this paper, an effective feature fusion framework— Complex Fisherfaces—was developed for face recognition. In this framework, PCA-based linear features and KPCAbased nonlinear features are integrated by complex vectors and, the developed Complex LDA is further employed for feature extraction and fusion. The resulting Complex LDA discriminant features contain two kinds of discriminant information, the linear one and the nonlinear one. This characteristic makes Complex Fisherfaces more powerful than Fisherfaces and Kernel Fisherfaces. We also compared the performance of Complex Fisherfaces with that of Serial Fisherfaces and, analyzed why Serial Fisherfaces could not achieve a satisfying performance. Serial Fisherfaces combine two sets of features via super-vectors, which increase the dimension double. The increase of dimension makes it more difficult to evaluate the scatter matrices accurately and renders the trailing eigenvalues of the within-class scatter matrix too small. These small trailing eigenvalues capture more noises and make the generalization of Serial Fisherfaces poor. In comparison, Complex Fisherfaces can avoid these disadvantages and thus has good generalization. 1044 J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044 Acknowledgements This work is partially supported by the National Science Foundation of China under Grant No. 60072034. It is also partially supported by grants TIC2002-04495-C02 from the same Spanish Ministry of Science and Technology (MCyT). Jian Yang and Alejandro F. Frangi are also supported by a Ramón y Cajal Research Fellowship from MCyT. Finally, we would like to thank the anonymous reviewers for their constructive advice. References [1] L. Sirovich, M. Kirby, Low-dimensional procedure for characterization of human faces, J. Opt. Soc. Am. 4 (1987) 519–524. [2] M. Kirby, L. Sirovich, Application of the KL procedure for the characterization of human faces, IEEE Trans. Pattern Anal. Mach. Intell. 12 (1) (1990) 103– 108. [3] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci. 3 (1) (1991) 71–86. [4] A. Pentland, Looking at people: sensing for ubiquitous and wearable computing, IEEE Trans. Pattern Anal. Mach. Intell. 22 (1) (2000) 107–119. [5] D.L. Swets, J. Weng, Using discriminant eigenfeatures for image retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 18 (8) (1996) 831–836. [6] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs. Fisherfaces: recognition using class specific linear projection, IEEE Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711 –720. [7] Z. Jin, J.-Y. Yang, Z.S. Hu, Z. Lou, Face recognition based on uncorrelated discriminant transformation, Pattern Recogn. 33 (7) (2001) 1405–1416. [8] L.F. Chen, M. Liao, M.T. Ko, J.C. Lin, G.J. Yu, A new LDA-based face recognition system which can solve the small sample size problem, Pattern Recogn. 33 (10) (2000) 1713–1726. [9] C.J. Liu, H. Wechsler, A shape-and texture-based enhanced Fisher classifier for face recognition, IEEE Trans. Image Process. 10 (4) (2001) 598–608. [10] J. Yang, J.-Y. Yang, Why can LDA be performed in PCA transformed space?, Pattern Recogn. 36 (2) (2003) 563–566. [11] J. Yang, J.-Y. Yang, Optimal FLD algorithm for facial feature extraction, SPIE Proc. Intell. Robots Comput Vision XX: Algorithms, Techniques, Active Vision, October 4572 (2001) 438 –444. [12] B. Scholkopf, A. Smola, K.R. Muller, Nonlinear component analysis as a kernel eigenvalue problem, Neural Comput. 10 (5) (1998) 1299–1319. [13] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, K.-R. Müller, Fisher Discriminant Analysis with Kernels, IEEE International Workshop on Neural Networks for Signal Processing IX, Madison (USA), August, 1999, pp. 41–48. [14] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola, K.-R. Müller, Constructing descriptive and discriminative non-linear features: Rayleigh coefficients in kernel feature spaces, IEEE Trans. Pattern Anal. Mach. Intell. 25 (5) (2003) 623 –628. [15] G. Baudat, F. Anouar, Generalized discriminant analysis using a kernel approach, Neural Comput. 12 (10) (2000) 2385–2404. [16] M.H. Yang, Kernel Eigenfaces vs. Kernel Fisherfaces: Face Recognition Using Kernel Methods, Proceedings of the Fifth IEEE International Conference on Automatic Face and Gesture Recognition (RGR’02), Washington DC, May, 2002, pp. 215 –220. [17] J. Yang, J.-Y. Yang, D. Zhang, J.F. Lu, Feature fusion: parallel strategy vs. serial strategy, Pattern Recogn. 36 (6) (2003) 1369–1381. [18] G.H. Golub, C.F. Van Loan, Matrix Computations, Third ed., The Johns Hopkins University Press, Baltimore, 1996. [19] J. Yang, J.-Y. Yang, D. Zhang, What’s wrong with the Fisher criterion?, Pattern Recogn. 35 (11) (2002) 2665–2668. [20] C.J. Liu, H. Wechsler, Robust coding schemes for indexing and retrieval from large face databases, IEEE Trans. Image Process. 9 (1) (2000) 132 –137. [21] P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss, The FERET evaluation methodology for face-recognition algorithms, IEEE Trans. Pattern Anal. Mach. Intell. 22 (10) (2000) 1090–1104. [22] P.J. Phillips, The Facial Recognition Technology (FERET) Database, http://www.itl.nist.gov/iad/humanid/feret/feret_master.html. Jian Yang was born in Jiangsu, China, on 3rd June 1973. He obtained his Bachelor of Science in Mathematics at the Xuzhou Normal University in China in 1995. He then continued on to complete a Masters of Science degree in Applied Mathematics at the Changsha Railway University in 1998 and his PhD at the Nanjing University of Science and Technology in the Department of Computer Science on the subject of Pattern Recognition and Intelligence Systems in 2002. From March to September in 2002, he worked as a research assistant at Department of Computing, Hong Kong Polytechnic University. Now, he is a postdoctoral research fellow at the University of Zaragoza and is affiliated with the Division of Bioengineering of the Aragon Institute of Engineering Research (I3A). He is the author of more than 20 scientific papers in pattern recognition and computer vision. His current research interests include face recognition and detection, handwritten character recognition and data fusion. Jing-yu Yang received the BS Degree in Computer Science from Nanjing University of Science and Technology (NUST), Nanjing, China. From 1982 to 1984, he was a visiting scientist at the Coordinated Science Laboratory, University of Illinois at Urbana-Champaign. From 1993 to 1994, he was a visiting professor at the Department of Computer Science, Missuria University. And in 1998, he acted as a visiting professor at Concordia University in Canada. He is currently a professor and Chairman in the department of Computer Science at NUST. He is the author of over 100 scientific papers in computer vision, pattern recognition, and artificial intelligence. He has won more than 20 provincial awards and national awards. His current research interests are in the areas of pattern recognition, robot vision, image processing, data fusion, and artificial intelligence. Alejandro F. Frangi obtained his MSc degree in Telecommunication Engineering from the Universitat Politècnica de Catalunya, Barcelona in 1996 where he subsequently did research on Electrical Impedance Tomography for image reconstruction and noise characterization. In 2001, he obtained his PhD at the Image Sciences Institute of the University Medical Center Utrecht on model-based cardiovascular image analysis. The same year, Dr Frangi moved to Zaragoza, Spain. He is now Assistant Professor at the University of Zaragoza and is affiliated with the Division of Biomedical Engineering of the Aragon Institute of Engineering Research (I3A), a multidisciplinary research institute of the University of Zaragoza. He is a member of the Computer Vision Group and his main research interests are in computer vision and medical image analysis with particular emphasis in model- and registration-based techniques, and statistical methods. Dr Frangi has recently been awarded the Ramón y Cajal Research Fellowship, a national program of the Spanish Ministry of Science and Technology.