Uploaded by praveen06061995

Combined Fisherfaces framework.pdf-f5791c5b649d7b41cabdb5464fa7729a

Image and Vision Computing 21 (2003) 1037–1044
www.elsevier.com/locate/imavis
Combined Fisherfaces framework
Jian Yanga,b,*, Jing-yu Yanga, Alejandro F. Frangib
b
a
Department of Computer Science, Nanjing University of Science and Technology, Nanjing 210094, China
Computer Vision Group, Aragon Institute of Engineering Research, Universidad de Zaragoza, E-50018 Zaragoza, Spain
Received 19 October 2001; received in revised form 30 June 2003; accepted 3 July 2003
Abstract
In this paper, a Complex LDA based combined Fisherfaces framework, coined Complex Fisherfaces, is developed for face feature
extraction and recognition. In this framework, Principal Component Analysis (PCA) and Kernel PCA (KPCA) are first used for feature
extraction. Then, the resulting PCA-based linear features and KPCA-based nonlinear features are integrated by complex vectors and,
Complex LDA is further employed for feature fusion. The proposed method is tested on a subset of FERET database. The experimental
results demonstrate that Complex Fisherfaces outperforms Fisherfaces and Kernel Fisherfaces. Also, the complex vector based parallel
feature fusion strategy is demonstrated to be much more effective and robust than the super-vector based serial feature fusion strategy for face
recognition.
q 2003 Elsevier B.V. All rights reserved.
Keywords: Fisher linear discriminant analysis; Principal component analysis; Kernel based methods; Fisherfaces; Eigenfaces; Feature extraction; Face
recognition
1. Introduction
Linear feature extraction based methods have been
widely used in face recognition. Among them, there are
two most famous techniques: Eigenfaces and Fisherfaces.
Eigenfaces is based on the technique of Principal Component Analysis (PCA). It was originally proposed by Kirby
and Sirovich [1,2] and popularized by Turk and Pentland [3,
4]. Fisherfaces is based on Fisher Linear Discriminant
Analysis (LDA) and was put forward by Swets [5] and
Belhumeur [6], respectively. Fisherfaces has shown to be
better than Eigenfaces particularly under the condition
where variations in illumination and facial expression occur.
The recently arisen face recognition systems such as those
in Refs. [7 – 11], are all based on LDA. Since face
recognition is typically a small sample problem, LDA
always becomes ill posed due to the singularity of the
within-class scatter matrix, which makes the traditional
LDA algorithm inapplicable. To avoid this difficulty,
* Corresponding author. Address: Computer Vision Group, Aragon
Institute of Engineering Research, Universidad de Zaragoza, E-50018,
Zaragoza, Spain. Tel.: þ 34-976-762158; fax: þ 34-976-762111.
E-mail addresses: jyang@unizar.es (J. Yang), yangjy@mail.njust.edu.
cn (J. -Yang), afrangi@unizar.es (A.F. Frangi).
0262-8856/$ - see front matter q 2003 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2003.07.005
a popular way is to adopt the two-phase strategy [5,6],
that is, PCA is first used to transform the original image
vector space into an m (c # m # M 2 c; where M is the
number of training samples and c is the number of pattern
classes) dimensional one, then, LDA is further applied to
reduce the dimension to d ðd # c 2 1Þ:
Recently, the kernel-based nonlinear feature extraction
techniques have been of wide concern. One is Kernel
Principal Component Analysis (KPCA or Kernel PCA), and
the other is Kernel Fisher Discriminant Analysis (KFD).
KPCA was originally developed by Scholkopf in 1998 [12].
Subsequently, KFD was, respectively, proposed by Mika
[13,14] and Baudat [15]. Mika’s work is mainly focused on
two-class problems, while Baudat’s algorithm is applicable
for multiclass problems. Yang [16] has applied KPCA and
KFD to face recognition and proposed the techniques of
Kernel Eigenfaces and Kernel Fisherfaces.
Now, given a face image, it is easy to derive its linear
features using PCA and nonlinear features using KPCA.
Intuitively, these two kinds of features should be complimentary. In this paper, we try to integrate them into a united
framework. Our strategy is to combine PCA-transformed
feature vectors and KPCA-transformed feature vectors via
complex vectors, and then to input them into a feature fusion
container called Complex Fisher Linear Discriminant
1038
J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044
Analysis (Complex LDA) for a second feature extraction.
The developed method is called Complex Fisherfaces. Also,
we try an alternative strategy to combine the PCAtransformed features and KPCA-transformed features, i.e.
align them into a super-vector and then perform LDA. This
approach is called Serial Fisherfaces. However, our
experimental results show Serial Fisherfaces is not as
effective as Complex Fisherfaces.
The remainder of this paper is organized as follows. In
Section 2, we outline the methods of PCA (Eigenfaces),
KPCA and Complex LDA. The combined Fisherfaces
framework is built in Section 3. In Section 4, the proposed
Complex Fisherfaces method is tested on FERET database
and compared to the other methods such as Fisherfaces,
Kernel Fisherfaces and Serial Fisherfaces. Finally, a brief
conclusion is offered in Section 5.
After the projection of sample x onto the eigenvector wj ; we
can obtain the j-th feature
1
yj ¼ wTj x ¼ pffiffiffi uTj QT x;
lj
j ¼ 1; …; m
ð3Þ
The resulting features y1 ; y2 ; …; ym form a PCA-transformed
feature vector Y ¼ ðy1 ; y2 ; …; ym ÞT for sample x.
2.2. Kernel principal component analysis
In this section, we try to give a more transparent
description for KPCA by comparing its technique to that of
PCA.
First of all, a nonlinear mapping F is used to map the
input data space Rn into the feature space F:
F : Rn ! F
ð4Þ
x 7 ! FðxÞ
2. Methods
2.1. Principal component analysis
Given a set of M training samples x1 ; x2 ; …; xM in Rn ; the
total covariance matrix can be formed by
M
1 X
C¼
ðx 2 x Þðxj 2 x ÞT
ð1Þ
M j¼1 j
where x denotes the mean vector of all training samples.
The orthonormal eigenvectors w1 ; w2 ; …; wm of C
corresponding to m largest eigenvalues are chosen as
projection axes. Generally, they can be calculated directly.
However, for the problem like face recognition, the
dimension of sample (image vector) is always very high
so that it is considerably time-consuming to calculate C’s
eigenvectors directly. Fortunately, we can use the Singular
Value Decomposition (SVD) technique to determine the
required eigenvectors.
Let Q ¼ ½x1 2 x ; …; xM 2 x ; then C can be represented by
C¼
1
QQT :
M
Let us form the matrix R ¼ QT Q; which is an M £ M
semi-positive definite matrix. In face recognition, since
the number of training image samples is usually much
smaller than the dimension of the image vector, the size
of R is much smaller than that of C. So, it is easier to
obtain its eigenvectors.
Let us calculate the orthonormal eigenvectors u1 ; u2 ; …;
um of R corresponding to m largest eigenvalues l1 $ l2 $
· · · $ lm : Then, by SVD theorem [18], the orthonormal
eigenvectors w1 ; w2 ; …; wm of C corresponding to m largest
eigenvalues l1 ; l2 ; …; lm are
1
wj ¼ pffiffiffi Quj ;
lj
j ¼ 1; …; m:
ð2Þ
Correspondingly, a pattern in the original input space Rn is
mapped into a potentially much higher dimensional feature
vector in the feature space F:
An initial motivation of KPCA is to perform PCA in the
feature space F: Let us construct the covariance matrix in
the feature space F :
M
1 X
T
C¼
ðFðxj Þ 2 FÞðFðx
ð5Þ
j Þ 2 FÞ
M j¼1
where
M
X
¼ 1
Fðxj Þ:
F
M j¼1
However, it is not easy to centralize data directly in the
feature space F: To avoid this difficulty, let us first consider
the following noncentralized covariance matrix:
M
X
~ ¼ 1
C
Fðxj ÞFðxj ÞT
M j¼1
ð6Þ
It is very computationally intensive or even impossible to
~ eigenvectors in a high-dimensional (even
calculate C’s
infinite-dimensional) feature space.
KPCA can be viewed as utilizing two key techniques to
solve this problem artfully. One is the SVD technique
adopted in Eigenfaces, and the other is the so-called kernel
tricks [12]. SVD technique can be used to transform the
eigenvector calculation problem of a large-size matrix to the
eigenvector calculation problem of a small-size matrix (see
the discussion in Section 2.1) and, kernel tricks can be used
to avoid the computation of dot products in the feature space
by virtue of the following formula:
kðx; yÞ ¼ ðFðxi Þ·Fðxj ÞÞ
ð7Þ
~ can also
Specifically, let Q ¼ ½Fðx1 Þ; …; FðxM Þ; then C
be expressed by
~ ¼ 1 QQT :
C
M
J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044
~ ¼ QT Q: By virtue of kernel
Let us form the matrix R
tricks, we can determine the elements of the M £ M matrix
~ by
R
~ ij ¼ Fðxj ÞT Fðxj Þ ¼ ðFðxi Þ·Fðxj ÞÞ ¼ kðxi ; yj Þ
R
1
wj ¼ pffiffiffi Quj ;
lj
j ¼ 1; …; m:
ð9Þ
Now, a question is: how to get the covariance matrix C’s
~ by
eigenvectors? Actually, if we centralize R
~ 2 1M R
~ 2 R1
~ M þ 1M R1
~ M
R¼R
ð10Þ
(where ð1M Þij ¼ 1=M) and get R’s eigenvectors u1 ; u2 ; …;
um ; we can obtain C’s eigenvectors w1 ; w2 ; …; wd from Eq.
(9); for more details refer to Ref. [12].
After the projection of the mapped sample F(x) onto the
eigenvector wj ; we can obtain the j-th feature
yj ¼
wTj FðxÞ
1
¼ pffiffiffi uTj QT FðxÞ
lj
1
¼ pffiffiffi uTj ½kðx1 ; xÞ; kðx2 ; xÞ; …; kðxM ; xÞ;
lj
ð11Þ
j ¼ 1; …; m
The resulting features y1 ; y2 ; …; ym form a KPCA-transformed feature vector Y ¼ ðy1 ; y2 ; …; ym ÞT for sample x.
2.3. Complex Fisher linear discriminant analysis
Supposed A and B are two feature spaces defined on the
pattern sample space V. Given an arbitrary sample j [ V;
suppose the corresponding two feature vectors are a [ A
and b [ B: Our idea is to combine them by a complex
vector g ¼ a þ ib (i is the imaginary unit), which
represents the combined feature of j. Note that if the
dimensions of a and b are not equal, pad the lower
dimensional one with zeros until its dimension is equal to
the others’ before combination.
We define the combined feature space on V by C ¼
{a þ ibla [ A; b [ B}: Obviously, it is an n-dimensional
complex vector space, where n ¼ max{dim A, dim B}. In
this space, the inner product is defined by
ðX; YÞ ¼ XH Y;
are, respectively, defined as follows:
ð12Þ
where X; Y [ C and H is the denotation of conjugate
transpose.
The space defined the above inner product is usually
called Unitary space. In this space, the between-class scatter
matrix, within-class scatter matrix and total scatter matrix
c
X
Pðvi Þðmi 2 m0 Þðmi 2 m0 ÞH
ð13Þ
Pðvi ÞE{ðX 2 mi ÞðX 2 mi ÞH lvi }
ð14Þ
St ¼ Sb þ Sw ¼ E{ðX 2 m0 ÞðX 2 m0 ÞH }
ð15Þ
Sb ¼
i¼1
ð8Þ
Let us calculate the orthonormal eigenvectors u1 ; u2 ; …; um
~ corresponding to m largest eigenvalues l1 $ l2 $
of R
· · · $ lm : Then, by SVD technique, the orthonormal
~ corresponding to m largest
eigenvectors w1 ; w2 ; …; wm of C
eigenvalues l1 ; l2 ; …; lm are
1039
Sw ¼
c
X
i¼1
where Pðvi Þ is the prior probability of class i; mi is the mean
vector of the training samples in class i; m0 is the mean
vector across all training samples.
From Eqs. (13) to (15), we know Sb ; Sw and St are all
semi-positive definite Hermite matrices [17]. What is more,
Sw and St are both positive definite matrix when Sw is
nonsingular. In the following discussion, Sw is assumed to
be nonsingular.
The complex Fisher discriminant criterion function can
be defined by:
JðwÞ ¼
wH Sb w
wH Sw w
ð16Þ
where w is an n-dimensional nonzero vector.
Since Sw is positive definite and Sb is semi-positive
definite, for any arbitrary w, we have wH Sb w $ 0 and wH
Sw w . 0: This means that the values of JðwÞ are all
nonnegative real number. So, the physical meaning of the
complex Fisher criterion defined in Unitary space is similar
to the classical Fisher criterion defined in Euclidean space.
The vector w* maximizing the criterion is called complex
Fisher optimal projection direction. Its physical meaning is
that after the projection of samples onto w*, the ratio of the
between-class scatter and the within-class scatter is
maximized in Unitary space. Actually, w* can be chosen
as the generalized eigenvector of the generalized eigenequation Sb X ¼ lSw X corresponding to the largest
eigenvalue.
However, the single optimal projection direction is
generally not enough in real-world applications. So, we
intend to find a set of projection axes w1 ; …; wd ; which
maximize the criterion function JðwÞ subject to the
following St -orthogonal constraints
(
1 i¼j
H
wj St wi ¼ dij ¼
i; j ¼ 1; …; d
ð17Þ
0 i–j
Note that here, in order to eliminate the statistical
correlation between the LDA-transformed features, the
projection axes are required to satisfy the St -orthogonal
constraints rather than the usual orthogonal constraints [7,
17,19].
Yang et al. [17] have proved that the optimal projection
axes w1 ; …; wd (d # c 2 1; where c is the number of
classes) can be selected as the St -orthogonal generalized
eigenvectors corresponding to d largest eigenvalues of the
generalized eigen-equation Sb X ¼ lSw X: The following
1040
J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044
algorithm can be used to determine the optimal projection
axes w1 ; …; wd :
Complex LDA algorithm
Step 1: Get the pre-whitening transformation matrix W,
such that WH Sw W ¼ I: In fact, W ¼ VL21=2 ; where
V ¼ ðV1 ; …; Vn Þ; L ¼ diagða1 ; …; an Þ; V1 ; …; Vn are
eigenvectors of Sw, and a1 ; …; an are the associated
eigenvalues.
Step 2: Let S~ b ¼ WH Sb W; and calculate its orthonormal
eigenvectors h1 ; …; hd corresponding to d largest
eigenvalues. Then, the optimal projection axes are w1 ¼
Wh1 ; …; wd ¼ Whd :
The resulting optimal projection axes w1 ; …; wd are used
for feature extraction in the combined complex feature
space. Given a sample and the corresponding combined
feature vector Y, we can obtain its discriminant feature
vector Z by the following transformation:
Z ¼ PH Y;
where P ¼ ðw1 ; …; wd Þ
ð18Þ
The feature extraction method described above is called
Complex LDA. Obviously, the traditional LDA is a special
case of Complex LDA.
3. Combined Fisherfaces framework
In this section, a Complex LDA based combined
Fisherfaces framework (coined Complex Fisherfaces) is
developed for face image feature extraction and recognition.
In this framework, PCA and KPCA are both used for feature
extraction in the first phase. In the second phase, PCA-based
linear features and KPCA-based nonlinear features are
integrated by complex vectors, which are input into a
feature fusion container called Complex LDA for a second
feature extraction. Finally, the resulting Complex LDA
transformed features are used for classification. This
framework is illustrated in Fig. 1.
For the current PCA plus LDA two-phase algorithms, in
order to avoid the difficulty of the within-class scatter matrix
being singular in the LDA phase, a feasible way is to discard
the subordinate components (those corresponding to small
eigenvalues) in the PCA phase [5,6]. Generally, only m
principal components of PCA (KPCA) are retained and used
to form the feature vectors, where m is generally subject to
c # m # M 2 c: Liu and Wechsler [9,20] pointed out that a
Fig. 1. Illustration of the combined Fisherfaces framework (Complex
Fisherfaces).
constraint that just make the within-class scatter matrix
nonsingular still cannot guarantee good generalization for
LDA. This is because the trailing eigenvalues of the withinclass scatter matrix tend to capture more noise if they were
too small. Taking the generalization of LDA into account,
we select m ¼ c components of PCA (KPCA) in the above
framework.
As we know, the difference of feature extraction
techniques or measurements might lead to the numerical
unbalance between two sets of features of a same pattern
[17]. In order to counteract the numerical unbalance and
gain satisfactory fusion performance, the original feature
vectors should be normalized before the combination step.
Given an original feature vector X, we use the following
preprocessing technique to get the normalized feature
vector Y:
Y¼
X2m
s
ð19Þ
where m denotes the mean vector of training samples;
s¼
n
1X
s;
n j¼1 j
where n is the dimension of X, and sj is the standard
deviation of the j-th feature component of the training
samples.
Suppose the PCA-based feature vector is denoted by a
and the KPCA-based feature vector is denoted by b. They
are both c-dimensional vectors. After the normalization
process, we get the normalized feature vectors ā and b̄.
we
Combining them by a complex vector, i.e. g ¼ a þ ib;
get an c-dimensional combined feature space. In this space,
the developed Complex LDA is exploited for feature
extraction and fusion. This method is named Complex
Fisherfaces.
Besides the complex vector based method suggested
above, there is an alternative feature fusion strategy which
was adopted in Ref. [9] and called serial feature fusion in
Ref. [17]. Specifically, given two feature vectors of a same
pattern, a and b, after normalization, they are combined by
a super-vector
!
a
g¼
:
b
Then, the traditional LDA is employed for a second
feature extraction. If the two sets of feature vectors a and b
are chosen as c-dimensional PCA-based feature vectors and
KPCA-based feature vectors, the resulting method based on
the above strategy is called Serial Fisherfaces in this paper.
Here, it should be pointed out that Serial Fisherfaces is
different from the method proposed in Ref. [9], where the
two sets of features involved in fusion are shape- and
texture-based features. Since extraction of the shape- and
texture-based features needs manual operations, Liu’s
combined Fisher classifier method [9] is semi-automatic.
J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044
Table 1
The two-letter strings in the image name indicate the kind of imagery [22]
Two-letter
mark
Pose angle
(degrees)
Description
Number of
subjects
ba
bj
0
0
200
200
bk
0
bd
þ25
Frontal ‘b’ series
Alternative expression
to ‘ba’
Different illumination
to ‘ba’
Subject faces to his left which
is the photographer’s right
be
bf
þ15
215
bg
225
1041
eyes and, the cropped image was resized to 80 £ 80 pixels
and pre-processed by histogram equalization. Some
example images of one person are shown in Fig. 2.
4.1. Experiments
Subject faces to his right which
is the photographer’s left
200
200
200
200
200
While, the Serial Fisherfaces and Complex Fisherfaces
proposed are both automatic methods.
4. Experiments and analysis
The FERET face image database is a result of the FERET
program, which was sponsored by the Department of
Defense through the DARPA Program [21,22]. It has
become a standard database to test and evaluate state-of-theart face recognition algorithms.
The proposed algorithm was applied to face recognition
and tested on a subset of the FERET database. This subset
includes 1400 images of 200 individuals (each individual
has seven images). It is composed of the images whose
names are marked with two-character strings: ‘ba’, ‘bj’,
‘bk’, ‘be’, ‘bf’, ‘bd’, and ‘bg’. These strings indicate the
kind of imagery as shown in Table 1. This subset involves
variations in facial expression, illumination, and pose
(^ 158 and ^ 258). In our experiment, the facial portion of
each original image was cropped based on the location of
In our experiment, three images of each subject are
randomly chosen for training, while the remaining images
are used for testing. Thus, the total number of training
samples is 600 and the total number of testing samples is
800. Fisherfaces [6], Kernel Fisherfaces [16], Complex
Fisherfaces, and Serial Fisherfaces are, respectively, used
for feature extraction. Like that in Complex Fisherfaces and
Serial Fisherfaces, 200 principal components ðm ¼ c ¼
200Þ are chosen in the first phase of Fisherfaces and Kernel
Fisherfaces. Yang [16] has demonstrated that a second or
third order polynomial kernel suffices to achieve good
results with less computation than other kernels. So,
for consistency with Yang’s studies, the polynomial kernel,
kðx; yÞ ¼ ðx·y þ 1Þq ; is adopted ðq ¼ 2Þ here for all kernelrelated methods. Finally, a minimum Euclidean distance
classifier is employed. Note that in the complex fusion space
(Unitary space), the Euclidean distance between sample Z
and pattern class l is defined by
qffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
gl ðZÞ ¼ ðZ 2 ml ÞH ðZ 2 ml Þ
ð20Þ
where ml denotes the mean vector of the training samples in
class l: The decision rule of the minimum distance classifier
is: if sample Z satisfies gk ðZÞ ¼ minl gl ðZÞ; then Z belongs
to class k: The classification results are illustrated in Fig. 3.
Note that in Fig. 3, for each method, we only show the
recognition rates within the interval where the dimension
(the number of features) varies from 6 to 26. This is because
the maximal recognition rates of Fisherfaces, Kernel
Fisherfaces and Complex Fisherfaces all occur within this
interval, and their recognition rates begin to reduce after the
dimension is over 26. Here, we pay less attention to Serial
Fig. 2. Images of one person in FERET database. (a) Original images, (b) cropped images corresponding to images in (a).
1042
J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044
Fig. 3. The recognition rates of Fisherface, Kernel Fisherface, Complex
Fisherfaces, and Serial Fisherfaces on test 1.
Fisherfaces because its recognition rates are overall below
70%.
From Fig. 3, we can see that the performance of Complex
Fisherfaces is consistently better than Fisherfaces and
Kernel Fisherfaces. Fisherfaces can only utilize the linear
discriminant information while kernel Fisherfaces can only
utilize the nonlinear discriminant information. Whereas,
Complex Fisherfaces is able to make use of these two kinds
of discriminant information, which turn out to be complimentary for achieving a better result. In comparison, the
performance of Serial Fisherfaces is not satisfying. Its
recognition rates are overall lower than those of Fisherfaces
and Kernel Fisherfaces.
Now, a question is: is the above result with respect to
the choice of training set? In other words, if another set of
training samples are chosen at random, would we obtain a
similar result? To answer this question, we repeat the
above experiment ten times. In each time, the training
sample set is selected at random so that the training
sample sets are different for ten tests (correspondingly, the
testing sets are also different). For each method and four
different dimensions (dimension ¼ 18, 20, 22, 24, respectively), the mean and standard deviation of the recognition
rates across ten tests are illustrated in Fig. 4. Note that we
chose dimension ¼ 18, 20, 22, 24 because it can be seen
from Fig. 3 that the maximal recognition rates of
Fisherfaces, Kernel Fisherfaces and Complex Fisherfaces
all occur in the interval where the dimension varies from
18 to 24. Also, for each method mentioned above, the
average recognition rates across ten tests and four
dimensions are listed in Table 2. Moreover, Table 2
also gives the results of Eigenfaces [3] and Kernel
Eigenfaces [16].
Fig. 4. The mean and standard deviation of the recognition rates of Fisherface, Kernel Fisherface, Complex Fisherfaces, and Serial Fisherfaces across 10 tests
when the dimension ¼ 18, 20, 22, 24, respectively.
J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044
Table 2
The average recognition rates (%) of Eigenfaces, Kernel Eigenfaces,
Fisherfaces, Kernel Fisherfaces, Complex Fisherfaces, and Serial Fisherfaces across ten tests and four dimensions (18, 20, 22, 24)
Eigenfaces Kernel
Fisherfaces Kernel
Complex
Serial
Eigenfaces
Fisherfaces Fisherfaces Fisherfaces
17.53
16.94
77.87
77.16
80.30
66.96
Fig. 4 shows that Complex Fisherfaces outperforms
Fisherfaces and Kernel Fisherfaces irrespective of different
training sample sets and varying dimensions. The mean
recognition rate of Complex Fisherfaces is over 2% higher
than those of Fisherfaces and Kernel Fisherfaces and, its
standard deviation is always between or below those of
Fisherfaces and Kernel Fisherfaces. However, Serial Fisherfaces does not perform well across all these trials and
dimensional variations. Its mean recognition rate is lower
while its standard deviation is larger than other methods.
Table 2 shows Fisherfaces, Kernel Fisherfaces and Complex
Fisherfaces are all significantly superior to Eigenfaces and
Kernel Eigenfaces. This indicates that LDA (or KFD) is
really helpful for improving the performance of PCA (or
KPCA) for face recognition.
4.2. Comparison analysis: why does Complex Fisherfaces
outperform Serial Fisherfaces?
Why does Complex Fisherfaces perform better than
Serial Fisherfaces? In our opinion, the underlying reason is
that the parallel feature fusion strategy based on complex
vectors is more suitable for the small sample size problem
like face recognition than the serial strategy based on supervectors. For small sample size problems, the higher the
dimension of feature vector is, the more difficult it is to
evaluate the scatter matrices accurately. If we combine two
sets of features of a sample serially by a super-vector, the
dimension of feature vector will increase double. For
instance, the resulting super-vector
!
a
g¼
b
is 400 dimensional after two 200-dimensional a and b are
combined in the above experiment. Thus, after serial
combination, it becomes more difficult to evaluate the
scatter matrices (based on a relatively small number of
training samples) than before. Whereas, the parallel feature
fusion strategy based on complex vectors can avoid this
difficulty since the dimension keeps invariable after
combination. In our experiments, using parallel strategy,
the size of the scatter matrices is still 200 £ 200, just like
that before fusion. However, the size of the scatter matrices
becomes 400 £ 400 as the serial strategy is used. Taking the
characteristic of complex matrices into account, the amount
of data needing evaluation using complex (parallel)
1043
Table 3
Ten smallest eigenvalues of the within-class scatter matrix in two different
fusion spaces
Strategy of
combination
Serial fusion
(unit:
1 £ 1025)
Complex
fusion (unit:
1 £ 1023)
1
2
2.45
15.1
2.26
13.6
3
2.14
11.7
4
5
6
7
8
9
10
1.94
1.90
1.72
1.69
1.44
1.37
1.13
9.5
7.9
7.0
5.8
4.7
3.9
2.6
combination is only a half of that using serial combination.
Note that for an l £ l complex matrix, there are 2l2 real
numbers involved since one complex element is actually
formed by two real numbers. Consequently, it is much
easier to evaluate the corresponding scatter matrices using
Complex Fisherfaces than Serial Fisherfaces.
Actually, a more specific explanation can be given from
the magnitude criterion point of view [9,20]. Liu and
Wechsler [9,20] thought the trailing eigenvalues of the
within-class scatter matrix should not be too small, for good
generalization. Let us calculate 10 smallest eigenvalues of
the within-class scatter matrix in two different fusion spaces
and list them in Table 3. Table 3 shows the trailing
eigenvalues of the within-class scatter matrix in the serial
fusion space is much less than those in the complex (parallel)
fusion space. This fact indicates that Complex Fisherfaces
should have better generalization than Serial Fisherfaces.
5. Conclusion
In this paper, an effective feature fusion framework—
Complex Fisherfaces—was developed for face recognition.
In this framework, PCA-based linear features and KPCAbased nonlinear features are integrated by complex vectors
and, the developed Complex LDA is further employed for
feature extraction and fusion. The resulting Complex LDA
discriminant features contain two kinds of discriminant
information, the linear one and the nonlinear one. This
characteristic makes Complex Fisherfaces more powerful
than Fisherfaces and Kernel Fisherfaces.
We also compared the performance of Complex Fisherfaces with that of Serial Fisherfaces and, analyzed why
Serial Fisherfaces could not achieve a satisfying
performance. Serial Fisherfaces combine two sets of
features via super-vectors, which increase the dimension
double. The increase of dimension makes it more difficult to
evaluate the scatter matrices accurately and renders the
trailing eigenvalues of the within-class scatter matrix too
small. These small trailing eigenvalues capture more noises
and make the generalization of Serial Fisherfaces poor. In
comparison, Complex Fisherfaces can avoid these disadvantages and thus has good generalization.
1044
J. Yang et al. / Image and Vision Computing 21 (2003) 1037–1044
Acknowledgements
This work is partially supported by the National
Science Foundation of China under Grant No.
60072034. It is also partially supported by grants
TIC2002-04495-C02 from the same Spanish Ministry of
Science and Technology (MCyT). Jian Yang and
Alejandro F. Frangi are also supported by a Ramón y
Cajal Research Fellowship from MCyT. Finally, we
would like to thank the anonymous reviewers for their
constructive advice.
References
[1] L. Sirovich, M. Kirby, Low-dimensional procedure for characterization of human faces, J. Opt. Soc. Am. 4 (1987) 519–524.
[2] M. Kirby, L. Sirovich, Application of the KL procedure for the
characterization of human faces, IEEE Trans. Pattern Anal. Mach.
Intell. 12 (1) (1990) 103– 108.
[3] M. Turk, A. Pentland, Eigenfaces for recognition, J. Cogn. Neurosci. 3
(1) (1991) 71–86.
[4] A. Pentland, Looking at people: sensing for ubiquitous and wearable
computing, IEEE Trans. Pattern Anal. Mach. Intell. 22 (1) (2000)
107–119.
[5] D.L. Swets, J. Weng, Using discriminant eigenfeatures for image
retrieval, IEEE Trans. Pattern Anal. Mach. Intell. 18 (8) (1996)
831–836.
[6] P.N. Belhumeur, J.P. Hespanha, D.J. Kriegman, Eigenfaces vs.
Fisherfaces: recognition using class specific linear projection, IEEE
Trans. Pattern Anal. Mach. Intell. 19 (7) (1997) 711 –720.
[7] Z. Jin, J.-Y. Yang, Z.S. Hu, Z. Lou, Face recognition based on
uncorrelated discriminant transformation, Pattern Recogn. 33 (7)
(2001) 1405–1416.
[8] L.F. Chen, M. Liao, M.T. Ko, J.C. Lin, G.J. Yu, A new LDA-based
face recognition system which can solve the small sample size
problem, Pattern Recogn. 33 (10) (2000) 1713–1726.
[9] C.J. Liu, H. Wechsler, A shape-and texture-based enhanced Fisher
classifier for face recognition, IEEE Trans. Image Process. 10 (4)
(2001) 598–608.
[10] J. Yang, J.-Y. Yang, Why can LDA be performed in PCA transformed
space?, Pattern Recogn. 36 (2) (2003) 563–566.
[11] J. Yang, J.-Y. Yang, Optimal FLD algorithm for facial feature
extraction, SPIE Proc. Intell. Robots Comput Vision XX: Algorithms,
Techniques, Active Vision, October 4572 (2001) 438 –444.
[12] B. Scholkopf, A. Smola, K.R. Muller, Nonlinear component analysis
as a kernel eigenvalue problem, Neural Comput. 10 (5) (1998)
1299–1319.
[13] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, K.-R. Müller, Fisher
Discriminant Analysis with Kernels, IEEE International Workshop on
Neural Networks for Signal Processing IX, Madison (USA), August,
1999, pp. 41–48.
[14] S. Mika, G. Rätsch, J. Weston, B. Schölkopf, A. Smola, K.-R. Müller,
Constructing descriptive and discriminative non-linear features:
Rayleigh coefficients in kernel feature spaces, IEEE Trans. Pattern
Anal. Mach. Intell. 25 (5) (2003) 623 –628.
[15] G. Baudat, F. Anouar, Generalized discriminant analysis using a
kernel approach, Neural Comput. 12 (10) (2000) 2385–2404.
[16] M.H. Yang, Kernel Eigenfaces vs. Kernel Fisherfaces: Face
Recognition Using Kernel Methods, Proceedings of the Fifth IEEE
International Conference on Automatic Face and Gesture Recognition
(RGR’02), Washington DC, May, 2002, pp. 215 –220.
[17] J. Yang, J.-Y. Yang, D. Zhang, J.F. Lu, Feature fusion: parallel
strategy vs. serial strategy, Pattern Recogn. 36 (6) (2003) 1369–1381.
[18] G.H. Golub, C.F. Van Loan, Matrix Computations, Third ed., The
Johns Hopkins University Press, Baltimore, 1996.
[19] J. Yang, J.-Y. Yang, D. Zhang, What’s wrong with the Fisher
criterion?, Pattern Recogn. 35 (11) (2002) 2665–2668.
[20] C.J. Liu, H. Wechsler, Robust coding schemes for indexing and
retrieval from large face databases, IEEE Trans. Image Process. 9 (1)
(2000) 132 –137.
[21] P.J. Phillips, H. Moon, S.A. Rizvi, P.J. Rauss, The FERET evaluation
methodology for face-recognition algorithms, IEEE Trans. Pattern
Anal. Mach. Intell. 22 (10) (2000) 1090–1104.
[22] P.J. Phillips, The Facial Recognition Technology (FERET) Database,
http://www.itl.nist.gov/iad/humanid/feret/feret_master.html.
Jian Yang was born in Jiangsu, China, on 3rd June 1973. He obtained
his Bachelor of Science in Mathematics at the Xuzhou Normal
University in China in 1995. He then continued on to complete a
Masters of Science degree in Applied Mathematics at the Changsha
Railway University in 1998 and his PhD at the Nanjing University of
Science and Technology in the Department of Computer Science on the
subject of Pattern Recognition and Intelligence Systems in 2002. From
March to September in 2002, he worked as a research assistant at
Department of Computing, Hong Kong Polytechnic University. Now,
he is a postdoctoral research fellow at the University of Zaragoza and is
affiliated with the Division of Bioengineering of the Aragon Institute of
Engineering Research (I3A). He is the author of more than 20 scientific
papers in pattern recognition and computer vision. His current research
interests include face recognition and detection, handwritten character
recognition and data fusion.
Jing-yu Yang received the BS Degree in Computer Science from
Nanjing University of Science and Technology (NUST), Nanjing,
China. From 1982 to 1984, he was a visiting scientist at the Coordinated
Science Laboratory, University of Illinois at Urbana-Champaign. From
1993 to 1994, he was a visiting professor at the Department of
Computer Science, Missuria University. And in 1998, he acted as a
visiting professor at Concordia University in Canada. He is currently a
professor and Chairman in the department of Computer Science at
NUST. He is the author of over 100 scientific papers in computer
vision, pattern recognition, and artificial intelligence. He has won more
than 20 provincial awards and national awards. His current research
interests are in the areas of pattern recognition, robot vision, image
processing, data fusion, and artificial intelligence.
Alejandro F. Frangi obtained his MSc degree in Telecommunication
Engineering from the Universitat Politècnica de Catalunya, Barcelona
in 1996 where he subsequently did research on Electrical Impedance
Tomography for image reconstruction and noise characterization. In
2001, he obtained his PhD at the Image Sciences Institute of the
University Medical Center Utrecht on model-based cardiovascular
image analysis. The same year, Dr Frangi moved to Zaragoza, Spain.
He is now Assistant Professor at the University of Zaragoza and is
affiliated with the Division of Biomedical Engineering of the Aragon
Institute of Engineering Research (I3A), a multidisciplinary research
institute of the University of Zaragoza. He is a member of the Computer
Vision Group and his main research interests are in computer vision and
medical image analysis with particular emphasis in model- and
registration-based techniques, and statistical methods. Dr Frangi has
recently been awarded the Ramón y Cajal Research Fellowship, a
national program of the Spanish Ministry of Science and Technology.