Kernel Discriminant Analysis Based on Canonical Difference for Face Recognition in Image Sets Wen-Sheng Chu (朱文生) Ju-Chin Chen (陳洳瑾) Jenn-Jier James Lien (連震杰) Robotics Lab, CSIE NCKU http://robotics.csie.ncku.edu.tw CVGIP 2007 Motivation • Challenges of face recognition – Facial variations illumination pose facial expression • Face recognition using image sets – Surveillance – Video retrieval 2 Why Multi-view Image Sets? • Multiple facial images contain more information than a single image. Person A Person B Person A Person B A or B? Single input pattern (Single-to-many) Multiple input patterns (Many-to-many) 3 For subjecti Image Set 1 Training/Testing Data: Facial Expression … Image Set 2 … Image Set 3 … Image Set 4 … Training Testing Image Set 5 … 4 For subjectj More Training/Testing Data: Illumination (Yale B) Image Set 1 … Image Set 2 … Image Set 3 … Image Set 4 … Training Testing Image Set 5 … 5 System Overview Subject N ... ... ... Training image sets {X1,…,Xm} Xm Xm-1 Xm-2 Subject 1 X3 Data ... X2 ... X1 Training Kernel Subspace Generation (Total m subspaces) Pi e1i ,..., edi ... Testing image set Xtest Testing Data ... Kernel Subspace Generation Ptest Kernel Discriminant Transformation Training Process (KDT) X Pi Testing Process T Reference Subspace Reftest Reference Subspace: Refi=TTPi Output result Identification 6 32 Training Process Subject N ... ... ... Xm Xm-1 Xm-2 Training image sets {X1,…,Xm} Subject 1 ... X3 Data ... X2 ... X1 Training Xi={ 32 32 , 32 1 32 ,…, 32 2 ni ni ~= 100 Testing image set Xtest ... Kernel Subspace Generation Pi Kernel P i (Total m subspaces) Discriminant Pi={ei1,…,eid} Training Process Transformation (KDT) X Reference Subspace: Refi=TTPi Kernel Subspace Generation Ptest Testing Process Reference Subspace Reftest Identification result 7 } Kernel Subspace Generation (KSG) i x i i i x Nonlinear ni x1 x 2 xn i 1 i Xi mapping function … 32 x 32 Xi … h ni ni Kernel subspace of Xi (d<ni) edi e1i Kernel matrix Kij nj Pi ni … 8 KSG: Kernel PCA (KPCA) X1={ , ,…, 1 } n1 2 , 1 ,…, } nm 2 P1= e11,..., e1p ,..., e1d … Xm={ … … Image Set Xi K11 Kernel Matrix Kii Kmm ,d < ni Kernel Subspace Pmi m m Pm= e1 ,..., e p ,..., ed • From the theory of reproducing kernels, Dimensionality may be ∞ ! eip i-th image set ni T SVD i i i i a sp (x s ) where Kii a a s 1 s-th image of i-th image set KPCA: B Scholkopf, A Smola, KR Muller - Advances in Kernel Methods-Support Vector Learning, 1999 9 Training Process: Kernel Discriminant Transformation (KDT) Subject N ... ... ... Training image sets {X1,…,Xm} Xm Xm-1 ... X3 Xm-2 ... X2 ... X1 Kernel Subspace Generation (Total m subspaces) Pi e1i ,..., edi Subject 1 X Pi T Testing image set Xtest ... Kernel Subspace Generation Ptest Kernel Discriminant Transformation (KDT) Reference Subspace: Refi=TTPi Testing Process Reference Subspace Reftest Identification result 10 KDT: Main Idea • Based on the concept of LDA, KDT is derived to find a transformation matrix T. • We proposed an iterative process to optimize T. • Dimensionality of T is assumed to be w. d x w transformation matrix (KDT matrix) T arg max T 32x32-dim 1 2 TT S B T TT SW T KPCA 1 2 Subspace w-dim d-dim KPCA Within Subjects T 1 2 How to measure Between Subjects similarity? 11 KDT: Canonical Difference (CD) – Similarity Measurement Kernel subspace P1 ↓ Canonical subspace C1 Kernel subspace P2 ↓ Canonical subspace C2 d1 d2 u1 u2 v1 v2 • Capture more common views and illumination than eigenvectors. 12 KDT: CD – Canonical Vector v.s. Eigenvector (cont.) eigenvectors B1 B2 B1-B2 canonical vectors A similarity measurement of two subspaces C1 C2 C1-C213 KDT: CD – Canonical Subspace (cont.) • Consider SVD on orthonormal basis matrices B1 and B2: d-dimensinoal orthonormal basis matrices d eigenvector SVD T B1 B2 12T21 d 2 2 CanonicalD iff i, j d r ur vr r 1 r 1 trace Ci C j T Ci C j 0 ≦ Eigenvalue = cos2θi ≦ 1 T T 12 B1 B221 C1T C2 C1 B112 C2 B221 Similarity measurement canonical subspaces (also orthonormal) T.K. Kim, J. Kittler and R. Cipolla, “Discriminative Learning and Recognition of Image Set Classes Using Canonical Correlations”, IEEE Trans. on PAMI, 2007 14 KDT: KDT Matrix Optimization Kernel Subspace Pi e1i ,..., edi KDT Matrix T Reference Subspace Canonical Subspace Ref i TT Pi Ci Based on LDA Kernel subspace Canonical Difference Iterative learning CanonicalD iff(i,j) • Orthonormal basis matrices are required to obtain canonical subspaces Ci. • Is Refi normalized? Usually not! 15 KDT: Kernel Subspace Normalization • QR-decomposition is performed to obtain two orthonormal basis matrices. w × d orthonormal matrix d × d invertible upper triangular matrix T T Pi Qi Ri 1 Qi T Pi Ri SVD T T T T Qi Q j ij ji , Qi ij Q j ji T Ci Qi ij , C j Q j ji 16 KDT: Formulation Canonical Subspace Ci Qi ij , C j Q j ji C C traceQ Q Q Q Derivation traceT P P P P T CanonicalDiff (i, j ) trace C i C j Qi = TTPiRi-1 T i j T i ij j ji i ij j ji T T i ij j ji i ij j ji SB, Sw m S B i 1 B (Pi i Pi )(Pi i Pi )T , Bi | X Ci i m m T i, iff i 1 Bi CanonicalD SW i 1 kWT (Parg P )( P P ) k ki ,Wi k | X k Ci max k ki i ik i i ik m T i 1 kW CanonicalD iff i, k i arg max T trace TT SW T trace TT S BT Form of LDA 17 KDT: Solution S B i 1B (Pi i Pi )(Pi i Pi )T i m SW i 1 kW (Pi ik Pk ki )( Pi ik Pk ki )T m i Contain the info of Dimensionality may be TT S B T T T T SW T T={t1,…,tq,…,tw} tq M αuq xu u 1 ! αT Vα J (α) T α Uα 18 TT S B T TT SW T Using the theory of reproducing kernels again: M T t1,..., t q ,..., t w where t q αuq xu u 1 T ( P P )( P P )T T SWmT i 1 kWi TDerivation i ik Tk ki i ik k ki T Z Z Z Z V i 1 B i i i i T m i Replace T T using kernel trick m U TkWiαTZUki α Zik Zki Zik TTi S1W ni i rpwe cani obtain TT S T αT Vα. d similar steps, ZFollowing a B ij up r 1 s 1 sr ij k x u , x s J (α) αT Vα αT Uα 19 KDT: Numerical Issues J (α) • αT Vα T α Uα α α is solved by simply computing the leading eigenvectors of U-1V. • To make sure that U is positive-definite, we regularize U by Uμ (μ=0.001) where Uμ U I 20 Training Process Subject N ... ... ... Training image sets {X1,…,Xm} Xm Subject 1 Xm-1 ... X3 Xm-2 ... X2 ... X1 Kernel Subspace Generation (Total m subspaces) Testing image set Xtest ... Kernel Subspace Generation Ptest Kernel T P Ref given by i i = T Pi where each element Testing is Process Discriminant i M ni T Transformation Pi e1i ,..., edi T Pi qp u 1s 1αuqa spk xu , xis . (KDT) X Reference Subspace T Reftest Reference Subspace: Refi=TTPi Identification result 21 Testing Process Subject N ... ... ... Training image sets {X1,…,Xm} Xm Xm-1 ... X3 Xm-2 ... X2 ... X1 Kernel Subspace Generation (Total m subspaces) Pi e1i ,..., edi X Testing image set Xtest Subject 1 Pi T ... Kernel Subspace Generation Ptest Kernel Discriminant Transformation (KDT) Reference Subspace: Refi=TTPi T X Testing Process Reference Subspace Reftest=TTPtest Identification result 22 Training List #individual (N) 32 #image set/individual #image/set (ni) size of normalized template KMSM 3 ~100 32x32 30 dimensionality KCMSM DCC KDT σ of Gaussian kernel function μ for regularization 30 20 30 0.05 10-3 23 Training: Convergence of Jacobian Value • J(α) tends to converge to a specified value under different initializations. 24 Testing: Comparison with Other Methods • The proposed KDT is compared to 3 related methods under 10 randomly chosen experiments. – – – – KMSM (avg=0.837) KCMSM (0.862) DCC (0.889) KDT (0.911) 25 Conclusions • Canonical differences is provided as a similarity measurement between two subspaces. • Based on canonical difference, we derived a KDT and applied it to a proposed face recognition system. • Our system is capable of recognizing faces using image sets against facial variations. 26 Thanks for your attention 27 Related Works • Mutual subspace method (MSM) • Constrained MSM (CMSM) Subspace V Subspace U project θ project Constrained Subspace Uc θc Vc • Discriminantive canonical correlation (DCC) • Kernel MSM (KMSM), Kernel CMSM (KCMSM) 28 Mutual Subspace Method (MSM) • Utilize the canonical angles for similarity. Subspace B1 Subspace B2 u1 Eigenvectors … u θ v u2 θ2 θ1 v1 v2 … similarity (u, v) cos 2 1 n similarity (U ,V ) cos 2 n i 1 29 K. Fukui and O. Yamaguchi, “Face Recognition Using Multi-viewpoint Patterns for Robot Vision”, ISRR 2003 Perform KDT on Subspace? • By KPCA, we can obtain Pi Rhd s.t. Xi T Xi Pi PiT • Multiply T to both sides of equal sign, T X T X T TP TP T T T i T i i T i • It can be observed that the kernel subspace of transformed mapped image sets is equivalent to applying T to the original kernel subspace. 30 KDT Optimization S B i 1B (Pi i Pi )(Pi i Pi )T i m SW i 1 kW (Pi ik Pk ki )( Pi ik Pk ki )T m i • Using the theory of reproducing kernels again: M T={t1,…,tq,…,tw} where t q αuq xu u 1 ~ Pij Piij e~1ij ,, e~dij where e~pij d ni i rp i a ( x r 1s1 sr ij s ) ~ TT Pij αZij where Zij dr1 nsi1a isr ijrp k x u , x is up TT SW T αT Uα, where U im1kW Zki Zik Zki Zik T i • Following similar steps, we can obtain TT S BT αTVα α T Vα That is, T T S B T T T SW T J (α) α T Uα 31 Training: Dimensionality w of KDT V.S. Identification Rate • The identification rate is guaranteed to be greater than 90% after w > 2,200. 32 Training: Similarity Matrix 0 1 1 1 1st iteration 10th iteration 0 32 C1T C2 Similarity • Similarity matrix behaves better after 10-times iterative learning. 32 33 KSG: Kernel Matrix • Gaussian kernel function: xi x j s r i j k (x s , x r ) exp 2 2 • Kernel matrix Kij: the correlation between i-th image set and j-th image set. j-th image set nj 1 2 r i-th image set s 1 2 … ni ni ... nj Kij Kernel trick Kij sr k x is,x rj T x is x rj s 1,..., ni , r 1,..., n j 34