Semi-supervised Discriminant Analysis Lishan Qiao 2009.03.13 Outline • Motivation • Locality Preserving Regularization based… – Laplacian Linear Discriminant Analysis(LapLDA)[1] – Semi-supervised Discriminant Analysis(SDA)[2] – Comments: Does Locality Preserving Reg. really work? • Opitimization based… – Semi-supervised Discriminant Analysis Via CCCP(SSDACCCP)[3] • Conclusion [1] J.H.Chen, J.P.Ye, Q.Li, Integrating global and local structures: A least squares framework for dimensionality reduction, CVPR07 [2] D.Cai, X.F.He, J.W.Han, Semi-supervised discriminant analysis, ICCV07 [3] Y. Zhang, D.Y.Yeung, Semi-supervised Discriminant Analysis Via CCCP, ECML PKDD 08 Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 Motivation Why to extend LDA Linear Discriminant Analysis (LDA) is popular supervised DR method. Objective function: However, wT Sb w max T w St w Small Sample Size (SSS) rank ( S w ) n c, rank ( St ) n 1 PseudoLDA, PCA+LDA, NullLDA, RLDA,… 2DLDA, TensorLDA,… Global DR method LapLDA, SDA, SSLDA Completely supervised method SDA, SSLDA, SSDACCCP Besides, Semi-supervised Learning Co-Training Transductive, e.g. Label Propagation Inductive, e.g. LapSVM … Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 LapLDA Motivation & Objective function Motivation: LDA captures the global geometric structure of the data by simultaneously maximizing the between-class distance and minimizing the within-class distance. However, local geometric structure has recently been shown to be effective for dimensionality reduction. wT Sb w max T w St w wT XLX T w Objective function: L D W LapLDA = LDA + LPP wij xi xj exp( || xi x j || 2 / 2 2 ) , if xi is among kNN of x j wij or if x j is among kNN of xi 0 , otherwise Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 LapLDA Experiments & Discussion 0.81 RLDA 0.808 Accuracy 90.90 0.61 ? 0.806 82.27 Letter (a-m) 0.804 ↑4.54 K=1,2,3,5,10,15,20 0.802 0.8 1.72 0 2 4 6 8 10 K 81.02 1.31 88.67 2.32 12 wT Sb w max T (LapLDA) w St w wT XLX T w 14 16 18 20 wT Sb w max T (RLDA) w St w wT Iw Does locality preserving Regularizer really work? It seems to only play the role of Tikhonov Regularizer!! Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 SDA Motivation & Objective function Motivation: The labeled data points are used to maximize the separability between different classes and the unlabeled data points are used to estimate the intrinsic geometric structure of the data. Objective function: wT Sb w max T w St w wT XLX T w || w || SDA=RLDA+LPP=LapLDA+Tikhonov Reg.=LDA+LPP+Tikhonov Reg. Globality Preserving DA: wT Sb w wT XX T w max wT St w || w || Only 1 labeled training sample per class wT XX T w max wT St w Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 SDA Experiments & Discussion WOptions = []; WOptions.Metric = 'Cosine'; WOptions.NeighborMode = 'KNN'; WOptions.k = 2; WOptions.WeightMode = 'Cosine'; WOptions.bSelfConnected = 0; WOptions.bNormalized = 1; options = []; options.ReguType = 'Ridge'; options.ReguAlpha = 0.01; options.beta = 0.1; 1 labeled + 29 unlabeled 32.7 2.3 1 labeled + 1 unlabeled 37.5 3.1 Semi-supervised Discriminant Analysis Lishan Qiao No any parameter! 2009-3 Discussion About Locality Preserving Reg. 1) Graph Construction Although the graph is at the heart of graph-based semi-supervised learning methods, its construction has not been studied extensively. [X. Zhu, SSL_survey, 05-08] exp( || xi x j || 2 / 2 2 ) , if xi is among kNN of x j wij or if x j is among kNN of xi 0 , otherwise Issue 1 Curse of dimensionality For example, the face space is estimated to have at least 100 dimensions [4] [4] M. Meytlis, L. Sirovich. On the dimensionality of face space. PAMI, 29(7):1262–1267, 2007 Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 Discussion About Locality Preserving Reg. Issue 2 The performance of classification relies heavily on how well the nearest neighbor criterion works in the original high-dimensional space[5]. 0.84x103 0.92x103 1.90x103 0.7 ↑ 0.6 ■ 0.5 ☆ 0.4 □ ☆ □ 4.35% LDA 0.3 0.2 ★ 0.1 0 0 10 20 30 40 50 60 70 [5] H. T. Chen, H. W. Chang, and T. L. Liu, Local discriminant embedding and its variants. CVPR, 2005. Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 Discussion About Locality Preserving Reg. Issue 3 Difficulty of Parameter selection Cross-validation? [6] [6] D. Zhou, O. Bousquet, B. Scholkopf. Learning with Local and Global Consistency.NIPS,2004 2) Parameter model vs. non-parametric model LDA: wT Sb w max T w St w RLDA: wT Sb w max T w St w wT Iw gpDA: wT XX T w max wT St w wT Sb w LapLDA: max T w St w wT XLX T w SDA: Semi-supervised Discriminant Analysis wT Sb w max T w St w wT XLX T w || w || Lishan Qiao 2009-3 Related Works algorithms Semi-supervised DR Discriminative term Fisher RLDA √ LapLDA[1] √ SDA[2]/SSLDA √ MMC Pairwise Regularization term Tikhonov Globality Locality λ √ √ √ SSDR SSDRL √ SSMMC √ λ, K, σ √ α,β,K,σ α,β √ λ √ w T XX T w Sparsity preserving “regularization” Semi-supervised Discriminant Analysis √ √ √ || w || p Parameters Lishan Qiao 2009-3 α,β,K,σ w T XLX T w Outline • Motivation • Locality Preserving Regularization based… – Laplacian Linear Discriminant Analysis(LapLDA)[1] – Semi-supervised Discriminant Analysis(SDA)[2] – Comments: Does Locality Preserving Reg. really work? • Opitimization based… – Semi-supervised Discriminant Analysis Via CCCP(SSDACCCP)[3] • Conclusion [1] J.H.Chen, J.P.Ye, Q.Li, Integrating global and local structures: A least squares framework for dimensionality reduction, CVPR07 [2] D.Cai, X.F.He, J.W.Han, Semi-supervised discriminant analysis, ICCV07 [3] Y. Zhang, D.Y.Yeung, Semi-supervised Discriminant Analysis Via CCCP, ECML PKDD 08 Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 SSDACCCP Motivation LDA: × × × × ■ × × × × ★ × × × x1 C 10 10 0 0 xl 0 0 1 xl 1 ? ? ? xn Semi-supervised Discriminant Analysis 12 ?? Lishan Qiao ? 2009-3 SSDACCCP x1 12 C 10 10 0 0 xl 0 0 1 xl 1 0? 1? 0? xn 0 ?0 ? 1? Formulation x1 12 C 10 10 0 0 xl 0 1 0 A [ A1 , A2 ,, AC ] xl 1 0 1 0 xn 00 D [ X 1 , X 2 ,, X C ] 1 C Sb nk (mk m)( mk m)T k 1 St DD T Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 nk AkT 1n m D1n / n mk DAk / nk SSDACCCP Formulation 1) trace(A B) trace( A) trace( B) 2) trace( AB) trace( BA) 3) trace( ABC ) trace( BCA) trace(CAB) Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 SSDACCCP Formulation max A Without loss of generality, xT Sx max x,t t xT Sx min const x ,t t g (x ) Semi-supervised Discriminant Analysis D.C. Programming h(x) Lishan Qiao 2009-3 SSDACCCP Semi-supervised Discriminant Analysis CCCP Lishan Qiao 2009-3 SSDACCCP CCCP g (x ) h(x) xp x p 1 Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 SSDACCCP Semi-supervised Discriminant Analysis CCCP Lishan Qiao 2009-3 SSDACCCP Formulation xT Sx max x,t t xT Sx min const x ,t t g (x ) xT Sx h ( x, t ) t h0 [( 2Sx p tp gradient h [( T T x xp x p Sx p T ) , 2 ] tp t tp 2Sx p tp T ) , h(x) xTp Sx p t 2 p First-order Taylor expansion ] Omit constant term Semi-supervised Discriminant Analysis ( Lishan Qiao 2Sx p tp ) x 2009-3 T xTp Sx p t 2p t SSDACCCP Semi-supervised Discriminant Analysis Formulation Lishan Qiao 2009-3 SSDACCCP Semi-supervised Discriminant Analysis Experiments Lishan Qiao 2009-3 SSDACCCP Semi-supervised Discriminant Analysis Experiments Lishan Qiao 2009-3 Conclusion 1. Data-dependent Regularizer The power of the Locality Preserving Reg. was somewhat overstated. 2. Label estimation via optimization × × × × × × × × ■ × × × × × × × × × × × × × × × × ★ × × × × × × × The prior from the practical problem is paramount important. Semi-supervised Discriminant Analysis Lishan Qiao 2009-3 Thanks! Semi-supervised Discriminant Analysis Lishan Qiao 2009-3