Adaptive Graph Construction and Dimensionality Reduction Songcan Chen, Lishan Qiao, Limei Zhang http://parnec.nuaa.edu.cn/ {s.chen, qiaolishan, zhanglimei}@nuaa.edu.cn 2009. 11. 06 Outline • Why to construct graph? • Typical graph construction – Review & Challenges • Our works – (I) Task-independent graph construction (Related work: Sparsity Preserving Projections) – (II) Task-dependent graph construction (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Outline • Why to construct graph? • Typical graph construction – Review & Challenges • Our works – (I) Task-independent graph construction (Related work: Sparsity Preserving Projections) – (II) Task-dependent graph construction (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Graph is used to characterize data geometry (e.g., manifold) and thus plays an important role in data analysis including machine learning! For example, dimensionality reduction, semi-supervised learning, spectral clustering, … Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Dimensionality reduction Nonlinear manifold learning E.g., Laplacian Eigenmaps, LLE, ISOMAP 10 -10 Data Points (Swiss roll) 4-NN Graph 2D Embedding Result Linearized variants E.g., LPP, NPE, and so on (Semi-)supervised and/or Tensorized extensions Too numerous to mention one by one Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Dimensionality reduction Many classical DR algorithms E.g., PCA (Unsupervised), LDA (Supervised) [1] PCA LDA According to [1], most of the current dimensionality reduction algorithms can be unified under a graph embedding framework. [1] S.Yan, D.Xu, B.Zhang, H.Zhang, Q.Yang, S.Lin, Graph embedding and extensions: a general framework for dimensionality reduction, IEEE Trans. Pattern Anal. Mach. Intell. 29(1)(2007):40–51. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Semi-supervised learning Typical graph-based semi-supervised algorithms Local and global consistency Label propagation Manifold regularization … Data Points with 4-NN graph Transductive (e.g., Label Propagation) Inductive (e.g., Manifold Reg.) “Graph is at the heart of the graph-based semi-supervised learning methods” [1]. [1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Spectral clustering Typical graph-based clustering algorithms Graph cut Normalized cut … Clustering structure Manifold structure “Ncut on a kNN graph does something systematically different than Ncut on an ε-neighborhood graph! … shows that graph clustering criteria cannot be studied independently of the kind of graph they are applied to.”[1] [1] M. Maier, U. Luxburg, Influence of graph construction on graph-based clustering measures. NIPS, 2008 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Summary Dimensionality reduction Linear/nonlinear, local/nonlocal, parametric/nonparametric Semi-supervised learning Transductive/inductive Spectral clustering Clustering structure/manifold structure A well-designed graph tends to result in good performance [1]. How to construct a good graph? What is the right graph for a given data set? [1] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Summary Generally speaking, Despite its importance, “Graph construction has not been studied extensively” [1]. “The way to establish high-quality graphs is still an open problem” [2]. [1] X. Zhu, Semi-supervised learning literature survey. Technical Report, 2008. [2] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Why to construct graph? Summary Fortunately, graph construction problem has attracted increasingly attention, especially in this year (2009) For example, graph construction by sparse representation [1,2,3] or l1-graph. minimizing the weighted sum of the squared distance from each vertex to the weighted average of its neighbors [4]. b-matching graph [5] symmetry-favored criterion and assuming that the graph is doubly stochastic [6]. learning projection transform and graph weights simultaneously [7]. [1] L. Qiao, S. Chen, X. Tan, Sparsity preserving projections with applications to face recognition. Pattern Recogn, 2009 (Received on 21 July 2008) [2] S. Yan,H. Wang, Semi-supervised Learning by Sparse Representation. SDM, 2009 [3] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR, 2009. [4] S. I. Daitch, J. A. Kelner, D. A. Apielman, Fitting a graph to vector data, ICML 2009 [5] T. Jebara, J. Wang, S. Chang, Graph Construction and b-Matching for Semi-Supervised Learning. ICML, 2009. [6] W. Liu and S.-F. Chang, Robust Multi-class Transductive Learning with Graphs. CVPR, 2009 [7] L. Qiao, S. Chen, L. Zhang, A Simultaneous Learning Framework for Dimensionality Reduction and Graph Construction, submitted, 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Outline • Why to construct graph? • Typical graph construction – Review & Challenges • Our works – (I) Task-independent graph construction (Related work: Sparsity Preserving Projections) – (II) Task-dependent graph construction (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Typical graph construction Task-independent Two steps Graph construction Edge weight assignment Review Two basic characteristics Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… A basic flow for graph-based machine learning Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Typical graph construction Graph construction Review Edge weight assignment Two basic criteria k-nearest neighbor criterion (Left) ε-ball neighborhood graph (Right) Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Typical graph construction Review Graph construction Edge weight assignment Several basic ways Gaussian function (Heat kernel) Pij exp( || xi x j ||2 /2 2 ) Inverse Euclidean distance Pij || xi x j ||1 Local reconstructive relationship (involved in LLE) min { Pij } s.t. i || xi x j N k ( xi ) x j N k ( xi ) Pij x j ||2 Pij 1, i 1, Adaptive Graph Construction and Dimensionality Reduction ,n Songcan Chen 2009-11 Typical graph construction Review Several basic ways Gaussian function (Heat kernel) Inverse Euclidean distance Local reconstructive relationship (involved in LLE) min { Pij } s.t. i || xi x j N k ( xi ) x j N k ( xi ) Pij x j ||2 Pij 1, i 1, ,n Non-negative local reconstruction [1] min { Pij } s.t. i || xi x j N k ( xi ) x j N k ( xi ) Pij x j ||2 Pij 1, i 1, Pij 0, i, j 1, ,n ,n [1] F. Wang and C. S. Zhang, Label propagation through linear Neighborhoods. NIPS, 2006 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Typical graph construction Challenges However, work well only when conditions are strictly satisfied. Few degree of freedom Little noise Sufficient sampling (Abundant samples) Smooth assumption or clustering assumption In Practice, >>?? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Challenges ①~⑤ Typical graph construction 1 Tens or hundreds of degrees of freedom Recent research [1] showed the face subspace is estimated to have at least 100 dimensions. More complex composite objects ? 2 Noise and other corruptions 0.84x103 0.92x103 Euclidean Distance 1.90x103 The locality preserving criterion may not work well under this scenario, especially when just few training samples are available. [1] M. Meytlis, L. Sirovich, On the dimensionality of face space. IEEE TPAMI, 2007, 29(7): 1262-1267. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Typical graph construction 3 Challenges ①~⑤ Insufficient samples Data points kNN graph Data points kNN graph Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Challenges ①~⑤ Typical graph construction Another example, on Wine data set 0.76 0.65 LPP PCA LPP 分类精度 分类精度 0.74 0.72 0.7 PCA 0.6 0.55 0.68 0.5 0.66 5 10 15 20 25 30 2 4 6 15 samples per class for training 8 10 12 14 近邻数 近邻数 5 samples per class for training Also, this illustrate… 4 The sensitivity to neighborhood size In fact, there are not reliable methods to assign appropriate values for the parameters k and ε under unsupervised scenario, or if only few labeled samples are available [1]. [1] D. Y. Zhou, O. Bousquet, T. N. Lal, J. Weston, B. Scholkopf, Learning with local and global consistency. NIPS, 2004 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Typical graph construction 5 Challenges ①~⑤ Others. For example, The lingering “curse of dimensionality” Fixed neighborhood size Independence on subsequent learning tasks Dimensionality reduction aims mainly at overcoming the “curse of dimensionality”, but unfortunately locality preserving algorithms construct graph relying on the nearest neighbor criterion which itself suffers from such a curse. This seems to be a paradox. Let’s try to address these problems… Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Outline • Why to construct graph? • Typical graph construction – Review & challenges • Our works – (I) Task-independent graph construction (Related work: Sparsity Preserving Projections) – (II) Task-dependent graph construction (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Task-independent graph construction Our work (I) Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… Our work (II) Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) x1 Motivation x1 xi xn 1 xi x j j i n xi PCA xi 1 n xj xn Simple, but ignore local structure x1 LLE xi x1 xi xn xi S ij xj xi sij x j min || si || 0 j i xi s x jN k ( i ) j i ij j xn Consider locality, but fixed neighborhood size, artificial definition, difficult Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) From L0 to L1 min || si ||0 min || si ||1 s.t. xi Xsi s.t. xi Xsi si si If the solution sought is sparse enough, the solution of L0-minimization problem is equal to the solution of L1-minimization problem [1]. The solution of L2 minimization (Left) and L1 minimization (Right) problem [1] D. Donoho, Compressed sensing, IEEE Trans. Inform. Theory, 52(4) (2006) 1289-1306 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) min || si ||1 1 2 si s.t. || xi Xsi || p min || si ||1 || xi Xsi || p si 3 min || xi Xsi || p si s.t. || si ||1 z Modeling & Algorithms Nonsmooth optimization Subgradient-based algorithms [1] Also, p=2, it can be recast as SOCP Quasi LASSO p=2, LASSO, many algorithms: LARS…[2] p=1, Linear Programming (see next page) L1-ball constraint optimization [3] (e.g., SLEP: Sparse Learning with Efficient Projections, http://www.public.asu.edu/~jye02/Software/ SLEP/index.htm ) [1] Y. Nesterov. Introductory Lectures on Convex Optimization: A Basic Course. Kluwer Academic Publisher, 2003. [2] B. Efron, T. Hastie, I. Johnstone, and R. Tibshirani, Least angle regression. Annals of Statistics, 2004, 32(2): 407-451. [3] J. Liu, J. Ye, Efficient Euclidean Projections in Linear Time, ICML, 2009 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Modeling (Example, p=1) min || si ||1 || xi Xsi || p 1 si ti xi Xsi sˆij min || si ||1 || ti ||1 si s.t. xi Xsi ti x1 t1 x2 t2 x3 t3 (Left) A sub-block of the weight matrix constructed by the above model; (Right) The optimal t for 3 different samples (YaleB). Incorporate prior into the graph construction process ! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Modeling (Example, p=2) min || xi Xsi || p si s.t. L1-norm neighborhood and its weights Sparse, Adaptive, Discriminative, Outlierinsensitive || si ||1 z Conventional k neighborhood and its weights Put samples from different classes into one patch [1] X.Tan, L.Qiao, W.Gao and J.Liu. Robust Faces Manifold Modeling: Most Expressive Vs. Most Sparse Criterion, Subspace 2009 Workshop in conjunction with ICCV2009, Kyoto, Japan Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) SPP: Sparsity Preserving Projections The optimal sˆij describes the sparse reconstructive relationship. xi sˆij x j j i So, we expect to preserve such relationship in the low dimensional space. wT xi sˆij wT xi j i More specifically, n min w s.t. T T 2 ˆ || w x w Xs || i i i 1 wT XX T w 1 Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Experiments: Toy Insufficient sampling 10 2 5 0 4 6 Additional prior 8 10 PCA 1 1.5 2 LPP 0 5 10 NPE The toy data and th ei r 1 D i m a ges based on 4 different DRs algorithms 1.5 1 0.5 0 0 -0.5 -0.5 1 0.5 1.5 -1 -0.5 0 0.5 SPP Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Experiments: Wine Wine data set from UCI, 178 samples, 3 classes, 13 features The basic statistics of Wine data set 20 600 500 5 0 0 -500 -5 500 1000 PCA 1500 400 15 200 10 0 5 -200 0 -400 -5 -600 0 5 LPP 10 -10 500 1000 1500 -20 NPE -10 0 SPP The 2D projections of Wine data set Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 10 20 Our work (I) Experiments: Face YALE AR Extended YALE B Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Experiments: Face 0.8 0.9 0.8 Recognition Rate Recognition Rate 0.7 0.6 Baseline 0.5 PCA LPP NPE 0.4 40 60 Dimensions 80 Baseline PCA 0.5 LPP NPE SPP1 SPP2 SPP2 20 0.6 0.4 SPP1 0.3 0.7 0.3 Yale 50 100 150 Dimensions AR_Fixed 200 1 0.9 0.9 0.8 Recognition Rate Recognition Rate 0.8 0.7 0.6 Baseline 0.5 PCA 0.4 LPP NPE 0.2 100 150 Dimensions 200 Baseline PCA 0.5 LPP SPP1 SPP2 0.4 SPP2 50 0.6 NPE SPP1 0.3 0.7 AR_Random 0.3 50 Adaptive Graph Construction and Dimensionality Reduction 100 Dimensions 150 200 Songcan Chen Extended YaleB 2009-11 Our work (I) Experiments: Face Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Related works Our work (I) Dimensionality Reduction [1] Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning [3] Spectral Kernel Learning …… Other extensions ? From graph to data-dependent regularization, … [1] L. Qiao, S. Chen, and X. Tan, Sparsity preserving projections with applications to face recognition. Pattern Recognition, 2009. (Received 21 July 2008) [2] E. Elhamifar and R. Vidal, Sparse Subspace Clustering. CVPR2009. [3] S. Yan and H. Wang, Semi-supervised Learning by Sparse Representation. SDM2009. Adaptive Graph Construction and Dimensionality Reduction [2] Songcan Chen 2009-11 Our work (I) Extensions Semi-supervised classification Semi-supervised dimensionality reduction Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Extensions Apply to single labeled face recognition problem Compare with supervised LDA, unsupervised SPP, semi-supervised SDA SPDA: Sparsity Preserving Discriminant Analysis E1: 1 labeled and 2 unlabeled samples E2: 1 labeled and 30 unlabeled samples Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (I) Summary Adaptive “neighborhood” size; Simpler parameter selection; Less training samples; Easier incorporation of prior knowledge ( Not so insensitive to noise) Stronger discriminating power Higher computational cost Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Outline • Why to construct graph? • Typical graph construction – Review & Challenges • Our works – (I) Task-independent graph construction (Related work: Sparsity Preserving Projections) – (II) Task-dependent graph construction (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Task-dependent graph construction Our work (I) Dimensionality Reduction Spectral Clustering Graph construction Edge weight assignment Learning tasks Semi-supervised Learning Spectral Kernel Learning …… Our work (II) Task-independent graph construction Advantage: be applicable to any graph-based learning tasks Disadvantage: does not necessarily help subsequent learning tasks Can we unify them? How to unify them? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Motivation (Cont’d) Our work (II) Furthermore, take LPP as an example, Step 1: Graph construction k-nearest neighbor criterion Step 2: Edge weight assignment exp( || xi x j ||2 /2 2 ) , xi N k ( x j ) x j N k ( xi ) Pij , otherwise 0 Step 3: Projection directions learning min W s.t. n T T 2 || W x W x || Pij i j i , j 1 n T 2 d || W x || 1 i i 1 ii Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Motivation (Cont’d) Our work (II) In LPP, “local geometry” is completely determined by the artificially pre-fixed neighborhood graph. As a result, its performance may drop seriously if given a “bad” graph. Unfortunately, it is generally uneasy to justify in advance whether a graph is good or not, especially under unsupervised scenario. 0.76 0.65 LPP PCA LPP 分类精度 分类精度 0.74 0.72 0.7 PCA 0.6 0.55 0.68 0.5 0.66 5 10 15 20 25 2 30 4 6 8 10 12 14 近邻数 近邻数 15 samples per class for training 5 samples per class for training So, we expect the graph to be adjustable. How to adjust? Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Motivation (Cont’d) LPP seeks a low-dimensional representation aiming at preserving the local geometry in the original data. Locality preserving power is potentially related to discriminating power [1]. Locality preserving power is described by minimizing its objective function. A natural question: Can we obtain more locality perserving power or discriminating power by minimizing the objective function further ? A Key: how to characterize such a power formally! Our idea: optimize graph and learn projections simultaneously in a unified objective function. [1] D. Cai, X. F. He, J. W. Han, and H. J. Zhang, Orthogonal laplacianfaces for face recognition. IEEE Transactions on Image Processing, 2006, 15(11): 3608-3614. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Modeling: SLPP LPP min i, j 1|| W T xi W T x j ||2 Sij s.t. T 2 d || W x || 1 i1 ii i W n n 1 regard graph S ij as new optimization variable, i.e., graph is adjustable instead of pre-fixed. Also, note we do not constrain Sij asymmetrical. 2 m (>1), a new parameter which controls Soft LPP (SLPP or SLAPP) 1 min T T 2 m || W x W x || S i, j 1 i j ij 2 s.t. W , Sij n n j 1 Sij 1, i 1, Sij 0, i, j 1, ,n T 2 || W x || 1 i1 i n 4 the uncertainty of Sij and helps us obtain closed-form solution. In addition, without it, we will get a singular solution where only one element in each row of is 1 and other elements are all zeros. 3 new constraints, aim to avoid degenerate ,n solution, provide a natural probability explanation for the graph. 3 4 remove dii from this constraint mainly for making the optimization tractable. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Algorithm min T T 2 m || W x W x || S i j ij i , j 1 Non-convex with respect to (W , S ) s.t. n Solve it by alternating iteration optimization technique W , Sij n S 1, i 1, j 1 ij Sij 0, i, j 1, n i 1 ,n ,n Fortunately, we will obtain closedform solution at each step. || W xi || 1 T 2 Step 1: Calculate W by generalized eigen-problem Step 2: Update graph (1/ || W xi W x j || ) T Sij n j 1 T 2 1 m1 (1/ || W xi W x j || ) T T 2 1 m1 Normalized inverse Euclidean distance!! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Adaptive Graph Construction and Dimensionality Reduction Algorithm Songcan Chen 2009-11 Our work (II) Modeling: ELPP ELPP: Etropy-regularized LPP min s.t. W , Sij n T 2 Sij 1, i 1, ,n i , j 1 n j 1 || W xi W x j || Sij i , j 1 Sij ln Sij T Sij 0, i, j 1, n ,n T 2 || W x || 1 i1 i n Sij exp( || W T xi W T x j ||2 / ) T T 2 exp( || W x W x || / ) j 1 i j n Normalized heat kernel distance!! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) ELPP: Algorithm Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Convergence Objective function values 1.2 1 0.8 J (Wk , Sk ) 0.6 0.4 0.2 0 0 2 4 6 Iterative numbers J (W0 , S0 ) J (W0 , S1 ) J (W1 , S1 ) 8 J (Wk , Sk ) 0 Cauchy’s convergence rule. Block-Coordinate Gradient Descent ! Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Experiments: Wine 6 30 20 4 20 2 10 10 0 0 0 -2 -10 -4 -10 -20 -6 -20 -30 10 20 30 40 50 60 70 80 -8 10 20 LPP 30 40 50 10 60 SLPP(1) 4 15 20 SLPP(3) 6 7 5 6 4 5 2 3 0 4 2 -2 3 1 2 0 1 -4 -1 4 6 8 SLPP(5) 10 12 6 8 10 12 0 6 8 SLPP(7) Adaptive Graph Construction and Dimensionality Reduction 10 SLPP(9) Songcan Chen 2009-11 12 Our work (II) Experiments: Wine Classification experiments with different initialized graphs Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Experiments: Wine Random initialization for the weight matrix Classification experiments with different initialized graphs The above experiments illustrate that the graph may becomes better and better with gradual updating process. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Our work (II) Experiments: Face AR database 0.9 0.8 0.7 0.6 Baseline 0.4 SALPP 20 40 60 80 Dimensions 100 0.7 0.6 Baseline LPP LPP 0.5 Recognition rates Recognition rates 0.8 120 0.5 0.4 SALPP 20 Adaptive Graph Construction and Dimensionality Reduction 40 60 80 Dimensions 100 Songcan Chen 120 2009-11 Our work (II) Experiments: Face Yale B database 0.9 0.9 0.9 0.8 0.6 0.5 Baseline SALPP 0.3 20 40 60 80 Dimensions 100 0.75 0.7 0.65 Baseline 0.6 LPP 0.4 0.8 120 LPP 0.8 0.7 Baseline 0.6 LPP SALPP SALPP 0.55 0.5 Recognition rates 0.7 Recognition rates Recognition rates 0.85 0.5 50 100 Dimensions 150 Adaptive Graph Construction and Dimensionality Reduction 50 Songcan Chen 100 Dimensions 150 2009-11 Our work (II) Experiments: Face PIE database 0.9 0.7 0.8 0.85 0.6 0.55 0.5 0.45 Baseline 0.4 0.7 0.6 Baseline 0.5 LPP SALPP LPP 0.35 Recognition rates Recognition rates Recognition rates 0.65 0.4 50 100 Dimensions 150 0.75 0.7 Baseline LPP data3 0.65 SALPP 0.3 0.8 0.6 50 100 Dimensions 150 200 Adaptive Graph Construction and Dimensionality Reduction 50 Songcan Chen 100 150 Dimensions 2009-11 200 Our work (II) Summary Inherit some good characteristics naturally; Provide a seamless link between dimensionality reduction and graph construction; Be quite insensitive to initial parameters such as neighborhood size k and heat kernel width σ; The learned graph can potentially be used in other graph-based learning algorithms, even if the graph learning is task-dependent, since DR itself is also a preprocessing step. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Outline • Why to construct graph? • Typical graph construction – Review & Challenges • Our works – (I) Task-independent graph construction (Related work: Sparsity Preserving Projections) – (II) Task-dependent graph construction (Related work: Soft LPP and Entropy-regularized LPP) • Discussion and Next Work Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Discussion Graph construction is a common and important problem involved in many machine learning algorithms. Graph construction is based on different motivation, prior or assumption. There is not a general way to establish high-quality graphs. It may be an interesting research topic to construct graphs relying on practical applications. Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Ongoing and Next Works For SPP Fast sparse representation algorithm (by the use of the data structure?) Verify the effectiveness of the constructed graph for other graphbased learning algorithms (e.g., manifold regularization) For SLPP and ELPP Semi-supervised extensions 1 , xi and x j from the same class Sij exp( || W T xi W T x j ||2 / ), xi and x j are unlabeled 0 , xi and x j from different classes Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Ongoing and Next Works From Shannon entropy regularization to generalized entropy regularization min i, j 1|| W T xi W T x j ||2 Sij i, j 1 Sij ln Sij s.t. W , Sij n n j 1 n Sij 1, i 1, Sij 0, i, j 1, i1|| W T xi ||2 1 n ,n ,n n i H (Si ) i 1 Thus, we can potentially obtain different graph updating formula. For example, possible types of neither Inverse Euclidean nor heat kernels etc.. Apply the learned graph to other graph-based learning algorithms (e.g., manifold regularization) Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Ongoing and Next Works General simultaneous learning framework for DR (other learning tasks) Input training samples X Construct graph G{X,S} Update samples X=WTX Learning projection W Stop condition N Y Output G, W or X Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11 Thanks! Q&A Adaptive Graph Construction and Dimensionality Reduction Songcan Chen 2009-11