Exploring Intrinsic Structures from Samples: Supervised, Unsupervised, and Semisupervised Frameworks Supervised by Prof. Xiaoou Tang & Prof. Jianzhuang Liu Huan Wang Multimedia Laboratory Department of Information Engineering The Chinese University of Hong Kong Outline • Notations & introductions • Trace Ratio Optimization Preserve sample feature structures Dimensionality reduction Tensor Subspace Learning Explore the geometric structures and feature domain relations concurrently • Correspondence Propagation Outline Concept. Tensor • Tensor: multi-dimensional (or multi-way) arrays of components Concept Concept. Tensor • real-world data are affected by multifarious factors for the person identification, we may have facial images of different ► views and poses ► lightening conditions ► expressions ► image columns and rows • the observed data evolve differently along the variation of different factors Application Concept. Tensor • it is desirable to dig through the intrinsic connections among different affection factors of the data. • Tensor provides a concise and effective representation. Images Image columns expression pose Image rows Illumination Application Concept. Dimensionality Reduction • Preserve sample feature structures • Enhance classification capability • Reduce the computational complexity Introduction Trace Ratio Optimization. Definition Trace(W T AW ) W arg m ax Trace(W T BW ) W • A, B w.r.t. W TW I Positive semidefinite • Orthoganality constraint • Homogeneous property: Trace(W T AW ) T T J (W ) J ( WQ ), Q Q QQ I T Trace(W BW ) Optimization over the Grassman manifold • Special case, when W is a vector wT Aw Generalized Rayleigh Quotient w arg m ax T w Bw wT w 1 Aw Bw GEVD Trace Ratio Formulation • Linear Discriminant Analysis Nc W arg m ax n W c 1 N c || W T xc W T x ||2 || W i 1 T xi W T xci ||2 Trace(W T SbW ) arg m ax Trace(W T S wW ) W Nc Sb nc ( xc x )( xc x )T c 1 N Sw ( xi xci )( xi xci )T i 1 Trace Ratio Formulation Trace Ratio Formulation • Kernel Discriminant Analysis xi i W A Nc 1 c cT T e e ) K A) n c 1 c J A arg m ax Nc 1 1 A Tr ( AT K ( ec ecT eeT ) K T A) N c 1 nc Tr ( A K ( I T Decompose w.r.t. W T W I AT T A AT K A I K K dT K d Tr ( AT K dT K d Lp K dT K d A) Tr ( AT KLp K T A) J A arg m ax arg m ax Tr ( AT KL K T A) Tr ( AT K dT K d L K dT K d A) A A Let w.r.t. AT K dT K d A I Kd A Tr ( T K d Lp K dT ) J arg m ax Tr ( T K d L K dT ) w.r.t. Trace Ratio Formulation T I Trace Ratio Formulation • Marginal Fisher Analysis Inter-class graph (Penalty graph) Intra-class graph (Intrinsic graph) W arg m ax W Sc Sm || W arg m ax || W T xi W T x j ||2 Wijc T xi W T x j ||2 Wijm i, j W i, j Trace(W T X ( D c W c ) X TW ) Trace(W T XLc X TW ) arg m ax arg m ax Trace(W T X ( D m W m ) X TW ) Trace(W T XLm X TW ) W W Trace Ratio Formulation Trace Ratio Formulation • Kernel Marginal Fisher Analysis Tr ( AT KLp K T A) J A arg m ax Tr ( AT KL K T A) A Decompose xi i w.r.t. W T W I AT T A AT K A I K K dT K d Tr ( AT K dT K d Lp K dT K d A) Tr ( AT KLp K T A) J A arg m ax arg m ax T T Tr ( A KL K A) Tr ( AT K dT K d L K dT K d A) A A Let W A w.r.t. AT K dT K d A I Kd A Tr ( T K d Lp K dT ) J arg m ax Tr ( T K d L K dT ) w.r.t. Trace Ratio Formulation T I Trace Ratio Formulation • 2-D Linear Discriminant Analysis yi LT xi R Left Projection & Right Projection Nc W arg m ax W n c c 1 N || L xc R L xR || T || L T i 1 T Nc 2 xi R LT xci R ||2 Trace( L ( nc ( xc x ) RRT ( xc x )T ) L) T arg m ax W c 1 N Trace( LT ( ( xc x ) RRT ( xc x )T ) L) i 1 Nc Trace( R ( nc ( xc x ) LLT ( xc x )T ) R ) T arg m ax W c 1 N Trace( RT ( ( xc x ) LLT ( xc x )T ) R ) i 1 Fix one projection matrix & optimize the other • Discriminant Analysis with Tensor Representation Nc W arg m ax U k |nk 1 n c 1 N c || xc 1 U1... n U n x 1 U1... n U n ||2 || x U ... i 1 i 1 1 n U n xci 1 U1... n U n ||2 Concept Trace(U k T SbkU k ) arg m ax Trace(U k T S wkU k ) Uk Trace Ratio Formulation • Tensor Subspace Analysis 1 T T T T || U T xiV U T x jV ||2 Sij Trace(U T ( Dii xVV xi Sii xVV xi )U ) i i 2 i, j i i W arg min arg min 2 T T T U ,V Trace(U ( Dii xVV xi )U ) U ,V || yi || Dii i i i arg min U ,V Trace(V T ( Dii xiTUU T xi Sii xiTUU T xi )V ) i i Trace(V ( D x UU T xi )V ) T T ii i i Trace Ratio Formulation Trace Ratio Formulation Trace(W T SbW ) W arg m ax Trace(W T S wW ) W Conventional Solution: | W T SbW | T 1 T W arg m ax arg m ax Trace ( W S W ) ( W SbW ) w T | W S wW | W W Sb w Sw w GEVD Singularity problem of S w Nullspace LDA Dualspace LDA Trace Ratio Formulation Preprocessing Tr (W T S pW ) arg m ax Tr (W T S lW ) U Tr (W T S pW ) arg m ax Tr (W T S tW ) U t St Sl S p Remove the Null Space of S with Principal Component Analysis. Tr (W T S pW ) 0 1 T t Tr (W S W ) from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference Tr (U T S pU ) Objective: arg m ax Tr (U T S U ) U g (U ) Tr (U T ( S p S )U ) Trace Ratio Define Tr (U T S pU ) T Tr (U S U ) Trace Difference Find g (U ) g (U t ) So that Then Tr (U t ( S p S )U t ) 0 T g (U t ) Tr (U ( S p S )U ) g (U t ) 0 Tr (U T S pU ) T Tr (U S U ) from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference g (U ) Tr (U T ( S p S )U ) Constraint Thus Tr (U tT1S pU t 1 ) T Tr (U t 1S U t 1 ) U U I T U t 1 [u1 , u2 ,..., um' ] Let k Where u1 , u2 ,..., umk' are the leading The Objective rises monotonously! p ( S S ) . eigen vectors of We have g (U t 1 ) g (U t ) 0 from Trace Ratio to Trace Difference Main Algorithm U 1: Initialization. Initialize as arbitrary column orthogonal matrices. 2: Iterative optimization. For t=1, 2, . . . , Tmax, Do 1. Set. Tr (U T S pU ) Tr (U T S U ) 2. Conduct Eigenvalue Decomposition: (Skp S k )v j v j 3. Reshape the projection directions 4. U t 1 [u1 , u2 ,..., um' ] k 3: Output the projection matrices Main Algorithm Process Traditional Tensor Discriminant algorithms • Two-dimensional Linear Discriminant Analysis Ye et.al • Discriminant Analysis with Tensor Representation Yan et.al • Tensor Subspace Analysis He et.al • project the tensor along different dimensions or ways • solve an trace ratio optimization problem • projection matrices for different dimensions are derived iteratively • DO NOT CONVERGE ! Tensor Subspace Learning algorithms Discriminant Analysis Objective arg m ax U k |nk1 k n 2 p || ( X X ) U | || W i j i j k k 1 ij k n 2 || ( X X ) U | || i j i j k k 1 Wij • No closed form solution Solve the projection matrices iteratively: leave one projection matrix as variable while keeping others as constant. k 2 p || U Y U Y || W i j i j ij kT arg m ax Uk Yi ~ k kT || U Yi U Y || Wij kT i j k k kT k j 2 ~ Mode-k unfolding of the tensor Yi Yi X i 1 U 1... k 1 U k 1 k 1 U k 1... n U n Discriminant Analysis Objective kT arg m ax p k k k Tr (U S U ) Uk kT Tr (U S U k ) S i j Wij (Yi Y )(Yi Y ) k k k j k k T j p Sk i j W p k ij (Yi Y )(Yi Y ) k j k k T j Trace Ratio: General Formulation for the objectives of the Discriminant Analysis based Algorithms. Within Class Scatter of the DATER: S unfolded data TSA: W Constructed from Image Manifold k S kp Sp Objective Deduction Between Class Scatter of the unfolded data Diagonal Matrix with weights Why do previous algorithms not converge? k1T arg m ax U k1 k1 p Tr (U S k1U ) k1T k1 k1T k1 1 k1T arg m ax Tr ((U S U ) U S kp1U k1 ) k1 Tr (U S U ) k1 U k1 GEVD k2T arg m ax U k2 p k2 Tr (U S k2 U ) k2T k2 arg m ax Tr ((U k2 k2 k2 1 S U ) U U k2 Tr (U S U ) Tr ( A) Tr ( B) k2T 1 Tr ( B A) k2T S kp2U k2 ) The conversion from Trace Ratio to Ratio Trace induces an inconsistency among the objectives of different dimensions! Disagreement between the Objective and the Optimization Process What will we do? from Trace Ratio to Trace Difference kT Objective: arg mk ax U Tr (U SkpU k ) kT Tr (U S kU k ) g (U ) Tr (U T ( S kp S k )U ) Trace Ratio Define kT t kT t p k k k t k t Tr (U S U ) Tr (U S U ) Find g (U ) g (U tk ) So that Then Tr (U (Skp S k )U tk ) 0 kT t Trace Difference Tr (U ( S kp S k )U ) g (U tk ) 0 Tr (U T SkpU ) T k Tr (U S U ) g (U tk ) from Trace Ratio to Trace Difference What will we do? from Trace Ratio to Trace Difference g (U ) Tr (U T ( S kp S k )U ) Constraint Thus kT p t 1 k kT k t 1 Tr (U S U tk1 ) U U I T k t 1 Tr (U S U ) U tk1 [u1 , u2 ,..., um' ] Let k Where u1 , u2 ,..., umk' are the leading The Objective rises monotonously! p k ( S S ) . k eigen vectors of We have g (U tk1 ) g (U tk ) 0 Projection matrices of different dimensions share the same objective from Trace Ratio to Trace Difference Main Algorithm 1: Initialization. Initialize U ,U ,...,U as arbitrary column orthogonal matrices. 1 0 2 0 n 0 2: Iterative optimization. For t=1, 2, . . . , Tmax, Do For k=1, 2, . . . , n, Do 1. Set. i j || ( X i X j ) o U to |ok 11 o U to1 |onk ||2 Wijp o k 1 n n 2 || ( X X ) U | U | || Wij i j o t o 1 o o o k i j 2. Compute S k and S kp . 3. Conduct Eigenvalue Decomposition: 4. Reshape the projection directions 5. U tk1 [u1 , u2 ,..., um' ] k 3: Output the projection matrices Main Algorithm Process (Skp S k )v j v j Highlights of our algorithm • The objective value is guaranteed to monotonously increase; and the multiple projection matrices are proved to converge. • Only eigenvalue decomposition method is applied for iterative optimization, which makes the algorithm extremely efficient. • Enhanced potential classification capability of the derived lowdimensional representation from the subspace learning algorithms. • The first work to give a convergent solution to the general tensor-based subspace learning. Hightlights of the Trace Ratio based algorithm Experimental Results Visualization of the projection matrix W of PCA, ratio trace based LDA, and trace ratio based LDA (ITR) on the FERET database. Projection Visualization Experimental Results Comparison: Trace Ratio Based LDA vs. the Ratio Trace based LDA (PCA+LDA) Comparison: Trace Ratio Based MFA vs. the Ratio Trace based MFA (PCA+MFA) Face Recognition Results.Linear Experimental Results Trace Ratio Based KDA vs. the Ratio Trace based KDA Trace Ratio Based KMFA vs. the Ratio Trace based KMFA Face Recognition Results.Kernelization Experimental Results Testing classification errors on three UCI databases for both linear and kernelbased algorithms. Results are obtained from 100 realizations of randomly generated 70/30 splits of data. Results on UCI Dataset Experimental Results Monotony of the Objective & Projection Matrix Convergence Experimental Results 1. TMFA TR mostly outperforms all the other methods concerned in this work, with only one exception for the case G5P5 on the CMU PIE database. 2. For vector-based algorithms, the trace ratio based formulation is consistently superior to the ratio trace based one for subspace learning. 3. Tensor representation has the potential to improve the classification performance for both trace ratio and ratio trace formulations of subspace learning. Face Recognition Results Explore the geometric structures and feature domain consistency for object registration Geometric Structures & Feature Structures Correspondence Propagation Aim • Objects are represented as sets of feature points • Seek a mapping of features from sets of different cardinalities • Exploit the geometric structures of sample features • Introduce human interaction for correspondence guidance Objective Graph Construction Spatial Graph Similarity Graph From Spatial Graph to Categorical Product Graph Assignment Neighborhood Definition 1 1 1 1 2 2 2 2 Definition: Suppose {i1 , i2 ,..., iN1 } and {i1 , i2 ,..., iN2 } are the vertices of graph G 1 and G 2 respectively. Two assignments mi i {i1 , i2 } and 12 1 2 m j1 j2 {1j1 , j22 } are neighbors iff both pairs {i11 , 1j1 } and {i22 , j22 } are neighbors in G1 and G 2 respectively, namely, mi1i2 ~ m j1 j2 where iff i1 ~ 1j 1 1 and a ~ b means a and b are neighbors. A {a1, a 2, a3, a 4, a5, a6} B {b1, b 2, b3} A B {(a1, b1), (a1, b2), (a1, b3), (a 2, b1),..., (a 6, b3)} i2 ~ j2 2 2 From Spatial Graph to Categorical Product Graph G a G1 G 2 The adjacency matrix W a of G a can be derived from: W a W 2 W 1 where is the matrix Kronecker product operator. Smoothness along the spatial distribution: MLa M 1 wija (miv mvj ) 2 2 ij Feature Domain Consistency & Soft Constraints Similarity Measure: where returns the sum of all elements in T is matrix Hardamard product and One-to-one correspondence penalty or Tr ( A1T M eN1 )T ( A1T M eN1 ) Tr ( A2T M eN2 )T ( A2T M eN2 ) where A1 eN2 I N1 and A2 I N2 e N1 Assignment Labeling Inhomogeneous Pair Labeling Assign zeros to those pairs with extremely low similarity scores. Mi, j Mi( j 1)N1 0 Reliable Pair Labeling Assign ones to those reliable pairs M i, j M i ( j 1)N1 1 Labeled assignments: Reliable correspondence & Inhomogeneous Pairs Reliable Correspondence Propagation Arrangement: Assignment variables M * M l ; M u Coefficient matrices A1* A1l ; A1u A*2 Al2 ; Au2 S * S l ; S u Spatial Adjacency matrices a a W W a* ll lu W a a Wul Wuu arrangement a a L L a* ll lu L a a Lul Luu Reliable Correspondence Propagation Objective: * * *T a* * min S M M L M * M Tr ( A1*T M eN1 )T ( A1*T M eN1 ) Tr ( A2*T M eN2 )T ( A2*T M eN2 ) Feature domain agreement: S *M * Geometric smoothness regularization: M *T La* M * One-to-one correspondence penalty: Tr ( A1*T M eN1 )T ( A1*T M eN1 ) Tr ( A2*T M eN2 )T ( A2*T M eN2 ) Objective Reliable Correspondence Propagation Relax to real domain & Closed-form Solution: M u Cuu1 ( Bu Cul M l ) where C ll C ul C and C lu * *T * *T a* A A A A L 1 1 2 2 C uu Bl 1 * * * B u A1 eN1 A2 eN 2 S 2 B Solution Rearrangement and Discretization Inverse process of the element arrangement: M* M Reshape the assignment vector into matrix: M M Thresholding: Assignments larger than a threshold are regarded as correspondences. Eliciting: Sequentially pick up the assignments with largest assignment scores. Rearrangement & Discretizing Semi-supervised & Unsupervised Frameworks Exact pairwise correspondence labeling: Users give exact correspondence guidance Obscure correspondence guidance: Rough correspondence of image parts Semisupervised & Automatic Systems Experimental Results. Demonstration Experiment. Dataset Experimental Results. Details Automatic feature matching score on the Oxford real image transformation dataset. The transformations include viewpoint change ((a) Graffiti and (b) Wall sequence), image blur ((c) bikes and (d) trees sequence), zoom and rotation ((e) bark and (f) boat sequence), illumination variation ((g) leuven ) and JPEG compression ((h) UBC). Future Works • From point-to-point correspondence to set-to-set correspondence. • Multi-scale correspondence searching. Summary Future Works • From point-to-point correspondence to set-to-set correspondence. • Multi-scale correspondence searching. • Combine the object segmentation and registration. Summary Publications: Publications: [1] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘A convergent solution to Tensor Subspace Learning’, International Joint Conferences on Artificial Intelligence (IJCAI 07 Regular paper) , Jan. 2007. [2] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Trace Ratio vs. Ratio Trace for Dimensionality Reduction’, IEEE Conference on Computer Vision and Pattern Recognition (CVPR 07), Jun. 2007. [3] Huan Wang, Shuicheng Yan, Thomas Huang, Jianzhuang Liu and Xiaoou Tang, ‘Transductive Regression Piloted by Inter-Manifold Relations ’, International Conference on Machine Learning (ICML 07), Jun. 2007. [4] Huan Wang, Shuicheng Yan, Thomas Huang and Xiaoou Tang, ‘Maximum unfolded embedding: formulation, solution, and application for image clustering ’, ACM international conference on Multimedia (ACM MM07), Oct. 2006. [5] Shuicheng Yan, Huan Wang, Thomas Huang and Xiaoou Tang, ‘Ranking with Uncertain Labels ’, IEEE International Conference on Multimedia & Expo (ICME07), May. 2007. [6] Shuicheng Yan, Huan Wang, Xiaoou Tang and Thomas Huang, ‘Exploring Feature Descriptors for Face Recognition ’, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP07 Oral), Apri. 2007. Publications Thank You! Explore the intrinsic feature structures w.r.t. different classes for regression Transductive Regression on Multi-Class Data Regression Algorithms. Reviews Tikhonov Regularization on the Reproducing Kernel Hilbert Space (RKHS) 1 n f arg min V ( f ( xi ), yi ) || f ||2H , f H K n i 1 Belkin et.al, Regularization and semisupervised learning on large graphs Classification problem can be regarded as a special version of regression Regression Values are constrained at 0 and 1 (binary) samples belonging to the corresponding class =>1 o.w. => 0 1 n f arg min V ( f ( xi ), yi ) A || f ||2H I 2 f T Lf , Fei Wang et.al, Label Propagation Through Linear (u l ) f H K n i 1 Neighborhoods Exploit the manifold structures to guide the regression Cortes et.al, On transductive regression. transduces the function values from the labeled data to the unlabeled ones utilizing local neighborhood relations, Global optimization for a robust prediction. An iterative procedure is deduced to propagate the class labels within local neighborhood and has been proved convergent The convergence point can be deduced from the regularization framework The Problem We are Facing Age estimation Pose Estimation w.r.t. different genders CMU-PIE Dataset w.r.t. different persons w.r.t. different Genders Persons Illuminations FG-NET Aging Database Expressions The Problem We are Facing Regression on Multi-Class Samples. Traditional Algorithms • The class information is easy to obtain for the training data • All samples are considered as in the same class • Samples close in the data space X are assumed to have similar function values (smoothness along the manifold) • For the incoming sample, no class information is given. • Utilize class information in the training process to boost the performance 0.4 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 0.5 -0.4 -0.5 0 0 0.5 The problem -0.5 TRIM. Intra-Manifold Regularization • It may not be proper to preserve smoothness between samples from different classes. • Correspondingly, intra-manifold regularization item for different classes are calculated separately 0.4 f T Lp f • The Regularization 0.3 0.2 0.1 0 -0.1 -0.2 -0.3 0.5 -0.4 -0.5 when p=1 0 0 f 0.5 T Lf -0.5 • Respective intrinsic graphs are built for different sample classes 1 2 ( f i f j ) 2 Wij i, j when p=2 f T LT Lf || f i wij f j ||2 i j ~i w.r.t. wij 1, wij 0 j intrinsic graph TRIM. Inter-Manifold Regularization • Assumptions Samples with similar labels lie generally in similar relative positions on the corresponding sub-manifolds. • Motivation 1.Align the sub-manifolds of different class samples according to the labeled points and graph structures. 2. Derive the correspondence in the aligned space using nearest neighbor technique. The algorithm TRIM. Manifold Alignment • Minimize the correspondence error on the landmark points • Hold the intra-manifold structures f ki |kMi 1 arg min( C ( f ki |kMi 1 ) ki f kiT D ki f ki ), where C( f | ) w ki k j ij ki M ki 1 ki k j M || f xki f k j || f ki T Lkpi f ki f T La f , ki i kj xj 2 ki 1 • The item f T La f is a global compactness regularization, and La is the Laplacian Matrix of W a wija 1 If xi and xi are of different classes 0 o.w. The algorithm TRIM. Inter-Manifold Regularization 4 4 3 3 2 2 1 1 0 0 -1 -1 -2 -2 -3 -3 -4 -4 -3 -2 -1 0 1 2 3 -4 -4 4 -3 -2 -1 0 1 2 • Concatenate the derived inter-manifold graphs to form Wr O W 21 ... W M1 12 W O ... W M2 • Laplacian Regularization ... ... ... ... f T Lr f W 1M W 2M ... O 3 4 TRIM. Objective f arg min f H K k k 1 (f ( N k )2 • Fitness Item • RKHS Norm 1 lk k xik X l || f xkk yik ||2 || f ||2K )T Lkp f 1 lk k xik X l i k N 2 f T Lr f , || f xkk yik ||2 i || f ||2K • Intra-Manifold Regularization k • Inter-Manifold Regularization f 1 (f k 2 (N ) T Objective Deduction Lr f k )T Lkp f k TRIM. Solution • The solution to the minimization of the objective admits an expansion N l u (Generalized Representer theorem) k k k f ( x) i K ( xi , x) i 1 Thus the minimization over Hilbert space boils down to minimizing the coefficient vector [11 ,..., l1 ,...l1 u ,..., 1M ,..., lM ,..., lM u ]T 1 1 1 M M over R N The minimizer is given by J 1 k where J k 1 k k T k ( S S ) Y k l l lk 1 k k T k k 1 r kT p k ( S S ) ( S S ) K I S L S K L K, k k k ( N k )2 k l lk l N2 Slkk ( Il k l k , Ol k uk ), S k (O N k k 1 N ki ki 1 I N k N k O N k ), M N ki ki k 1 and K is the N × N Gram matrix of labeled and unlabeled points over all the sample classes. Solution M TRIM.Generalization • For the out-of-sample data, the labels can be estimated using N ynew k ( l k u k ) i 1 i K ( xi , xnew ) Note here in this framework the class information for the incoming sample is not required in the prediction stage. Original version without kernel f arg min f k 1 lk xik X l || f xkk yik ||2 i k Solution 1 T r k T p k ( f ) L f f L f, k k 2 2 (N ) N Experiments Two Moons Experiments.Age Dataset Open set evaluation for the kernelized regression on the YAMAHA database. (left) Regression on the training set. (right) Regression on out-of-sample data TRIM vs traditional graph Laplacian regularized regression for the training set evaluation on YAMAHA database. YAMAHA Dataset