Ranking Projection Zhi-Sheng Chen 2010/02/03 1/30 Multi-Media Information Lab, NTHU Introduction Ranking is everywhere Retrieval for music, image, video, sound, … etc Scoring for speech, multimedia… etc Find a projection that 2/30 Preserves the given order of the data Reduces the dimensionality of the data Multi-Media Information Lab, NTHU The Basic Criteria of Linear Ranking Projection Given the ranking order (c1, c2, c3, c4). In the projection space, we have the criteria d c1 , c2 d c1 , c3 d c1 , c2 d c1 , c3 0 d123 0 d c1 , c2 d c1 , c4 d c1 , c2 d c1 , c4 0 d124 0 d c1 , c3 d c1 , c4 d c1 , c3 d c1 , c4 0 d134 0 d c2 , c3 d c2 , c4 d c2 , c3 d c2 , c4 0 d 234 0 min J d123 d124 d134 d 234 3/30 Where d(.,.) is the distance measure between two classes In our cases we use the difference of the means Multi-Media Information Lab, NTHU The Basic Criteria of Linear Ranking Projection Let a be the projection vector, the previous criteria can be rewritten as min J a subject to a T a 1 J a a T M 12 M 13 M 12 M 14 M 13 M 14 M 23 M 24 a M ij mi m j mi m j T 4/30 Multi-Media Information Lab, NTHU The Ordinal Weights Roughly speaking, these distances measure have different importance according to their order. 5/30 Ex: d123 is more importance than d124 is more importance than d134 is more importance than d123 is more importance than How about d 239 and d 245 ? d124 d134 d 234 d 234 Instead of finding the precisely rules of ordinal weights, we use a roughly ordinal weighted rule Multi-Media Information Lab, NTHU The Ordinal Weights Given a ranking order, we define a score to each term. The largest and the smallest scores indicate the top and the latest terms of the order. Simply define the ordinal weight function as ws1 , s2 , s3 s1 s2 s3 So the weighted criteria becomes min J a subject to aT a 1 ws1 , s2 , s3 M 12 M 13 ws1 , s2 , s4 M 12 M 14 J a a a ws1 , s3 , s4 M 13 M 14 ws2 , s3 , s4 M 23 M 24 T M ij mi m j mi m j T 6/30 Multi-Media Information Lab, NTHU Some Results for Weighted Criteria (c1, c2, c3, c4) 8 8 6 6 C1 4 C2 C1 4 2 C2 C3 0 2 C4 -2 C3 -4 0 C4 -6 -2 -8 -10 -8 -6 -4 -2 0 2 4 6 8 10 -6 7/30 -4 -2 Multi-Media Information Lab, NTHU 0 2 4 6 8 Some Results for Weighted Criteria (c3, c1, c4, c2) 7 6 6 C1 4 C2 5 C3 4 2 C1 3 0 C4 2 -2 C3 -4 0 C4 -1 -6 -2 -8 C2 1 -6 -4 -2 0 2 4 6 8 -6 -4 -2 0 2 4 For the projection onto more than one-dim, the solution becomes selecting the kth eigenvectors w.r.t. the smallest kth eigenvalues 8/30 Multi-Media Information Lab, NTHU 6 Class with several groups We may not care the order of some groups of the data points within the class 10 8 6 4 C3 2 C2 0 C1 -2 -4 -6 -8 -10 -15 9/30 -10 -5 0 5 Multi-Media Information Lab, NTHU Grouped Classes For the above case, let the order be (c1, c2, c3), then the criteria becomes min J a subject to a T a 1 J a a wsi , s j , sk M ij , pq M ik ,rs a i , j , k , p , q , r , s T M ij , pq mi , p m j ,q mi , p m j ,q T mi , p is the mean vecto r of the group p in the class i 10/30 Multi-Media Information Lab, NTHU Grouped Classes Result 4 25 3.5 C3 20 C1 3 15 C2 2.5 10 C2 C1 5 2 1.5 0 C3 -5 1 -10 0.5 -10 11/30 -5 0 5 10 15 20 25 30 35 0 -40 -35 -30 -25 Multi-Media Information Lab, NTHU -20 -15 -10 -5 0 5 Reweighting function Take a look at this case We got a problem here 4 30 25 3.5 20 3 15 2.5 10 2 C1 C1 5 C2 1.5 C3 0 1 -5 0.5 -10 0 -10 0 10 20 30 40 -40 -35 -30 -25 -20 However, the proper projection is … 12/30 Multi-Media Information Lab, NTHU -15 -10 -5 0 5 Reweighting function Solved by reweighting Every groups in the same class are weighted by the distance from the mean of the class Farer groups have the larger weights The modified criteria becomes … min J a subject to a T a 1 J a a wsi , s j , sk rwi, j , p, q M ij , pq rwi, k , r , s M ik ,rs a i , j , k , p , q , r , s T M ij , pq mi , p m j ,q mi , p m j ,q T rwi, j , p, q mi , p mi m j ,q m j 2 13/30 2 Multi-Media Information Lab, NTHU Reweighting function 35 4 30 3 25 2 20 1 15 0 C1 C1 C2 10 C3 -40 -35 -30 -25 -20 -15 -10 -5 0 5 4 5 3 0 2 C1 C2 C3 1 -5 -10 -10 0 14/30 10 20 30 40 0 -25 -20 -15 Multi-Media Information Lab, NTHU -10 -5 0 5 Non-linear Ranking Projection It is impossible to find a linear projection that have the order (c3, c2, c1, c4) 6 C1 4 C2 2 0 -2 C3 -4 C4 -6 -8 15/30 -6 -4 -2 0 2 4 6 8 Multi-Media Information Lab, NTHU General Idea of Kernel Transform the data into the high dimensional space through , and do the ranking projection on this space The projection algorithm can be done by using the dot product, i.e. t Hence, we can define the term t k x, y x y k x, y is called the Gram matrix (the discussion of the validation of the kernel is skip here) Several kernels: 16/30 Polynomial kernel Gaussian kernel Radius base kernel … etc. Multi-Media Information Lab, NTHU Non-linear Ranking Projection Using “kernelized” approach to find a non-linear projection Consider the criteria of basic linear case min J a subject to a T a 1 J a a T M 12 M 13 M 12 M 14 M 13 M 14 M 23 M 24 a M ij mi m j mi m j T Similar to the kernelized LDA (KDA), we can let the projection vector be a N x mi 17/30 i 1 1 Ni i i x Ni i 1 i Multi-Media Information Lab, NTHU Non-linear Ranking Projection Then a mi j 1 j x j k 1 xk 1 1 N N N t j 1 j k i 1 x j xk j 1 j Ni Ni t N t 1 Ni Ni k x , x k 1 j k Ni j 1 j i , j N Thus a t M 12a a t m1 m2 m1 m2 a t 1 2 1 2 tU12 t 18/30 t Multi-Media Information Lab, NTHU Non-linear Ranking Projection The kernelized criteria becomes min J subject to T 1 J T U12 U13 U12 U14 U13 U14 U 23 U 24 U ij i j i j T Extending to ordinal weighting and grouped class is straightforward. Extending to re-weighting is more delicate. 19/30 Multi-Media Information Lab, NTHU 6 Results C1 4 C2 2 Experiments 1 0 -2 Order: c3, c1, c4, c2 C3 -4 C4 -6 5 Polynomial kernel, degree=2 -8 -6 -4 -2 0 2 4 6 8 4.5 C3 4 3.5 C1 3 2.5 C4 2 1.5 C2 1 0.5 0 -3 -2.5 -2 -1.5 -1 -0.5 0 5 4 x 10 4.5 Polynomial kernel, degree=3 C3 4 3.5 C1 3 2.5 C4 2 1.5 C2 1 20/30 Multi-Media Information Lab, NTHU 0.5 0 -6 -4 -2 0 2 4 Results 6 C1 4 C2 2 Order: c3, c1, c4, c2 0 -2 C3 -4 C4 -6 -8 -6 -4 -2 0 2 4 6 8 5 4.5 C3 4 3.5 Gaussian kernel C1 3 2.5 C4 2 1.5 C2 1 0.5 0 21/30 -4 -3 -2 -1 0 Multi-Media Information Lab, NTHU 1 2 25 Results C3 20 15 Experiments 2 C2 10 5 C1 0 Order: c3, c2, c1 -5 C1 -10 -5 0 5 10 15 20 25 30 35 40 4 3.5 C3 Polynomial kernel, degree=2 3 2.5 C2 2 1.5 C1 C1 1 0.5 0 1 2 3 4 5 6 7 4 x 10 3.5 C3 3 Gaussian kernel 2.5 C2 2 1.5 C1 1 22/30 0.5 0 Multi-Media Information Lab, NTHU -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 10 8 Results 6 4 2 Experiments 3 Order: c3, c2, c1 C3 0 C2 -2 -4 C1 -6 -8 -10 -10 4 -5 0 5 10 15 3.5 C3 3 Polynomial kernel, degree=2 2.5 C2 2 1.5 C1 1 0.5 0 -3 -2.5 -2 -1.5 -1 -0.5 x 4 3.5 C3 3 2.5 Gaussian kernel C2 2 1.5 C1 1 0.5 23/30 0 Multi-Media Information Lab, NTHU -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 5 10 10 5 Results C3 0 Experiments 4 C2 -5 C1 Order: c3, c2, c1-10 -15 -20 4 -20 -15 -10 -5 0 5 10 15 20 3.5 Polynomial kernel, degree=2 C3 3 2.5 C2 2 1.5 C1 1 0.5 0 -5 -4.5 -4 -3.5 -3 -2.5 -2 -1.5 -1 -0.5 6 x 10 4 3.5 C3 Gaussian kernel 3 2.5 C2 2 1.5 C1 1 0.5 24/30 0 Multi-Media Information Lab, NTHU -4 -3 -2 -1 0 1 2 Results Airplane dataset 214 data points Feature dimension is 13 Scores: 1 to 7 60 50 40 30 20 10 0 25/30 1 2 3 4 5 6 Multi-Media Information Lab, NTHU 7 Results Linear ranking projection 7 6 5 4 3 2 1 -500 26/30 0 500 1000 1500 2000 2500 3000 Multi-Media Information Lab, NTHU 7 Results Polynomial kernel, degree=2 6 5 4 3 2 1 -1 7 0 1 2 3 4 5 6 14 x 10 6 Polynomial kernel, degree=5 5 4 3 2 1 -2 -1 0 1 2 3 4 5 6 7 8 35 x 10 7 6 Polynomial kernel, degree=10 5 4 3 2 1 -2 27/30 -1 0 1 2 3 Multi-Media Information Lab, NTHU 4 5 71 x 10 Results Each data points are projected onto the same points due to the computer precision Preserve the order well 7 6 5 Gaussian kernel 4 3 2 1 -0.1 28/30 -0.05 0 0.05 0.1 0.15 0.2 0.25 Multi-Media Information Lab, NTHU 0.3 0.35 Future Work Some works need to be done For grouped classes Time consuming We can use “kernelized” K-means clustering to reduce the size of the data points The re-weighting function in the high dimensional space (kernel approach) has not done yet The precision problem in the kernelized approach Potential work Derives a probabilistic model? How to cope with the “missing” data (i.e. some dimensions of features are missing)? For what kernel is appropriate? 29/30 Multi-Media Information Lab, NTHU Questions? 30/30 Multi-Media Information Lab, NTHU