Semi‐Supervised Distance Metric Learning Wei Liu wliu@ee.columbia.edu DVMM Columbia University Wei Liu Outline z Background z Related Work z Learning Framework z Collaborative Image Retrieval z Future Research DVMM Columbia University Wei Liu Background z Euclidean distance d ( x1 , x2 ) = ( x1 − x2 ) ( x1 − x2 ) = x1 − x2 T 2 z Mahalanobis distance d M ( x1 , x2 ) = ( x1 − x2 )T P −1 ( x1 − x2 ) z We find a distance metric in terms of a square matrix A d A ( x1 , x2 ) = ( x1 − x2 ) A( x1 − x2 ) = x1 − x2 T DVMM Columbia University 2 A Wei Liu Background z A ∈ \ d ×d is positive semi‐definite. z A linear subspace can be learned as a low‐rank U ∈ \ d ×m T approximation of the metric A = UU . z Under U, the distance becomes the in‐subspace distance d A ( x1 , x2 ) = ( x1 − x2 ) A( x1 − x2 ) = U ( x1 − x2 ) T T 2 z Semi‐supervised settings: 1. labeled and unlabeled data. 2. Partially pairwise similarities&dissimilarities. z This paper deals with setting 2 using a geometric intuition: similar points are near each other but dissimilar points are far away under the target metric. DVMM Columbia University Wei Liu Related Work z Supervised distance metric learning A. Globerson et al, “Metric Learning by Collapsing Classes”, in NIPS 18, 2006. K. Weinberger et al, “Distance Metric Learning for Large Margin Nearest Neighbor Classification”, in NIPS 18, 2006. z Semi‐supervised distance metric learning E. P. Xing et al, “Distance Metric Learning, with Application to Clustering with Side‐Information”, in NIPS 15, 2003. A. Bar‐Hillel et al, “Learning A Mahalanobis Metric from Equivalence Constraints”, JMLR, 6:937–965, 2005. S. C. Hoi, W. Liu, M. R. Lyu, and W.‐Y. Ma, “Learning Distance Metrics with Contextual Constraints for Image Retrieval”, in Proc. CVPR, 2006. (My Paper) DVMM Columbia University Wei Liu Xing et al. NIPS’02 min A s.t. ∑ ( xi , x j )∈S ∑ ( xi , x j )∈D xi − x j 2 xi − x j A A ≥1 A;0 S: a set of positive pairs, i.e., similar pairs. D: a set of negative pairs, i.e., dissimilar pairs. DVMM Columbia University Wei Liu Xing et al. NIPS’02 ∑ xi − x j ≥ 1 z If the constraint is replaced with , 2 A is always rank 1, which implies that the data are always projected onto a hyperline. z Learning with only “labeled” points might result in overfitting. z Apply to clustering, low classification performance. ( xi , x j )∈D DVMM Columbia University A Wei Liu Graph Laplacian z Construct a graph G(V,E,W) given . X = [ x1 , x2,..., xn ] ∈ \ d ×n Set the weight matrix by if xi is among k‐NN of xj Wij = 1 or xj is among k‐NN of xi. The Laplacian matrix L=D‐W. n n g ( y ) = ∑∑ ( yi − y j ) 2Wij = yT Ly i =1 j =1 Linear : y = X T u ⇒ g (u ) = u T XLX T u y ∈ \ n is a 1D embedding such as a linear embedding. g ( y ) measure the extent of smoothness. DVMM Columbia University Wei Liu Graph Laplacian Regularization z Assume the subspace related to the desired metric A is U = [u1 , u2,...,um ] ∈ \ d ×m, and then formulate a smoothness term which is linear in A. m g ( A) = ∑ uiT XLX T ui = tr (U T XLX T U ) i =1 = tr ( XLX T UU T ) = tr ( XLX T A) Notice tr ( AB ) = tr ( BA) DVMM Columbia University Wei Liu Learning Framework min A s.t. t + cs ∑ ( xi , x j )∈S xi − x j 2 A − cD ∑ ( xi , x j )∈D xi − x j 2 A tr ( XLX T A) ≤ t A;0 z Introduce a slack variable t that will encourage Laplacian regularization. z The above optimization is clearly a standard form of Semidefinite Programs (SDP), which can be solved efficiently with global optimum found by existing convex optimization packages, such as SeDuMi. DVMM Columbia University Wei Liu Collaborative Image Retrieval z Collect the log data of user relevance feedback. z For each log session, we can convert it into similar and dissimilar pairwise constraints. Specifically, given a specific query q, for any two images xi and xj , if they are marked as relevant, we will put them into the set of positive pairs Sq; if one of them is marked as relevant, and the other is marked as irrelevant, we will put them into the set of negative pairs Dq. z We denote the log data as Ω = {( S q , Dq ) | q = 1,..., Q} where Q is the number of log sessions. DVMM Columbia University Wei Liu Laplacian Regularized Metric Learning Q min A t +γs∑ ∑ q =1 ( xi , x j )∈Sq Q −γD∑ tr ( A( xi − x j )( xi − x j )T ) ∑ q =1 ( xi , x j )∈Dq s.t. tr ( A( xi − x j )( xi − x j )T ) tr ( XLX T A) ≤ t A;0 DVMM Columbia University Wei Liu Large Margin Version min A s.t. 2 ⎢ g ( A) + cs t + cD ∑ ⎢1 + t − xi − x j ⎥⎥ A ⎦+ ( xi , x j )∈D ⎣ xi − x j 2 xi − x j 2 A A ≤ t , ( xi , x j ) ∈ S < xi − xl 2 A , ( xi , x j , xl ) ∈ R A;0 ⎢⎣ x ⎥⎦ + = max{x, 0} is a linear hinge loss. R: a set of triples of points known as relative comparisons. DVMM Columbia University Wei Liu Discussions z Because all of the objection function and constraints are linear in A, this learning problem can also be cast into an instance of semi‐definite programming (SDP). z The constraints charactering relative comparisons are optional. z If the original dimension of data is very high (~10^3), PCA must be done to reduce the dimension to a lower one (~10^2) ahead of metric learning time. DVMM Columbia University Wei Liu Semi‐Supervised Clustering z Work on the similarity/dissimilarity settings. z Spectral clustering z SSML + constrained K‐means z Spectral embedding + SSML + constrained K‐means DVMM Columbia University Wei Liu Thanks! http://www.ee.columbia.edu/~wliu/ DVMM Columbia University Wei Liu