Semi‐Supervised Distance Metric  Learning Wei Liu

advertisement
Semi‐Supervised Distance Metric Learning
Wei Liu
wliu@ee.columbia.edu
DVMM
Columbia University
Wei Liu
Outline
z Background
z Related Work
z Learning Framework
z Collaborative Image Retrieval z Future Research
DVMM
Columbia University
Wei Liu
Background
z Euclidean distance d ( x1 , x2 ) = ( x1 − x2 ) ( x1 − x2 ) = x1 − x2
T
2
z Mahalanobis distance d M ( x1 , x2 ) = ( x1 − x2 )T P −1 ( x1 − x2 )
z We find a distance metric in terms of a square matrix A
d A ( x1 , x2 ) = ( x1 − x2 ) A( x1 − x2 ) = x1 − x2
T
DVMM
Columbia University
2
A
Wei Liu
Background
z A ∈ \ d ×d is positive semi‐definite. z A linear subspace can be learned as a low‐rank U ∈ \ d ×m
T
approximation of the metric A = UU .
z Under U, the distance becomes the in‐subspace distance
d A ( x1 , x2 ) = ( x1 − x2 ) A( x1 − x2 ) = U ( x1 − x2 )
T
T
2
z Semi‐supervised settings: 1. labeled and unlabeled data. 2. Partially pairwise similarities&dissimilarities.
z This paper deals with setting 2 using a geometric intuition: similar points are near each other but dissimilar points are far
away under the target metric.
DVMM
Columbia University
Wei Liu
Related Work
z Supervised distance metric learning
‡ A. Globerson et al, “Metric Learning by Collapsing Classes”, in NIPS 18, 2006.
‡ K. Weinberger et al, “Distance Metric Learning for Large Margin Nearest Neighbor Classification”, in NIPS 18, 2006.
z Semi‐supervised distance metric learning
‡ E. P. Xing et al, “Distance Metric Learning, with Application to Clustering with Side‐Information”, in NIPS 15, 2003.
‡ A. Bar‐Hillel et al, “Learning A Mahalanobis Metric from Equivalence Constraints”, JMLR, 6:937–965, 2005.
‡ S. C. Hoi, W. Liu, M. R. Lyu, and W.‐Y. Ma, “Learning Distance Metrics with Contextual Constraints for Image Retrieval”, in Proc. CVPR, 2006. (My Paper)
DVMM
Columbia University
Wei Liu
Xing et al. NIPS’02
min
A
s.t.
∑
( xi , x j )∈S
∑
( xi , x j )∈D
xi − x j
2
xi − x j
A
A
≥1
A;0
S: a set of positive pairs, i.e., similar pairs.
D: a set of negative pairs, i.e., dissimilar pairs.
DVMM
Columbia University
Wei Liu
Xing et al. NIPS’02
∑ xi − x j ≥ 1
z If the constraint is replaced with ,
2
A is
always rank 1, which implies that the data are always projected onto a hyperline.
z Learning with only “labeled” points might result in overfitting.
z Apply to clustering, low classification performance.
( xi , x j )∈D
DVMM
Columbia University
A
Wei Liu
Graph Laplacian
z Construct a graph G(V,E,W) given .
X = [ x1 , x2,..., xn ] ∈ \ d ×n
Set the weight matrix by if xi is among k‐NN of xj
Wij = 1
or xj is among k‐NN of xi. The Laplacian matrix L=D‐W.
n
n
g ( y ) = ∑∑ ( yi − y j ) 2Wij = yT Ly
i =1 j =1
Linear : y = X T u ⇒ g (u ) = u T XLX T u
y ∈ \ n is a 1D embedding such as a linear embedding.
g ( y ) measure the extent of smoothness.
DVMM
Columbia University
Wei Liu
Graph Laplacian Regularization
z Assume the subspace related to the desired metric A
is U = [u1 , u2,...,um ] ∈ \ d ×m, and then formulate a smoothness term which is linear in A.
m
g ( A) = ∑ uiT XLX T ui = tr (U T XLX T U )
i =1
= tr ( XLX T UU T ) = tr ( XLX T A)
Notice tr ( AB ) = tr ( BA)
DVMM
Columbia University
Wei Liu
Learning Framework
min
A
s.t.
t + cs
∑
( xi , x j )∈S
xi − x j
2
A
− cD
∑
( xi , x j )∈D
xi − x j
2
A
tr ( XLX T A) ≤ t
A;0
z Introduce a slack variable t that will encourage Laplacian
regularization. z The above optimization is clearly a standard form of
Semidefinite Programs (SDP), which can be solved efficiently with global optimum found by existing convex optimization packages, such as SeDuMi.
DVMM
Columbia University
Wei Liu
Collaborative Image Retrieval
z Collect the log data of user relevance feedback.
z For each log session, we can convert it into similar and dissimilar pairwise constraints. Specifically, given a specific query q, for any two images xi and xj , if they are marked as relevant, we will put them into the set of positive pairs Sq; if one of them is marked as relevant, and the other is marked as irrelevant, we will put them into the set of negative pairs Dq. z We denote the log data as Ω = {( S q , Dq ) | q = 1,..., Q}
where Q is the number of log sessions. DVMM
Columbia University
Wei Liu
Laplacian Regularized Metric Learning
Q
min
A
t +γs∑
∑
q =1 ( xi , x j )∈Sq
Q
−γD∑
tr ( A( xi − x j )( xi − x j )T )
∑
q =1 ( xi , x j )∈Dq
s.t.
tr ( A( xi − x j )( xi − x j )T )
tr ( XLX T A) ≤ t
A;0
DVMM
Columbia University
Wei Liu
Large Margin Version
min
A
s.t.
2
⎢
g ( A) + cs t + cD ∑ ⎢1 + t − xi − x j ⎥⎥
A ⎦+
( xi , x j )∈D ⎣
xi − x j
2
xi − x j
2
A
A
≤ t , ( xi , x j ) ∈ S
< xi − xl
2
A
, ( xi , x j , xl ) ∈ R
A;0
⎢⎣ x ⎥⎦ + = max{x, 0} is a linear hinge loss.
R: a set of triples of points known as relative comparisons.
DVMM
Columbia University
Wei Liu
Discussions
z Because all of the objection function and constraints are linear in A, this learning problem can also be cast into an instance of semi‐definite programming (SDP).
z The constraints charactering relative comparisons are optional. z If the original dimension of data is very high (~10^3), PCA must be done to reduce the dimension to a lower one (~10^2) ahead of metric learning time.
DVMM
Columbia University
Wei Liu
Semi‐Supervised Clustering
z Work on the similarity/dissimilarity settings.
z Spectral clustering z SSML + constrained K‐means
z Spectral embedding + SSML + constrained K‐means
DVMM
Columbia University
Wei Liu
Thanks!
http://www.ee.columbia.edu/~wliu/
DVMM
Columbia University
Wei Liu
Download