Hashing with Graphs  g p

advertisement
Hashing with Graphs g
p
Wei Liu (Columbia
Wei
Liu (Columbia
Columbia) Jun Wang (IBM
Columbia), Jun Wang (IBM
IBM)
IBM), Sanjiv Kumar (Google
Google), and Shih‐Fu Chang (Columbia
Columbia)
June, 2011 Outline
•
•
•
•
•
Overview Overview
Graph Hashing Anchor Graph Hashing E
Experiments
i
t
Conclusions
Hashing with Graphs, ICML 2011
2
Fast NN Search
Fast NN Search
• In
In many ML/CV/IR problems, one needs a reliable many ML/CV/IR problems one needs a reliable
content‐based similarity search engine.
• The simplest method is nearest neighbor (NN) search The simplest method is nearest neighbor (NN) search
using some similarity measure such as Lp, cosine, kernel. • The exhaustive linear search into n
The exhaustive linear search into n data items takes data items takes
time. KD‐tree is faster but seriously cursed by y
y
dimensionality (c<=dd, d<20 works). Hashing with Graphs, ICML 2011
3
Hashing
• Low
Low memory usage and high search efficiency. memory usage and high search efficiency
randomized hash function
• Locality‐Sensitive Hashing (LSH) long bits/multi‐tables
Indyk et al. (since 1998)
et al (since 1998)
Semantic Hashing (RBMs)
Salakhutdinov & Hinton (2006,2007)
& Hinton (2006 2007)
Spectral Hashing (SH)
learned hash function Weiss et al (2009)
Weiss et al. (2009)
short bits/single table
Binary Reconstruction Embedding (BRE)
K li & Darrell (2010)
Kulis
&D
ll (2010)
Semi‐Supervised Hashing (SSH)
W
Wang et al. (2010)
t l (2010)
Hashing with Graphs, ICML 2011
4
Hashing Using Compact Codes
Hashing Using Compact Codes
• Learning‐based
Learning based methods try to learn compact binary codes methods try to learn compact binary codes
whose Hamming distance approaches or preserves the given similarity, e.g., L2, cosine, kernel (local measure).
• Achieve constant time search (O(1)) by flipping several bits and looking up the hash table. learning
buckets
hash function
NN list
Hashing with Graphs, ICML 2011
5
Our Mission
Our Mission
• Can
Can load millions of data, e.g., images, into memory. load millions of data e g images into memory
Very compact codes!
• Beat the existing unsupervised hashing methods (LSH, PCAH, Beat the existing unsupervised hashing methods (LSH, PCAH,
USPLH, SH, KLSH, SIKH).
• Exploit global
p
g
measure on manifolds to learn compact codes. p
Use graphs!
• Graphs tend to capture the semantic neighborhoods. Beat L2 linear scan!
1.2
1.2
+1
1
0.8
-1
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
-0.2
-0.2
-0.4
-0.4
-1 with Graphs, ICML 2011
Hashing
-0.6
-0.8
-1.5
+1
1
6
-0.6
-1
-0.5
0
0.5
1
1.5
2
2.5
3
-0.8
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
3
Outline
•
•
•
•
•
Overview
Graph Hashing Anchor Graph Hashing E
Experiments
i
t
Conclusions
Hashing with Graphs, ICML 2011
7
Graph Hashing (GH)
Graph Hashing (GH)
• A standard graph
A standard graph‐based
based hashing framework [Weiss et al. 09]
hashing framework [Weiss et al 09]
drop it by spectral relaxation
drop it by spectral relaxation x2
balanced bits
balanced bits
uncorrelated bits
uncorrelated bits
x1 A12
A: the affinity matrix of the data graph.
L: the graph Laplacian diag(A1)‐A.
L: the graph Laplacian
diag(A1) A
x1
x2
……
xn
Hashing with Graphs, ICML 2011
data points
8
Steps
Training:
1) Building a sparse matrix A of an exact neighborhood graph
(e g k‐NN graph) on n d‐dim data points needs . (e.g., k‐NN graph) on n
d‐dim data points needs
2) Computing & binarizing r eigenvectors of the sparse graph
Laplacian L needs .
needs
Test (hashing):
1) Generalizing r
Generalizing r eigenvectors to any unseen point and eigenvectors to any unseen point and
binarizing needs (interpolations over n entries) .
Training 1) is intractable for offline training; Test 1) is infeasible for online hashing!
Hashing with Graphs, ICML 2011
9
Spectral Hashing [Weiss et al. 09]
Spectral Hashing
[Weiss et al 09]
• SH
SH directly obtains the analytical eigenfunctions
directly obtains the analytical eigenfunctions cos(akx+b) cos(akx+b)
of 1D graph Laplacians based on the strong assumptions: i) data dimensions are independent to each other;
i) data dimensions are independent to each other; ii) at each dimension data is uniformly distributed. • The 1D graph Laplacian
The 1D graph Laplacian is derived from a complete graph. is derived from a complete graph.
Manifold structure of raw data is almost ignored.
• Pseudo eigenfunctions
g
& pseudo‐linear hash functions.
p
Hashing with Graphs, ICML 2011
10
Toy Data
Toy Data
• 12k 2D points: blue color denotes the positive eigenvector entries and red color denotes the negative entries.
smooth yet balanced our approach
SH
GH
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
0.4
0.4
0.4
0.2
0.2
0.2
0
0
0
-0.2
-0.2
-0.2
-0.4
-0.4
-0.4
-0.6
-0.6
-0.6
-00.88
-00.88
-0
0.8
8
-1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-1
-2.5
1
1
1
0.8
0.8
0.8
0.6
0.6
0.6
04
0.4
04
0.4
04
0.4
0.2
0.2
0.2
0
0
0
-0.2
-0.2
-0.2
-0.4
-0.4
-0.4
-0.6
-0.6
-0.6
-0.8
-0.8
-0.8
-1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-1
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
2.5
-1
1st
eigenvector
-2
-1.5
-2.5
-2
-1.5
Hashing with Graphs, ICML
2011
-1
-0.5
0
0.5
1
1.5
2
2.5
2nd
eigenvector
-1
-0.5
0
0.5
1
1.5
2
2.5
11
Outline
•
•
•
•
•
Overview
Graph Hashing Anchor Graph Hashing E
Experiments
i
t
Conclusions
Hashing with Graphs, ICML 2011
12
Idea
• In
In Training 1), we utilize our proposed large graph Training 1) we utilize our proposed large graph
Anchor Graph whose space/time complexities are both linear.
W. Liu,
W
Liu J. He, and S.‐F. Chang
J He and S F Chang
Large Graph Construction for Scalable Semi‐Supervised Learning
ICML 2010 (code released)
Hashing with Graphs, ICML 2011
13
Anchor Graph [Liu et al 10]
Anchor Graph [Liu et al. 10]
• Nonnegative
Nonnegative, sparse, and low
sparse and low‐rank
rank affinity matrices.
affinity matrices
• Approximate the exact neighborhood graph when the number of anchors is sufficiently large.
number of anchors is sufficiently large. 100 anchor points
10‐NN graph
Hashing with Graphs, ICML 2011
Anchor Graph
14
Anchors
• Introduce
Introduce a few anchor
a few anchor points (K
points (K‐means
means clustering centers) clustering centers)
and do the data‐to‐anchor similarity computation (nXm).
• The data‐to‐anchor similarity matrix • Obtain the data‐to‐data similarity matrix  using Z.
Â18>0
x1
anchor points
Z12
Z16
Z11
u1
x8
u6
Â14=0
0 u5
Inner product?
u2
u3
u4
x4
data points
Hashing with Graphs, ICML 2011
15
Low Rank Affinity Matrix
Low‐Rank Affinity Matrix
• Kernel‐based truncated Z design ( ):
g
Note Z1=1 such that Z acts as a mapping anchorsdata
pp g
• The Anchor Graph affinity matrix The Anchor Graph affinity matrix
Rank<= m, only computing and saving Z in memory;
a doubly stochastic matrix, so the graph Laplacian
is I‐Â
 and has the same eigenvectors as those of Â.
Â
Hashing with Graphs, ICML 2011
16
Anchor Graph Hashing (AGH)
Anchor Graph Hashing (AGH)
• Solve the relaxed GH problem using the Anchor Graph
Solve the relaxed GH problem using the Anchor Graph
• D
Due to low‐rank of Â, its r
l
k f i
eigenvectors Y = ZW (excluding i
Y ZW ( l di
the trivial 1) can be easily solved via a small eigenvalue
decomposition on the mXm matrix .
decomposition on the mXm
matrix
•
looks like r “eigenvectors” on m
anchors.
• The r‐dim Hamming Embeddings are sgn(ZW). Hashing with Graphs, ICML 2011
17
Eigenvector to Eigenfunction
Eigenvector to Eigenfunction
• Suppose one eigenvector y
Suppose one eigenvector y = Zw
= Zw
w
Hashing with Graphs, ICML 2011
18
Theorem
Given m anchor points U={u
anchor points U={uj} and any sample x, define a } and any sample x define a
Given m
feature map as follows
where =1 iff anchor uj is one of s nearest anchors of x in U
according to the distance function D(∙). Then the NystrÖm
g
()
y
eigenfunction extended from the Anchor Graph Laplacian
eigenvector yk = Zwk is
interpolation over m
entries
i
l i
i
Hashing with Graphs, ICML 2011
19
Steps
Training:
1) Building an Anchor Graph (Z) on n d‐dim data points needs (T is the iteration number of K‐means). is the iteration number of K‐means)
2) Computing & binarizing r eigenvectors of the Anchor Graph Laplacian needs .
needs
.
3) Build the hash table O(n).
Test (hashing):
Test (hashing):
1) Generalizing r eigenvectors to any unseen point via
NystrÖm extension and binarizing
extension and binarizing needs .
needs
2) Inverse lookup in the hash table O(1). Linear training time & constant hashing time!
Hashing with Graphs, ICML 2011
20
Hierarchical Hashing
Hierarchical Hashing • The
The higher eigenvec/eigenfunc
higher eigenvec/eigenfunc corresponding to the higher corresponding to the higher
graph Laplacian eigenvalue is of low quality for partitioning. [Shi et al. 00]
[Shi et al. 00]
• We propose two‐layer hashing to revisit the lower (smoother) eigenfuncs to generate multiple hash bits.
Eigenvalues of Anchor Graph Laplacian
1
0.9
reuse
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Hashing with Graphs, ICML 2011
0
0
50
21
100
150
200
250
300
Hierarchical Hashing
Hierarchical Hashing
balanced partitioning
Hashing with Graphs, ICML 2011
22
1 AGH vs 2 AGH
1‐AGH vs. 2‐AGH
• One‐Layer
One Layer AGH = Binarizing
AGH = Binarizing r eigenfuncs into r
into r hash bits
hash bits
hash functions
r‐bit code
• Two‐Layer AGH = Hierarchically thresholding r/2 eigenfuncs
into r hash bits
r‐bit code
Hashing with Graphs, ICML 2011
23
Outline
•
•
•
•
•
Overview
Graph Hashing Anchor Graph Hashing E
Experiments
i
t
Conclusions
Hashing with Graphs, ICML 2011
24
MNIST
• 70k
70k digit images from 10 classes, 1000 queries uniformly digit images from 10 classes 1000 queries uniformly
sampled from 10 classes.
• Compare 8 unsupervised hashing methods, L
Compare 8 unsupervised hashing methods, L2 2 linear scan,
linear scan,
and Spectral Embedding (SE) L2 linear scan (real‐valued version of 1‐AGH).
)
• Measure Hamming radius 2 precision (truly hashing) and Hamming ranking accuracy (brute‐force search) in terms of true neighbors which take the same class label.
Hashing with Graphs, ICML 2011
25
MNIST
much higher
0.8
drop
0.6
0.4
0.2
0
12 16
24
32
48
64
(b) Precision vs. # anchors
LSH
PCAH
USPLH
SH
KLSH
SIKH
1-AGH
2 AGH
2-AGH
Precision off top-5000 n
P
neighbors
Pre
ecision with
hin Hammin
ng radius 2
(a) Precision vs. # bits
1
0.75
2‐AGH>SE L2 > L2
0.7
0.65
l2 Scan
0.6
SE l2 Scan (48 dims)
1-AGH (48 bits)
2-AGH (48 bits)
0.55
0.5
0.45
0.4
0
4
100
# bits
200
300
400
500
600
# anchors
(a) Precision
Precision within Hamming radius 2 using hash lookup within Hamming radius 2 using hash lookup
(fix m=300 to run AGH); (b) Hamming ranking precision of top‐5k ranked neighbors. Hashing with Graphs, ICML 2011
26
MNIST
(c) Precision curves (1-AGH vs. 2-AGH)
0 95
0.95
l2 Scan
1-AGH (24 bits)
1-AGH (48 bits)
2-AGH ((48 bits))
0.8
Recall 2‐AGH (r) > 1‐AGH (r)
0.7
0.6
0.85
Recall
Precision
0.9
(d) Recall curves (1-AGH vs. 2-AGH)
09
0.9
0.8
0.5
0.4
2 AGH ( ) 1 AGH ( /2)
2‐AGH (r) > 1‐AGH (r/2)
l2 Scan
0.3
0.75
1-AGH (24 bits)
1-AGH (48 bits)
2-AGH (48 bits)
0.2
07
0.7
50100 200 300 400 500 600 700 800 900 1000
0.1
0
1
0
# samples
0.5
1
# samples
1.5
2
x 10
4
(c) Hamming ranking precision curves (fix m=300); (c) Hamming ranking precision curves (fix m
300); (d) Hamming ranking recall curves (fix m=300).
Hashing with Graphs, ICML 2011
27
NUS WIDE
NUS‐WIDE
(a) Precision vs. # bits
0.7
(b) Precision vs. # anchors
LSH
PCAH
USPLH
SH
KLSH
SIKH
1-AGH
2-AGH
much higher
g
0.6
0.5
drop
04
0.4
0.3
0.2
0.1
0
12 16
24
32
48
64
Precision of top-50
000 neighbo
ors
Precision within Ham
mming radiu
us 2
• About
About 270k web images from 81 semantic tags, 2100 query
270k web images from 81 semantic tags 2100 query
images sampled from 21 frequent tags. True neighbors are those sharing at least one semantic tag
those sharing at least one semantic tag. 0.49
0.485
G S 2 2 > L2
2‐AGH>SE L
0.48
0.475
l2 Scan
0.47
SE l2 Scan (48 dims)
1-AGH (48 bits)
2-AGH (48 bits)
0.465
0 46
0.46
0.455
0.45
100
# bit
bits
200
300
400
500
600
# anchors
h
Hashing with Graphs, ICML 2011
28
NUS WIDE
NUS‐WIDE
• (a)(c)(d) fix m=300. (b) top
(a)(c)(d) fix m=300 (b) top‐5k
5k hamming ranking precision. hamming ranking precision
(c) Precision curves (1-AGH vs. 2-AGH)
(d) Recall curves (1-AGH vs. 2-AGH)
0.57
0.25
l2 Scan
0.56
1-AGH (24 bits)
1-AGH (48 bits)
2-AGH (48 bits)
02
0.2
0.54
Re
ecall
Precision
0.55
0.53
0.52
Recall 2‐AGH (r) > 1‐AGH (r)
0.15
0.1
2‐AGH (r) > 1‐AGH (r/2)
0.51
l2 Scan
1-AGH (24 bits)
1-AGH (48 bits)
2-AGH (48 bits)
0.05
05
0.5
0.49
50100 200 300 400 500 600 700 800 900 1000
0
0
# samples
1
2
# samples
Hashing with Graphs, ICML 2011
3
4
x 10
4
29
Hamming Ranking
Hamming Ranking
MAP on MNIST
Method
r=24
r=48
r=24
0 4125
0.4125
L2 Scan
S
r=48
0 4523
0.4523
0.5269
0.3909
0.4866
0.4775
BRE (NIPS’10)
BRE (NIPS
10)
0 2638
0.2638
0 3090
0.3090
0 4100
0.4100
0 4229
0.4229
SH (NIPS’09)
0.2699
0.2453
0.3609
0.3420
KLSH (ICCV’09)
KLSH (ICCV
09)
0.2555
0 3049
0.3049
0.4232
0 4157
0.4157
SIKH (NIPS’10)
0.1947
0.1972
0.3270
0.3094
USPLH (ICML’10)
0.4699
0.4930
0.4269
0.4322
1‐AGH
0.4997
0.3971
0.4762
0.4761
2‐AGH
0.6738
0.6410
0.4699
0.4779
supervised SE L2 Scan
linear
MP/5k on NUS‐WIDE
Hashing with Graphs, ICML 2011
30
Outline
•
•
•
•
•
Overview
Graph Hashing Anchor Graph Hashing E
Experiments
i
t
Conclusions
Hashing with Graphs, ICML 2011
31
Conclusions
• Under
Under the unsupervised scenario, almost all hashing the unsupervised scenario almost all hashing
methods are inferior to L2 linear scan in search accuracy. • O
Our AGH method outperforms
AGH
th d t f
L2 scan in terms of i t
f
searching semantic neighbors. • We have applied Anchor Graphs to achieve scalability in W h
li d A h G h t
hi
l bilit i
SSL, clustering, and web image search.
• AGH will have wide applications, even only the compact AGH ill h
id
li i
l h
code generation scheme. Hashing with Graphs, ICML 2011
32
Download