Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

advertisement
Ryan O'Donnell (CMU)
Yi Wu (CMU, IBM)
Yuan Zhou (CMU)
Locality Sensitive Hashing
h:
H:
objects
[Indyk-Motwani '98]
sketches
family of hash functions h s.t.
“similar” objects collide w/ high prob.
“dissimilar” objects collide w/ low prob.
Abbreviated history
Min-wise hash functions
[Broder '98]
A
0
1
1
1
0
0
1
0
0
1
1
1
0
0
0
1
0
1
B
Jaccard similarity:
| A B |
| A B |
Invented simple H s.t. Pr [h(A) = h(B)] =
Indyk-Motwani '98
Defined LSH.
Invented very simple H good for
{0, 1}d under Hamming distance.
Showed good LSH implies good
nearest-neighbor-search data structs.
Charikar '02, STOC
Proposed alternate H (“simhash”) for
Jaccard similarity.
Patented by
Google .
Many papers about LSH
Practice
Theory
Free code base [AI’04]
[Broder ’97]
Sequence comparison
in bioinformatics
Association-rule finding
in data mining
[Indyk–Motwani ’98]
[Gionis–Indyk–Motwani ’98]
[Charikar ’02]
[Datar–Immorlica–
–Indyk–Mirrokni ’04]
Collaborative filtering
[Motwani–Naor–Panigrahi ’06]
Clustering nouns by
meaning in NLP
[Andoni–Indyk ’06]
[Tenesawa–Tanaka ’07]
Pose estimation in vision
[Andoni–Indyk ’08, CACM]
•••
[Neylon ’10]
Given:
Goal:
(X, dist),
r > 0,
distance space
“radius”
c>1
“approx factor”
Family H of functions X → S
(S can be any finite set)
s.t. ∀ x, y ∈ X,
dist ( x, y ) ≤ r

dist ( x, y ) ≥ cr

Pr [h( x)  h( y )]
.25
.5
≥q
pρ.1
Pr [h( x)  h( y )]
≤q
h~ H
h~ H
dist ( x, y )  r
Theorem
 Pr [h( x)  h( y )]  q 
h~ H
dist ( x, y )  cr
 Pr [h( x)  h( y )]  q
[IM’98, GIM’98]
h~ H
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size:
query time:
O(n1+ρ)
Õ(nρ) hash fcn evals.
dist ( x, y )  r
Example
 Pr [h( x)  h( y )]  q 
h~ H
dist ( x, y )  cr
 Pr [h( x)  h( y )]  q
X = {0,1}d, dist = Hamming
h~ H
r = εd,
c=5
0
1
1
1
0
0
1
0
0
1
1
1
0
0
0
1
0
1
[IM’98]
H=
{ h1, h2, …, hd }, hi(x) = xi
“output a random coord.”
dist ≤ εd
or ≥5εd
Analysis
dist ( x, y)  d

dist ( x, y)  5d

Pr [h( x)  h( y)]  1  
h~ H
= qρ
Pr [h( x)  h( y)]  1  5 = q
h~ H
(1 − 5ε)1/5 ≈ 1 − ε.
∴ ρ ≈ 1/5
(1 − 5ε)1/5 ≤ 1 − ε.
∴ ρ ≤ 1/5
In general, achieves ρ ≤ 1/c, ∀c (∀r).
Optimal upper bound
( {0, 1}d, Ham ),
S ≝ {0, 1}d ∪ {✔},
hab(x) =
dist ( x, y ) ≤ r

dist ( x, y ) ≥ cr

r > 0,
c > 1.
H ≝ {hab : dist(a,b) ≤ r}
✔
if x = a or x = b
x
otherwise
.5positive
.1
.01
Pr [h( x)  h( y )] => 0.0001
h~ H
Pr [h( x)  h( y )] = 0
h~ H
The End.
Any questions?
Wait, what?
Theorem [IM’98, GIM’98]
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size:
query time:
O(n1+ρ)
Õ(nρ) hash fcn evals.
Wait, what?
Theorem [IM’98, GIM’98]
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size:
query time:
O(n1+ρ)
Õ(nρ) hash fcn evals.
q ≥ n-o(1)
("not tiny")
More results
For Rd with ℓp-distance:
1
 p
c
when p = 1, 0 < p < 1, p = 2
[IM’98] [DIIM’04] [AI’06]
For Jaccard similarity:
ρ ≤ 1/c [Bro’98]
For {0,1}d with Hamming distance:
[MNP’06]
.462

c
immediately
−od(1) (assuming q ≥ 2−o(d))
.462
 p
c
for ℓp-distance
Our Theorem
For {0,1}d with Hamming distance:
(∃ r s.t.)
immediately
1

c
1
 p
c
−od(1) (assuming q ≥ 2−o(d))
for ℓp-distance
Proof also yields ρ ≥ 1/c for Jaccard.
Proof
Proof:
Noise-stability is log-convex.
Proof:
A definition, and two lemmas.
Definition: Noise stability at
-т
e
Fix any arbitrary function h : {0,1}d → S.
Pick x ∈ {0,1}d at random:
x=
0
1
1
1
0
0
1
0
0
h(x) = s
Flip each bit w.p. (1-e-2т)/2 independently
y=
0
def:
0
1
1
0
0
1
K h ( )  Pr[h( x)  h( y)]
x~ y
1
0
h(y) = s’
Lemma 1:
For x
τ
dist(x, y) = (1  e 2 )d / 2
≈ d
y,
o(d)
w.v.h.p.
when τ ≪ 1.
Proof:
Chernoff bound and Taylor expansion.
Lemma 2:
Kh(τ) is a log-convex function of τ.
τ (for any h)
0
log Kh(τ)
Proof:
Fourier analysis of Boolean functions.
1
d
Theorem: LSH for {0,1} requires    od (1) .
c
Proof: Say H is an LSH family for {0,1}d
with params
(εd + o(d),
cεd - o(d),
r
(c − o(1)) r
def: K H ( )  E [K h ( )]
h~ H
 E [ Pr[h( x)  h( y)]]
h~ H x~ y
 E [ Pr [h( x)  h( y)]]
x~ y h~ H
w.v.h.p.,
dist(x,y) ≈ (1 - e-т)d ≈ тd
qρ, q) .
(Non-neg. lin. comb.
of log-convex fcns.
∴ KH(τ) is also
log-convex.)
∴ KH(ε) ≳ qρ
KH(cε) ≲ q
KH(τ) is log-convex
∴ lnKH(0) =
10
∴ ln KH(ε) ≳
q
ρρln q
ln KH(cε) ≲
0
ε
cε
1
ln q
c
q q
ln
τ
ln q
ln KH(τ)
1
∴ ρ ln q ≤ ln q
c
The End.
Any questions?
Download