long-ppt

advertisement
Ryan O'Donnell (CMU)
Yi Wu (CMU, IBM)
Yuan Zhou (CMU)
Locality Sensitive Hashing
[Indyk-Motwani '98]
h:
objects
sketches
H:
family of hash functions h s.t.
“similar” objects collide w/ high prob.
“dissimilar” objects collide w/ low prob.
Abbreviated history
Min-wise hash functions
[Broder '98]
A
0
1
1
1
0
0
1
0
0
1
1
1
0
0
0
1
0
1
B
Jaccard similarity:
| A B |
| A B |
Invented simple H s.t. Pr [h(A) = h(B)] =
Indyk-Motwani '98
Defined LSH.
Invented very simple H good for
{0, 1}d under Hamming distance.
Showed good LSH implies good
nearest-neighbor-search data structs.
Charikar '02, STOC
Proposed alternate H (“simhash”) for
Jaccard similarity.
Patented by
Google .
Many papers about LSH
Practice
Theory
Free code base [AI’04]
[Broder ’97]
Sequence comparison
in bioinformatics
Association-rule finding
in data mining
[Indyk–Motwani ’98]
[Gionis–Indyk–Motwani ’98]
[Charikar ’02]
[Datar–Immorlica–
–Indyk–Mirrokni ’04]
Collaborative filtering
[Motwani–Naor–Panigrahi ’06]
Clustering nouns by
meaning in NLP
[Andoni–Indyk ’06]
[Tenesawa–Tanaka ’07]
Pose estimation in vision
[Andoni–Indyk ’08, CACM]
•••
[Neylon ’10]
Given:
Goal:
(X, dist),
r > 0,
distance space
“radius”
c>1
“approx factor”
Family H of functions X → S
(S can be any finite set)
s.t. ∀ x, y ∈ X,
dist ( x, y ) ≤ r

dist ( x, y ) ≥ cr

Pr [h( x)  h( y )]
.25
.5
≥q
pρ.1
Pr [h( x)  h( y )]
≤q
h~ H
h~ H
dist ( x, y )  r
Theorem
 Pr [h( x)  h( y )]  q 
h~ H
dist ( x, y )  cr
[IM’98, GIM’98]
 Pr [h( x)  h( y )]  q
h~ H
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size:
query time:
O(n1+ρ)
Õ(nρ) hash fcn evals.
dist ( x, y )  r
Example
 Pr [h( x)  h( y )]  q 
h~ H
dist ( x, y )  cr
X = {0,1}d, dist = Hamming
 Pr [h( x)  h( y )]  q
h~ H
r = εd,
c=5
0
1
1
1
0
0
1
0
0
1
1
1
0
0
0
1
0
1
[IM’98]
H = { h , h , …, h }, h (x) = x
1
2
d
i
i
“output a random coord.”
dist ≤ εd
or ≥5εd
Analysis
dist ( x, y)  d

dist ( x, y)  5d

Pr [h( x)  h( y)]  1  
h~ H
= qρ
Pr [h( x)  h( y)]  1  5 = q
h~ H
(1 − 5ε)1/5 ≈ 1 − ε.
∴ ρ ≈ 1/5
(1 − 5ε)1/5 ≤ 1 − ε.
∴ ρ ≤ 1/5
In general, achieves ρ ≤ 1/c, ∀c (∀r).
Optimal upper bound
( {0, 1}d, Ham ),
S ≝ {0, 1}d ∪ {✔},
hab(x) =
dist ( x, y ) ≤ r

dist ( x, y ) ≥ cr

r > 0,
c > 1.
H ≝ {hab : dist(a,b) ≤ r}
✔
if x = a or x = b
x
otherwise
.5
.1
.01
.0001
Pr [h( x)  h( y )] =
> 0positive
h~ H
Pr [h( x)  h( y )] = 0
h~ H
The End.
Any questions?
Wait, what?
Theorem [IM’98, GIM’98]
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size:
query time:
O(n1+ρ)
Õ(nρ) hash fcn evals.
Wait, what?
Theorem [IM’98, GIM’98]
Given LSH family for (X, dist),
can solve “(r,cr)-near-neighbor search”
for n points with data structure of
size:
query time:
O(n1+ρ)
Õ(nρ) hash fcn evals.
q ≥ n-o(1)
("not tiny")
More results
For Rd with ℓp-distance:
1
 p
c
when p = 1, 0 < p < 1, p = 2
[IM’98] [DIIM’04] [AI’06]
For Jaccard similarity:
ρ ≤ 1/c [Bro’98]
For {0,1}d with Hamming distance:
[MNP’06]
.462

c
immediately
−od(1) (assuming q ≥ 2−o(d))
.462
 p
c
for ℓp-distance
Our Theorem
For {0,1}d with Hamming distance:
(∃ r s.t.)
immediately
1

c
1
 p
c
−od(1) (assuming q ≥ 2−o(d))
for ℓp-distance
Proof also yields ρ ≥ 1/c for Jaccard.
Proof
Proof:
Noise-stability is log-convex.
Proof:
A definition, and two lemmas.
Definition: Noise stability at
-т
e
Fix any arbitrary function h : {0,1}d → S.
Pick x ∈ {0,1}d at random:
x=
0
1
1
1
0
0
1
0
0
h(x) = s
Flip each bit w.p. (1-e-2т)/2 independenttly
y=
0
def:
0
1
1
0
0
1
K h ( )  Pr[h( x)  h( y)]
x~ y
1
0
h(y) = s’
Lemma 1:
For x
τ
dist(x, y) = (1  e 2 )d / 2
y,
o(d)
w.v.h.p.
≈ d
when τ ≪ 1.
Proof:
Chernoff bound and Taylor expansion.
Lemma 2:
Kh(τ) is a log-convex function of τ.
τ
0
(for any h)
log Kh(τ)
Proof uses Fourier analysis of Boolean functions.
Fourier transformation
• Theorem. f : {0, 1}d -> R can be uniquely written
as
ˆ
f ( x) 
 f (S )  
S [ n ]
S
( x)
Fourier Basis
coef.
fcns.
where
 S ( x)  (1)  (1)
1[ iS ] xi
xi
iS
i
• Proof. { S ( x)}S is an orthonormal basis of {f :
{0, 1}d -> R}.
Lemma 2:
Kh(τ) is a log-convex function of τ.
Proof:
Let hi(x) = 1h(x)=i .
K h ( )  Pr[h( x)  h( y)]   Pr[h( x)  h( y)  i]
x~ y
i
  E [hi ( x)hi ( y )]
i
x~ y
x~ y





  E   hˆi ( S )  S ( x)   hˆi (T ) T ( y ) 
x~ y
i
 S [ n ]
 T [ n ]

   hˆi (S )hˆi (T ) E [  S ( x) T ( y)]
i
S ,T [ n ]
x~ y
E [  S ( x) T ( y)]
x~ y
 E   (1)1[iS ] xi  (1)1[iT ] yi 

x~ y 
i[ d ]
i[ d ]
 E   (1)1[iS ] xi 1[iT ] yi 

x~ y 
i[ d ]
1[ iS ] xi 1[ iT ] yi
  E [( 1)
i[ d ] xi ~ yi
0
  2 |S|
e
S T
S T
] =
1
i  S, i T
0
i  S, i T
0
i  S, i T
(1  e 2 ) (1  e 2 )

2
2
 2
i  S, i T
e
Lemma 2:
Kh(τ) is a log-convex function of τ.
Proof:
Let hi(x) = 1h(x)=i .
K h ( )  Pr[h( x)  h( y)]
x~ y

i

i

S ,T [ n ]
hˆi (S )hˆi (T ) E [  S ( x) T ( y)]
x~ y
2 2 |S |
ˆ
 hi (S ) e
S [ n ]
non-neg
comb. of
log-convex
fcns.
Lemma 1:
For x
τ
y,
dist(x, y) = (1  e 2 )d / 2
o(d)
w.v.h.p.
≈ d
Lemma 2:
when τ ≪ 1.
Kh(τ) is a log-convex function of τ.
τ
0
(for any h)
log Kh(τ)
Theorem: LSH for
{0,1}d
1
requires    od (1) .
c
Proof: Say H is an LSH family for {0,1}d
with params
(εd + o(d),
cεd - o(d),
r
(c − o(1)) r
def: K H ( )  E [K h ( )]
h~ H
 E [ Pr[h( x)  h( y)]]
h~ H x~ y
 E [ Pr [h( x)  h( y)]]
x~ y h~ H
w.v.h.p.,
dist(x,y) ≈ (1 - e-т)d ≈ тd
qρ, q) .
(Non-neg. lin. comb.
of log-convex fcns.
∴ KH(τ) is also
log-convex.)
∴ KH(ε) ≳ qρ
KH(cε) ≲ q
KH(τ) is log-convex
∴ lnKH(0) =
10
∴ ln KH(ε) ≳
q
ρρln q
ln KH(cε) ≲
0
ε
cε
1
ln q
c
q q
ln
τ
ln q
ln KH(τ)
∴ ρ ln q ≤
1
ln q
c
The End.
Any questions?
Download