long-ppt

Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU) Locality Sensitive Hashing [Indyk-Motwani '98] h: objects sketches H: family of hash functions h s.t. “similar” objects collide w/ high prob. “dissimilar” objects collide w/ low prob. Abbreviated history Min-wise hash functions [Broder '98] A 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 B Jaccard similarity: | A B | | A B | Invented simple H s.t. Pr [h(A) = h(B)] = Indyk-Motwani '98 Defined LSH. Invented very simple H good for {0, 1}d under Hamming distance. Showed good LSH implies good nearest-neighbor-search data structs. Charikar '02, STOC Proposed alternate H (“simhash”) for Jaccard similarity. Patented by Google . Many papers about LSH Practice Theory Free code base [AI’04] [Broder ’97] Sequence comparison in bioinformatics Association-rule finding in data mining [Indyk–Motwani ’98] [Gionis–Indyk–Motwani ’98] [Charikar ’02] [Datar–Immorlica– –Indyk–Mirrokni ’04] Collaborative filtering [Motwani–Naor–Panigrahi ’06] Clustering nouns by meaning in NLP [Andoni–Indyk ’06] [Tenesawa–Tanaka ’07] Pose estimation in vision [Andoni–Indyk ’08, CACM] ••• [Neylon ’10] Given: Goal: (X, dist), r > 0, distance space “radius” c>1 “approx factor” Family H of functions X → S (S can be any finite set) s.t. ∀ x, y ∈ X, dist ( x, y ) ≤ r  dist ( x, y ) ≥ cr  Pr [h( x)  h( y )] .25 .5 ≥q pρ.1 Pr [h( x)  h( y )] ≤q h~ H h~ H dist ( x, y )  r Theorem  Pr [h( x)  h( y )]  q  h~ H dist ( x, y )  cr [IM’98, GIM’98]  Pr [h( x)  h( y )]  q h~ H Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: query time: O(n1+ρ) Õ(nρ) hash fcn evals. dist ( x, y )  r Example  Pr [h( x)  h( y )]  q  h~ H dist ( x, y )  cr X = {0,1}d, dist = Hamming  Pr [h( x)  h( y )]  q h~ H r = εd, c=5 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 [IM’98] H = { h , h , …, h }, h (x) = x 1 2 d i i “output a random coord.” dist ≤ εd or ≥5εd Analysis dist ( x, y)  d  dist ( x, y)  5d  Pr [h( x)  h( y)]  1   h~ H = qρ Pr [h( x)  h( y)]  1  5 = q h~ H (1 − 5ε)1/5 ≈ 1 − ε. ∴ ρ ≈ 1/5 (1 − 5ε)1/5 ≤ 1 − ε. ∴ ρ ≤ 1/5 In general, achieves ρ ≤ 1/c, ∀c (∀r). Optimal upper bound ( {0, 1}d, Ham ), S ≝ {0, 1}d ∪ {✔}, hab(x) = dist ( x, y ) ≤ r  dist ( x, y ) ≥ cr  r > 0, c > 1. H ≝ {hab : dist(a,b) ≤ r} ✔ if x = a or x = b x otherwise .5 .1 .01 .0001 Pr [h( x)  h( y )] = > 0positive h~ H Pr [h( x)  h( y )] = 0 h~ H The End. Any questions? Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: query time: O(n1+ρ) Õ(nρ) hash fcn evals. Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: query time: O(n1+ρ) Õ(nρ) hash fcn evals. q ≥ n-o(1) ("not tiny") More results For Rd with ℓp-distance: 1  p c when p = 1, 0 < p < 1, p = 2 [IM’98] [DIIM’04] [AI’06] For Jaccard similarity: ρ ≤ 1/c [Bro’98] For {0,1}d with Hamming distance: [MNP’06] .462  c immediately −od(1) (assuming q ≥ 2−o(d)) .462  p c for ℓp-distance Our Theorem For {0,1}d with Hamming distance: (∃ r s.t.) immediately 1  c 1  p c −od(1) (assuming q ≥ 2−o(d)) for ℓp-distance Proof also yields ρ ≥ 1/c for Jaccard. Proof Proof: Noise-stability is log-convex. Proof: A definition, and two lemmas. Definition: Noise stability at -т e Fix any arbitrary function h : {0,1}d → S. Pick x ∈ {0,1}d at random: x= 0 1 1 1 0 0 1 0 0 h(x) = s Flip each bit w.p. (1-e-2т)/2 independenttly y= 0 def: 0 1 1 0 0 1 K h ( )  Pr[h( x)  h( y)] x~ y 1 0 h(y) = s’ Lemma 1: For x τ dist(x, y) = (1  e 2 )d / 2 y, o(d) w.v.h.p. ≈ d when τ ≪ 1. Proof: Chernoff bound and Taylor expansion. Lemma 2: Kh(τ) is a log-convex function of τ. τ 0 (for any h) log Kh(τ) Proof uses Fourier analysis of Boolean functions. Fourier transformation • Theorem. f : {0, 1}d -> R can be uniquely written as ˆ f ( x)   f (S )   S [ n ] S ( x) Fourier Basis coef. fcns. where  S ( x)  (1)  (1) 1[ iS ] xi xi iS i • Proof. { S ( x)}S is an orthonormal basis of {f : {0, 1}d -> R}. Lemma 2: Kh(τ) is a log-convex function of τ. Proof: Let hi(x) = 1h(x)=i . K h ( )  Pr[h( x)  h( y)]   Pr[h( x)  h( y)  i] x~ y i   E [hi ( x)hi ( y )] i x~ y x~ y        E   hî ( S )  S ( x)   hî (T ) T ( y )  x~ y i  S [ n ]  T [ n ]     hî (S )hî (T ) E [  S ( x) T ( y)] i S ,T [ n ] x~ y E [  S ( x) T ( y)] x~ y  E   (1)1[iS ] xi  (1)1[iT ] yi   x~ y  i[ d ] i[ d ]  E   (1)1[iS ] xi 1[iT ] yi   x~ y  i[ d ] 1[ iS ] xi 1[ iT ] yi   E [( 1) i[ d ] xi ~ yi 0   2 |S| e S T S T ] = 1 i  S, i T 0 i  S, i T 0 i  S, i T (1  e 2 ) (1  e 2 )  2 2  2 i  S, i T e Lemma 2: Kh(τ) is a log-convex function of τ. Proof: Let hi(x) = 1h(x)=i . K h ( )  Pr[h( x)  h( y)] x~ y  i  i  S ,T [ n ] hî (S )hî (T ) E [  S ( x) T ( y)] x~ y 2 2 |S | ˆ  hi (S ) e S [ n ] non-neg comb. of log-convex fcns. Lemma 1: For x τ y, dist(x, y) = (1  e 2 )d / 2 o(d) w.v.h.p. ≈ d Lemma 2: when τ ≪ 1. Kh(τ) is a log-convex function of τ. τ 0 (for any h) log Kh(τ) Theorem: LSH for {0,1}d 1 requires    od (1) . c Proof: Say H is an LSH family for {0,1}d with params (εd + o(d), cεd - o(d), r (c − o(1)) r def: K H ( )  E [K h ( )] h~ H  E [ Pr[h( x)  h( y)]] h~ H x~ y  E [ Pr [h( x)  h( y)]] x~ y h~ H w.v.h.p., dist(x,y) ≈ (1 - e-т)d ≈ тd qρ, q) . (Non-neg. lin. comb. of log-convex fcns. ∴ KH(τ) is also log-convex.) ∴ KH(ε) ≳ qρ KH(cε) ≲ q KH(τ) is log-convex ∴ lnKH(0) = 10 ∴ ln KH(ε) ≳ q ρρln q ln KH(cε) ≲ 0 ε cε 1 ln q c q q ln τ ln q ln KH(τ) ∴ ρ ln q ≤ 1 ln q c The End. Any questions?

long-ppt

Related documents

Products

Support

long-ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib