Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU) Locality Sensitive Hashing [Indyk-Motwani '98] h: objects sketches H: family of hash functions h s.t. “similar” objects collide w/ high prob. “dissimilar” objects collide w/ low prob. Abbreviated history Min-wise hash functions [Broder '98] A 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 B Jaccard similarity: | A B | | A B | Invented simple H s.t. Pr [h(A) = h(B)] = Indyk-Motwani '98 Defined LSH. Invented very simple H good for {0, 1}d under Hamming distance. Showed good LSH implies good nearest-neighbor-search data structs. Charikar '02, STOC Proposed alternate H (“simhash”) for Jaccard similarity. Patented by Google . Many papers about LSH Practice Theory Free code base [AI’04] [Broder ’97] Sequence comparison in bioinformatics Association-rule finding in data mining [Indyk–Motwani ’98] [Gionis–Indyk–Motwani ’98] [Charikar ’02] [Datar–Immorlica– –Indyk–Mirrokni ’04] Collaborative filtering [Motwani–Naor–Panigrahi ’06] Clustering nouns by meaning in NLP [Andoni–Indyk ’06] [Tenesawa–Tanaka ’07] Pose estimation in vision [Andoni–Indyk ’08, CACM] ••• [Neylon ’10] Given: Goal: (X, dist), r > 0, distance space “radius” c>1 “approx factor” Family H of functions X → S (S can be any finite set) s.t. ∀ x, y ∈ X, dist ( x, y ) ≤ r dist ( x, y ) ≥ cr Pr [h( x) h( y )] .25 .5 ≥q pρ.1 Pr [h( x) h( y )] ≤q h~ H h~ H dist ( x, y ) r Theorem Pr [h( x) h( y )] q h~ H dist ( x, y ) cr [IM’98, GIM’98] Pr [h( x) h( y )] q h~ H Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: query time: O(n1+ρ) Õ(nρ) hash fcn evals. dist ( x, y ) r Example Pr [h( x) h( y )] q h~ H dist ( x, y ) cr X = {0,1}d, dist = Hamming Pr [h( x) h( y )] q h~ H r = εd, c=5 0 1 1 1 0 0 1 0 0 1 1 1 0 0 0 1 0 1 [IM’98] H = { h , h , …, h }, h (x) = x 1 2 d i i “output a random coord.” dist ≤ εd or ≥5εd Analysis dist ( x, y) d dist ( x, y) 5d Pr [h( x) h( y)] 1 h~ H = qρ Pr [h( x) h( y)] 1 5 = q h~ H (1 − 5ε)1/5 ≈ 1 − ε. ∴ ρ ≈ 1/5 (1 − 5ε)1/5 ≤ 1 − ε. ∴ ρ ≤ 1/5 In general, achieves ρ ≤ 1/c, ∀c (∀r). Optimal upper bound ( {0, 1}d, Ham ), S ≝ {0, 1}d ∪ {✔}, hab(x) = dist ( x, y ) ≤ r dist ( x, y ) ≥ cr r > 0, c > 1. H ≝ {hab : dist(a,b) ≤ r} ✔ if x = a or x = b x otherwise .5 .1 .01 .0001 Pr [h( x) h( y )] = > 0positive h~ H Pr [h( x) h( y )] = 0 h~ H The End. Any questions? Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: query time: O(n1+ρ) Õ(nρ) hash fcn evals. Wait, what? Theorem [IM’98, GIM’98] Given LSH family for (X, dist), can solve “(r,cr)-near-neighbor search” for n points with data structure of size: query time: O(n1+ρ) Õ(nρ) hash fcn evals. q ≥ n-o(1) ("not tiny") More results For Rd with ℓp-distance: 1 p c when p = 1, 0 < p < 1, p = 2 [IM’98] [DIIM’04] [AI’06] For Jaccard similarity: ρ ≤ 1/c [Bro’98] For {0,1}d with Hamming distance: [MNP’06] .462 c immediately −od(1) (assuming q ≥ 2−o(d)) .462 p c for ℓp-distance Our Theorem For {0,1}d with Hamming distance: (∃ r s.t.) immediately 1 c 1 p c −od(1) (assuming q ≥ 2−o(d)) for ℓp-distance Proof also yields ρ ≥ 1/c for Jaccard. Proof Proof: Noise-stability is log-convex. Proof: A definition, and two lemmas. Definition: Noise stability at -т e Fix any arbitrary function h : {0,1}d → S. Pick x ∈ {0,1}d at random: x= 0 1 1 1 0 0 1 0 0 h(x) = s Flip each bit w.p. (1-e-2т)/2 independenttly y= 0 def: 0 1 1 0 0 1 K h ( ) Pr[h( x) h( y)] x~ y 1 0 h(y) = s’ Lemma 1: For x τ dist(x, y) = (1 e 2 )d / 2 y, o(d) w.v.h.p. ≈ d when τ ≪ 1. Proof: Chernoff bound and Taylor expansion. Lemma 2: Kh(τ) is a log-convex function of τ. τ 0 (for any h) log Kh(τ) Proof uses Fourier analysis of Boolean functions. Fourier transformation • Theorem. f : {0, 1}d -> R can be uniquely written as ˆ f ( x) f (S ) S [ n ] S ( x) Fourier Basis coef. fcns. where S ( x) (1) (1) 1[ iS ] xi xi iS i • Proof. { S ( x)}S is an orthonormal basis of {f : {0, 1}d -> R}. Lemma 2: Kh(τ) is a log-convex function of τ. Proof: Let hi(x) = 1h(x)=i . K h ( ) Pr[h( x) h( y)] Pr[h( x) h( y) i] x~ y i E [hi ( x)hi ( y )] i x~ y x~ y E hˆi ( S ) S ( x) hˆi (T ) T ( y ) x~ y i S [ n ] T [ n ] hˆi (S )hˆi (T ) E [ S ( x) T ( y)] i S ,T [ n ] x~ y E [ S ( x) T ( y)] x~ y E (1)1[iS ] xi (1)1[iT ] yi x~ y i[ d ] i[ d ] E (1)1[iS ] xi 1[iT ] yi x~ y i[ d ] 1[ iS ] xi 1[ iT ] yi E [( 1) i[ d ] xi ~ yi 0 2 |S| e S T S T ] = 1 i S, i T 0 i S, i T 0 i S, i T (1 e 2 ) (1 e 2 ) 2 2 2 i S, i T e Lemma 2: Kh(τ) is a log-convex function of τ. Proof: Let hi(x) = 1h(x)=i . K h ( ) Pr[h( x) h( y)] x~ y i i S ,T [ n ] hˆi (S )hˆi (T ) E [ S ( x) T ( y)] x~ y 2 2 |S | ˆ hi (S ) e S [ n ] non-neg comb. of log-convex fcns. Lemma 1: For x τ y, dist(x, y) = (1 e 2 )d / 2 o(d) w.v.h.p. ≈ d Lemma 2: when τ ≪ 1. Kh(τ) is a log-convex function of τ. τ 0 (for any h) log Kh(τ) Theorem: LSH for {0,1}d 1 requires od (1) . c Proof: Say H is an LSH family for {0,1}d with params (εd + o(d), cεd - o(d), r (c − o(1)) r def: K H ( ) E [K h ( )] h~ H E [ Pr[h( x) h( y)]] h~ H x~ y E [ Pr [h( x) h( y)]] x~ y h~ H w.v.h.p., dist(x,y) ≈ (1 - e-т)d ≈ тd qρ, q) . (Non-neg. lin. comb. of log-convex fcns. ∴ KH(τ) is also log-convex.) ∴ KH(ε) ≳ qρ KH(cε) ≲ q KH(τ) is log-convex ∴ lnKH(0) = 10 ∴ ln KH(ε) ≳ q ρρln q ln KH(cε) ≲ 0 ε cε 1 ln q c q q ln τ ln q ln KH(τ) ∴ ρ ln q ≤ 1 ln q c The End. Any questions?