CS 361A (Advanced Algorithms)

CS 361A (Advanced Data Structures and Algorithms) Lecture 19 (Dec 5, 2005) Nearest Neighbors: Dimensionality Reduction and Locality-Sensitive Hashing Rajeev Motwani CS 361A 1 Metric Space • Metric Space (M,D) – For points p,q in M, D(p,q) is distance from p to q – only reasonable model for high-dimensional geometric space • Defining Properties – Reflexive: D(p,q) = 0 if and only if p=q – Symmetric: D(p,q) = D(q,p) – Triangle Inequality: D(p,q) is at most D(p,r)+D(r,q) • Interesting Cases – M  points in d-dimensional space – D  Hamming or Euclidean Lp-norms CS 361A 2 High-Dimensional Near Neighbors • Nearest Neighbors Data Structure – Given – N points P={p1, …, pN} in metric space (M,D) – Queries – “Which point pP is closest to point q?” – Complexity – Tradeoff preprocessing space with query time • Applications – vector quantization – multimedia databases – data mining – machine learning – … CS 361A 3 Known Results Query Time Storage Technique dN dN Brute-Force 2d log N N2^d+1 Voronoi Diagram Dobkin-Lipton 76 Dd/2 log N Nd/2 Random Sampling Clarkson 88 d5 log N Nd Combination Meiser 93 logd-1 N N logd-1 N Parametric Search Agarwal-Matousek 92 Paper • Some expressions are approximate • Bottom-line – exponential dependence on d CS 361A 4 Approximate Nearest Neighbor • Exact Algorithms – Benchmark – brute-force needs space O(N), query time O(N) – Known Results – exponential dependence on dimension – Theory/Practice – no better than brute-force search • Approximate Near-Neighbors – Given – N points P={p1, …, pN} in metric space (M,D) – Given – error parameter >0 – Goal – for query q and nearest-neighbor p, return r such that D(q, r)  (1  ε)D(q, p) • Justification – Mapping objects to metric space is heuristic anyway – Get tremendous performance improvement CS 361A 5 Results for Approximate NN Query Time Storage Technique Paper dd e-d dN Balanced Trees Arya et al 94 d2 polylog(N,d) N2d Random Projection Kleinberg 97 N dN polylog(N,d) log3 N N1/^2 Search Trees + Dimension Reduction Indyk-Motwani 98 dN1/log2N N1+1/log N Locality-Sensitive Hashing Indyk-Motwani 98 External Memory External Memory Locality-Sensitive Hashing Gionis-IndykMotwani 99 • Will show main ideas of last 3 results • Some expressions are approximate CS 361A 6 Approximate r-Near Neighbors • Given – N points P={p1,…,pN} in metric space (M,D) • Given – error parameter >0, distance threshold r>0 • Query – If no point p with D(q,p)<r, return FAILURE – Else, return any p’ with D(q,p’)< (1+)r • Application – Solving Approximate Nearest Neighbor – Assume maximum distance is R – Run in parallel for r  1, (1  ε), (1  ε)2 , (1  ε)3 ,, R – Time/space – O(log R) overhead – [Indyk-Motwani] – reduce to O(polylog n) overhead CS 361A 7 Hamming Metric • Hamming Space – Points in M: bit-vectors {0,1}d (can generalize to {0,1,2,…,q}d) – Hamming Distance: D(p,q) = # of positions where p,q differ • Remarks – Simplest high-dimensional setting – Still useful in practice – In theory, as hard (or easy) as Euclidean space – Trivial in low dimensions • Example – Hypercube in d=3 dimensions – {000, 001, 010, 011, 100, 101, 110, 111} CS 361A 8 Dimensionality Reduction • Overall Idea – Map from high to low dimensions – Preserve distances approximately – Solve Nearest Neighbors in new space – Performance improvement at cost of approximation error • Mapping? – Hash function family H = {H1, …, Hm} – Each Hi: {0,1}d  {0,1}t with t<<d – Pick HR from H uniformly at random – Map each point in P using same HR – Solve NN problem on HR(P) = {HR(p1), …, HR(pN)} CS 361A 9 Reduction for Hamming Spaces Theorem: For any r and small >0, there is hash family H such that for any p,q and random HR H D(p, q)  r  D(H R (p), H R (q))  (c  ε/20)t D(p, q)  (1  ε)r  D(H R (p), H R (q))  (c  ε/10)t with probability >1-, provided for some constant C, C log 2/δ t ε2 (c  ε/20)t r c CS 361A b a b (1 ε)r c a (c  ε/10)t 10 Remarks • For fixed threshold r, can distinguish between – Near D(p,q) < r – Far D(p,q) > (1+ε)r • For N points, need δ  N 2 • Yet, can reduce to O(log N)-dimensional space, while approximately preserving distances • Works even if points not known in advance CS 361A 11 Hash Family • Projection Function – Let S be ordered, multiset of s indexes from {1,…,d} – p|S:{0,1}d {0,1}s projects p into s-dimensional subspace – Example • d=5, p=01100 • s=3, S={2,2,4}  p|S = 110 • Choosing hash function HR in H – Repeat for i=1,…,t • Pick Si randomly (with replacement) from {1…d} • Pick random hash function fi:{0,1}s {0,1} • hi(p)=fi(p|Si) – HR(p) = (h1(p), h2(p),…,ht(p)) • Remark – note similarity to Bloom Filters CS 361A 12 1 p Illustration of Hashing . . . . . 0 1 1 0 0 0 1 0 1 d 0 p|S1 p|St 1 1 0 0 ... . . . . . 1 s 1 HR(p) 0 0 ... 0 s ft f1 CS 361A 0 0 h1(p) 1 1 ... 0 ht(p) 13 Analysis I • Choose random index-set S • Claim: For any p,q  D(p, q)  Pr[ p S  q S]  1   d   s • Why? – p,q differ in D(p,q) bit positions – Need all s indexes of S to avoid these positions – Sampling with replacement from {1, …,d} CS 361A 14 Analysis II • Choose s=d/r • Since 1-x<e-x for |x|<1, we obtain s  D(p, q)   D(p, q)/r Pr[ p S  q S]  1   e  d   • Thus D(p, q)  r D(p, q)  (1  ε)r CS 361A  Pr[ p S  q S]  e 1  Pr[ p S  q S]  e 1  ε/3 15 Analysis III • Recall hi(p)=fi(p|Si) • Thus Pr[h i (p)  h i (q)]  (1  Pr[ p Si  q Si ]) 1/2  Pr[ p Si  q Si ]  0  (1  Pr[ p Si  q Si ])/2 • Choosing c= ½ (1-e-1) D(p, q)  r  Pr[h i (p)  h i (q)]  (1  e -1 ) 1/2  c D(p, q)  (1  ε)r  Pr[h i (p)  h i (q)]  (1  e -1  ε/3) 1/2  c  ε/6 CS 361A 16 Analysis IV • Recall HR(p)=(h1(p),h2(p),…,ht(p)) • D(HR(p),HR(q)) = number of i’s where hi(p), hi(q) differ • By linearity of expectations E[D(H R (p), H R (q))]  iPr[h i (p)  h i (q)]  t  Pr[h i (p)  h i (q)] • Theorem almost proved D(p, q)  r  E[D(H R (p), H R (q))]  ct D(p, q)  (1  ε)r  E[D(H R (p), H R (q))]  (c  ε/6)t • For high probability bound, need Chernoff Bound CS 361A 17 Chernoff Bound • Consider Bernoulli random variables X1,X2, …, Xn – Values are 0-1 – Pr[Xi=1] = x and Pr[Xi=0] = 1-x • Define X = X1+X2+…+Xn with E[X]=nx • Theorem: For independent X1,…, Xn, for any 0<<1, Pr X - nx  βnx   2e P CS 361A  β 2 nx/3 2nx nx X 18 Analysis V • Define – Xi=0 if hi(p)=hi(q), and 1 otherwise – n=t – Then X = X1+X2+…+Xt = D(HR(p),HR(q)) • Case 1 [D(p,q)<r  x=c] Pr[X  (c  ε/20)t]  Pr[ X  tx  εtc/20]  2e  ( /20)2 tc/3 • Case 2 [D(p,q)>(1+ε)r  x=c+ε/6] Pr[X  (c  ε/10)t]  Pr[ X  tx  εtc/20]  2e  (/20)2 tc/3 • Observe – sloppy bounding of constants in Case 2 CS 361A 19 Putting it all together • Recall t C log 2/δ ε2 • Thus, error probability 2e ( / 20) 2 tc/3  2e (cC/1200)log 2 δ • Choosing C=1200/c 2e (cC/1200)log 2 δ  2e log 2 δ δ • Theorem is proved!! CS 361A 20 Algorithm I • Set error probability δ  1/poly(N)  t  O(ε 2log N) • Select hash HR and map points p  HR(p) • Processing query q – Compute HR(q) – Find nearest neighbor HR(p) for HR(q) – If D(p, q)  (1  ε)r then return p, else FAILURE • Remarks -2 – Brute-force for finding HR(p) implies query time O(ε N log N) – Need another approach for lower dimensions CS 361A 21 Algorithm II • Fact – Exact nearest neighbors in {0,1}t requires – Space O(2t) – Query time O(t) • How? – Precompute/store answers to all queries – Number of possible queries is 2t • Since t  O(ε 2log N) • Theorem – In Hamming space {0,1}d, can solve approximate nearest neighbor with: O(1/ε 2 ) – Space N 2 O(ε log N) – Query time CS 361A 22 Different Metric • Many applications have “sparse” points – Many dimensions but few 1’s – Example – pointsdocuments, dimensionswords – Better to view as “sets” • Previous approach would require large s • For sets A,B, define sim(A, B)  AB AB • Observe – A=B  sim(A,B)=1 – A,B disjoint  sim(A,B)=0 • Question – Handling D(A,B)=1-sim(A,B) ? CS 361A 23 Min-Hash • Random permutations p1,…,pt of universe (dimensions) • Define mapping hj(A)=mina in A pj(a) • Fact: Pr[hj(A)= hj(B)] = sim(A,B) • Proof? – already seen!! • Overall hash-function HR(A) = (h1(A), h2(A),…,ht(A)) CS 361A 24 Min-Hash Analysis • Select C log 1/δ t ε2 • Hamming Distance – D(HR(A),HR(B))  number of j’s such that h j (A)  h j (B) • Theorem For any A,B, Pr D(H(A), H(B)) - (1 - sim(A, B))t  εt   δ • Proof? – Exercise (apply Chernoff Bound) • Obtain – ANN algorithm similar to earlier result CS 361A 25 Generalization • Goal – abstract technique used for Hamming space – enable application to other metric spaces – handle Dynamic ANN • Dynamic Approximate r-Near Neighbors – Fix – threshold r – Query – if any point within distance r of q, return any point within distance (1 ε)r – Allow insertions/deletions of points in P • Recall – earlier method required preprocessing all possible queries in hash-range-space… CS 361A 26 Locality-Sensitive Hashing • Fix – metric space (M,D), threshold r, error ε0 • Choose – probability parameters Q1 > Q2>0 Definition – Hash family H={h:MS} for (M,D) is called (r,. ε, Q1 , Q2 ) -sensitive, if for random h and for any p,q in M D(p, q)  r  D(p, q)  (1  ε)r  Prh(q)  h(p)   Q1 Prh(q)  h(p)   Q 2 • Intuition CS 361A – p,q are near  likely to collide – p,q are far  unlikely to collide 27 Examples • Hamming Space M={0,1}d – point p=b1…bd – H = {hi(b1…bd)=bi, for i=1…d} – sampling one bit at random – Pr[hi(q)=hi(p)] = 1 – D(p,q)/d • Set Similarity D(A,B) = 1 – sim(A,B) – Recall sim(A, B)  – H= AB AB {h π : h π (A)  min aA π(A)} – Pr[h(A)=h(B)] = 1 – D(A,B) CS 361A 28 Multi-Index Hashing • Overall Idea – Fix LSH family H – Boost Q1, Q2 gap by defining G = H k – Using G, each point hashes into l buckets • Intuition – r-near neighbors likely to collide – few non-near pairs in any bucket • Define – G = { g | g(p) = h1(p)h2(p)…hk(p) } – Hamming metric  sample k random bits CS 361A 29 Example (l=4) h1 g1 …… hk p q g2 g3 g4 CS 361A r 30 Overall Scheme • Preprocessing – Prepare hash table for range of G – Select l hash functions g1, g2, …, gl • Insert(p) – add p to buckets g1(p), g2(p), …, gl(p) • Delete(p) – remove p from buckets g1(p), g2(p), …, gl(p) • Query(q) – Check buckets g1(q), g2(q), …, gl(q) – Report nearest of (say) first 3l points • Complexity – Assume – computing D(p,q) needs O(d) time – Assume – storing p needs O(d) space – Insert/Delete/Query Time – O(dlk) – Preprocessing/Storage – O(dN+Nlk) CS 361A 31 Collision Probability vs. Distance 1 Q1 Q2 0 r r CS 361A Pcoll  Q r r k,l Pcoll  1  (1  Q k )l 32 Multi-Index versus Error • Set l=Nz where z  log 1/Q1 log 1/Q 2 Theorem For l=Nz, any query returns r-near neighbor correctly with probability at least 1/6. • Consequently (ignoring k=O(log N) factors) – Time O(dNz) – Space O(N1+z) 1 – Hamming Metric  z  lε CS 361A – Boost Probability – use several parallel hash-tables 33 Analysis • Define (for fixed query q) – p* – any point with D(q,p*) < r – FAR(q) – all p with D(q,p) > (1+ ε )r – BUCKET(q,j) – all p with gj(p) = gj(q) – Event Esize:  j1 FAR(q)  BUCKET(q, j)  3l (query cost bounded by O(dl)) l – Event ENN: gj(p*) = gj(q) for some j (nearest point in l buckets is r-near neighbor) • Analysis – Show: Pr[Esize] = x > 2/3 and Pr[ENN] = y > 1/2 – Thus: Pr[not(Esize & ENN)] < (1-x) + (1-y) < 5/6 CS 361A 34 Analysis – Bad Collisions • Choose k  log 1/Q N 2 1 • Fact p  FAR(q)  Prp  BUCKET(q, j)   Q  N • Clearly 1 E FAR(q)  BUCKET(q, j)   N   1 N k 2    E  j1 FAR(q)  BUCKET(q, j)  l l • Markov Inequality – Pr[X>r.E[X]]<1/r, for X>0   • Lemma 1 PrEsize   Pr  j1 FAR(q)  BUCKET(q, j)  3l  CS 361A l 1 3 35 Analysis – Good Collisions   • Observe  Pr g j (p*)  g j (q)  Q  Q k 1 N  log1/Q2 N 1 log1/Q1 log1/Q2  N z • Since l=nz  PrE NN   1  1  Prg j (p*)  g j (q)  l   1 1 N  1 z  Nz 1 e • Lemma 2 Pr[ENN] >1/2 CS 361A 36 Euclidean Norms • Recall – x=(x1, x2, …, xd) and y=(y1, y2, …, yd) in Rd – L1-norm x  y 1  i 1 x i  yi d – Lp-norm (for p>1) xy p  CS 361A p p   x  y i1 i i d 37 Extension to L1-Norm • Round coordinates to {1,…M} • Embed L1-{1,…,M}d into Hamming-{0,1}dM • Unary Mapping (x 1 ,, x d )  1 10     0    0  1  10 x1 M  x1 xd Mxd (y1 ,, y d )  1    0  1 10    10    0 y1 M  y1 yd M  yd • Apply algorithm for Hamming Spaces – Error due to rounding of 1/M  M  Ω(1/ε) – Space-Time Overhead due to mapping of d  dM CS 361A 38 Extension to L2-Norm • Observe – Little difference in L1-norm and L2-norm for high d – Additional error is small • More generally – Lp, for 1  p  2 – [Figiel et al 1977, Johnson-Schechtman 1982] – Can embed Lp into L1 – Dimensions d  O(d) – Distances preserved within factor (1+a) – Key Idea – random rotation of space CS 361A 39 Improved Bounds • [Indyk-Motwani 1998] – For any Lp-norm – Query Time – O(log3 N) – Space – N O(1/ε 2 ) • Problem – impractical • Today – only a high-level sketch CS 361A 40 Better Reduction • Recall – Reduced Approximate Nearest Neighbors to Approximate r-Near Neighbors – Space/Time Overhead – O(log R) – R = max distance in metric space • Ring-Cover Trees – Removed dependence on R – Reduced overhead to O(polylog N) CS 361A 41 Approximate r-Near Neighbors • Idea – Impose regular-grid on Rd – Decompose into cubes of side length s – Label cubes with points at distance <r • Data Structure – Query q – determine cube containing q – Cube labels – candidate r-near neighbors • Goals – Small s  lower error – Fewer cubes  smaller storage CS 361A 42 p1 p2 p3 CS 361A 43 Grid Analysis • Assume r=1 • Choose s  ε d • Cube Diameter = d  s2  ε • Number of cubes = Vol d ( d /ε )  O(ε ) d Theorem – For any Lp-norm, can solve Approx r-Near Neighbor using – Space – O(dNε  d ) – Time – O(d) CS 361A 44 Dimensionality Reduction [Johnson-Lindenstraus 84, Frankl-Maehara 88] For p  [1,2], can map points in P into subspace of dimension O(ε 2logN) while preserving all inter-point distances to within a factor 1 ε • Proof idea – project onto random lines • Result for NN 1/ε 2 – Space – O(dN ) – Time – O(polylog N) CS 361A 45 References • Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality P. Indyk and R. Motwani STOC 1998 • Similarity Search in High Dimensions via Hashing A. Gionis, P. Indyk, and R. Motwani VLDB 1999 CS 361A 46

CS 361A (Advanced Algorithms)

Related documents

Products

Support

CS 361A (Advanced Algorithms)

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib