slide

Random Grids Presented by Yonatan Glassner A work of : Dror Aiger, Efi Kokiopoulou & Ehud Rivlin 1 What’s for today? • • • • • 2 Problem & Motivation Previous work Current algorithm Results Discussion The NN problem Given a set P of points in R d : • Nearest neighbor – For any query q, returns a point p  P minimizing pq p p p 3 q p The NN problem Given a set P of points in R d : • Nearest neighbor – For any query q, returns a point p  P minimizing pq p p p 4 q p Motivation 1 – Image similarity 5 Motivation 1 – Image similarity Description p p p q p p Description 6 Motivation 2 – Suggestion algorithms x R d x R R number d 7 Large of dimensions – d & examples N d What’s for today? • • • • • 8 Problem & Motivation Previous work Current algorithm Results Discussion KNN performance key domains 9 Short history lesson KNN computation 10 Exact algorithms 1960’s 1975 Bentley • K nearest neighbors classification • K-D trees search What is the complexity? In practice: • kd-trees work “well” in “low-medium” dimensions • Near-linear query time for high dimensions 11 What can we do? (see next slide) Problem formulation q r cr 12 Problem formulation q 13 Approximate algorithms 1998 Indyk &Motwani 1998 Arya et al. 2006 David Nister and Henrik Stewenius 2009 Marius Muja, David G. Lowe 14 • Towards Removing the Curse of Dimensionality - LSH • ANN – BBD trees • Scalable Recognition with a Vocabulary Tree – K-means • FLANN Complexity summary (partial) Pre processing Exhaustive search LSH Vocabulary tree ANN 15 On query Time Space Time 𝑶(𝒅 ∙ 𝒏) 𝑶(𝒅 ∙ 𝒏) 𝑶(𝒅 ∙ 𝒏) 𝟏 𝟏+ 𝜺 𝟏 𝟏+ 𝜺 𝑶(𝒅 ∙ 𝒏 + 𝒏 ) 𝑶(𝒅 ∙ 𝑲𝑳 ) 𝑶(𝒏 ∙ 𝒍𝒐𝒈(𝒏)) 𝑶(𝒅 ∙ 𝒏 + 𝒏 ) 𝑶(𝒅 ∙ 𝟏 𝒏𝜺 ) 𝑶(𝒅 ∙ 𝑲𝑳 ) 𝑶(𝑲 ∙ 𝑳 ∙ 𝒅) 𝑶(𝒏) 𝟏 𝑶(𝒍𝒐𝒈(𝒏) ∙ 𝒅 ) 𝜺 Complexity summary (partial) Pre processing Exhaustive search LSH Vocabulary tree ANN 16 On query Time Space Time 𝑶(𝒅 ∙ 𝒏) 𝑶(𝒅 ∙ 𝒏) 𝑶(𝒅 ∙ 𝒏) 𝟏 𝟏+ 𝜺 𝟏 𝟏+ 𝜺 𝑶(𝒅 ∙ 𝒏 + 𝒏 ) 𝑶(𝒅 ∙ 𝑲𝑳 ) 𝑶(𝒅 ∙ 𝒏 + 𝒏 𝑶(𝒅 ∙ 𝑲𝑳 ) And back to present… 𝑶(𝒏 ∙ 𝒍𝒐𝒈(𝒏)) 𝑶(𝒏) ) 𝑶(𝒅 ∙ 𝟏 𝒏𝜺 ) 𝑶(𝑲 ∙ 𝑳 ∙ 𝒅) 𝟏 𝑶(𝒍𝒐𝒈(𝒏) ∙ 𝒅 ) 𝜺 What’s for today? • • • • • 17 Problem & Motivation Previous work Current algorithm Results Discussion Motivation revisited • We want to avoid exponential dependency on dimension • On query, we want to avoid dependency on dataset size • Our solution: –Random Grids 18 Theorem (the only one in the PPT) • If p and q are two points at distance at most 1 in d-dimensional Euclidean space – and we impose a randomly rotated and shifted grid of cells size w on this space, then the probability of capturing both p and q in the same cell is at least 𝑒 19 − 𝑑 𝑤 for sufficiently large w. Intuition – see next slide q 20 p q 21 p q 22 p Basic algorithm structure Pre processing • Set 𝐰 • Create 𝐦 copies of points P, randomly rotated and shifted • Index points using hash table On query(q): • Rotate q 𝑚 times, search by hash tables. From all points found – check randomly K points and return the nearest neighbor. 23 Performance 𝑶(𝒅 ∙ 𝒎 ∙ 𝒏) 𝑶(𝒅𝟐 ∙ 𝒎 ∙ 𝒏) 𝑶(𝒅𝟐 ∙ 𝒎) 24 Practical algorithm For specific dataset – set desired 𝜹 Build data structure 25 Learn w, m, k to build data structure Upon query Map-Reduce method What’s for today? • • • • • 26 Problem & Motivation Previous work Current algorithm Results Discussion Experimental settings • Data: 1M SURF descriptors (dim=64), extracted from 4000 images. • Fair comparison – auto tuning to get best results, set fixed target precision for all algorithms • Metrics – Runtime is computed over multiple queries – Accuracy = See next slide 27 Accuracy metric RRS NN p p p p p q R q p p p p p 28 p 𝟐 Accuracy= 𝟓 𝟏 Accuracy= 𝟏 Results - runtime 29 Index dataset = Query set. Precision = 0.98 RRS Results - runtime 30 Index dataset = Query set. Precision = 0.98 RRS Results - accuracy 31 Index dataset = Query set. Precision = 0.98 RRS Results - accuracy 32 Index dataset = Query set. Precision = 0.98, Radius=0.08 RRS Results - runtime 33 r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9. NN Results - runtime 34 r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9. NN Results - accuracy 35 r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9. NN Results - accuracy 36 r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9. NN What’s for today? • • • • • 37 Problem & Motivation Previous work Current algorithm Results Discussion Discussion • Pros: – Very good runtime results, w\o harming accuracy – Intuitive to parallel – Good fitting to data • Cons – Graph explanations are missing – c dependency is missing on the – Cons… 38 Questions? 39 ‫תודה על ההקשבה‬ ‫‪40‬‬ Backup 41 Locality-Sensitive Hashing Scheme Based on p-Stable Distributions 42 Tree methods in high-dimensions 43 44 Performance Pre processing space Pre processing time Query time 45 46 47 People • • • • 48 Andoni – Microsoft Indyk - MIT Nister – Microsoft Motwani – Google related Google images database size 49 Table comparison 50

slide

Related documents

Products

Support

slide

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib