slide

advertisement
Random Grids
Presented by Yonatan Glassner
A work of :
Dror Aiger, Efi Kokiopoulou &
Ehud Rivlin
1
What’s for today?
•
•
•
•
•
2
Problem & Motivation
Previous work
Current algorithm
Results
Discussion
The NN problem
Given a set P of points in R d :
• Nearest neighbor
– For any query q, returns a point p  P minimizing
pq
p
p
p
3
q
p
The NN problem
Given a set P of points in R d :
• Nearest neighbor
– For any query q, returns a point p  P minimizing
pq
p
p
p
4
q
p
Motivation 1 – Image similarity
5
Motivation 1 – Image similarity
Description
p
p
p
q
p
p
Description
6
Motivation 2 – Suggestion algorithms
x R
d
x R
R
number
d
7
Large
of
dimensions – d &
examples N
d
What’s for today?
•
•
•
•
•
8
Problem & Motivation
Previous work
Current algorithm
Results
Discussion
KNN performance key domains
9
Short history lesson
KNN computation
10
Exact algorithms
1960’s
1975
Bentley
• K nearest neighbors classification
• K-D trees search
What is the complexity?
In practice:
• kd-trees work “well” in “low-medium” dimensions
• Near-linear query time for high dimensions
11
What can we do? (see next slide)
Problem formulation
q
r
cr
12
Problem formulation
q
13
Approximate algorithms
1998
Indyk
&Motwani
1998
Arya et al.
2006
David Nister
and Henrik
Stewenius
2009
Marius Muja,
David G. Lowe
14
• Towards Removing the Curse of Dimensionality
- LSH
• ANN – BBD trees
• Scalable Recognition with a Vocabulary Tree –
K-means
• FLANN
Complexity summary (partial)
Pre processing
Exhaustive
search
LSH
Vocabulary
tree
ANN
15
On query
Time
Space
Time
𝑶(𝒅 ∙ 𝒏)
𝑶(𝒅 ∙ 𝒏)
𝑶(𝒅 ∙ 𝒏)
𝟏
𝟏+ 𝜺
𝟏
𝟏+ 𝜺
𝑶(𝒅 ∙ 𝒏 + 𝒏
)
𝑶(𝒅 ∙ 𝑲𝑳 )
𝑶(𝒏 ∙ 𝒍𝒐𝒈(𝒏))
𝑶(𝒅 ∙ 𝒏 + 𝒏
)
𝑶(𝒅 ∙
𝟏
𝒏𝜺 )
𝑶(𝒅 ∙ 𝑲𝑳 )
𝑶(𝑲 ∙ 𝑳 ∙ 𝒅)
𝑶(𝒏)
𝟏
𝑶(𝒍𝒐𝒈(𝒏) ∙ 𝒅 )
𝜺
Complexity summary (partial)
Pre processing
Exhaustive
search
LSH
Vocabulary
tree
ANN
16
On query
Time
Space
Time
𝑶(𝒅 ∙ 𝒏)
𝑶(𝒅 ∙ 𝒏)
𝑶(𝒅 ∙ 𝒏)
𝟏
𝟏+ 𝜺
𝟏
𝟏+ 𝜺
𝑶(𝒅 ∙ 𝒏 + 𝒏
)
𝑶(𝒅 ∙ 𝑲𝑳 )
𝑶(𝒅 ∙ 𝒏 + 𝒏
𝑶(𝒅 ∙ 𝑲𝑳 )
And back to present…
𝑶(𝒏 ∙ 𝒍𝒐𝒈(𝒏))
𝑶(𝒏)
)
𝑶(𝒅 ∙
𝟏
𝒏𝜺 )
𝑶(𝑲 ∙ 𝑳 ∙ 𝒅)
𝟏
𝑶(𝒍𝒐𝒈(𝒏) ∙ 𝒅 )
𝜺
What’s for today?
•
•
•
•
•
17
Problem & Motivation
Previous work
Current algorithm
Results
Discussion
Motivation revisited
• We want to avoid exponential dependency on
dimension
• On query, we want to avoid dependency on
dataset size
• Our solution:
–Random Grids
18
Theorem (the only one in the PPT)
• If p and q are two points at distance at most 1
in d-dimensional Euclidean space – and we
impose a randomly rotated and shifted grid of
cells size w on this space, then the probability
of capturing both p and q in the same cell is at
least 𝑒
19
− 𝑑
𝑤
for sufficiently large w.
Intuition – see next slide
q
20
p
q
21
p
q
22
p
Basic algorithm structure
Pre processing
• Set 𝐰
• Create 𝐦 copies of points P, randomly rotated
and shifted
• Index points using hash table
On query(q):
• Rotate q 𝑚 times, search by hash tables. From
all points found – check randomly K points
and return the nearest neighbor.
23
Performance
𝑶(𝒅 ∙ 𝒎 ∙ 𝒏)
𝑶(𝒅𝟐 ∙ 𝒎 ∙ 𝒏)
𝑶(𝒅𝟐 ∙ 𝒎)
24
Practical algorithm
For specific
dataset – set
desired 𝜹
Build data
structure
25
Learn
w, m, k to build
data structure
Upon query Map-Reduce
method
What’s for today?
•
•
•
•
•
26
Problem & Motivation
Previous work
Current algorithm
Results
Discussion
Experimental settings
• Data: 1M SURF descriptors (dim=64),
extracted from 4000 images.
• Fair comparison – auto tuning to get best
results, set fixed target precision for all
algorithms
• Metrics
– Runtime is computed over multiple queries
– Accuracy = See next slide
27
Accuracy metric
RRS
NN
p
p
p
p
p
q
R
q
p
p
p
p
p
28
p
𝟐
Accuracy=
𝟓
𝟏
Accuracy=
𝟏
Results - runtime
29
Index dataset = Query set. Precision = 0.98
RRS
Results - runtime
30
Index dataset = Query set. Precision = 0.98
RRS
Results - accuracy
31
Index dataset = Query set. Precision = 0.98
RRS
Results - accuracy
32
Index dataset = Query set. Precision = 0.98, Radius=0.08
RRS
Results - runtime
33
r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9.
NN
Results - runtime
34
r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9.
NN
Results - accuracy
35
r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9.
NN
Results - accuracy
36
r=0.3 is fixed to approximate 1 NN. Probability of report success = 0.9.
NN
What’s for today?
•
•
•
•
•
37
Problem & Motivation
Previous work
Current algorithm
Results
Discussion
Discussion
• Pros:
– Very good runtime results, w\o harming accuracy
– Intuitive to parallel
– Good fitting to data
• Cons
– Graph explanations are missing
– c dependency is missing on the
– Cons…
38
Questions?
39
‫תודה על ההקשבה‬
‫‪40‬‬
Backup
41
Locality-Sensitive Hashing Scheme
Based on p-Stable Distributions
42
Tree methods in high-dimensions
43
44
Performance
Pre
processing
space
Pre
processing
time
Query time
45
46
47
People
•
•
•
•
48
Andoni – Microsoft
Indyk - MIT
Nister – Microsoft
Motwani – Google related
Google images database size
49
Table comparison
50
Download