Slides

advertisement
LUDWIGMAXIMILIANSUNIVERSITY
MUNICH
DEPARTMENT
INSTITUTE FOR
INFORMATICS
DATABASE
SYSTEMS
GROUP
Efficient Probabilistic Reverse Nearest Neighbor
Query Processing on Uncertain Data
Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel,
Matthias Renz, Stefan Zankl and Andreas Zuefle
Ludwig-Maximilians-Universität München (LMU)
Munich, Germany
http://www.dbs.ifi.lmu.de
{bernecker, emrich, kriegel, renz, zuefle} @dbs.ifi.lmu.de
Outline
DATABASE
SYSTEMS
GROUP
• Background
 Uncertain Data Model
 Reverse k-nearest neighbour queries
 Reverse k-nearest neighbour queries on uncertain objects
• Framework for Probabilistic RkNN Processing




Approximation
Spatial Filter
Probabilistic Filter
Verification
• Evaluation + Summary
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
2
Background
Framework
Summary
DATABASE
SYSTEMS
GROUP





Datamodel
RkNN Queries
PRkNN Queries
Objects are described by a multi-dimensional probability distribution
Object Independence Assumption
Queries are answered according to possible worlds semantic
Object PDFs can be spatially bounded
Continuous or discrete representation
User ratings for „Life of Brian“
Action
PDFX
Humor
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
3
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Datamodel
RkNN Queries
PRkNN Queries
RkNN(q) = {o  DB | q  kNN(o)}
o1
o2
What is it good for?
o3
q
o4
o5
o6
o7
 Market segmentation
 Outlier detection
 Incremental algorithms
…
R1NN(q) = {o7}
R2NN(q) = {o7, o5, o4}
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
4
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Datamodel
RkNN Queries
PRkNN Queries
„Is O‘ R1NN of Q?“
O2
O1
O‘
Q
Note: The query
object may be
uncertain.as well!
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
5
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Datamodel
RkNN Queries
PRkNN Queries
„Is O‘ R1NN of Q?“
=> In some worlds it is
O2
O1
O‘
Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
6
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Datamodel
RkNN Queries
PRkNN Queries
„Is O‘ R1NN of Q?“
=> In other worlds it is not
O2
O1
O‘
Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
7
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Datamodel
RkNN Queries
PRkNN Queries
Definition of Probabilistic RkNN
PRkNN(Q, τ) =
O2
O1
{O  DB | P(O  RkNN(Q)) ≥ τ}
{O  DB | P(Q  kNN(O)) ≥ τ}
O‘
Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
P(Q  1NN(O‘)) = 21/24
e.g. O‘  PR1NN(Q, 0.5)
8
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
Framework for PRkNN query processing
 Approximation (Indexing)
• Simplification of spatial-probabilistic keys
 Spatial Filter
• Filter objects according to simple spatial keys
 Probabilistic Filter
• Derive lower/upper bounds of qualification probability (by means
of simple spatial-probabilistic keys)
• Filter objects according to lower/upper probability bounds
 Verification
• Computation of the exact probability (very expensive)
• Monte-Carlo Sampling (many samples required)
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
9
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
R*-Tree for indexing objects (global index)
Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
10
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
AR*-Tree for indexing instances (local index)
0.3
0.15
1.0
0.15
0.15
0.25
0.15
0.1
0.2
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
0.1
0.45
11
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
Pruning based on rectangular approximations only [1].
For any O‘ intersecting
this region, Q may
possibly be closer than O.
For any O‘ in this
region, O is closer
than Q.
O
Q
For any O‘ in this
region, O is not
closer than Q.
B
Task
Find k objects
O  DB\O‘ which are
closer to O‘ than to Q
[1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal
Pruning of MBRs. SIGMOD Conference 2010: 39-50
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
12
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
Probability of O to be closer to O‘ than Q?
O
Q
B
O‘
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
13
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
Probability of O to be closer to O‘ than Q?
„O is closer to O‘ than Q with at least x% probability“
O
Q
O‘
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
14
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
Probability of O to be closer to O‘ than Q?
„O is closer to O‘ than Q with at most x% probability“
O
Q
O‘
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
15
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
 Exemplary statements
• O1 is closer to O’ with at least 20% and at most 50%
• O2 is closer to O’ with at least 60% and at most 80%
• Correctly deriving these bounds is not trivial (see paper)
How many objects O  DB are closer to O‘ than Q?
 Consider the following uncertain generating function
• x-term: probability of the object to be closer to O’ than Q
• z-term: probability of the object to be further from O’ than Q
• y-term: uncertainty
=> (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z)
 Expansion yields
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
16
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
probability
80 %
60 %
40 %
20 %
0
1
2
# objects O  DB that are closer to O‘ than Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
17
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
probability
80 %
60 %
40 %
20 %
0
1
2
# objects O  DB that are closer to O‘ than Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
18
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
probability
80 %
60 %
40 %
20 %
0
1
2
# objects O  DB that are closer to O‘ than Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
19
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
probability
80 %
60 %
40 %
20 %
0
1
2
# objects O  DB that are closer to O‘ than Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
20
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
probability
80 %
60 %
40 %
20 %
0
1
2
# objects O  DB that are closer to O‘ than Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
21
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y²
probability
80 %
60 %
40 %
20 %
0
1
2
# objects O  DB that are closer to O‘ than Q
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
22
Approximation
Spatial Filter
Probabilistic Filter
Verification
80 %
100 %
60 %
80 %
probability
probability
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
40 %
20 %
0
1
2
Exact # objects O  DB that
are closer to O‘ than Q
60 %
40 %
20 %
0
1
2
Maximum # objects O  DB that
are closer to O‘ than Q
• Example PRkNN queries
– PR1NN (Q, 50%)  O‘ is not part of the result
– PR2NN (Q, 40%)  O‘ is part of the result
– PR2NN (Q, 80%)  O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
23
Approximation
Spatial Filter
Probabilistic Filter
Verification
80 %
100 %
60 %
80 %
probability
probability
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
40 %
20 %
0
1
2
Exact # objects O  DB that
are closer to O‘ than Q
60 %
40 %
20 %
0
1
2
Maximum # objects O  DB that
are closer to O‘ than Q
• Example PRkNN queries
– PR1NN (Q, 50%)  O‘ is not part of the result
– PR2NN (Q, 40%)  O‘ is part of the result
– PR2NN (Q, 80%)  O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
24
Approximation
Spatial Filter
Probabilistic Filter
Verification
80 %
100 %
60 %
80 %
probability
probability
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
40 %
20 %
0
1
2
Exact # objects O  DB that
are closer to O‘ than Q
60 %
40 %
20 %
0
1
2
Maximum # objects O  DB that
are closer to O‘ than Q
• Example PRkNN queries
– PR1NN (Q, 50%)  O‘ is not part of the result
– PR2NN (Q, 40%)  O‘ is part of the result
– PR2NN (Q, 80%)  O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
25
Approximation
Spatial Filter
Probabilistic Filter
Verification
80 %
100 %
60 %
80 %
probability
probability
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
40 %
20 %
0
1
2
Exact # objects O  DB that
are closer to O‘ than Q
60 %
40 %
20 %
0
1
2
Maximum # objects O  DB that
are closer to O‘ than Q
• Example PRkNN queries
– PR1NN (Q, 50%)  O‘ is not part of the result
– PR2NN (Q, 40%)  O‘ is part of the result
– PR2NN (Q, 80%)  O‘ has to be further investigated
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
26
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Approximation
Spatial Filter
Probabilistic Filter
Verification
Options for Verification
 Consideration of all possible worlds (exponential)
 Adabting probabilistic nearest neighbour ranking [2] on instance
level of objects (polynomial)
 Monte-Carlo based (linear in the number of samples)
[2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502513 (2009)
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
27
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Evaluation
Conclusion
# candidates
Spatial Filter
20
18
16
14
12
10
8
6
4
2
0
Random
MinMax
LC
CLWZP
HP
Equal
extent
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
Dominant
28
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Evaluation
Conclusion
Probabilitsic Filter
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
29
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Evaluation
Conclusion
Comparison to other algorithms
DATABASE
SYSTEMS
GROUP
Background
Framework
Summary
Evaluation
Conclusion
• Framework for PRkNN query processing
• Deriving probabilistic pruning bounds for single objects
• Accumulate theses bounds using uncertain generating
functions
• Cost model for choosing the optimal value for tree depth
• Comparison to existing algorithms for PRNN processing
DATABASE
SYSTEMS
GROUP
Thanks!
Questions?
Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data
32
runtime (s ec )
DATABASE
SYSTEMS
GROUP
Dependency on k
45
40
35
30
25
20
15
10
5
0
v erific ation
probabilis tic pruning
s patial pruning
1
5
10
k
15
20
DATABASE
SYSTEMS
GROUP
Problem of dependency
O’
Q
O1, O2
Download