LUDWIGMAXIMILIANSUNIVERSITY MUNICH DEPARTMENT INSTITUTE FOR INFORMATICS DATABASE SYSTEMS GROUP Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data Thomas Bernecker, Tobias Emrich, Hans-Peter Kriegel, Matthias Renz, Stefan Zankl and Andreas Zuefle Ludwig-Maximilians-Universität München (LMU) Munich, Germany http://www.dbs.ifi.lmu.de {bernecker, emrich, kriegel, renz, zuefle} @dbs.ifi.lmu.de Outline DATABASE SYSTEMS GROUP • Background Uncertain Data Model Reverse k-nearest neighbour queries Reverse k-nearest neighbour queries on uncertain objects • Framework for Probabilistic RkNN Processing Approximation Spatial Filter Probabilistic Filter Verification • Evaluation + Summary Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 2 Background Framework Summary DATABASE SYSTEMS GROUP Datamodel RkNN Queries PRkNN Queries Objects are described by a multi-dimensional probability distribution Object Independence Assumption Queries are answered according to possible worlds semantic Object PDFs can be spatially bounded Continuous or discrete representation User ratings for „Life of Brian“ Action PDFX Humor Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 3 DATABASE SYSTEMS GROUP Background Framework Summary Datamodel RkNN Queries PRkNN Queries RkNN(q) = {o DB | q kNN(o)} o1 o2 What is it good for? o3 q o4 o5 o6 o7 Market segmentation Outlier detection Incremental algorithms … R1NN(q) = {o7} R2NN(q) = {o7, o5, o4} Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 4 DATABASE SYSTEMS GROUP Background Framework Summary Datamodel RkNN Queries PRkNN Queries „Is O‘ R1NN of Q?“ O2 O1 O‘ Q Note: The query object may be uncertain.as well! Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 5 DATABASE SYSTEMS GROUP Background Framework Summary Datamodel RkNN Queries PRkNN Queries „Is O‘ R1NN of Q?“ => In some worlds it is O2 O1 O‘ Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 6 DATABASE SYSTEMS GROUP Background Framework Summary Datamodel RkNN Queries PRkNN Queries „Is O‘ R1NN of Q?“ => In other worlds it is not O2 O1 O‘ Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 7 DATABASE SYSTEMS GROUP Background Framework Summary Datamodel RkNN Queries PRkNN Queries Definition of Probabilistic RkNN PRkNN(Q, τ) = O2 O1 {O DB | P(O RkNN(Q)) ≥ τ} {O DB | P(Q kNN(O)) ≥ τ} O‘ Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data P(Q 1NN(O‘)) = 21/24 e.g. O‘ PR1NN(Q, 0.5) 8 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification Framework for PRkNN query processing Approximation (Indexing) • Simplification of spatial-probabilistic keys Spatial Filter • Filter objects according to simple spatial keys Probabilistic Filter • Derive lower/upper bounds of qualification probability (by means of simple spatial-probabilistic keys) • Filter objects according to lower/upper probability bounds Verification • Computation of the exact probability (very expensive) • Monte-Carlo Sampling (many samples required) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 9 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification R*-Tree for indexing objects (global index) Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 10 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification AR*-Tree for indexing instances (local index) 0.3 0.15 1.0 0.15 0.15 0.25 0.15 0.1 0.2 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 0.1 0.45 11 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification Pruning based on rectangular approximations only [1]. For any O‘ intersecting this region, Q may possibly be closer than O. For any O‘ in this region, O is closer than Q. O Q For any O‘ in this region, O is not closer than Q. B Task Find k objects O DB\O‘ which are closer to O‘ than to Q [1] Tobias Emrich, Hans-Peter Kriegel, Peer Kröger, Matthias Renz, Andreas Züfle: Boosting Spatial Pruning: On Optimal Pruning of MBRs. SIGMOD Conference 2010: 39-50 Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 12 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? O Q B O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 13 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at least x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 14 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification Probability of O to be closer to O‘ than Q? „O is closer to O‘ than Q with at most x% probability“ O Q O‘ Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 15 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification Exemplary statements • O1 is closer to O’ with at least 20% and at most 50% • O2 is closer to O’ with at least 60% and at most 80% • Correctly deriving these bounds is not trivial (see paper) How many objects O DB are closer to O‘ than Q? Consider the following uncertain generating function • x-term: probability of the object to be closer to O’ than Q • z-term: probability of the object to be further from O’ than Q • y-term: uncertainty => (0.2x + 0.3y + 0.5z) * (0.6x + 0.2y + 0.2z) Expansion yields 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 16 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability 80 % 60 % 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 17 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability 80 % 60 % 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 18 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability 80 % 60 % 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 19 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability 80 % 60 % 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 20 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability 80 % 60 % 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 21 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification 0.12x² + 0.34xz + 0.1z² + 0.22xy + 0.16yz + 0.06y² probability 80 % 60 % 40 % 20 % 0 1 2 # objects O DB that are closer to O‘ than Q Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 22 Approximation Spatial Filter Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability probability DATABASE SYSTEMS GROUP Background Framework Summary 40 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 60 % 40 % 20 % 0 1 2 Maximum # objects O DB that are closer to O‘ than Q • Example PRkNN queries – PR1NN (Q, 50%) O‘ is not part of the result – PR2NN (Q, 40%) O‘ is part of the result – PR2NN (Q, 80%) O‘ has to be further investigated Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 23 Approximation Spatial Filter Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability probability DATABASE SYSTEMS GROUP Background Framework Summary 40 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 60 % 40 % 20 % 0 1 2 Maximum # objects O DB that are closer to O‘ than Q • Example PRkNN queries – PR1NN (Q, 50%) O‘ is not part of the result – PR2NN (Q, 40%) O‘ is part of the result – PR2NN (Q, 80%) O‘ has to be further investigated Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 24 Approximation Spatial Filter Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability probability DATABASE SYSTEMS GROUP Background Framework Summary 40 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 60 % 40 % 20 % 0 1 2 Maximum # objects O DB that are closer to O‘ than Q • Example PRkNN queries – PR1NN (Q, 50%) O‘ is not part of the result – PR2NN (Q, 40%) O‘ is part of the result – PR2NN (Q, 80%) O‘ has to be further investigated Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 25 Approximation Spatial Filter Probabilistic Filter Verification 80 % 100 % 60 % 80 % probability probability DATABASE SYSTEMS GROUP Background Framework Summary 40 % 20 % 0 1 2 Exact # objects O DB that are closer to O‘ than Q 60 % 40 % 20 % 0 1 2 Maximum # objects O DB that are closer to O‘ than Q • Example PRkNN queries – PR1NN (Q, 50%) O‘ is not part of the result – PR2NN (Q, 40%) O‘ is part of the result – PR2NN (Q, 80%) O‘ has to be further investigated Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 26 DATABASE SYSTEMS GROUP Background Framework Summary Approximation Spatial Filter Probabilistic Filter Verification Options for Verification Consideration of all possible worlds (exponential) Adabting probabilistic nearest neighbour ranking [2] on instance level of objects (polynomial) Monte-Carlo based (linear in the number of samples) [2] Jian Li, Barna Saha, Amol Deshpande: A Unified Approach to Ranking in Probabilistic Databases. PVLDB 2(1): 502513 (2009) Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 27 DATABASE SYSTEMS GROUP Background Framework Summary Evaluation Conclusion # candidates Spatial Filter 20 18 16 14 12 10 8 6 4 2 0 Random MinMax LC CLWZP HP Equal extent Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data Dominant 28 DATABASE SYSTEMS GROUP Background Framework Summary Evaluation Conclusion Probabilitsic Filter Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 29 DATABASE SYSTEMS GROUP Background Framework Summary Evaluation Conclusion Comparison to other algorithms DATABASE SYSTEMS GROUP Background Framework Summary Evaluation Conclusion • Framework for PRkNN query processing • Deriving probabilistic pruning bounds for single objects • Accumulate theses bounds using uncertain generating functions • Cost model for choosing the optimal value for tree depth • Comparison to existing algorithms for PRNN processing DATABASE SYSTEMS GROUP Thanks! Questions? Efficient Probabilistic Reverse Nearest Neighbor Query Processing on Uncertain Data 32 runtime (s ec ) DATABASE SYSTEMS GROUP Dependency on k 45 40 35 30 25 20 15 10 5 0 v erific ation probabilis tic pruning s patial pruning 1 5 10 k 15 20 DATABASE SYSTEMS GROUP Problem of dependency O’ Q O1, O2