Probabilistic Skylines on Uncertain Data

Probabilistic Skylines on Uncertain Data VLDB2007 Outline  Motivation  Traditional and Probabilistic Skyline  Problem Definition  Computation Problem and Algorithms (Top down and Bottom up)  Experimental Results Motivation Skyline Analysis on NBA players performance (#Assists) Uncertainty Each Player has multiple records (#Rebounds) Motivation Skyline Analysis on NBA players with multiple records Motivation Skyline Analysis on NBA players with multiple records   Easy Approach – Averaging Arbor (x) is better in assist than Eddy, but Eddy (point b) dominates all games of Arbor (x). “not so fair to say Eddy is a worse in assist than Arbor”  Bob (point a) bias the aggregate value Motivation Motivating result using Probabilistic Skyline  Olajuwon and Kobe Bryant are missing from Aggregate Skyline but present in Probabilistic Skyline  Their performance vary a lot over games Details in experiment analysis  Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data  Uncertain Data Dominance   Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data  Uncertain Data Dominance   Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data  Uncertain Data Dominance   Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Traditional and Probabilistic Skyline Semantics difference of Dominance between objects Certain Data  Uncertain Data Dominance   Certain model: an object dominate another object with Probability 1. Uncertain model: an object dominate another object with Probability P. Probabilistic Skyline Calculation of Probability Object A dominating Object C Pr [A≺C] = 1/4*1/3 (4+..) Probabilistic Skyline Calculation of Probability Object A dominates Object C Pr [A≺C] =1/4*1/3 (4+4+..) Probabilistic Skyline Calculation of Probability Object A dominates Object C Pr [A≺C] = 1/4*1/3 (4+4+0) Probabilistic Skyline Probabilistic Skyline: From Dominance to Skyline  Intuition of finding Skyline, probability of an object not to be dominated by other objects 0 (1/3)(1/3) Computation Problem of p-skyline   First, each uncertain object may have many instances. We have to process a large number of instances. Second, we have to consider many probabilities in deriving the probabilistic skylines. Algorithms (Top down and Bottom up)  Data   Multiple records of objects in the hope of approximating the probability density function Techniques:    Bounding Pruning Refining Bottom-up Algorithm Technique – Minimum Bounding Box (MBB) Bottom-up Algorithm - Pruning Techniques (1/3) using Umin, Umax to decide membership of p-skyline  For an uncertain object U and probability threshold p, if Pr(Umin) < p, then U is not in the p-skyline. If Pr(Umax) ≥ p, then U is in the p-skyline Bottom-up Algorithm - Pruning Techniques (2/3) using Umax to prune instances of objects  Let U and V be uncertain objects such that U V . If u is an instance of U and Vmax ≺ u, then Pr(u) = 0. C2 is dominated by Umax, dominated by all instances in object D Bottom-up Algorithm - Pruning Techniques (3/3) using subset of instance to prune objects Estimate Pr(Vmin) upper bound by Pr(Umax’) Pr(Vmin) = (1 – |U’|/|U|)(..)(..) If |U’| is large, more instances dominate Vmin, then Pr(Vmin) is low Top-down Algorithm Idea of bounding  The skyline probability of each subset of uncertain object can be bounded using its MBB.  The skyline probability of the uncertain object can be bounded as the weighted mean of the bounds of subsets. Top-down Algorithm supporting data structure : partition tree D B A C D B A C B A D C Top-down Algorithm partition tree for bounding D B A C A’ B A D A’  D’ C’ B’ D C C’ B’ C B A D’ B’ A’ D’ C’ Compare the partition of U with other partition tree as follows: traverse the partition tree of other uncertain object V, in the depth-first manner. Top-down Algorithm all possible situations during partition trees traversal B D A C A B C D B A B’ D’ A’ C’ A’ D C B’ D’ C’ B’ A’ D’ C’ Top-down Algorithm Pruning partition tree B D A C A B C D B A D C Experiment Other Experiment results Scalability with respect to probability threshold Conclusion  Multiple records  MBB

Probabilistic Skylines on Uncertain Data

Related documents

Products

Support

Probabilistic Skylines on Uncertain Data

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib