Probabilistic Skylines on Uncertain Data

advertisement
Probabilistic Skylines on Uncertain Data
VLDB2007
Outline

Motivation

Traditional and Probabilistic Skyline

Problem Definition

Computation Problem and Algorithms
(Top down and Bottom up)

Experimental Results
Motivation
Skyline Analysis on NBA players performance
(#Assists)
Uncertainty
Each Player has multiple records
(#Rebounds)
Motivation
Skyline Analysis on NBA players with multiple records
Motivation
Skyline Analysis on NBA players with multiple records


Easy Approach – Averaging
Arbor (x) is better in assist than Eddy, but Eddy (point b)
dominates all games of Arbor (x). “not so fair to say Eddy is a worse
in assist than Arbor”

Bob (point a) bias the aggregate value
Motivation
Motivating result using Probabilistic Skyline

Olajuwon and Kobe Bryant are missing from Aggregate
Skyline but present in Probabilistic Skyline

Their performance vary a lot over games
Details in experiment analysis

Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data

Uncertain Data
Dominance


Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data

Uncertain Data
Dominance


Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data

Uncertain Data
Dominance


Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Traditional and Probabilistic Skyline
Semantics difference of Dominance between objects
Certain Data

Uncertain Data
Dominance


Certain model: an object dominate another object with Probability 1.
Uncertain model: an object dominate another object with Probability P.
Probabilistic Skyline
Calculation of Probability Object A dominating Object C
Pr [A≺C] = 1/4*1/3 (4+..)
Probabilistic Skyline
Calculation of Probability Object A dominates Object C
Pr [A≺C] =1/4*1/3 (4+4+..)
Probabilistic Skyline
Calculation of Probability Object A dominates Object C
Pr [A≺C] = 1/4*1/3 (4+4+0)
Probabilistic Skyline
Probabilistic Skyline: From Dominance to Skyline

Intuition of finding Skyline, probability of an object not to be
dominated by other objects
0
(1/3)(1/3)
Computation Problem of p-skyline


First, each uncertain object may have many
instances. We have to process a large number of
instances.
Second, we have to consider many probabilities in
deriving the probabilistic skylines.
Algorithms (Top down and Bottom up)

Data


Multiple records of objects in the hope of
approximating the probability density function
Techniques:



Bounding
Pruning
Refining
Bottom-up Algorithm
Technique – Minimum Bounding Box (MBB)
Bottom-up Algorithm - Pruning Techniques (1/3)
using Umin, Umax to decide membership of p-skyline

For an uncertain object U and probability threshold p, if
Pr(Umin) < p, then U is not in the p-skyline. If Pr(Umax) ≥ p,
then U is in the p-skyline
Bottom-up Algorithm - Pruning Techniques (2/3)
using Umax to prune instances of objects

Let U and V be uncertain objects such that
U V . If u is an instance of U and Vmax ≺ u, then Pr(u) = 0.
C2 is dominated by Umax,
dominated by all instances in object D
Bottom-up Algorithm - Pruning Techniques (3/3)
using subset of instance to prune objects
Estimate Pr(Vmin) upper bound by Pr(Umax’)
Pr(Vmin) = (1 – |U’|/|U|)(..)(..)
If |U’| is large, more instances dominate Vmin, then Pr(Vmin) is low
Top-down Algorithm
Idea of bounding

The skyline probability of each subset of uncertain object can
be bounded using its MBB.

The skyline probability of the uncertain object can be bounded
as the weighted mean of the bounds of subsets.
Top-down Algorithm
supporting data structure : partition tree
D
B
A
C
D
B
A
C
B
A
D
C
Top-down Algorithm
partition tree for bounding
D
B
A
C
A’
B
A
D
A’

D’
C’
B’
D
C
C’
B’
C
B
A
D’
B’
A’
D’
C’
Compare the partition of U with other partition tree as follows: traverse
the partition tree of other uncertain object V, in the depth-first manner.
Top-down Algorithm
all possible situations during partition trees traversal
B D
A C
A
B
C
D
B
A
B’ D’
A’ C’
A’
D
C
B’
D’
C’
B’
A’
D’
C’
Top-down Algorithm
Pruning partition tree
B D
A C
A
B
C
D
B
A
D
C
Experiment
Other Experiment results
Scalability with respect to probability threshold
Conclusion

Multiple records

MBB
Download