Efficient Threshold Monitoring for Distributed Probabilistic Data Mingwang Tang

advertisement
Efficient Threshold Monitoring for Distributed
Probabilistic Data
Mingwang Tang, Feifei Li, Jeff M. Phillips, Jeffrey Jestes
School of Computing, University of Utah
Outline
1
Introduction and Motivation
2
Exact Methods
3
Approximate Methods
4
Experiments
5
Conclusion
2 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
P3
i=1 xi
H
c1
> 240
c3
c2
t1
t2
70
70
80
90
75
90
tT
70
80
70
3 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
225 > 240?
H
c1
c3
c2
t1
t2
70
70
80
90
75
90
tT
70
80
70
4 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
250 > 240?
H
c1
c3
c2
t1
t2
70
70
80
90
75
90
tT
70
80
70
5 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
220 > 240?
H
c1
c3
c2
t1
t2
70
70
80
90
75
90
tT
70
80
70
6 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
220 > 240?
H
c1
c3
c2
t1
t2
70
70
80
90
75
90
tT
70
80
70
Extensively studied: e.g., S. Jeyashanker
al. propose an adaptive
Pet
g
technique dealing with DTM problem ( i=1 xi ≤ T ) for
deterministic data.
[ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds,
ICDE 2008
7 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
P3
i=1 xi
H
c1 B
1
c2
> 240
B2
c3 B
3
t1
t2
70
70
80
90
75
90
tT
70
80
70
Extensively studied: e.g., S. Jeyashanker
al. propose an adaptive
Pet
g
technique dealing with DTM problem ( i=1 xi ≤ T ) for
deterministic data.
[ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds,
ICDE 2008
8 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
H
c1 80
P3
i=1 xi
> 240
c2 80
c3 80
t1
t2
70
70
80
90
75
90
tT
70
80
70
Extensively studied: e.g., S. Jeyashanker
al. propose an adaptive
Pet
g
technique dealing with DTM problem ( i=1 xi ≤ T ) for
deterministic data.
[ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds,
ICDE 2008
9 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
H
c1 80
t1
t2
x1
70
70
tT
70
P3
i=1 xi
c3 80
c2 80
?
x2
80
90
80
> 240
?
x3
75
90
?
70
Extensively studied: e.g., S. Jeyashanker
al. propose an adaptive
Pet
g
technique dealing with DTM problem ( i=1 xi ≤ T ) for
deterministic data.
[ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds,
ICDE 2008
10 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
H
c1 80
P3
i=1 xi
c3 80
c2 80
?
> 240
?
?
t1
t2
70
70
80
90
75
90
tT
70
80
70
Extensively studied: e.g., S. Jeyashanker
al. propose an adaptive
Pet
g
technique dealing with DTM problem ( i=1 xi ≤ T ) for
deterministic data.
[ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds,
ICDE 2008
11 / 74
Introduction
Distributed threshold monitoring(DTM) problem:
H
c1 75
P3
i=1 xi
c3 80
c2 85
?
> 240
?
?
t1
t2
70
70
80
90
75
90
tT
70
80
70
Extensively studied: e.g., S. Jeyashanker
al. propose an adaptive
Pet
g
technique dealing with DTM problem ( i=1 xi ≤ T ) for
deterministic data.
[ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds,
ICDE 2008
12 / 74
Introduction
New challenge in the DTM problem: uncertainty naturally exist in
distributed data
data integration produces fuzzy matches
noisy sensor readings
13 / 74
Introduction
New challenge in the DTM problem: uncertainty naturally exist in
distributed data
data integration produces fuzzy matches
noisy sensor readings
The Shipboard Automated Meteorological and Oceanographic
System(SAMOS)
(a)
Ships
Towers
Data
(b)
satellite,
radio frequency
Applications
Applications
14 / 74
Introduction
Attribute-level uncertain model (with a single attribute score)
tuples
d1
attribute score
X1 = {(v1,1 , p1,1 ), (v1,2 , p1,2 )...(v1,b1 , p1,b1 )}
d2
X2 = {(v2,1 , p2,1 ), (v2,2 , p2,2 )...(v2,b2 , p2,b2 )}
.
...
dt
Xt = {(vt,1 , pt,1 ), (vt,2 , pt,2 )...(vt,bt , pt,bt )}
15 / 74
Introduction
Distributed probabilistic threshold monitoring (DPTM):
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
cg
c2
t1 X1,1
t2 X1,2
X2,1
X2,2
Xg,1
Xg,2
tT X1,T
X2,T
Xg,T
16 / 74
Introduction
Distributed probabilistic threshold monitoring (DPTM):
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
cg
c2
t1 X1,1
t2 X1,2
X2,1
X2,2
Xg,1
Xg,2
tT X1,T
X2,T
Xg,T
17 / 74
Introduction
Distributed probabilistic threshold monitoring (DPTM):
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
cg
c2
t1 X1,1
t2 X1,2
X2,1
X2,2
Xg,1
Xg,2
tT X1,T
X2,T
Xg,T
18 / 74
Introduction
Distributed probabilistic threshold monitoring (DPTM):
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
cg
c2
t1 X1,1
t2 X1,2
X2,1
X2,2
Xg,1
Xg,2
tT X1,T
X2,T
Xg,T
Naive Method:
ci sends Xi to H at each time instance t;
H computes Pr[Y > γ] based on Xi ’s
expensive in terms of both communication (O(gT )) and
computation (O(ng T )).
19 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive
20 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive
Incorporates pruning techniques.
21 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive
Incorporates pruning techniques.
Pr[Y > γ] < upperbound < δ ?
Pr[Y > γ] > lowerbound > δ ?
H
c1
cg
c2
deterministic data derived based Xi’s
t1 X1,1
t2 X1,2
X2,1
X2,2
Xg,1
Xg,2
tT X1,T
X2,T
Xg,T
22 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive
Incorporates pruning techniques.
Combine the adaptive threshold algorithm for deterministic data
when it’s applicable.
Pr[Y > γ] < upperbound < δ ?
Pr[Y > γ] > lowerbound > δ ?
H
c1
cg
c2
deterministic data derived based Xi’s
t1 X1,1
t2 X1,2
X2,1
X2,2
Xg,1
Xg,2
tT X1,T
X2,T
Xg,T
23 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive
Incorporates pruning techniques.
Combine the adaptive threshold algorithm for deterministic data
when it’s applicable.
Approximate Methods:
Replace the exact computation of Pr[Y > γ] using sampleing
method (but with the same monitoring instance).
24 / 74
Outline
1
Introduction and Motivation
2
Exact Methods
3
Approximate Methods
4
Experiments
5
Conclusion
25 / 74
Baseline method (Madaptive)
Markov’s inequality: Pr[Y > γ] ≤
E(Y )
γ
26 / 74
Baseline method (Madaptive)
Markov’s inequality: Pr[Y > γ] ≤
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
t
c2
X1
E(Y )
γ
cg
X2
Xg
27 / 74
Baseline method (Madaptive)
Markov’s inequality: Pr[Y > γ] ≤
t
c2
X1
<δ ?
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
E(Y )
γ
cg
X2
Xg
28 / 74
Baseline method (Madaptive)
Markov’s inequality: Pr[Y > γ] ≤
t
c2
X1
< δ ? → ture ( no alarm)
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
E(Y )
γ
cg
X2
Xg
29 / 74
Baseline method (Madaptive)
Markov’s inequality: Pr[Y > γ] ≤
t
c2
X1
< δ ? → ture ( no alarm)
E(Y ) < γδ ?
H
c1
E(Y )
γ
cg
X2
Xg
30 / 74
Baseline method (Madaptive)
Markov’s inequality: Pr[Y > γ] ≤
c1
c2
E(X1)
E(X2)
X1
< δ ? → ture ( no alarm)
E(Y ) = Pgi=1 E(Xi) < γδ ?
H
t
E(Y )
γ
cg
X2
E(Xg )
Xg
31 / 74
Baseline method (Madaptive)
Markov’s inequality: Pr[Y > γ] ≤
E(Y )
γ
< δ ? → ture ( no alarm)
E(Y ) = Pgi=1 E(Xi) < γδ ?
H
constant
t
c1
c2
E(X1)
E(X2)
X1
cg
X2
E(Xg )
Xg
Leverage on the adaptive thresholds algorithm for deterministic data
32 / 74
Improved method
Combine the Chebyshev bound and Chernoff bound pruning.
33 / 74
Improved method
Combine the Chebyshev bound and Chernoff bound pruning.
Chebyshev gives one-sided bound using E(Xi ) and Var(Xi )
34 / 74
Improved method
Combine the Chebyshev bound and Chernoff bound pruning.
Chebyshev gives one-sided bound using E(Xi ) and Var(Xi )
Pr[Y = Pgi=1 Xi > γ] > δ ?
H
c1
t
c2
X1
cg
X2
Xg
35 / 74
Improved method
Combine the Chebyshev bound and Chernoff bound pruning.
Chebyshev gives one-sided bound using E(Xi ) and Var(Xi )
Pr[Y > γ] ≤
H
Var (Y )
Var (Y )+(γ−E (Y ))2
E (Y )
c1
E (X1)
Var (X1)
t
X1
c2
cg
E (X2)
Var (X2)
E (Xg )
Var (Xg )
X2
≤δ?
γ
Xg
36 / 74
Improved method
Combine the Chebyshev bound and Chernoff bound pruning.
Chebyshev gives one-sided bound using E(Xi ) and Var(Xi )
Var (Y )
≥δ?
Pr[Y > γ] > 1 − Var (Y )+(E
(Y )−γ)2
H
γ E (Y )
c1
E (X1)
Var (X1)
t
X1
c2
cg
E (X2)
Var (X2)
E (Xg )
Var (Xg )
X2
Xg
37 / 74
Improved method
Combine the Chebyshev bound and Chernoff bound pruning.
Chebyshev gives one-sided bound using E(Xi ) and Var(Xi )
Chernoff bound using the moment generating function
βY
M(β) = E(e
), Mi (β) = E(e βXi ) for any β ∈ R
Q
M(β) = gi=1 Mi (β)
38 / 74
Improved method
Combine the Chebyshev bound and Chernoff bound pruning.
Chebyshev gives one-sided bound using E(Xi ) and Var(Xi )
Chernoff bound using the moment generating function
βY
M(β) = E(e
), Mi (β) = E(e βXi ) for any β ∈ R
Q
M(β) = gi=1 Mi (β)
β1 > 0, Chernoff gives an upper bound
β2 < 0, Chernoff gives a lower bound
Pr[Y > γ] ≤ e −β1γM(β1) ≤ δ ?
Pr[Y > γ] > 1−e −β2γM(β2) ≥ δ ?
H
c1
t
c2
cg
M1(β1)
M1(β2)
M2(β1)
M2(β2)
X1
X2
Mg (β1)
Mg (β2)
Xg
39 / 74
Improved Adaptive Method (Iadaptive)
Pg
ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ).
Pg
ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ).
i=1
i=1
40 / 74
Improved Adaptive Method (Iadaptive)
Pg
ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ).
Pg
ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ).
i=1
i=1
Practical considerations
Use the adaptive thresholds algorithm
Get a tight upper bound (lower bound)
41 / 74
Improved Adaptive Method (Iadaptive)
Pg
ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ).
Pg
ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ).
i=1
i=1
Practical considerations
Use the adaptive thresholds algorithm
Get a tight upper bound (lower bound)
Approaches
Fix the values of β1 and β2 in each period of k time instance.
Reset the optimal values of β1 and β2 periodically.
42 / 74
Improved Adaptive Method (Iadaptive)
Pg
ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ).
Pg
ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ).
i=1
i=1
Practical considerations
Use the adaptive thresholds algorithm
Get a tight upper bound (lower bound)
Running J1 and J2 together is communication expensive.
Approaches
Fix the values of β1 and β2 in each period of k time instance.
Reset the optimal values of β1 and β2 periodically.
43 / 74
Improved Adaptive Method (Iadaptive)
Pg
ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ).
Pg
ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ).
i=1
i=1
Practical considerations
Use the adaptive thresholds algorithm
Get a tight upper bound (lower bound)
Running J1 and J2 together is communication expensive.
Approaches
Fix the values of β1 and β2 in each period of k time instance.
Reset the optimal values of β1 and β2 periodically.
Periodically decides which monitoring instance to run
44 / 74
Outline
1
Introduction and Motivation
2
Exact Methods
3
Approximate Methods
4
Experiments
5
Conclusion
45 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive in terms of both
communication (O(gT )) and computation (O(ng T )).
Incorporates pruning techniques.
Combine the adaptive threshold algorithm for deterministic data
when it’s applicable.
Approximate Methods:
46 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive in terms of both
communication (O(gT )) and computation (O(ng T )).
Incorporates pruning techniques.
Combine the adaptive threshold algorithm for deterministic data
when it’s applicable.
Approximate Methods:
We use -Sampling methods to estimate the condition when
monitoring instances fail to make a decision
47 / 74
Our Approach
Exact Methods:
Computing Pr[Y > γ] exactly is expensive in terms of both
communication (O(gT )) and computation (O(ng T )).
Incorporates pruning techniques.
Combine the adaptive threshold algorithm for deterministic data
when it’s applicable.
Approximate Methods:
We use -Sampling methods to estimate the condition when
monitoring instances fail to make a decision
Replace the exact computation using sampling based method (but
with the same monitoring instance): we get MadapativeS,
ImprovedS, IadaptiveS
48 / 74
Random Distributed ε-Sample (rdεs)
H asks for a random sample xi from each client according to the
distribution of Xi
Pg
Pr[Ỹ = i=1 xi > γ] is an unbiased estimate of Pr[Y > γ]
Repeating this sampling κ = O( ε12 ln φ1 ) times.
Pr[| Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ε] ≥ 1 − φ using O( εg2 ln φ1 ) bytes.
49 / 74
Deterministic Distributed ε-Sample (ddεs)
Using κ = O( gε ) evenly spaced sample points from each Xi .
50 / 74
Deterministic Distributed ε-Sample (ddεs)
Using κ = O( gε ) evenly spaced sample points from each Xi .
R xj+1
Pr[Xi = x]dx = gε
x=xj
pdf
g
x1 x2 x3 . . .
xκ X i
51 / 74
Deterministic Distributed ε-Sample (ddεs)
Using κ = O( gε ) evenly spaced sample points from each Xi .
R xj+1
Pr[Xi = x]dx = gε
x=xj
The evaluation space Pr[Ỹ > γ] is in O(κg )
c1 : S1{x1,1, x1,2, . . . , x1,κ}
c2 : S2{x2,1, x2,2, . . . , x2,κ}
cg : Sg {xg,1, xg,2, . . . , xg,κ}
52 / 74
Deterministic Distributed ε-Sample (ddεs)
Using κ = O( gε ) evenly spaced sample points from each Xi .
R xj+1
Pr[Xi = x]dx = gε
x=xj
The evaluation space Pr[Ỹ > γ] is in O(κg )
c1 : S1{x1,1, x1,2, . . . , x1,κ}
c2 : S2{x2,1, x2,2, . . . , x2,κ}
cg : Sg {xg,1, xg,2, . . . , xg,κ}
53 / 74
Deterministic Distributed ε-Sample (ddεs)
Using κ = O( gε ) evenly spaced sample points from each Xi .
R xj+1
Pr[Xi = x]dx = gε
x=xj
The evaluation space Pr[Ỹ > γ] is in O(κg )
In practice, O(κm )(e.g., m = 2) random selected evaluations.
c1 : S1{x1,1, x1,2, . . . , x1,κ}
c2 : S2{x2,1, x2,2, . . . , x2,κ}
cg : Sg {xg,1, xg,2, . . . , xg,κ}
54 / 74
Deterministic Distributed ε-Sample (ddεs)
Using κ = O( gε ) evenly spaced sample points from each Xi .
R xj+1
Pr[Xi = x]dx = gε
x=xj
The evaluation space Pr[Ỹ > γ] is in O(κg )
In practice, O(κm )(e.g., m = 2) random selected evaluations.
c1 : S1{x1,1, x1,2, . . . , x1,κ}
c2 : S2{x2,1, x2,2, . . . , x2,κ}
cg : Sg {xg,1, xg,2, . . . , xg,κ}
ddεs gives | Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ε with probability 1 in
O(g 2 /ε) bytes.
55 / 74
A randomized improvement of ddεs (αddεs)
R xi,j+1
x=xi,j
Pr [Xi = x]dx = α
pdf
α
xj xj+1
Xi
56 / 74
A randomized improvement of ddεs (αddεs)
R xi,j+1
x=xi,j
Pr [Xi = x]dx = α
Computes xα where the integral of pdf first reaches α
pdf
α
xα xj xj+1 . . . xκ Xi
57 / 74
A randomized improvement of ddεs (αddεs)
R xi,j+1
x=xi,j
Pr [Xi = x]dx = α
Computes xα where the integral of pdf first reaches α
Chooses the smallest sample point at random (within xα ).
pdf
α
x1xα xj xj+1 . . . xκ Xi
58 / 74
A randomized improvement of ddεs (αddεs)
R xi,j+1
x=xi,j
Pr [Xi = x]dx = α
Computes xα where the integral of pdf first reaches α
Chooses the smallest sample point at random (within xα ).
pdf
α
x1xαx2 xj xj+1 . . . xκ Xi
59 / 74
A randomized improvement of ddεs (αddεs)
R xi,j+1
x=xi,j
Pr [Xi = x]dx = α
Computes xα where the integral of pdf first reaches α
Chooses the smallest sample point at random (within xα ).
pdf
α
x1xαx2 xj xj+1 . . . xκ Xi
Pr[| Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ε] > 1 − φ in O( gε
q
2g ln φ2 ) bytes.
60 / 74
Outline
1
Introduction and Motivation
2
Exact Methods
3
Approximate Methods
4
Experiments
5
Conclusion
61 / 74
Experiment setup
A Linux machine with an Intel Xeon CPU at 2.13GHz and 6GB of
memory. GMP library are used in calculating Mi (β).
Server-to-client using broadcast and client-to-server using unicast.
Data sets:
Real datasets (11.8 million records in the Wecoma research vessels)
from the SAMOS project.
Each record contain four measurements: wind direction (WD), wind
speed (WS), sound speed (SS), and temperature (TEM), which
leads to four single probabilistic attribute datasets.
Group the records every τ consecutive seconds and represent it using
a pdf.
62 / 74
Experiment setup
The default experimental parameters:
Symbol
Definition
grouping interval
τ
T
number of time instances
g
number of clients
probability threshold
δ
score threshold
γ
κ
sample size per client
Default Value
300
3932
10
0.7
30% alarms (230.g for WD)
30
63 / 74
Response time:
γ: score threshold
−2
response time (secs)
10
−3
10
−4
10
Madaptive
1500
2000
Improved
γ
2500
Iadaptive
3000
64 / 74
Communication
γ: score threshold
Madaptive
Improved
Iadaptive
3000
number of bytes
number of messages
3500
16
12
8
4
2500
2000
1500
1000
500
Madaptive
0
1500
2000
Improved
γ
2500
Iadaptive
3000
0
1500
2000
γ
2500
3000
65 / 74
Precision
κ: number of samples
precision
1
0.98
0.96
0.94
RDεS
0.92
0
20
κ
40
DDεS
αDDεS
60
66 / 74
Recall
κ: number of samples
1
recall
0.99
0.98
0.97
0.96
RDεS
0.95
0
20
κ
40
DDεS
αDDεS
60
67 / 74
number of messages
Performance of all methods
20
Madaptive, MadaptiveS
Improved, ImprovedS
Iadaptive, IadaptiveS
15
10
5
0
WD
WS
SS
TEM
68 / 74
Performance of all methods
Madaptive, MadaptiveS
Improved, ImprovedS
Iadaptive, IadaptiveS
number of bytes
4
10
3
10
2
10
1
10
WD
WS
SS
TEM
69 / 74
Performance of all methods
2
response time (secs)
10
Madaptive, MadaptiveS
Improved, ImprovedS
Iadaptive, IadaptiveS
0
10
−2
10
−4
10
−6
10
WD
WS
SS
TEM
70 / 74
Performance of all methods
precision and recall
MadaptiveS
1
ImprovedS
IadaptiveS
precision
recall
WD WS SS
TEM WD WS SS
0.999
0.998
0.997
0.996
TEM
71 / 74
Outline
1
Introduction and Motivation
2
Exact Methods
3
Approximate Methods
4
Experiments
5
Conclusion
72 / 74
Conclusion
Future work:
Other aggregation constraints (e.g.,max) beside sum constraint.
Extend our study to the hierarchical model that is often used in a
sensor network.
Handle the case when data from different sites are correlated.
73 / 74
The end
Thank You
Q and A
74 / 74
Download