Efficient Threshold Monitoring for Distributed Probabilistic Data

advertisement
Efficient Threshold Monitoring for Distributed Probabilistic Data
Mingwang Tang, Feifei Li, Jeff M. Phillips, Jeffrey Jestes
Distributed Probabilistic Threshold Monitoring (DPTM)
c2
Towers
Data
satellite,
radio frequency
• When derived deterministic monitoring instances fail to
make a decision, still expensive to compute Y even with all
Applications
A Randomized Improvement of DDǫS (αDDǫS)
R xi,j+1
• x=xi,j P r[Xi = x]dx = α
Xi ’s → use sampling methods!
• S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive
Baseline Method Based on Markov bound (Madaptive)
attribute score
• H can check if
X1 = {(v1,1 , p1,1 ), (v1,2 , p1,2 )...(v1,b1 , p1,b1 )}
X2 = {(v2,1 , p2,1 ), (v2,2 , p2,2 )...(v2,b2 , p2,b2 )}
.
...
c1
E(X1)
Exact Methods
• Both communication and computation expensive. Naively, it could be in O(ng ) (n is the maximal |Xi |)
• When Xi ’s are represented by continuous pdfs,
we leverage on the characteristic function of Xi
to compute the pdf of Y .
Improved Adaptive Method (Iadaptive)
•
•
c2
t
X1
Datasets
• Choose the smallest sample point at random
(within xα ).
• Real datasets (11.8 million records) from the
SAMOS project.
• Pr[| Pr[
Ỹ
>
γ]
−
Pr[Y
>
γ]|
≤
ǫ]
>
1
−
φ
in
q
O( gǫ 2g ln φ2 ) bytes.
• Each record contain four measurements: WD,
WS, SS, TEM, which leads to four single probabilistic attribute datasets.
−2
Pg
i=1 E(Xi )
< γδ
E(Xg )
−2
10
−3
10
Madaptive
−4
10
1500
Improved
2000
γ
2500
Iadaptive
3000
1
1
−3
10
0.98
0.99
0.96
RDεS
Madaptive
−4
10
0.5
0.6
Improved
0.7
0.8
δ
Xg
Iadaptive
0.92
0
20
κ
0.9
I One-sided Chebyshev’s inequality:
Var(Y )
Pr[Y > γ] < Var(Y )+(γ−E(Y ))2 .(γ > E(Y )
Pr[Y > γ] > 1 −
Var(Y )
Var(Y )+(E(Y )−γ)2 .(E(Y
) > γ)
16
16
12
8
4
Improved
Madaptive
0
II The Chernoff bound using the momentgenerating function.
Q
g
βY
M (β) = E(e ) = i=1 Mi (β) for any β ∈ R.
for any β1 > 0 and β2 < 0 :
i=1 ln Mi (β1) ≤ ln δ + β1 γ
Pr[Y > γ] ≤ e −β1γM(β1) ≤ δ
Pr[Y > γ] > 1 − e −β2γM(β2) ≥ δ
H
Pg
i=1 ln Mi (β2) ≤ ln(1 − δ) + β2 γ
c1
• A counter e of alarm instances is maintained in
each period of k time instances.
t
c2
cg
M1(β1)
M1(β2)
M2(β1)
M2(β2)
X1
X2
Mg (β1)
Mg (β2)
Xg
1500
2000
γ
2500
Iadaptive
DDεS
40
αDDεS
12
8
60
RDεS
0.95
0
Madaptive
Improved
Iadaptive
3000
3000
2500
2500
2000
1500
1000
4
500
3000
Madaptive
0
0.5
0.96
20
κ
DDεS
40
αDDεS
60
Precision and recall
3500
Improved Method
0.98
0.97
0.94
Response time
Pg
• Periodically decide which monitoring instance
to run and set the optimal value of β1 and β2
X2
Default Value
300
10
0.7
30% alarms (230g for WD)
30
10
cg
E(X2)
Definition
grouping interval
number of clients
probability threshold
score threshold
sample size per client
Experiments
< δ.
E(Y ) =
Xt = {(vt,1 , pt,1 ), (vt,2 , pt,2 )...(vt,bt , pt,bt )}
• Compute P r[Y ≥ γ] exactly: each client ci simply sends Xi to H and H sum Xi and check
against the (γ, δ) threshold.
E(Y )
γ
number of messages
d2
dt
• Markov’s inequality: Pr[Y > γ] ≤
Improved
0.6
0.7
Iadaptive
0.8
δ
0
Madaptive
Improved
Iadaptive
2000
1500
1000
500
1500
2000
0.9
γ
2500
3000
0
Messages
0.5
0.6
0.7
δ
0.8
0.9
Bytes
2
10
response time (secs)
d1
E(Y )
γ .
response time (secs)
• Attribute uncertain model and flat model
tuples
Symbol
τ
g
δ
γ
κ
• DDǫS gives | Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ǫ with
probability 1 in O(g 2 /ǫ) bytes.
• The lowerbound (upperbound) is a function of some deterministic values derived based on Xi ’s → use DTM methods
Applications
Thresholds, [ICDE2008]
xκ X i
give two deterministic monitoring instances.
(b)
Ships
x1 x2 x3 . . .
Xg,T
• Pr[Y > γ] < upperbound < δ
Pr[Y > γ] > lowerbound > δ
70
• The Shipboard Automated Meteorological and
Oceanographic System(SAMOS)
(a)
X2,T
Default Experimental Parameters
number of bytes
80
• Pr[| Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ǫ] ≥ 1 − φ using
g
1
O( ǫ2 ln φ ) bytes.
Xg,1
Xg,2
Madaptive, MadaptiveS
Improved, ImprovedS
Iadaptive, IadaptiveS
0
10
−2
10
−4
10
Madaptive, MadaptiveS
Improved, ImprovedS
Iadaptive, IadaptiveS
4
10
3
10
2
10
−6
10
WD
WS
SS
TEM
20
10
WD
WS
SS
MadaptiveS
15
10
5
0
1
Madaptive, MadaptiveS
Improved, ImprovedS
Iadaptive, IadaptiveS
precision and recall
70
• Repeating this sampling κ = O( ǫ12 ln φ1 ) times.
precision
75
90
X2,1
X2,2
• H asks for a random sample xi from each client
w.r.t. the distribution of Xi
number of bytes
80
90
t1 X1,1
t2 X1,2
ǫ
g
ǫ
g
c3
tT X1,T
tT
cg
P r[Xi = x]dx =
x=xj
number of messages
c2
70
70
t1
t2
pdf
> 240
c1
c1
> γ] > δ
response time (secs)
i=1 xi
H
i=1 Xi
R xj+1
number of bytes
P3
Pr[Y =
H
Pg
recall
• Distributed Threshold Monitoring (DTM):
Random Distributed ǫ-Sample (RDǫS)
Deterministic Distributed ǫ-Sample (DDǫS)
number of messages
Introduction
WD
WS
TEM
Performance of all methods
SS
1
ImprovedS
IadaptiveS
precision
recall
WD WS SS
TEM WD WS SS
0.999
0.998
0.997
TEM
0.996
TEM
Download