Efficient Threshold Monitoring for Distributed Probabilistic Data Mingwang Tang, Feifei Li, Jeff M. Phillips, Jeffrey Jestes Distributed Probabilistic Threshold Monitoring (DPTM) c2 Towers Data satellite, radio frequency • When derived deterministic monitoring instances fail to make a decision, still expensive to compute Y even with all Applications A Randomized Improvement of DDǫS (αDDǫS) R xi,j+1 • x=xi,j P r[Xi = x]dx = α Xi ’s → use sampling methods! • S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Baseline Method Based on Markov bound (Madaptive) attribute score • H can check if X1 = {(v1,1 , p1,1 ), (v1,2 , p1,2 )...(v1,b1 , p1,b1 )} X2 = {(v2,1 , p2,1 ), (v2,2 , p2,2 )...(v2,b2 , p2,b2 )} . ... c1 E(X1) Exact Methods • Both communication and computation expensive. Naively, it could be in O(ng ) (n is the maximal |Xi |) • When Xi ’s are represented by continuous pdfs, we leverage on the characteristic function of Xi to compute the pdf of Y . Improved Adaptive Method (Iadaptive) • • c2 t X1 Datasets • Choose the smallest sample point at random (within xα ). • Real datasets (11.8 million records) from the SAMOS project. • Pr[| Pr[ Ỹ > γ] − Pr[Y > γ]| ≤ ǫ] > 1 − φ in q O( gǫ 2g ln φ2 ) bytes. • Each record contain four measurements: WD, WS, SS, TEM, which leads to four single probabilistic attribute datasets. −2 Pg i=1 E(Xi ) < γδ E(Xg ) −2 10 −3 10 Madaptive −4 10 1500 Improved 2000 γ 2500 Iadaptive 3000 1 1 −3 10 0.98 0.99 0.96 RDεS Madaptive −4 10 0.5 0.6 Improved 0.7 0.8 δ Xg Iadaptive 0.92 0 20 κ 0.9 I One-sided Chebyshev’s inequality: Var(Y ) Pr[Y > γ] < Var(Y )+(γ−E(Y ))2 .(γ > E(Y ) Pr[Y > γ] > 1 − Var(Y ) Var(Y )+(E(Y )−γ)2 .(E(Y ) > γ) 16 16 12 8 4 Improved Madaptive 0 II The Chernoff bound using the momentgenerating function. Q g βY M (β) = E(e ) = i=1 Mi (β) for any β ∈ R. for any β1 > 0 and β2 < 0 : i=1 ln Mi (β1) ≤ ln δ + β1 γ Pr[Y > γ] ≤ e −β1γM(β1) ≤ δ Pr[Y > γ] > 1 − e −β2γM(β2) ≥ δ H Pg i=1 ln Mi (β2) ≤ ln(1 − δ) + β2 γ c1 • A counter e of alarm instances is maintained in each period of k time instances. t c2 cg M1(β1) M1(β2) M2(β1) M2(β2) X1 X2 Mg (β1) Mg (β2) Xg 1500 2000 γ 2500 Iadaptive DDεS 40 αDDεS 12 8 60 RDεS 0.95 0 Madaptive Improved Iadaptive 3000 3000 2500 2500 2000 1500 1000 4 500 3000 Madaptive 0 0.5 0.96 20 κ DDεS 40 αDDεS 60 Precision and recall 3500 Improved Method 0.98 0.97 0.94 Response time Pg • Periodically decide which monitoring instance to run and set the optimal value of β1 and β2 X2 Default Value 300 10 0.7 30% alarms (230g for WD) 30 10 cg E(X2) Definition grouping interval number of clients probability threshold score threshold sample size per client Experiments < δ. E(Y ) = Xt = {(vt,1 , pt,1 ), (vt,2 , pt,2 )...(vt,bt , pt,bt )} • Compute P r[Y ≥ γ] exactly: each client ci simply sends Xi to H and H sum Xi and check against the (γ, δ) threshold. E(Y ) γ number of messages d2 dt • Markov’s inequality: Pr[Y > γ] ≤ Improved 0.6 0.7 Iadaptive 0.8 δ 0 Madaptive Improved Iadaptive 2000 1500 1000 500 1500 2000 0.9 γ 2500 3000 0 Messages 0.5 0.6 0.7 δ 0.8 0.9 Bytes 2 10 response time (secs) d1 E(Y ) γ . response time (secs) • Attribute uncertain model and flat model tuples Symbol τ g δ γ κ • DDǫS gives | Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ǫ with probability 1 in O(g 2 /ǫ) bytes. • The lowerbound (upperbound) is a function of some deterministic values derived based on Xi ’s → use DTM methods Applications Thresholds, [ICDE2008] xκ X i give two deterministic monitoring instances. (b) Ships x1 x2 x3 . . . Xg,T • Pr[Y > γ] < upperbound < δ Pr[Y > γ] > lowerbound > δ 70 • The Shipboard Automated Meteorological and Oceanographic System(SAMOS) (a) X2,T Default Experimental Parameters number of bytes 80 • Pr[| Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ǫ] ≥ 1 − φ using g 1 O( ǫ2 ln φ ) bytes. Xg,1 Xg,2 Madaptive, MadaptiveS Improved, ImprovedS Iadaptive, IadaptiveS 0 10 −2 10 −4 10 Madaptive, MadaptiveS Improved, ImprovedS Iadaptive, IadaptiveS 4 10 3 10 2 10 −6 10 WD WS SS TEM 20 10 WD WS SS MadaptiveS 15 10 5 0 1 Madaptive, MadaptiveS Improved, ImprovedS Iadaptive, IadaptiveS precision and recall 70 • Repeating this sampling κ = O( ǫ12 ln φ1 ) times. precision 75 90 X2,1 X2,2 • H asks for a random sample xi from each client w.r.t. the distribution of Xi number of bytes 80 90 t1 X1,1 t2 X1,2 ǫ g ǫ g c3 tT X1,T tT cg P r[Xi = x]dx = x=xj number of messages c2 70 70 t1 t2 pdf > 240 c1 c1 > γ] > δ response time (secs) i=1 xi H i=1 Xi R xj+1 number of bytes P3 Pr[Y = H Pg recall • Distributed Threshold Monitoring (DTM): Random Distributed ǫ-Sample (RDǫS) Deterministic Distributed ǫ-Sample (DDǫS) number of messages Introduction WD WS TEM Performance of all methods SS 1 ImprovedS IadaptiveS precision recall WD WS SS TEM WD WS SS 0.999 0.998 0.997 TEM 0.996 TEM