Efficient Threshold Monitoring for Distributed Probabilistic Data Mingwang Tang, Feifei Li, Jeff M. Phillips, Jeffrey Jestes School of Computing, University of Utah Outline 1 Introduction and Motivation 2 Exact Methods 3 Approximate Methods 4 Experiments 5 Conclusion 2 / 74 Introduction Distributed threshold monitoring(DTM) problem: P3 i=1 xi H c1 > 240 c3 c2 t1 t2 70 70 80 90 75 90 tT 70 80 70 3 / 74 Introduction Distributed threshold monitoring(DTM) problem: 225 > 240? H c1 c3 c2 t1 t2 70 70 80 90 75 90 tT 70 80 70 4 / 74 Introduction Distributed threshold monitoring(DTM) problem: 250 > 240? H c1 c3 c2 t1 t2 70 70 80 90 75 90 tT 70 80 70 5 / 74 Introduction Distributed threshold monitoring(DTM) problem: 220 > 240? H c1 c3 c2 t1 t2 70 70 80 90 75 90 tT 70 80 70 6 / 74 Introduction Distributed threshold monitoring(DTM) problem: 220 > 240? H c1 c3 c2 t1 t2 70 70 80 90 75 90 tT 70 80 70 Extensively studied: e.g., S. Jeyashanker al. propose an adaptive Pet g technique dealing with DTM problem ( i=1 xi ≤ T ) for deterministic data. [ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds, ICDE 2008 7 / 74 Introduction Distributed threshold monitoring(DTM) problem: P3 i=1 xi H c1 B 1 c2 > 240 B2 c3 B 3 t1 t2 70 70 80 90 75 90 tT 70 80 70 Extensively studied: e.g., S. Jeyashanker al. propose an adaptive Pet g technique dealing with DTM problem ( i=1 xi ≤ T ) for deterministic data. [ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds, ICDE 2008 8 / 74 Introduction Distributed threshold monitoring(DTM) problem: H c1 80 P3 i=1 xi > 240 c2 80 c3 80 t1 t2 70 70 80 90 75 90 tT 70 80 70 Extensively studied: e.g., S. Jeyashanker al. propose an adaptive Pet g technique dealing with DTM problem ( i=1 xi ≤ T ) for deterministic data. [ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds, ICDE 2008 9 / 74 Introduction Distributed threshold monitoring(DTM) problem: H c1 80 t1 t2 x1 70 70 tT 70 P3 i=1 xi c3 80 c2 80 ? x2 80 90 80 > 240 ? x3 75 90 ? 70 Extensively studied: e.g., S. Jeyashanker al. propose an adaptive Pet g technique dealing with DTM problem ( i=1 xi ≤ T ) for deterministic data. [ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds, ICDE 2008 10 / 74 Introduction Distributed threshold monitoring(DTM) problem: H c1 80 P3 i=1 xi c3 80 c2 80 ? > 240 ? ? t1 t2 70 70 80 90 75 90 tT 70 80 70 Extensively studied: e.g., S. Jeyashanker al. propose an adaptive Pet g technique dealing with DTM problem ( i=1 xi ≤ T ) for deterministic data. [ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds, ICDE 2008 11 / 74 Introduction Distributed threshold monitoring(DTM) problem: H c1 75 P3 i=1 xi c3 80 c2 85 ? > 240 ? ? t1 t2 70 70 80 90 75 90 tT 70 80 70 Extensively studied: e.g., S. Jeyashanker al. propose an adaptive Pet g technique dealing with DTM problem ( i=1 xi ≤ T ) for deterministic data. [ICDE08] S. Jeyashanker et al., Efficient Constraint Monitoring Using Adaptive Thresholds, ICDE 2008 12 / 74 Introduction New challenge in the DTM problem: uncertainty naturally exist in distributed data data integration produces fuzzy matches noisy sensor readings 13 / 74 Introduction New challenge in the DTM problem: uncertainty naturally exist in distributed data data integration produces fuzzy matches noisy sensor readings The Shipboard Automated Meteorological and Oceanographic System(SAMOS) (a) Ships Towers Data (b) satellite, radio frequency Applications Applications 14 / 74 Introduction Attribute-level uncertain model (with a single attribute score) tuples d1 attribute score X1 = {(v1,1 , p1,1 ), (v1,2 , p1,2 )...(v1,b1 , p1,b1 )} d2 X2 = {(v2,1 , p2,1 ), (v2,2 , p2,2 )...(v2,b2 , p2,b2 )} . ... dt Xt = {(vt,1 , pt,1 ), (vt,2 , pt,2 )...(vt,bt , pt,bt )} 15 / 74 Introduction Distributed probabilistic threshold monitoring (DPTM): Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 cg c2 t1 X1,1 t2 X1,2 X2,1 X2,2 Xg,1 Xg,2 tT X1,T X2,T Xg,T 16 / 74 Introduction Distributed probabilistic threshold monitoring (DPTM): Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 cg c2 t1 X1,1 t2 X1,2 X2,1 X2,2 Xg,1 Xg,2 tT X1,T X2,T Xg,T 17 / 74 Introduction Distributed probabilistic threshold monitoring (DPTM): Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 cg c2 t1 X1,1 t2 X1,2 X2,1 X2,2 Xg,1 Xg,2 tT X1,T X2,T Xg,T 18 / 74 Introduction Distributed probabilistic threshold monitoring (DPTM): Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 cg c2 t1 X1,1 t2 X1,2 X2,1 X2,2 Xg,1 Xg,2 tT X1,T X2,T Xg,T Naive Method: ci sends Xi to H at each time instance t; H computes Pr[Y > γ] based on Xi ’s expensive in terms of both communication (O(gT )) and computation (O(ng T )). 19 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive 20 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive Incorporates pruning techniques. 21 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive Incorporates pruning techniques. Pr[Y > γ] < upperbound < δ ? Pr[Y > γ] > lowerbound > δ ? H c1 cg c2 deterministic data derived based Xi’s t1 X1,1 t2 X1,2 X2,1 X2,2 Xg,1 Xg,2 tT X1,T X2,T Xg,T 22 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive Incorporates pruning techniques. Combine the adaptive threshold algorithm for deterministic data when it’s applicable. Pr[Y > γ] < upperbound < δ ? Pr[Y > γ] > lowerbound > δ ? H c1 cg c2 deterministic data derived based Xi’s t1 X1,1 t2 X1,2 X2,1 X2,2 Xg,1 Xg,2 tT X1,T X2,T Xg,T 23 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive Incorporates pruning techniques. Combine the adaptive threshold algorithm for deterministic data when it’s applicable. Approximate Methods: Replace the exact computation of Pr[Y > γ] using sampleing method (but with the same monitoring instance). 24 / 74 Outline 1 Introduction and Motivation 2 Exact Methods 3 Approximate Methods 4 Experiments 5 Conclusion 25 / 74 Baseline method (Madaptive) Markov’s inequality: Pr[Y > γ] ≤ E(Y ) γ 26 / 74 Baseline method (Madaptive) Markov’s inequality: Pr[Y > γ] ≤ Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 t c2 X1 E(Y ) γ cg X2 Xg 27 / 74 Baseline method (Madaptive) Markov’s inequality: Pr[Y > γ] ≤ t c2 X1 <δ ? Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 E(Y ) γ cg X2 Xg 28 / 74 Baseline method (Madaptive) Markov’s inequality: Pr[Y > γ] ≤ t c2 X1 < δ ? → ture ( no alarm) Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 E(Y ) γ cg X2 Xg 29 / 74 Baseline method (Madaptive) Markov’s inequality: Pr[Y > γ] ≤ t c2 X1 < δ ? → ture ( no alarm) E(Y ) < γδ ? H c1 E(Y ) γ cg X2 Xg 30 / 74 Baseline method (Madaptive) Markov’s inequality: Pr[Y > γ] ≤ c1 c2 E(X1) E(X2) X1 < δ ? → ture ( no alarm) E(Y ) = Pgi=1 E(Xi) < γδ ? H t E(Y ) γ cg X2 E(Xg ) Xg 31 / 74 Baseline method (Madaptive) Markov’s inequality: Pr[Y > γ] ≤ E(Y ) γ < δ ? → ture ( no alarm) E(Y ) = Pgi=1 E(Xi) < γδ ? H constant t c1 c2 E(X1) E(X2) X1 cg X2 E(Xg ) Xg Leverage on the adaptive thresholds algorithm for deterministic data 32 / 74 Improved method Combine the Chebyshev bound and Chernoff bound pruning. 33 / 74 Improved method Combine the Chebyshev bound and Chernoff bound pruning. Chebyshev gives one-sided bound using E(Xi ) and Var(Xi ) 34 / 74 Improved method Combine the Chebyshev bound and Chernoff bound pruning. Chebyshev gives one-sided bound using E(Xi ) and Var(Xi ) Pr[Y = Pgi=1 Xi > γ] > δ ? H c1 t c2 X1 cg X2 Xg 35 / 74 Improved method Combine the Chebyshev bound and Chernoff bound pruning. Chebyshev gives one-sided bound using E(Xi ) and Var(Xi ) Pr[Y > γ] ≤ H Var (Y ) Var (Y )+(γ−E (Y ))2 E (Y ) c1 E (X1) Var (X1) t X1 c2 cg E (X2) Var (X2) E (Xg ) Var (Xg ) X2 ≤δ? γ Xg 36 / 74 Improved method Combine the Chebyshev bound and Chernoff bound pruning. Chebyshev gives one-sided bound using E(Xi ) and Var(Xi ) Var (Y ) ≥δ? Pr[Y > γ] > 1 − Var (Y )+(E (Y )−γ)2 H γ E (Y ) c1 E (X1) Var (X1) t X1 c2 cg E (X2) Var (X2) E (Xg ) Var (Xg ) X2 Xg 37 / 74 Improved method Combine the Chebyshev bound and Chernoff bound pruning. Chebyshev gives one-sided bound using E(Xi ) and Var(Xi ) Chernoff bound using the moment generating function βY M(β) = E(e ), Mi (β) = E(e βXi ) for any β ∈ R Q M(β) = gi=1 Mi (β) 38 / 74 Improved method Combine the Chebyshev bound and Chernoff bound pruning. Chebyshev gives one-sided bound using E(Xi ) and Var(Xi ) Chernoff bound using the moment generating function βY M(β) = E(e ), Mi (β) = E(e βXi ) for any β ∈ R Q M(β) = gi=1 Mi (β) β1 > 0, Chernoff gives an upper bound β2 < 0, Chernoff gives a lower bound Pr[Y > γ] ≤ e −β1γM(β1) ≤ δ ? Pr[Y > γ] > 1−e −β2γM(β2) ≥ δ ? H c1 t c2 cg M1(β1) M1(β2) M2(β1) M2(β2) X1 X2 Mg (β1) Mg (β2) Xg 39 / 74 Improved Adaptive Method (Iadaptive) Pg ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ). Pg ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ). i=1 i=1 40 / 74 Improved Adaptive Method (Iadaptive) Pg ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ). Pg ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ). i=1 i=1 Practical considerations Use the adaptive thresholds algorithm Get a tight upper bound (lower bound) 41 / 74 Improved Adaptive Method (Iadaptive) Pg ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ). Pg ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ). i=1 i=1 Practical considerations Use the adaptive thresholds algorithm Get a tight upper bound (lower bound) Approaches Fix the values of β1 and β2 in each period of k time instance. Reset the optimal values of β1 and β2 periodically. 42 / 74 Improved Adaptive Method (Iadaptive) Pg ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ). Pg ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ). i=1 i=1 Practical considerations Use the adaptive thresholds algorithm Get a tight upper bound (lower bound) Running J1 and J2 together is communication expensive. Approaches Fix the values of β1 and β2 in each period of k time instance. Reset the optimal values of β1 and β2 periodically. 43 / 74 Improved Adaptive Method (Iadaptive) Pg ln Mi (β1 ) ≤ ln δ + β1 γ, (monitoring instance J1 ). Pg ln Mi (β2 ) ≤ ln(1 − δ) + β2 γ, (monitoring instance J2 ). i=1 i=1 Practical considerations Use the adaptive thresholds algorithm Get a tight upper bound (lower bound) Running J1 and J2 together is communication expensive. Approaches Fix the values of β1 and β2 in each period of k time instance. Reset the optimal values of β1 and β2 periodically. Periodically decides which monitoring instance to run 44 / 74 Outline 1 Introduction and Motivation 2 Exact Methods 3 Approximate Methods 4 Experiments 5 Conclusion 45 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive in terms of both communication (O(gT )) and computation (O(ng T )). Incorporates pruning techniques. Combine the adaptive threshold algorithm for deterministic data when it’s applicable. Approximate Methods: 46 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive in terms of both communication (O(gT )) and computation (O(ng T )). Incorporates pruning techniques. Combine the adaptive threshold algorithm for deterministic data when it’s applicable. Approximate Methods: We use -Sampling methods to estimate the condition when monitoring instances fail to make a decision 47 / 74 Our Approach Exact Methods: Computing Pr[Y > γ] exactly is expensive in terms of both communication (O(gT )) and computation (O(ng T )). Incorporates pruning techniques. Combine the adaptive threshold algorithm for deterministic data when it’s applicable. Approximate Methods: We use -Sampling methods to estimate the condition when monitoring instances fail to make a decision Replace the exact computation using sampling based method (but with the same monitoring instance): we get MadapativeS, ImprovedS, IadaptiveS 48 / 74 Random Distributed ε-Sample (rdεs) H asks for a random sample xi from each client according to the distribution of Xi Pg Pr[Ỹ = i=1 xi > γ] is an unbiased estimate of Pr[Y > γ] Repeating this sampling κ = O( ε12 ln φ1 ) times. Pr[| Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ε] ≥ 1 − φ using O( εg2 ln φ1 ) bytes. 49 / 74 Deterministic Distributed ε-Sample (ddεs) Using κ = O( gε ) evenly spaced sample points from each Xi . 50 / 74 Deterministic Distributed ε-Sample (ddεs) Using κ = O( gε ) evenly spaced sample points from each Xi . R xj+1 Pr[Xi = x]dx = gε x=xj pdf g x1 x2 x3 . . . xκ X i 51 / 74 Deterministic Distributed ε-Sample (ddεs) Using κ = O( gε ) evenly spaced sample points from each Xi . R xj+1 Pr[Xi = x]dx = gε x=xj The evaluation space Pr[Ỹ > γ] is in O(κg ) c1 : S1{x1,1, x1,2, . . . , x1,κ} c2 : S2{x2,1, x2,2, . . . , x2,κ} cg : Sg {xg,1, xg,2, . . . , xg,κ} 52 / 74 Deterministic Distributed ε-Sample (ddεs) Using κ = O( gε ) evenly spaced sample points from each Xi . R xj+1 Pr[Xi = x]dx = gε x=xj The evaluation space Pr[Ỹ > γ] is in O(κg ) c1 : S1{x1,1, x1,2, . . . , x1,κ} c2 : S2{x2,1, x2,2, . . . , x2,κ} cg : Sg {xg,1, xg,2, . . . , xg,κ} 53 / 74 Deterministic Distributed ε-Sample (ddεs) Using κ = O( gε ) evenly spaced sample points from each Xi . R xj+1 Pr[Xi = x]dx = gε x=xj The evaluation space Pr[Ỹ > γ] is in O(κg ) In practice, O(κm )(e.g., m = 2) random selected evaluations. c1 : S1{x1,1, x1,2, . . . , x1,κ} c2 : S2{x2,1, x2,2, . . . , x2,κ} cg : Sg {xg,1, xg,2, . . . , xg,κ} 54 / 74 Deterministic Distributed ε-Sample (ddεs) Using κ = O( gε ) evenly spaced sample points from each Xi . R xj+1 Pr[Xi = x]dx = gε x=xj The evaluation space Pr[Ỹ > γ] is in O(κg ) In practice, O(κm )(e.g., m = 2) random selected evaluations. c1 : S1{x1,1, x1,2, . . . , x1,κ} c2 : S2{x2,1, x2,2, . . . , x2,κ} cg : Sg {xg,1, xg,2, . . . , xg,κ} ddεs gives | Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ε with probability 1 in O(g 2 /ε) bytes. 55 / 74 A randomized improvement of ddεs (αddεs) R xi,j+1 x=xi,j Pr [Xi = x]dx = α pdf α xj xj+1 Xi 56 / 74 A randomized improvement of ddεs (αddεs) R xi,j+1 x=xi,j Pr [Xi = x]dx = α Computes xα where the integral of pdf first reaches α pdf α xα xj xj+1 . . . xκ Xi 57 / 74 A randomized improvement of ddεs (αddεs) R xi,j+1 x=xi,j Pr [Xi = x]dx = α Computes xα where the integral of pdf first reaches α Chooses the smallest sample point at random (within xα ). pdf α x1xα xj xj+1 . . . xκ Xi 58 / 74 A randomized improvement of ddεs (αddεs) R xi,j+1 x=xi,j Pr [Xi = x]dx = α Computes xα where the integral of pdf first reaches α Chooses the smallest sample point at random (within xα ). pdf α x1xαx2 xj xj+1 . . . xκ Xi 59 / 74 A randomized improvement of ddεs (αddεs) R xi,j+1 x=xi,j Pr [Xi = x]dx = α Computes xα where the integral of pdf first reaches α Chooses the smallest sample point at random (within xα ). pdf α x1xαx2 xj xj+1 . . . xκ Xi Pr[| Pr[Ỹ > γ] − Pr[Y > γ]| ≤ ε] > 1 − φ in O( gε q 2g ln φ2 ) bytes. 60 / 74 Outline 1 Introduction and Motivation 2 Exact Methods 3 Approximate Methods 4 Experiments 5 Conclusion 61 / 74 Experiment setup A Linux machine with an Intel Xeon CPU at 2.13GHz and 6GB of memory. GMP library are used in calculating Mi (β). Server-to-client using broadcast and client-to-server using unicast. Data sets: Real datasets (11.8 million records in the Wecoma research vessels) from the SAMOS project. Each record contain four measurements: wind direction (WD), wind speed (WS), sound speed (SS), and temperature (TEM), which leads to four single probabilistic attribute datasets. Group the records every τ consecutive seconds and represent it using a pdf. 62 / 74 Experiment setup The default experimental parameters: Symbol Definition grouping interval τ T number of time instances g number of clients probability threshold δ score threshold γ κ sample size per client Default Value 300 3932 10 0.7 30% alarms (230.g for WD) 30 63 / 74 Response time: γ: score threshold −2 response time (secs) 10 −3 10 −4 10 Madaptive 1500 2000 Improved γ 2500 Iadaptive 3000 64 / 74 Communication γ: score threshold Madaptive Improved Iadaptive 3000 number of bytes number of messages 3500 16 12 8 4 2500 2000 1500 1000 500 Madaptive 0 1500 2000 Improved γ 2500 Iadaptive 3000 0 1500 2000 γ 2500 3000 65 / 74 Precision κ: number of samples precision 1 0.98 0.96 0.94 RDεS 0.92 0 20 κ 40 DDεS αDDεS 60 66 / 74 Recall κ: number of samples 1 recall 0.99 0.98 0.97 0.96 RDεS 0.95 0 20 κ 40 DDεS αDDεS 60 67 / 74 number of messages Performance of all methods 20 Madaptive, MadaptiveS Improved, ImprovedS Iadaptive, IadaptiveS 15 10 5 0 WD WS SS TEM 68 / 74 Performance of all methods Madaptive, MadaptiveS Improved, ImprovedS Iadaptive, IadaptiveS number of bytes 4 10 3 10 2 10 1 10 WD WS SS TEM 69 / 74 Performance of all methods 2 response time (secs) 10 Madaptive, MadaptiveS Improved, ImprovedS Iadaptive, IadaptiveS 0 10 −2 10 −4 10 −6 10 WD WS SS TEM 70 / 74 Performance of all methods precision and recall MadaptiveS 1 ImprovedS IadaptiveS precision recall WD WS SS TEM WD WS SS 0.999 0.998 0.997 0.996 TEM 71 / 74 Outline 1 Introduction and Motivation 2 Exact Methods 3 Approximate Methods 4 Experiments 5 Conclusion 72 / 74 Conclusion Future work: Other aggregation constraints (e.g.,max) beside sum constraint. Extend our study to the hierarchical model that is often used in a sensor network. Handle the case when data from different sites are correlated. 73 / 74 The end Thank You Q and A 74 / 74