Analyzing wireless sensor network data under suppression and failure in transmission Alan E. Gelfand Institute of Statistics and Decision Sciences Duke University (with G.Puggioni, J.Yang, A.Silberstein, K.Munagala) Before we begin… • From the perspective of a stochastic modeler • In fact, a hierarchical modeler working in the Bayesian inference paradigm • Neophyte with regard to sensor networks • However, not much attention in the statistics community to issues involved in studying these networks • No optimization! Outline • • • • • • • • • • Our niche in the sensor networks world Global, local, modeling, computation, and analysis Local data collection Suppression or transmission; failure; redundancy Stochastic implications suggests focus on probabilistic modeling rather than on algorithms Fully model based approach implies full and exact inference with uncertainty Computational challenges (model fitting) Measuring information loss An example, some “experiments” Future work; getting closer to what we really want Data Collection • At the node; multiple sensors per node; local calibration using field data collection • Collection at high temporal resolution (scales?) • Cost of collection; periods of no collection • Collection is cheap, transmission is expensive in terms of battery life • Multivariate data collection Network Communication • Here a very simple version – “spokes on a wheel”; single-hop; nodes to gateway (and back); no node to node communication Model Building Plan • Single node, suppression only, failure only, both • Two nodes, suppression only, failure only, both • Network of nodes, spatial modeling, suppression only, failure only, both Suppression • Temporal suppression only here • The basic idea: at high temporal resolution, at a given node, data expected to change little from time point to time point • Transmission is expensive relative to collection so only transmit given a “consequential” change • Suppression schemes? based upon comparison with previous observation? with previously transmitted observation? with a “predicted” value at that time and location? Suppression cont. • For location s at time t, Y(s,t) is collected value • For continuous data and a specified Є, consider, say |Y(s,t) – Y(s,tprevtrans)| > Є or |Y(s,t) – Yest(s,t)| > Є. NOT |Y(s,t)-Y(s,t-1)| > Є. • Choice of Є? Anticipate a high rate of suppression. Much more “missing data” than in usual statistical analysis settings • Again, no cross-node communication so here suppression can not be based on neighboring values (spatial suppression) Transmission Failure • Practical issue, what is a failure - bit errors, corrupted transmission • Rate varies spatially, varies seasonally • Will not be known - so models for failure • Disentangling failure from suppression? • Redundancy or error-correcting schemes - when transmitting, transmit both a value and the time or do this for several previous transmission times (how many?) • Another idea is to include acknowledgement from gateway; no acknowledgement implies retransmission. • Suppression or observation after failure results from comparison with Y(s,tfailure). Modelling • Envision an overall process model which is spatially dependent time series. • Observed data is a noisy version of this • In fact, we envision the familiar specification, [data|process, parameters] x [process|parameters] x [parameters] with dynamics at the second stage • Dynamics can be driven by local autoregressive models (with drift), by local discretized continuous time models, by local differential equations • They are connected up in space by spatially colored noise at the second stage and, more generally, by spatially associated model parameters Inference • • • • Global and local parameters Which model parameters vary spatially? Temporally evolving parameters reflecting seasonality Interest in reconstruction of the local time series (but not interested in piecewise interpolation schemes – want full model and full inference under the model) • Again, full inference in terms of posterior distributions • Global model fitting – offline activity at server, what temporal scale? • With regard to local computation, communication of parameter estimation to nodes for local suppression? Model fitting • Offline computation • Bayesian hierarchical spatio-temporal model • Fitted using Gibbs sampling • Currently, no local modeling; just comparison with previous transmission (failure or not) Some details Details cont. Cont. Cont. Cont. Cont. Dynamic model version An Example • An AR(1) model • Known drift (as in say precipitation input for a soil moisture model) • Drift measured at the gateway but assumed applicable to all nodes • Only parameters are autoregressive coefficient and process variance • Suppression rate Є known, failure rate not modeled • Experiments - using, not using (i) suppressionfailure information; (ii) redundancy Single missing value, known endpoints, no other information Single missing value, known endpoints, missing value is a known failure Single missing value, known endpoints, missing value is a known suppression String of five missing values, known endpoints, no other information on missing values String of five missing values, known endpoints, all missing values known to be suppressions Joint density, adjacent missing values, no other information Joint density, adjacent missing values, missing values known to be suppressions Comments • Anticipate high rate of suppression • Failure should not “dominate” suppression or else we should not suppress • Failure rate model – reflecting space and time • We have not viewed lowering failure rate as an option Information loss • For process parameters: - Kullback-Liebler distance between full data posterior and “partial” data posterior - Kullback-Leibler distance comparing different Є’s - Length of fixed coverage credible interval - Coverage probability of a symmetric (about the point estimate) fixed length interval • For sequence reconstruction: A predictive mean square error criterion Cont. • Priority on process parameter inference or on sequence reconstruction • Cost vs. information loss trade-off • Utility function with cost linear in transmission • No “off-line” cost associated with computation, e.g., using or ignoring suppression/failure information Future Work • • • • • Parameters changing over time Node-to-node communication Multi-hop transmission Multivariate local data collection Local, non- network data collection for calibration, fusion • Good approximations for handling high suppression rate and high failure rate settings • All moving toward modeling for an environmental observation network