Analyzing wireless sensor network data under suppression and failure in transmission

advertisement
Analyzing wireless sensor
network data under suppression
and failure in transmission
Alan E. Gelfand
Institute of Statistics and Decision Sciences
Duke University
(with G.Puggioni, J.Yang, A.Silberstein, K.Munagala)
Before we begin…
• From the perspective of a stochastic
modeler
• In fact, a hierarchical modeler working in
the Bayesian inference paradigm
• Neophyte with regard to sensor networks
• However, not much attention in the
statistics community to issues involved in
studying these networks
• No optimization!
Outline
•
•
•
•
•
•
•
•
•
•
Our niche in the sensor networks world
Global, local, modeling, computation, and analysis
Local data collection
Suppression or transmission; failure; redundancy
Stochastic implications suggests focus on probabilistic
modeling rather than on algorithms
Fully model based approach implies full and exact
inference with uncertainty
Computational challenges (model fitting)
Measuring information loss
An example, some “experiments”
Future work; getting closer to what we really want
Data Collection
• At the node; multiple sensors per node;
local calibration using field data collection
• Collection at high temporal resolution
(scales?)
• Cost of collection; periods of no collection
• Collection is cheap, transmission is
expensive in terms of battery life
• Multivariate data collection
Network Communication
• Here a very simple version – “spokes on a
wheel”; single-hop; nodes to gateway (and
back); no node to node communication
Model Building Plan
• Single node, suppression only, failure only,
both
• Two nodes, suppression only, failure only,
both
• Network of nodes, spatial modeling,
suppression only, failure only, both
Suppression
• Temporal suppression only here
• The basic idea: at high temporal resolution, at a
given node, data expected to change little from
time point to time point
• Transmission is expensive relative to collection
so only transmit given a “consequential” change
• Suppression schemes? based upon comparison
with previous observation? with previously
transmitted observation? with a “predicted” value
at that time and location?
Suppression cont.
• For location s at time t, Y(s,t) is collected value
• For continuous data and a specified Є, consider,
say |Y(s,t) – Y(s,tprevtrans)| > Є or |Y(s,t) – Yest(s,t)|
> Є. NOT |Y(s,t)-Y(s,t-1)| > Є.
• Choice of Є? Anticipate a high rate of
suppression. Much more “missing data” than in
usual statistical analysis settings
• Again, no cross-node communication so here
suppression can not be based on neighboring
values (spatial suppression)
Transmission Failure
• Practical issue, what is a failure - bit errors, corrupted
transmission
• Rate varies spatially, varies seasonally
• Will not be known - so models for failure
• Disentangling failure from suppression?
• Redundancy or error-correcting schemes - when
transmitting, transmit both a value and the time or do this
for several previous transmission times (how many?)
• Another idea is to include acknowledgement from
gateway; no acknowledgement implies retransmission.
• Suppression or observation after failure results from
comparison with Y(s,tfailure).
Modelling
• Envision an overall process model which is spatially
dependent time series.
• Observed data is a noisy version of this
• In fact, we envision the familiar specification,
[data|process, parameters] x [process|parameters] x
[parameters] with dynamics at the second stage
• Dynamics can be driven by local autoregressive models
(with drift), by local discretized continuous time models,
by local differential equations
• They are connected up in space by spatially colored
noise at the second stage and, more generally, by
spatially associated model parameters
Inference
•
•
•
•
Global and local parameters
Which model parameters vary spatially?
Temporally evolving parameters reflecting seasonality
Interest in reconstruction of the local time series (but not
interested in piecewise interpolation schemes – want full
model and full inference under the model)
• Again, full inference in terms of posterior distributions
• Global model fitting – offline activity at server, what
temporal scale?
• With regard to local computation, communication of
parameter estimation to nodes for local suppression?
Model fitting
• Offline computation
• Bayesian hierarchical spatio-temporal
model
• Fitted using Gibbs sampling
• Currently, no local modeling; just
comparison with previous transmission
(failure or not)
Some details
Details cont.
Cont.
Cont.
Cont.
Cont.
Dynamic model version
An Example
• An AR(1) model
• Known drift (as in say precipitation input for a
soil moisture model)
• Drift measured at the gateway but assumed
applicable to all nodes
• Only parameters are autoregressive coefficient
and process variance
• Suppression rate Є known, failure rate not
modeled
• Experiments - using, not using (i) suppressionfailure information; (ii) redundancy
Single missing value, known endpoints, no other
information
Single missing value, known endpoints, missing
value is a known failure
Single missing value, known endpoints, missing
value is a known suppression
String of five missing values, known endpoints, no
other information on missing values
String of five missing values, known endpoints, all
missing values known to be suppressions
Joint density, adjacent missing values, no other
information
Joint density, adjacent missing values, missing
values known to be suppressions
Comments
• Anticipate high rate of suppression
• Failure should not “dominate” suppression
or else we should not suppress
• Failure rate model – reflecting space and
time
• We have not viewed lowering failure rate
as an option
Information loss
• For process parameters:
- Kullback-Liebler distance between full data
posterior and “partial” data posterior
- Kullback-Leibler distance comparing different
Є’s
- Length of fixed coverage credible interval
- Coverage probability of a symmetric (about the
point estimate) fixed length interval
• For sequence reconstruction:
A predictive mean square error criterion
Cont.
• Priority on process parameter inference or
on sequence reconstruction
• Cost vs. information loss trade-off
• Utility function with cost linear in
transmission
• No “off-line” cost associated with
computation, e.g., using or ignoring
suppression/failure information
Future Work
•
•
•
•
•
Parameters changing over time
Node-to-node communication
Multi-hop transmission
Multivariate local data collection
Local, non- network data collection for
calibration, fusion
• Good approximations for handling high
suppression rate and high failure rate settings
• All moving toward modeling for an
environmental observation network
Download