An Agenda for Information Theory Research in Sensor Networks Outline • Introduction • The Conventional Paradigm • The Emerging Paradigm • New Theory Challenges Greg Pottie UCLA EE Department Center for Embedded Networked Sensing Pottie@icsl.ucla.edu Introduction • Much research has focused upon sensor networks with some alternative assumption sets: – Memory, processing, and sensing will be cheap, but communications will be dear; thus in deploying large numbers of sensors concentrate on algorithms that limit communications but allow large numbers of nodes – For the sensors to be cheap, even the processing should be limited; thus in deploying even larger numbers of sensors concentrate on algorithms that limit both processing and communications • In either case, compelling theory can be constructed for random deployments with large numbers and flat architectures Theory for Dense Flat Networks of Simple Nodes • Redundant communications pathways given unreliable radios • Data aggregation and distributed fusion – Combinations with routing, connections with network rate distortion coding • • • • Scalability Density/reliability/accuracy trades Cooperative communication Adaptive fidelity/network lifetime trades What applications? • Early research concentrated on short-term military deployments – Can imagine that leaving batteries everywhere is at least as acceptable as leaving depleted uranium bullets; careful placement/removal might expose personnel to danger – Detection of vehicles (and even ID of type) and detection of personnel can be accomplished with relatively inexpensive sensors that don’t need re-calibration or programming in the field • Story was plausible… But was this ever done? • Military surveillance – Largest deployment (1000 nodes or so) was in fact hierarchical and required careful placement; major issues with radio propagation even on flat terrain – Vehicles are really easy to detect with aerial assets, and the major problem with personnel is establishment of intent; this requires a sequence of images – Our major problems are not battles, but insurgencies, which demand much longer-term monitoring as well as concealment • Science applications diverge even more in basic requirements – Scientists want to know precisely where things are; cannot leave heavy metals behind; many other issues • Will still want dense networks of simple nodes in some locations, but will be system component Sampling and Sensor Networks • Basic goal is to enable new science – Discover things we don’t know now – Do this at unprecedented scales in remote locations • This is a data-driven process: measure phenomena, build models, make more measurements, validate or reject models, … continue • Spatiotemporal sampling: a fundamental problem in the design of any ENS system – Spatial: Where to measure – Temporal: How often to measure • (Nearly) all problems in ENS system design are related to sampling: coverage, deployment, time-sync, datadissemination, sufficiency to test hypotheses, reliability… Adaptive Sampling Strategies • Over-deploy: focus on scheduling which nodes are on at a given time • Actuate: work with smaller node densities, but allow nodes to move to respond to environmental dynamics • Our apps are at large scales and highly dynamic: over-deployment not an option – Always undersampled with respect to some phenomenon – Focus on infrastructure supported mobility – Passive supports (tethers, buoyancy) – Small number of moving nodes • Will need to extend the limited sets of measurements with models Evolution to More Intelligent Design • Early sensor network research focused on resource constrained nodes and flat architecture – High density deployments with limited application set • Many problems with this flat architecture – Software is nightmarish – Always undersample physical world in some respect – Logistics are very difficult; usually must carefully place, service, and remove nodes • The major constraint in sustained science observations is the sensor – Biofouling/calibration: must service the nodes • Drives us towards tiered architecture that includes mobile nodes – Many new and exciting theory problems Some Theory Problems • Data Integrity – Sufficiency of network components/measurements to trust results • Model Uncertainty – Effects on deployment density, number of measurements needed given uncertainty at different levels • Multi-scale sensing – Information flows between levels; appropriate populations at the different levels given sensing tasks – Local interactions assume increased importance • Logistics management – Energy mules – Mobile/fixed node trades Many Models • Source Phenomena – Discrete sets vs. continuous, coupling to medium, propagation medium, noise and interference processes • Sensor Transduction – Coupling to medium, conversion to electrical signal, drift and error sources • Processing Abstractions – Transformation to reduced representations, fusion among diverse sensor types • System Performance – Reliability of components, time to store/transport data at different levels of abstraction Much Uncertainty • Observations (Data) – Noisy, subject to imperfections of signal conversion, interference, etc. • Model Parameters – Weighting of statistical and deterministic components; selection of model order • Models – Particular probability density function family, differential equation set, or in general combination of components • Goals and System Interactions – Goals can shift with time, interactions with larger system not always well-defined Model and Data Uncertainty in Sensor Networks • How much information is required to trust either data or a model? • Approach: multi-level network and corresponding models; evaluation of sequence of observations/experiments Data Uncertainty Multiple nodes observe source, exchange reputation information, and then interact with mobile audit node Model Uncertainty How many nodes must sample a field to determine it is caused by one (or more) point sources? A Few Problems • Validation (=debugging) is usually very painful – One part design, 1000 parts testing – Never exhaustively test with the most reliable method • So how can we trust the result given all the uncertainties? – Not completely, so the design process deliberately minimizes the uncertainties through re-use of trusted components • But is the resulting modular model/design efficient? – Fortunately not for academics; one can always propose a more efficient but untestable design • Our goal: quantifying this efficiency vs. validation effort tradeoff in model creation for environmental applications Universal Design Procedure • Innovate as little as possible to achieve goals – Applies to surprisingly large number of domains of human activity. • Begin with what we know – E.g., trusted reference experiment, prior model(s) • Validate a more efficient procedure – Exploit prior knowledge to test selected cases • Bake-off the rival designs or hypotheses – Use your favorite measure of fitness • Iterate – Result is usually a composite model with many components Example: Radio Propagation • Model from First Principles: Maxwell’s Equations – – – – Complete description (until we get to the scale of quantum dynamics) Economy of principles Computationally intractable for large volumes Many parameters that must be empirically determined • Practical approach: hybrid models – – – – Start with geometric optics (rays+Huygen’s principle) Add statistical models for unobserved or dynamic factors in environment Choice of statistical models determined by geometric factors Deeper investigation as required using either extensive observations or occasional solution of Maxwell’s equations for sample volumes – Level of detail in model depends on goals Two-Level Models • Each level in hierarchy contains reference experiments – Trusted, but resource intensive and/or limited to particular scales • Higher level establishes context – Selects among set of models at lower level corresponding to each context – Each of these sets contains a reference model/experimental procedure • This system allows re-use of components – Limits validation requirements – Extensible to new environments and scales by adding new modules Example: Fiat Lux • Top level: camera/laser mapper providing context and wider area coverage – Direct locations for PAR sensors to resolve ambiguities due to ground cover • Modular model construction – Begin with simple situations: pure geometric factors, calibration of instruments – Progress to add statistical components: swaying of branches, distributions of leaves/branches at different levels of canopy, ground cover • Resulting model is hybrid combination of: – Deterministic causal effects – Partially characterized causes (statistical descriptions) • Level of detail depends on goals – Reconstruction, statistics or other function of observations Early Experiments • Sensors with different modes and spatial resolutions – E.g. PAR sensor and camera – PAR measures local incident intensity – Camera measures relative reflected intensity • Provides better spatial and temporal resolution, at cost of requiring careful calibration • Analogous to remote sensing on local scales • • • A homogeneous screen is placed to create a reflection Er proportional to incident light Ec. Camera captures the reflection on its CCD The image pixel intensity is transformed to Er using camera’s characteristic curve. If 2 levels are good, n levels are even better! Daily Average Temperature (Geostatistical Analyst) Extend model to include remote sensing; additional levels of “side information” and/or sources for data fusion Aspect (Spatial Analyst) Slope (Spatial Analyst) Elevation (Calculated from Contour Map) Aerial Photograph (10.16cm/pixels) Hourly Temperature for June 5 2004 30.000 Series1 Series2 Series3 Series4 25.000 Series5 Series6 Series7 Series8 Series9 Series10 Series11 Graphs Temperature 20.000 15.000 Series12 Series13 Series14 Series15 Series16 Series17 Series18 10.000 5.000 0.000 0.000 5.000 10.000 15.000 Hour 3D Images 20.000 25.000 30.000 Series19 Series20 Layers and Modules vs. Tabula Rasa Design • Fresh approach (e.g. “cross-layer design”) allows optimization according to particular goals – Yields efficiency of operation – But may lack robustness, and requires much larger validation effort each time new goals/conditions considered – Size of model parameter set can be daunting • Sequential set of experiments allows management of uncertainty at each step – Minimizes marginal effort; if each experiment or design in chain was of interest, overall effort (likely) also minimized – Naturally lends itself to Bayesian approach; many information theory opportunities – But has an overhead in terms of components not required for given instantiation • Research goal is quantification of efficiency/validation tradeoff Conclusion • Development of multi-layered systems – – – – Physical phenomenon modeled at multiple abstraction layers Hardware has many levels from tags to mobile infrastructure Software abstractions and tools in support of this development Theoretical study of information flows among these levels • New and interesting problems arise from real deployments – Even seemingly simple phenomena such as light patterns in forests are amazingly complicated to model – Approach through sequence of related experiments and models