HiPC 2003 Tutorial System Support for Sensor Networks Speakers: Sharad Mehrotra, Univ. of California, Irvine Nalini Venkatasubramanian, Univ. of California, Irvine Rajesh Gupta, Univ. of California, San Diego Quasar Group Acknowledgements (for Slides) • Nesime Tatbul Kevin Hoeschele, Anurag Shakti Maskey (AURORA team) • Jennifer Widom, Rajeev Motwani (STREAM) • Sam Madden (TinyDB) • Anantha Chandrakasan (MIT uAMPS TEAM) • Qi Han, Iosif Lazaridis, Xingbo Yu (QUASAR team) • Srini Seshan (Irisnet) • Slides for tutorial available at – http://www.ics.uci.edu/~quasar/tutorial/hipc.ppt Quasar Group Various Sensor Applications Habitat Monitoring Battlefield Monitoring Earthquake Monitoring Medical Condition Monitoring Sensor Networks Video surveillance Oceanographic current monitoring Intrusion Detection Quasar Group Target Tracking & Detection Traffic Congestion Detection Taxonomy of Applications (1) • Data Access needs of applications – Historical data • Analysis to better understand the physical world – Current data • Monitoring and control to optimize the processes that drive the physical world – Future data • Forecasting trend in data for decision making Quasar Group Taxonomy of Applications (2) • Predictability of Data access – Fixed • data access needs of applications known a-priori – Unpredictable (ad-hoc) • Data access needs of applications not known at any instance of time – Predictable (continuous) • Data access needs of applications can be predicted for some time in the future with high probability Quasar Group Application Landscape Temporal property of data accessed the future Predict noise levels around the airport if runway 2 becomes operational I’m going surfing on Sep. 30! Will it be windy? Each evening at 8pm predict the temperature for the next 5 days the present Visualize current humidity with Mrs. Doe’s new interpolation scheme. How much snow is there in Aspen? Notify me immediately when there is a forest fire the Is Mr. Doe’s newly proposed weather model accurate for 1996-2000? Did the temperature rise above 40oC in the last year? Every month, calculate the average humidity in California for the last 30 days past no knowledge some knowledge Predictability of data access Quasar Group full knowledge Basic architecture of sensor nodes Quasar Group Sensor Properties – Different Capabilities • Storage – Built-in memory • Sensing • Computing – Micro-processor or micro-controller • Communication – Short range radio for wireless communication Quasar Group Sensor Properties – Resource Constraints • Lower transmission distances (< 10m) • Lower bit rates (typically < kbps) • Limited battery capacity Radio mode Power consumption(mw) Transmit 14.88 Receive 12.50 Idle 12.36 Sleep 0.016 Quasar Group Sensor Devices today • A series of sensor nodes developed • MIT uAMPS – 59Mhz to 206 Mhz processor – 2 radios , capable of transmitting at 1Mbps – 4KB RAM • Berkeley Mica motes – 8bit, 4Mhz processor – 40kbit CSMA radio – 4KB RAM, – TinyOS based Quasar Group Sensor OS Concepts Quasar Group msg_rec(type, data) msg_sen d_done) Messaging Component internal thread Internal State TX_pack et_done (success RX_pack )et_done (buffer) • Very lean multithreading • Efficient Layering Events send_msg (addr, type, data) – frame per component, shared stack, no heap power(mode) • Constrained Storage Model init – Event-based(?) Commands init Power(mode) TX_packet(buf) • Constrained Scheduling Sensor Network Properties restricted resources frequent topology changes and network partitions prone to failure small-scale sensor nodes fixed vs. mobile sensor grids infrastructure based vs. ad-hoc communication Quasar Group environmental influence unattended operation node mobility depleted battery concurrency issues dense deployment in large numbers heterogeneity issues scalability issues Controversies with sensor networks • How is this different from mobile ubiquitous computing? • Network-centric vs. edge-centric architecture? – Passive sensors vs. smart sensors • A new class of algorithms? – Traditional deterministic vs. probabilistic vs. epidemic Quasar Group Wireless Networked Embedded Systems Characteristics • Wireless – limited bandwidth, high latency (3ms-100ms) – variable link quality and link asymmetry due to noise, interference, disconnections – easier snooping need for more signal and protocol processing • Mobility – causes variability in system design parameters: connectivity, b/w, security domains, location awareness need for more protocol processing • Portability – limited capacities (battery, CPU, I/O, storage, dimensions) need for energy efficient signal and protocol processing Quasar Group Capacity of Wireless Sensor Networks • Sensor Networks – nodes can sense (actuate), compute, communicate • at the next level, these nodes and networks can infer, track, correlate and correspond – when such nodes can be composed, the application possibilities can be wildly imaginative • highly intelligent real-time distributed systems • However, there are fundamental limits to scaling that have to do with the ad hoc nature of such networks – nodes building links and communicating (including relaying, setup and discovery) without a central control Quasar Group Communication in Sensor Networks • Questions we seek to answer – How much information can wireless sensor networks transport? • What can be done to maximize this transport? – What is the right power level for transport? • Where is this control (best) exercised? – What is the appropriate network configuration • Direct communication (single-hop) • Multi-hop communication – Directed diffusion , LAR, GF • Cluster-based communication – LEACH Quasar Group Challenges for Sensor Networks Services for localization, discovery, storage, agreement Integration of communication and application specific data processing Challenges for Sensor Networks Automatic configuration & error handling Quasar Group Injection of application knowledge into sensor network infrastructure Quality of data/service Guarantees under resource constraints Time & location management Projects on Sensor Networks Sensor OS Network related ISI UCLA USC SensIT NEST NEST Stabilization Ohio-state Univ. of Iowa Michigan state Univ. UT-Arlington Kenn State Univ. UC-Berkeley MIT muOS QoS in Surveillance and Control MIT Duke Univ. Univ. of Hawaii Univ. of Wisconsin Northwestern Univ. Penn State Univ. Auburn Univ. UIUC Univ. of Virginia CMU WebDust Rutgers Quasar UC-Irvine Xerox SmartDust UC-Berkeley Quasar Group TinyDB UCBerkeley Aurora Brown, MIT, Brandeis Univ. Cougar Cornell What are the Choices? Quasar Group Sensor networks Wireless networks Specialized infrastructure COTS infrastructure Smart sensors Passive sensors Probabilistic guarantees Deterministic solutions This tutorial – systems perspective • Layered approach – Device level • Challenges in design of sensor devices and OSs – Distributed sensor networks • Challenges in managing large networks of sensors to meet application requirements – Sensor Database Management • Challenges in Query Processing over sensor networks Quasar Group Design of sensor nodes • Sensor Node Components – Computation/communication tradeoff • Energy Management within a sensor – Computation/communication tradeoff • Power-aware OS design for sensors Quasar Group Distributed Computing Infrastructure for Sensors • Designing Distributed Sensor Architectures – Server oriented -- data migrates to server from sensors • Store or not store (stream) • When should data migrate • How should should data migrate in its original raw form or in some aggregated form. – Distributed approach • Data does not migrate, requests/Queries migrate • Tiny DB approach, Dimension Approach • Designing Middleware Support for Sensor Networks – Energy-Efficiency – Real-time – Fault tolerance Quasar Group Query Processing in Sensor Networks • Queries Processing over Sensor Databases – Taxonomy of queries • Lifetime queries, aggregation queries, approximate queries, set based queries – Where do queries arise • At the server, fully distributed at any node – Query semantics • What does a query mean? Exact semantics not very clear. – Query Processing techniques • Answering Approximate Queries over Approximate Representation • Answering Queries in the network • Distributed Query Answering • Data Stream processing & Dynamic Data Quasar Group Design Issues in Sensor Devices HiPC 2003, Hyderabad, India Quasar Group Energy Availability Growth limited to 2-3% per year 16x Improvement (compared to year 0) 14x 12x 10x 8x 6x 4x 2x 1x 0 1 2 3 4 Time (years) 5 6 J. Rabaey, BWRC Need to be energy efficient at all Quasar Group Computational Efficiency • Speed power efficiency has indeed gone up – 10x / 2.5 years for Ps and DSPs in 1990s • between 100 mW/MIP to 1 mW/MIP since 1990 – IC processes have provided 10x / 8 years since 1965 – rest from power conscious IC design in recent years • Lower power for a given function & performance Quasar Group Processor P54VRT (Mobile) P55VRT (Mobile MMX) PowerPC 603e PowerPC 604e PowerPC 740 (G3) PowerPC 750 (G3) Mobile Celeron MHz 150 233 300 350 300 300 333 Year SPECint-95 Watts 1996 4.6 3.8 1997 7.1 3.9 1997 7.4 3.5 1997 14.6 8 1998 12.2 3.4 1998 14 3.4 1999 13.1 8.6 However, circuit gains are nearing a plateau – circuit tricks & voltage scaling provided a large part of the gains • while energy needs Efficiency in Communications • Power Efficiency (or Energy Efficiency) P = Eb/N0 – ratio of signal energy per bit to noise power spectral density required at the receiver for a certain BER – high power efficiency requires low (E_b/N_0) needed for a given BER • Bandwidth Efficiency B = bit rate / bandwidth = R_b/W bps/hz – ratio of throughput data rate to bandwidth occupied by the modulated signal (typically range from 0.33 to 5) • Often a trade-off between the two Quasar Group Communication vs. Computation • Computation cost (2004 projected): 60 pJ/op • Minimum thermal energy for communications: – 20 nJ/bit @ 1.5 GHz for 100 m • equivalent of 300 ops – 2 nJ/bit @ 1.5 GHz for 10 m • equivalent of 0.03 ops significant processing versus communication tradeoff Quasar Group J. Rabaey, BWRC The Need • Power consumption, energy efficiency is a system level design concern – efficiency in computation, communication and networking subsystems • The energy/power tradeoffs cut across – all system layers: circuit, architecture, software, algorithms – need to choose the right metric • Power awareness goes beyond low power concerns Quasar Group Where does the Power Go? Baseband DSP Peripherals Disk Display Processing Programmable Ps & DSPs ASICs (apps, protocols etc.) Memory Battery DC-DC Converter Radio Modem Power Supply RF Transceiver Communication Signaling protocols, choice of modulation, TX/RX architecture, RF/IF circuits Quasar Group Example 1: Power Measurements on Rockwell WINS Node Processor Active Active Active Active Active Active Seismic Sensor On On On On Removed On Radio Rx Idle Sleep Removed Removed Tx (36.3 mW) Tx (27.5 mW) Tx (19.1 mW) Summary Tx (13.8 mW) • Processor = Tx (10.0 mW) Tx (3.47 mW) 360 mW Tx (2.51 mW) – doing Tx (1.78 mW) repeated Tx (1.32 mW) transmit/recei Tx (0.955 mW) Tx (0.437 mW) ve Tx (0.302 mW) • Sensor = 23 Tx (0.229 mW) mW Tx (0.158 mW) Group •Quasar Processor : Tx Tx (0.117 mW) Power (mW) 751.6 727.5 416.3 383.3 360.0 1080.5 1033.3 986.0 942.6 910.9 815.5 807.5 799.5 791.5 787.5 775.5 773.9 772.7 771.5 771.1 Capabilities: vibration, acoustic, accelerometer, magnetometer, temperature sensing GPS Radio Modem Communication Subsystem Micro Controller Rest of the Node CPU Sensor Power Consumption Notables • Differences in radio “sleep” versus “shutdown” can be significant – need power management strategies at module/subsystem level • Generally RX power less than TX power. • However, as TX get to lower power modes, under some circumstances, it may be less than RX power – particularly true in “sensor” type nodes – need protocols that minimize listening needed – need very low power “paging” channels for wakeup • Processing can be a significant fraction of total power – 30-50% Quasar Group Metrics for Power • Absolute power (mW) – sets battery life in hours – problem: power frequency (slow the system!) • uW/MHz – average energy consumed by the system • Energy per operation – fixes obvious problem with the power metric – but can cheat by doing stuff that will slow the chip – Energy/op = Power * Delay/op • Metric should capture both energy and performance: e.g. Energy/Op * Delay/Op • Energy*Delay = Power*(Delay/Op)2 • Therefore: – uW/MIPS: average energy per instruction – uW/MIPS^2: normalizes uW/MIPS with the architectural performance • Quasar Group useful for comparing architectures for power efficiency. Node Level Power Management • Choices: H/W, Firmware, OS, Application, Users • Hardware & firmware – don’t know the global state and application-specific knowledge • Users – don’t know component characteristics, and can’t make frequent decisions • Applications – operate independently – and the OS hides machine information from them • OS is a reasonable place, but… – OS should incorporate application information in power management – OS should expose power state and events to applications for them to adapt. Quasar Group Operating System Directed Power Management • Significant opportunities in power management lie with application-specific “knobs” – quality of service, timing criticality of various functions • OS plays an important role in allocation, sharing of critical resource – it is a logical place for dynamic power management – application-specific constraints and opportunities for saving energy that can be known only at that level • Needs of applications are driving force for OS power management functions & power-based API – collaboration between applications and the OS in setting “energy use policy” • OS helps resolve conflicts and promote cooperation Quasar Group Slowdown by reducing supply voltage – Dynamic Voltage Scaling • Reduction in supply voltage reduces speed • Reduce supply voltage when – slower speed can be tolerated – or use architectural techniques to combat slow operation • e.g. concurrency, pipelining via compiler techniques Quasar Group Shutdown for Energy Saving – Shutdown attractive for many wireless applications due to low duty cycle of many subsystems: Blocked “Off” – Issues: Tblock Active “On” Tactive • Cost of restarting: latency vs. power trade-off ideal improvement – increase in latency (response time) – increase in power consumption due to startup • When to Shutdown: – Optimal vs.Idle Time Threshold vs. Predictive • When to Wakeup: – Optimal vs. On-demand vs. Predictive • Two main approaches: (Reactive versus Predictive) – “Go to Reduced Power Mode after the user has been idle for a few seconds/minutes, and restart on demand” Quasar Group – “Use computation history to predict whether Tblock[i] is large enough ( Tblock[i] Tcost )” To Shutdown or Reduce Voltage? • Observation: – better to lower voltage than to shutdown in case of digital logic • Example: task with 100ms deadline, requires 50ms CPU time at full speed – normal system gives 50ms computation, 50ms idle/stopped time – half speed/voltage system gives 100ms computation, 0ms idle – same number of CPU cycles but 1/4 energy reduction • Voltage gets dictated by the tightest (critical) timing constraint both on throughput and latency --> dynamically change voltage – Use voltage to control the operating point on the power vs. speed curve • I.e., power and clock frequency are functions of voltage – Main challenge here is algorithmic: • one has to schedule the voltage variation as well! – via compiler or OS or hardware Quasar Group Current OSPM - ACPI • Advanced Configuration and Power Management Interface (ACPI) – OS visible (SCI-based) as opposed to OS invisible (SMI-based) – OS/drivers/BIOS are in sync regarding power states • Standard way for the system to describe its device config. & power control h/w interface to the OS – register interface for common functions • system control events, processor power and clock control, thermal management, and resume handling • Info on devices, resources, & control mechanisms – Description Tables, linked in a "table of tables" – description data for each device: • • • • Quasar Group Power management capabilities and requirements Methods for setting and getting the power state Hardware resource settings Methods for setting hardware resources New power-aware interfaces required • Provide ways by which Application, Operating System and Hardware can exchange energy/power and performance related information efficiently. • Facilitate the continuously dialogue / adaptation between OS / Applications. • Facilitate the implementation of power aware OS services by providing a software interface to low power devices – A power-aware API to the end user that enables one to implement energy-efficient RTOS services and applications Quasar Group Power-aware API The applications interface provides the following services: • The application is able to – tell RT information to OS (period, deadlines, WCET, hardness) – create new threads – tell OS time predicted to finish a given task instance • depending on the conditions of the environment (application dependent and not yet implemented) • OS must be able to predict and tell applications the time estimated to finish the task – depends on the scheduling scheme used • A hard task must be killed if its deadline is missed. Quasar Group Power Management in Communication Subsystems Computation Subsystem Communication Subsystem e.g. Dynamic Voltage/Freq. Scaling Power-aware Task Scheduling Modulation coding Power-aware Packet Scheduling OS/Middleware/Application Quasar Group Tiny OS Concepts – frame per component, shared stack, no heap • Very lean multithreading • Efficient Layering Quasar Group Messaging Component internal thread Internal State TX_pack et_done (success RX_pack )et_done (buffer) • Constrained Storage Model Events send_msg (addr, type, data) Commands, Event Handlers Frame (storage) Tasks (concurrency) power(mode) – – – – init • Component: Commands init Power(mode) TX_packet(buf) – constrained two-level scheduling model: threads + events msg_rec(type, data) msg_sen d_done) • Scheduler + Graph of Components application Application = Graph of Components Route map router sensor appln packet Radio byte bit Radio Packet byte Active Messages RFM Serial Packet UART Temp ADC photo SW HW clocks Example: ad hoc, multi-hop routing of photo sensor readings 3450 B code 226 B data Graph of cooperating state machines on shared stack Quasar Group Part 2: Distributed Computing Infrastructure for Sensor Applications **Supported in part by a collaborative NSF ITR grant entitled “real-time data capture, analysis, and querying of dynamic spatio-temporal events” in collaboration with UCLA, U. Maryland, U. Chicago Quasar Group Managing Distributed Sensor Infrastructures • A data collection and management middleware infrastructure that – provides seamless access to data dispersed across a hierarchy of sensors, servers, and archives – supports multiple concurrent applications of diverse types – adapts to changing application needs • Fundamental Issues: – Where to store data? • do not store, at the producers, at the servers – Where to compute? • At the client, server, data producers Quasar Group Outline of this section • Sensor network architectures • Sensor application needs – Accuracy, timeliness, cost, reliability • Tasks of a middleware framework – Services that can be customized to address needs • Case studies – accuracy/cost tradeoffs in collection – Accuracy/cost/timeliness tradeoffs in collection – Storage/accuracy tradeoffs in archival Quasar Group Architectural Configurations • Server-centric • Streams • Hierarchical • Distributed Quasar Group Sensor Network Architectures – 1: (server centric) data/query request data producers server data/query result • Traditional data management – – – – client-server architecture efficient approaches to data storage & querying query shipping versus data shipping data changes with explicit update • Limitations – Sensors generate continuously changing data • Producers must be considered as “first class” entities – Does not exploit the storage, processing, and communicating capabilities of sensors Quasar Group client Sensor Network Architectures – 2: streams synopsis in memory data streams stream processing engine continuous queries (Approximate) Answer • Stream model – Data streams through the server but is not stored – Continuous queries evaluated against streaming data – Deals with problems due to dynamic data on the server side • Limitations – Does not converse sensor resources (e.g., power) – Does not exploit the storage and processing capabilities of sensors – Geared towards continuous monitoring and not archival applications Quasar Group Sensor Network Architectures – 3: hierarchical • Hierarchical architecture (e.g Quasar) client server server cache and archive Producer & its cache Quasar Group QUERY FLOW DATA FLOW client cache – data flows from producers to server to clients periodically – queries flow the other way: • If client cache does not suffices, then • query routed to appropriate server • If server cache does not suffice, then access current data at producer – This is a logical architecture • producers could also be clients • A server may be a base station or a (more) powerful sensor node • Servers might themselves be hierarchically organized • The hierarchy might evolve over time Sensor Network Architectures - 4: Fully Distributed P2P • Distributed architecture (e.g. Dimensions) PROGRESSIVELY LOSSY Quasar Group Level 1 Level 0 … PROGRESSIVELY AGE Level 2 – Store data at sensor nodes – Construct distributed loadbalanced quad-tree hierarchy of lossy wavelet-compressed summaries corresponding to different resolutions and spatiotemporal scales. – Queries drill-down from root of hierarchy to focus search on small portions of the network. – Progressively age summaries for long-term storage and graceful degradation of query quality over time. Outline of this section • Sensor network architectures • Sensor application needs – Accuracy, timeliness, cost, reliability • Tasks of a middleware framework – Services that can be customized to address needs • Case studies – accuracy/cost tradeoffs in collection – Accuracy/cost/timeliness tradeoffs in collection – Storage/accuracy tradeoffs in archival Quasar Group Balancing Tradeoffs in Application Requirements • Accuracy – More accurate context results in better application performance – Very high accuracy may not be needed • Cost – Minimize resources consumed • Network (messaging) • Energy • Storage • Timeliness – Late data may be useless • Reliability – Wrong/missing data may cause problems Quasar Group Data Representation • Instantaneous value • Range-based – Static Interval – Dynamic range-based • Probabilistic distribution – (mean, stdev) with decay • Compressed formats – wavelet – histograms – sketches Quasar Group What is accuracy? • Resolution – Temporal (Aurora) • 1 value for a sliding window of size 5 • Load-shedding, subsetting – Spatial (ask Iosif about wkshp paper) • 1 value for a given region of dimension [x.y] • Value laxity (Quasar) – Value represented as an interval • 9 represented as [6,12] – Value represented as a probability distribution Quasar Group Tasks of a Sensor Management Framework • Translation: mapping application quality requirement to data quality requirements – Examples: • Target tracking: quality of track --> accuracy of data • Aggregation Queries: accuracy of results --> accuracy of data – Strategy should adapt to expected application load • Collection – Minimize sensor resource consumption while guaranteeing required data quality • Storage • Dissemination/Delivery Quasar Group Middleware Components Applications mobile target tracking activity monitoring location based service .... Adaptive Middleware Server Side Components sensor selection adaptive precision setting fault tolerance prediction module AQ DQtranslation sensor database sensor data management Sensor Side Components Sensor State management prediction module precision driven adaptation Distributed Sensor Environment Quasar Group Adaptive Tracking of mobile objects Track visualization object Base station 1 Wireless link Show me the approximate track of the object with precision Server Wireless Sensor Grid Base station 2 Base station 3 Tracking Architecture A network of wireless acoustic sensors arranged as a grid transmitting via a base station to server Objective Track a mobile object at the server such that the track deviates from the real trajectory within a user defined error threshold track with minimum communication overhead. Quasar Group Basic Triangulation Algorithm P: source object power, Ii = intensity reading at ith sensor (x1, y1) (x2, y2) (x-x1)2 + (y- y1)2 = P/4 I1 (x-x2)2 + (y- y2)2 = P/4 I2 (x, y) (x-x3)2 + (y- y3)2 = P/4 I3 (x3, y3) Solving we get (x, y)=f(x1,x2,x3,y1,y2,y3, P,I1, I2 , I3, ) More complex approaches to amalgamate more than three sensor readings possible Those are based on numerical methods -- do not provide a closed form equation between sensor reading and tracking location ! Server can use simple triangulation to convert track quality to sensor intensity quality tolerances and use a more complex approach to track. Quasar Group Track quality data quality Case 1 (power constant) I1 Intensity ( I1 ) Let Ii be the intensity value of sensor | Δ Ii | Ii ξ /(1 Iiξ ) If then, track quality is guaranteed to be within track 2 ti time t( i+1 ) 2 / C and C is a constant where track derived from the known locations of the sensors and the power of the object. I2 Intensity ( I2 ) ti time t( i+1 ) Case 2 (power varies between [Pmin , Pmax ]) I3 If Intensity ( I3 ) time ti Y (m) Quasar Group then t( i+1 ) track X (m) 2 Pmin 2 track | I i | 2 [ I i I i Pmax ] Pmax C' track quality is guaranteed to be within track where C’ = C/ P2 and is a constant . The above constraint is a conservative estimate. Better bounds possible Components of an Information Collection Framework Information Source Information Consumer source consumer consumer request source update request …… Information Mediator source consumer …… DS DS DS source Quasar Group Sensor Model Wireless sensors : battery operated, energy constrained Removed from “active list” S0: monitor processor on, sensor on, radio off S1: active processor on, sensor on, radio on Intensity above threshold S2: quasi-active processor on, sensor on, radio intermittent Quasar Group Data Collection Protocols Sensor-Side protocol: • When not in use: – tell server to remove it from “active list”, switch to monitor mode S0 • Upon external event: – if in S0, change to active mode S1, and update every time instant – if in S2, update only when error bound violated Server-Side protocol: • If sensor state changes to S1 – add it to “active list” – compute an error bound for it, and send to the sensor • else, when value received, update server cache if the sensor is in “active list” Quasar Group Data Collection Problem Sensor time series …p[n], p[n-1], …, p[1] • Let P = < p[1], p[2], …, p[n] > be a sequence of environmental measurements (time series) generated by the producer, where n = now • Let S = <s[1], s[2], …, s[n]> be the server side representation of the sequence • A within- quality data collection protocol guarantees that for all i • error(p[i], s[i]) < is derived from application quality tolerance Quasar Group Answering Queries query Q1 (A1) sensor-initiated update (sensor time series: …p[n], p[n-1], …, p[1]) query Qm (Am) … probe sensor si • Probe result If query quality tolerance satisfied at server (more than ) – Answer query at the server • Else – Probe the sensor – Sensor guaranteed to respond within a bounded time • Approach guarantees quality tolerance of queries Quasar Group i=[li,ui] Imprecise data representation Simple Data Collection Protocol Sensor time series …p[n], p[n-1], …, p[1] • sensor Logic (at time step n) Let p’ = last value sent to server if error(p[n], p’) > or on timeout send p[n] to server • --- sensor if switch radio on, if need be server logic (at time step n) If new update p[n] received at step n s[n] = p[n] Else s[n] = last update sent by sensor – guarantees maximum error at server less than equal to Quasar Group Exploiting Prediction Models • Producer and server agree upon a prediction model (M, ) • Let spred[i] be the predicted value at time i based on (M, ) • sensor Logic (at time step n) if error(p[n], spred[n] ) > send p[n] to server • server logic (at time step n) • If new update p[n] received at step n s[n] = p[n] Else s[n] = spred[n] based on model (M, ) Quasar Group Challenges in Prediction • Simple versus complex models? – Complex and more accurate models require more parameters (that will need to be transmitted). – Goal is to minimize cost not necessarily best prediction • How is a model M generated? – static -- one out of a fixed set of models – dynamic -- dynamically learn a model from data • When should a model M or parameters be changed? – immediately on model violation: • too aggressive: violation may be a temporary phenomena – never changed: • too conservative: data rarely follows a single model Quasar Group Challenges in Prediction (cont.) • who updates the model? – Server • long-haul prediction models possible, since server maintains history • might not predict recent behavior well since server does not know exact S sequence; server has only samples • extra communication to inform the producer – Producer • better knowledge of recent history • long haul models not feasible since producer does not have history • producers share computation load – Both • server looks for new models, sensor performs parameter fitting given existing models. Quasar Group Experiment (error tolerance 20m) A restricted random motion : the object starts at (0,d) and moves from one node to another randomly chosen node until it walks out of the grid. Models used: static and linear Quasar Group Energy Savings total energy consumption over all sensor nodes for random mobility model with varying track or track error. significant energy savings using adaptive precision protocol over non adaptive tracking ( constant line in graph) for a random model, prediction does not work well ! Quasar Group Energy Savings total energy consumption over all sensor nodes for random mobility model with varying base station distance from sensor grid. As base station moves away, one can expect energy consumption to increase since transmission cost varies as d n ( n =2 ) better results with increasing base station distance Quasar Group Outline of this section • Sensor network architectures • Sensor application needs – Accuracy, timeliness, cost, reliability • Tasks of a middleware framework – Services that can be customized to address needs • Case studies – accuracy/cost tradeoffs in collection – Accuracy/cost/timeliness tradeoffs in collection – Storage/accuracy tradeoffs in archival Quasar Group Accuracy/Cost Tradeoff • Applications can tolerate errors in sensor data – applications may not require exact answers: • small errors in location during tracking or error in answer to query result may be OK – data cannot be precise due to measurement errors, transmission delays, etc. • Cost – Communication bandwidth – Energy drain • Quasar Approach – exploit application error tolerance to reduce communication between producer and server and/or to conserve energy – Two approaches • Minimize resource usage given quality constraints • Maximize quality given resource constraints Quasar Group Modeling cost as communication bandwidth (e.g.TRAPP) – Caches store approximations of exact source values • Queries have precision constraints Quasar Group performance • Goal: Minimize network usage while meeting application-specific precision requirements • Our solution: stale cache you decide exact cache precision Modeling energy costs in sensors • How should sensor state be managed to minimize energy consumption in maintaining data at required quality – Sensor State: error precision, power states • Power consumption of sensors Quasar Group Sensor state Radio mode Power consumption (mW) active Tx 14.88 listening Rx 12.50 listening idle 12.36 sleeping off 0.016 Energy Efficient Sensor State Management Active-Listening-Sleeping Model (ALS): sleeping After Tl without traffic listening Upon first sensor-initiated update Or after Ts Upon first sensor initiated update or probe active Ta after processing last sensor-initiated update or probe Other Models: Always-Active (AA) [Ta is infinite] Active-Listening (AL) [Tl is infinite] Active-Sleeping (AS) [Tl is 0] Quasar Group Issues in Energy Efficient Data Collection • Issues – How to maintain the precision range for each sensor • Larger increases possibility of expensive probes • Small wastes communication due to sensor-initiated updates – When to transition between sensor states (I.e, set Ta, Tl, Ts) • Powering down might not be optimal if we have to power up immediately • Powering down may increases query response time • Objective – set values for Ta, Tl, Ts, and that minimizes energy cost normalized energy cost= energy consumed at each state Quasar Group + state transition energy Addressing Accuracy/Energy Tradeoffs • We solve the energy optimization problem by solving two sub-problems – Optimize energy consumption by adjusting range size under the assumption that the state transition is fixed • I.e., Ta, Tl, and Ts have been optimally set – Optimize energy consumption by adapting sensor states while assuming that the precision range for sensor is fixed Quasar Group Range size Adjustment for the AA/AL Model • Optimal precision range that minimizes E occurs when – Optimal range can be realized by maintaining this probability ratio – Can be done at the sensor • Assuming that is the ratio of sensor-initiated update probability to probe probability: for sensor-initiated update: with probability min{,1}, set ’= (1+); for probe: with probability min{1/ ,1}, set ’=/(1+ ); Quasar Group Range Size Adjustment for the AS/ALS Model • Sensor side – Keep track of the number of state transitions of the last k updates – Piggyback the probability of state transitions with the Kth update • Server side – Keep track of the number of sensor-initiated updates and probes of the last k updates – Upon receiving the Kth update from the sensor • Compute the optimal precision range • Inform the sensor about the new Quasar Group Adaptive State Management • Consider the AS model for derivation of optimal Ta to minimize energy consumption – Assuming (t) is the probability of receiving a request at time instant t, the expected energy consumption for a single silent period is – E is minimized when Ta=0 if requests are uniformly distributed in interval [0, Ta+Ts]. • In practice, learn (t) at runtime and select Ta adaptively – Choose a window size w in advance – Keep track of the last w silent period lengths and summarizes this information in a histogram – Periodically use the histogram to generate a new Ta Quasar Group Adaptive State Management (Cont.) • ci : the number of silent periods for bin i among the last w silent periods • estimate by the distribution which generates a silent period of length ti with probability ci/w • Ta is chosen to be the value tm that minimizes the energy consumption as follows: c1 c0 cn-1 c2 bin 0 t0 Quasar Group bin 1 bin n-1 bin 2 t1 t2 t3 …… tn-1 tn=Ta+Ts System Performance Comparison Sensor Energy Consumption Comparison 800 16 normalized sensor energy consumption(uJ) average query respone time (us) Query Response Time Comparison 700 600 500 400 300 200 100 0 14 12 10 8 6 4 2 0 AA Quasar Group AL AS ALS AA AL AS ALS Impact of Ta adaptation on System Performance 840 820 800 780 760 740 720 700 Quasar Group Impact of Ta Selection on Sensor Energy Consumption normalized sensor energy consumption(uJ) average query response time(us) Impact of Ta Selection on Query Response Time static Ta(0) adaptive Ta 9 8 7 6 5 4 3 2 1 0 static Ta(0) adaptive Ta Impact of Range Size Adaptation on System Performance Impact of Range Size Adjustment on Query Response Time normalized sensor energy consumption(uJ) Impact of Range Size Adjustment on Sensor Energy Consumption average query response time (ms) 2500 2000 1500 1000 500 0 fixed(0) Quasar Group average accuracy constraint adaptive adjustment fixed(large) 0.05 0.04 0.03 0.02 0.01 0 fixed(0) average accuracy constraint adaptive adjustment fixed(large) Outline of this section • Sensor network architectures • Sensor application needs – Accuracy, timeliness, cost, reliability • Tasks of a middleware framework – Services that can be customized to address needs • Case studies – Accuracy/cost tradeoffs in collection – Accuracy/cost/timeliness tradeoffs in collection – Storage/accuracy tradeoffs in archival Quasar Group Accuracy/Cost/Timeliness Tradeoffs • Continuous stream of fast changing source data • Diverse user requirements in terms of data accuracy and service timeliness • Effective utilization of underlying computation, communication and storage resources Competing goals of Timeliness Accuracy Cost-effectiveness Quasar Group Real-time Communication for sensors • John A. Stankovic, Tarek Abdelzaher, Chenyang Lu, Lui Sha, Jennifer Hou, "Real-Time Communication and Coordination in Embedded Sensor Networks," Proceedings of the IEEE, 91(7): 1002-1022, July 2003. (invited paper) • SPEED: a stateless protocol (ICDCS’03) • RAP (RTAS’02) Quasar Group Real-time Data Processing • Supporting transaction timeliness and data freshness in databases – STRIP (STanford Real-time Information Processor) – ARCS (databases for Active Rapidly Changing data Systems) – QMF (QoS sensitive approach for Miss Ratio and Freshness guarantees) Quasar Group Modeling Application Timeliness Needs source value PREC (U , L) L 1 U L U timeliness requirements ( source ID, request issue time, periodicity, urgency, relative deadline ) + + current value (accuracy requirement, bias) source update request consumer request no preference 0 bias 1 favoring timelines s 2 favoring accuracy Quasar Group QoS as a metric of user satisfaction timeliness satisfaction = deadline is met: QoS TT RDL accuracy satisfaction = answer precision requirement is higher : PRECanswer PREC req & answer fidelity is 1 : 1 L V (cr.s, cr.t ) U Fidelity ( A, cr ) otherwise 0 cr RDL cr j 1, TTcr j timeliness cr j crjj Bias Bias timeliness cr j 1, TT cr j RDL cr j timeliness QoS (QoS for requests favoring timelines s) w QoS satisfaction QoS w111 satisfaction satisfaction cr 1 crjj || Bias Bias cr cr jj 1 cr cr Bias Bias 22,, Fidelity Fidelity((A A )) 11,, PREC PREC w w (QoS for requests favoring accuracy) j j cr j cr j cr j cr j ww2222 (QoS for requests favoring accuracy) cr crjj || Bias Biascrcrjj 22 Acr j Acr j PREC PRECcrcrjj accuracy accuracy satisfaction satisfaction cr j Bias cr j 0, TTcr j RDL cr j , PREC Acr j PRECcr j , Fidelity ( Acr j ) 1 w (QoS for 3 w (QoS for requests without bias) 3w forrequests requestswithout withoutbias) bias) w 33 (QoS crj | Bias cr j 0 Quasar Group timeliness & accuracy satisfaction Quality of Data Characterization DS Fidelity(DS vs. source value): fidelity of s at time instant t prob. of accessing a faithful s value during T aggregate DS fidelity 1 if L v U FI ds( s, t ) 0 otherwise DS Validity(DS vs. consumer needs): 1 if PREC ( L,U ) PRECcri VAds (cri ( s, t )) otherwise 0 k VAds ( s, t ) VA i 1 FI ( s, [ti , t j ]) 1 FI ds ( s, t )dt T ti paccess ( si ) p fi ( s, T ) AFI ds ( S , T ) si S paccess ( si ) FI ds ( si , T ) pva (s, T ) VAds aggregate DS validity (s, T ) VA(s, [t , t ]) i j ks tj u 1 v t i VAds (cru (s, tv )) ks paccess ( si ) pva ( s, T ) AVAds ( S , T ) si S Overall QoD: QoD AFI ds ( S , T ) AVAds ( S , T ) Quasar Group (cri ( s, t )) k p fi ( s, T ) FI ds ( s, T ) tj ds si S paccess ( si ) VAds ( si , T ) si S Objectives of real-time data collection • Given a set of sources S={s1,…,sl} and an Input instance I , which is a collection of m source update requests and n consumer requests I=SRCR={sr1,…,srm;cr1,…,crn}, our goal is to – Maximize QoS – Maximize QoD – Minimize Cost Quasar Group Joint optimization of QoS, QoD and Cost • Dynamicity – Highly dynamic system and network condition – Unpredictable application workload – Frequently changing information sources • Inter-relationship between QoS and QoD is not ? QoS straightforward: QoD – Prioritize source update requests • QoD deadline miss ratio QoS & missing opportunities – Prioritize consumer requests • QoS stale data QoD & making wrong decisions Quasar Group One approach • Frame the tradeoffs as two sub-problems – Manipulate QoS via a scheduling algorithm, assuming DS is well maintained (QoD) – Adjust QoD via a DS maintenance algorithm, assuming an efficient scheduling algorithm is applied (QoS) Quasar Group Design of the Information Mediator …… Information Mediator Information Consumer request consumer scheduler consumer request queue source update request source consumer-initiated source update request consumer request or source update request …… source update request queue request servicer feedback consumer-initiated probe source check value stored range DS …… …… answer Information Source consumer DS maintainer Quasar Group update source probe Design of the Scheduling Algorithm • Issues – Decide on an ordering of the incoming source update requests • The most recent update will be processed first – Decide on a relative ordering of source update and consumer requests Quasar Group Scheduling Strategies • CF (Consumer request First) • SF (Source update request First) • SU (Split Update) – Updates from popular data are assigned higher priority than consumer requests • OD (On-Demand Update) – Only when consumer requests encounter stale data, will the corresponding source update requests be applied Quasar Group Timeliness-Accuracy Balanced Scheduling (TABS) Assignment absolute deadline ADL=t+PER periodic requests: Processor utilization PER t UP time RDL np i 1 Ei min{ RDL i , PERi ) ADL=t+RDL request i aperiodic requests: U AP 1 U P ADLi-1 t RDL ADLi=max(t, ADLi-1)+Ei/UAP time Apply Earliest-Deadline-First TABS schedulability Given a set of np periodic requests with processor utilization UP , a TB server with processor utilization UAP , the whole set of task is schedulable if UP+UAP<=1. Quasar Group Minimized Cost Directory Service Maintenance (MC) • Analyze cost involved in the collection process • Range adjustment – Consumer-initiated update: shrink the range – Source-initiated update: curve fitting mw > mw-1: increase range size source value mw < mw-1: decrease range size fitted curve slope: mw-1 time w-1 w monitoring window Quasar Group Experiments • Performance metrics – QoS, QoD, Cost (the number of messages exchanged) – Efficiency of System EoS (QoS QoD/Cost) • Experiments – Evaluation of all the possible policy combination in terms of the overall EoS – Evaluation of system heterogeneity in terms of source capabilities and deadline variations – Evaluation of benefits by adding intelligence into each subcomponent of the mediator Quasar Group Benefits of Intelligent Policies 0.18 0.16 EoS 0.14 TABS+MC 0.12 0.1 TABS+SS 0.08 FCFS+SS 0.06 0.04 25 50 100 150 the num ber of sources The EoS is improved as more intelligence is added to each component • TABS ensure fairness among the requests • MC decreases the DS maintenance overhead Quasar Group Fusing Energy Efficient Data Collection and In-network Aggregation access point …… access point … • Issues – Hierarchical precision range adjustment – Cluster forming and dynamic maintenance Quasar Group … Value update -- 1 AP AP {212 -10, 212+10} C1: {200 -20, 200+20} C1: {212 -20, 212+20} 112 n1: {100 -10, 100+10} Quasar Group (a) n2: {100 -10, 100+10} n1: {112 -10, 112+10} (b) n2: {100 -10, 100+10} Value update -- 2 AP AP 224 C1: {224 -20, 224+20} C1: {200 -20, 200+20} 112 112 n1: {112 -10, 112+10} n2: {112 -10, 112+10} (c) Quasar Group 112 113.7 86.3 n1: 85 n2: {113.7 -10, 113.7+10} {86.3 -10, 86.3+10} (d) Error Adjustment • When? – (fmax - fmin)/fmax >= rth • How? – dfmax = a* dfmax +(1-a)*(dfmax + dfmin)*(fmax /(fmax + fmin)) – dfmin = a* dfmin +(1-a)*(dfmax + dfmin)*(fmin /(fmax + fmin)) Quasar Group Fault Tolerance Issues • Communication – Routing • SPIN: disseminate data to all the sensors • Braided Diffusion: maintain multiple braided paths as backup • GRAB (Gradient Broadcast): controlled mesh forwarding – Transport protocol • PSFQ (pump slowly, fetch quickly): store-and-forward, multihop forwarding • ESRT (event to sink reliable transmission): adjust source reporting frequency to avoid congestion and maintain enough reliability • RMST (reliable multi-segment transport): MAC layer • Storage – R-DCS (Resilient Data Centric Storage): store event data at the closest R replica nodes Quasar Group Outline of this section • Sensor network architectures • Sensor application needs – Accuracy, timeliness, cost, reliability • Tasks of a middleware framework – Services that can be customized to address needs • Case studies – Accuracy/cost tradeoffs in collection – Timeliness/accuracy/cost tradeoffs in collection – Storage/accuracy tradeoffs in archival Quasar Group Archiving Sensor Data • Often sensor-based applications are built with only the real-time utility of time series data. – Values at time instants <<n are discarded. • Archiving such data consists of maintaining the entire S sequence, or an approximation thereof. • Importance of archiving: – Discovering large-scale patterns – Once-only phenomena, e.g., earthquakes – Discovering “events” detected post facto by “rewinding” the time series – Future usage of data which may be not known while it is being collected Quasar Group Quality Sensitive Archival • Let P = < p[1], p[2], …, p[n] > be the sensor time series • Let S = < s[1], s[2], …, s[n] > be the server side representation • A within archive quality data archival protocol guarantees that error(p[i], s[i]) < archive • Trivial Solution: modify collection protocol to collect data at quality guarantee of min(archive , collect) – then data collection protocol described earlier will provide a archive quality data stream that can be archived. • Better solutions possible since – archived data not needed for immediate access by real-time or forecasting applications (such as monitoring, tracking) – compression can be used to reduce data transfer Quasar Group Addressing Cost/Quality Tradeoffs in Data Archival – Sample Protocol Sensor updates for data collection …p[n], p[n-1], .. Compressed representation for archiving compress Sensor memory buffer processing at sensor exploited to reduce communication cost and hence battery drain • Sensors compresses observed time series p[1:n] and sends a lossy compression to the server • At time n : – p[1:n-nlag] is at the server in compressed form s’ [1:n-nlag] withinarchive – s[n-nlag+1:n] is estimated via a predictive model (M, ) • collection protocol guarantees that this remains within- collect – s[n+1:] can be predicted but its quality is not guaranteed • it is in the future and thus the sensor has not observed these values Quasar Group Piecewise Constant Approximation (PCA) • Given a time series Sn = s[1:n] a piecewise constant approximation of it is a sequence PCA(Sn) = < (ci, ei) > that allows us to estimate s[j] as: scapt [j] = ci if j in [ei-1+1, ei] = c1 if j<e1 Value c1 c3 c2 e1 Quasar Group e2 c4 Time e3 e4 Online Compression using PCA • Goal: Given stream of sensor values, generate a within-archive PCA representation of a time series • Approach (PMC-midrange) – Maintain m, M as the minimum/maximum values of observed samples since last segment – On processing p[n], update m and M if needed • if M - m > 2archive , output a segment ((m+M )/2, n) 6 Value Example: archive = 1.5 4 3 2.5 2 Time 1 Quasar Group 2 3 4 5 Online Compression using PCA • PMC-MR … – guarantees that each segment compresses the corresponding time series segment to within-archive – requires O(1) storage – is instance optimal • no other PCA representation with fewer segments can meet the within-archive constraint • Variant of PMC-MR – PMC-MEAN, which takes the mean of the samples seen thus far instead of mid range. Quasar Group Improving PMC using Prediction • Observation – Prediction models guarantee a within- collect version of the time series at server even before the compressed time series arrives from the producer. • Can the prediction model be exploited to reduce the overhead of compression. – If archive> collect no additional effort is required for archival --> simply archive the predicted model. • Approach: – Define an error time series E[i] = p[i]-spred[i] – Compress E[1:n] to within-archive instead of compressing p[1:n] – The archive contains the prediction parameters and the compressed error time series – Within-archive of E[I] + (M, Quasar Group archive version of p ) can be used to reconstruct a within- Combing Compression and Prediction (Example) 25 30 25 Predicted Time Series 20 15 20 Compressed Time Series 15 (7 segments) Actual Time Series 10 Actual Time Series 10 5 5 0 0 -5 0 0 10 20 30 40 50 60 Actual – Predicted 0.5 0 -0.5 -1 -1.5 Compressed Error -2.5 -3 -3.5 -4 Quasar Group 20 Error = 1 -2 10 -5 (2 segments) 30 40 50 60 Estimating Time Series Values • Historical samples (before n-nlag) is maintained at the server withinarchive • Recent samples (between n-nlag+1 and n) is maintained by the sensor and predicted at the server. • If an application requires q precision, then: – if q collect then it must wait for time in case a parameter refresh is en route – if q archive but q < collect then it may probe the sensor or wait for a compressed segment – Otherwise only probing meets precision • For future samples (after n) immediate probing not available as an option Quasar Group Distributed Computing Infrastructure for Sensors • Designing Distributed Architectures for Sensor Networks – Server oriented -- data migrates to server from sensors • Store or not store (stream) • Useful for all types of applications -- archival, analysis, monitoring • When should data migrate -- periodically, application qualitybased way based on application (quasar approach ) • should data migrate in its original raw form or in some aggregated form. – Distributed approach • Data does not migrate to any single server but remains in the sensor network. Queries migrate from the server to the network • Tiny DB approach, dimension Approach • Real-time • Fault tolerance Quasar Group Part 3: Query Processing in Sensor Applications Quasar Group Outline • Need for a declarative query language for sensor applications • Query Taxonomy • Issues impacting sensor query processing – Sensor database research landscape • Sample query Processing techniques Quasar Group Programming Sensor Nets Is Hard • Applications must be “energy aware” – Naive implementations may result in battery drain in days while careful programming may conserve power for months • interleave sleep with processing and transmission – Recharging battery frequently not feasible • Lossy, multi-hop, low-bandwidth, short range communication High-Level Abstraction Is – 20% loss @ 5m Needed!for communication – often desirable to trade computation – 200-800 instructions per bit transmitted!! – applications must be “network aware” • Highly distributed environments • Once deployed, applications cannot be easily administered • Limited development and debugging tools Quasar Group Declarative Queries • Users specify the data they want – Simple, SQL-like queries – Using predicates, not specific addresses • Challenge is to provide: – Expressive & easy-to-use interface – High-level operators • Well-defined interactions • “Transparent Optimizations” that many programmers would miss – Sensor-net specific techniques – Power efficient execution framework Quasar Group Database View of Sensor Data time • Sensors viewed as a single table – Columns are sensor data – Rows are individual sensors • Sensors table is an unbounded, continuous data stream – Operations such as sort and symmetric join are not allowed on streams – They are allowed on bounded subsets of the stream (windows) • SQL (with minor extensions) can be used as a declarative query language Quasar Group Nodeid Location value 0 1 17 455 0 2 25 389 1 1 17 422 1 2 25 405 SELECT nodeid, nestNo, light FROM sensors WHERE light > 400 “Find the sensors in bright nests.” Taxonomy of Queries • Query Generality – Simple selection, aggregation, full-blown SQL • Continuous queries – query evaluated continuously on sensor data streams – Issues: • How long – For a specified period, for lifetime of sensor • how often – adaptive rate (based on load/utility/value), fixed rate • Event based queries Quasar Group Aggregation Queries 2 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s “Count the number occupied nests in each loud region of the island.” Epoch 3 SELECT region, CNT(occupied) region CNT(…) AVG(…) 0 North 3 360 FROM sensors 0 South 3 520 GROUP BY region 1 North 3 370 HAVING AVG(sound) > 200 1 South 3 520 AVG(sound) EPOCH DURATION 10s Quasar Group Regions w/ AVG(sound) > 200 General SQL Query General: Is there anyone in the building? Value>10dB Value>10lm Join RoomID = RoomID SELECT roomid FROM lightsensors as L, soundsensors as S WHERE L.roomid = S.roomid Quasar Group Event-Based Queries • An alternative to continuous polling for data • Example ON EVENT bird-detector(loc): SELECT AVG(light), AVG(temp), event.loc FROM sensors AS s WHERE dist(s.loc, event.loc) < 10m SAMPLE INTERVAL 2s FOR 30s Quasar Group Lifetime Queries • Lifetime query SELECT … LIFETIME 30 days SELECT … LIFETIME 10 days Estimate sampling rate that achieves this May not be able to transmit all the data MIN SAMPLE INTERVAL 1s Quasar Group Adapted from slides ©Sam Madden Processing Lifetimes: Issues • Provide formulas for estimating power consumption: set maximum per-node sampling rates • What makes this difficult? – multiple sensing types (temp, accel) with different drain – estimating the selectivity of predicates – • amount transmitted by a node varies widely – root is a bottleneck: all nodes rates must correspond to it – aggregation vs. sending individual values – conditions change: multiple queries, burstiness, message losses • What to do when can’t transmit all the data Quasar Group Adapted from slides ©Sam Madden Issues impacting Query Processing • Where Does data resides? – sensor/server • Where does the query originate? – sensor/server • Where should the results be delivered? – sensor/server • How is data represented? – Continuous data streams require unbounded storage • Represent data as a synopses (spatial/temporal aggregation) – Sliding Windows, Samples, Sketches, Histograms, Wavelet representation – Precise / approximate representation • with or without error guarantees • guarantees can be deterministic or probabilistic Quasar Group Sensor Database Research Landscape Type of query •Aggregation •selection •General SQL •continuous •Event-based Query Evaluation •At server •In network •At both server and network Quasar Group Data representation •precise representation •Approximate value •Specified spatial/temporal resolution Data & Query Location •server •Sensor network Classification of Query Processing Techniques (1) • Data and query @ server – Data Stream Model • Data streams from data sources to servers • server maintains a synopses • continuous queries at server Quasar Group Stream Data Management synopsis in memory data streams • • sliding window, Sketches, histograms, wavelets, sampling Deals with problems due to dynamic data on the server side But – – – • at input: sampling at server: if load exceeds capacity Continuous queries evaluated against streaming data at sensor Data represented as a synopses – • • (Approximate) Answer Data streams through the server Load shedding – – • • stream processing engine continuous queries Does not converse sensor resources (e.g., power) Does not exploit the storage and processing capabilities of sensors Geared towards continuous monitoring and not archival applications Examples:Aurora (Brown/MIT), Streams (Stanford), Hancock (AT&T), OpenCQ (Georgia) Tapestry (Xerox), Telegraph (Berkeley), ... Quasar Group Classification of Query Processing Techniques (1) • Data and query @ server – Data Stream Model • • • • Data streams from data sources to servers server maintains a synopses continuous queries at server Examples:Aurora (Brown/MIT), Streams (Stanford), Hancock (AT&T), OpenCQ (Georgia) Tapestry (Xerox), Telegraph (Berkeley), … – Quality-Aware Query answering • quality aware data collection at the server – attempts to minimize communication/energy consumption in network during data collection • Applications/ Queries have quality tolerance – query tolerance converted to data quality requirement • If query’s error tolerance met by data at server, query computed @ server • Else, either more accurate data brought to server, or servers and sensors collaborate to answer query • Error tolerance of applications exploited for minimizing resource utilization • Examples: Quasar (UCI), TRAP (Stanford). – Quasar exploits in-network processing when query cannot be answered at server Quasar Group Classification of Query Processing Techniques (2) • In network query processing – Query originates and results needed at base station • Two steps: – Push query to sensor network – gather results • Trades computation to reduce communication among sensors. • Examples: TinyDB (Berkeley), Cougar (Cornell) – Query originates and results required anywhere in network • Distributed query processing within sensor network • Example: SURGE (UCI), research @ UCLA Quasar Group Quality Aware Queries (QaQ) query Q1 (A1) sensor-initiated update (sensor time series: …p[n], p[n-1], …, p[1]) query Qm (Am) … probe sensor si • i=[li,ui] Probe result Data represented at server at a given error tolerance – Actual sensor values: Pi = pi[1], pi[2], …, pi[n]…. for sensor i – Server representation: Si = si[1], si[2], … si[n] …. for sensor I – Error guarantee: for all I, j error(pi[j], si[j]) < i for a given value of i • Queries have an associated level of error tolerance. • If query quality tolerance satisfied at server (more than ) – • • Answer query at the server Else – Probe the sensor – Sensor guaranteed to respond within a bounded time Approach guarantees quality tolerance of queries Quasar Group Imprecise data representation Overview of QaQ Processing Research • Mapping application quality requirement to data quality requirements – – • Quality-based data collection – – – – – • Target Tracking using acoustic sensors [MW ‘03] Spatial range queries [DEXA ‘03] General framework [DS Online ‘03] To support monitoring queries over current data [Qi+03] For sensor data archival [ICDE ‘03] With real-time constraints [RTSS ‘03] With support for in-network aggregation [Yu+03] Quality-cognizant query processing – – – – Quasar Group Aggregation queries [Quasar-1, Trap-1, Trap-2] Continuous aggregation queries [Trap-3] Selection Queries [ICDE ‘04] General SQL queries (open problem) QaQ Selection: Problem Definition • There is a collection T of imprecise objects – E.g., { [1,3], [2,5], [4,9] } represents {2, 3, 5} • The query is: “Retrieve objects from T which satisfy predicate ” – The query specifies quality requirements – The system must return some approximate result that meets the quality requirements and with minimum overall cost. Quasar Group Impact of Data Imprecision Selection b a c d e f Imprecise Object o • Objects are classified as: – a is a NO object – b, f are MAYBE objects – c, d, e are YES objects • The exact set is E = { b, c, d, e} Quasar Group Precise Object o can be retrieved with a probe Defining Quality Selection a b c d e • Measures the accuracy of an Approximate answer A • Set-based Quality – Precision: p = |A E | / | A |. • E.g., p = 4/5 (if b, c, d, e, f returned as answers) – Recall: r = | A E | / | E |. • E.g., r = 4/4 = 1 (if b, c, d, e, f returned as answers) • Value-based Quality – Laxity of an object is l (o ). E.g., l ([2,3]) = 3-2=1 – Laxity of A is l max = max xA l (x) • Query specifies upper bounds pq, rq, lmaxq Quasar Group f Evaluating QaQ Selection Operator Read Object MAYBE YES NO • Probe • Forward • Ignore • Probe • Forward • Ignore •Another possibility is to store the object and deal with it later •Might be good under certain situations based on available memory at the server Quasar Group The Decision Problem • How should the QaQ selection operator decide – When to probe – When to forward – When to ignore • Objective: – Meet query quality requirement – Minimize cost Quasar Group Constraints on the Decision • Some decisions are fixed -- we have no choice! • No objects with l(o) greater than the query tolerance lqmax must be forwarded • The precision guarantee pG must never be less than the query tolerance pq – If no new YES objects are seen might lead to pq violation • If |A Y | / (|Y |+|Ms-A|) is less than the query tolerance rq you can’t ignore an object – This might lead to an rq violation if no new YES objects are seen Quasar Group Two Naïve Approaches • Two simple heuristics: – STINGY avoids probes: it ignores MAYBE objects and objects exceeding the lqmax threshold. • STINGY is conservative, but sometimes it is forced to probe to meet the quality guarantees. – GREEDY forwards all MAYBE objects and probes all objects that exceed the lqmax threshold. • GREEDY tries to produce the result quickly by not ignoring objects, but sometimes it uses too many probes and forwards too many objects Quasar Group Impact of Probe, Forward, Ignore actions to quality • + increase, - decrease, = remains the same Quasar Group The “decision” Plane (ICDE 2004) No Maybe Laxity l(o) 1 2 Yes 3 6 or ignore Probe Ignore Probe with probability ppy s5 4 s3 Forward with probability pfm lqmax 5 7 Probe Forward or ignore s(o)=0 0<s(o)<1 s(o)=1 s(o): probability of a MAYBE object satisfying the selection Quasar Group The Optimization Problem • Free parameters ppy, s3, s5 , pfm • Estimate: – Number of YES/MAYBE/NO objects – Number of YES/MAYBE objects exceeding the lqmax threshold – Distribution of s (o ) • Minimize cost W in parameter space (ppy, s3 , s5 , pfm) subject to Precision, Recall, Laxity guarantees Quasar Group Query Aware Query Processing (Review) • Quality aware data collection • Queries have error tolerance • QaQ query processing optimizes resource consumption while ensuring query quality requirement. • A Dual problem: – optimize quality given resource constraints • Aurora Stream Processing system explores such an approach Quasar Group AURORA in the Sensor Database Landscape Data representation •time sampled Type of query •continuous Query Evaluation •At server Quasar Group Quasar Group Data & Query Location •server Aurora System Model • Input Streams are unpredictable – • The Output Streams must be useful to applications. – • Specified by Quality of Service (QoS) The Goal: shed load intelligently so that – – Quasar Group If system processing capacity is reached load must be dropped by invoking the Load Shedder system operates within processing capacity QoS of output streams maximized Quality of Service Types of QoS Latency Shows utility drop as answers take longer to achieve (Handled by Scheduler) Value-based Value-based QoS utility 1.0 0.4 Shows which output values are most important (Handled by Load Shedder) Loss-tolerance Shows how approximate answers affect a query (Handled by Load Shedder) Quasar Group 0 80 120 200 values Loss-tolerance QoS utility 1.0 0.7 100 50 0 % delivery Key Questions how is load measured? Via static load coefficients and dynamic monitoring of stream rates when to shed load? When processing capacity does not suffice for handling the system load where to shed load? In which segments of the query processing graph? how much load to shed? What fraction of tuples will be discarded? which tuples to drop? Do tuple values affect the decision of whether to drop them or not? Quasar Group How to Measure Load: Load Coefficients I c1 s1 c2 s2 … cn sn O Load Coefficients (L) the number of processor cycles required to push a single tuple through the network to the outputs • n operators • ci = cost • si = selectivity Total Load (Load) Depends on load coefficients Li and input stream rates Quasar Group • m input streams • ri = stream rate Load = Load Coefficient (Example) L2 = 14 2 c2 = 10 s2 = 0.8 I L(I) = 22 L1 = 22 1 c1 = 10 s1 = 0.5 L3 = 5 3 cn = 5 sn = 1.0 L4 = 10 4 c2 = 10 s2 = 0.9 O1 O2 L1 = 10 + (0.5 * 10) + (0.5 * 0.8 * 5) + (0.5 * 10) = 22 L2 = 10 + (0.8 * 5) = 14 Quasar Group When to Shed Load N: network I: input streams C: processing capacity Shed load when: Load(N(I)) > C Quasar Group How to Shed Load: Drop Tuples Modify N into N’ by inserting “drop” operators, such that: Load(N’(I)) < H * C U σ π π σ Random Drop Quasar Group σ QoS QoS Semantic Drop Drop k% Filter P(value) Drop tuples randomly Drop tuples based on the utility of their value Where to Shed Load 2 1 3 Usually at the inputs, but Placing a drop in 1 relieves all three operators QoS of both output streams is affected Quasar Group Random Drops Greedy approach: Order drop locations in ascending Loss/Gain ratios Insert drops in location with the minimum Loss/Gain ratio first; repeat until enough capacity has been retrieved The amount of the drop is in increments of STEP_SIZE The drop operator has a cost: inserting a drop for <STEP_SIZE does not retrieve any processing capacity! Quasar Group Semantic Drops Greedy approach: Each value interval has a frequency fi and a utility ui Start dropping from the interval with minimum ui First drop from interval with utility 0.2 and relative frequency 0.4 You can drop at most 40% of the tuples using the first interval If this suffices, drop as many as needed Else, choose the interval with next minimum ui Quasar Group In network Query Processing • Two steps: – Query Dissemination • Exploit broadcast based routing to disseminate query to sensors – Query execution and Result accumulation • Gather and compute results in network en-route to the root (base station) • Plusses – In network computation reduces periodic communication of raw results. – Trades computation for communication – a very worthwhile goal for sensor nets • 1 bit communication approx. equivalent to 800 instructions! • Minuses – Query dissemination and execution synchronization overheads. • Benefit must exceed cost! – Applicable only when sensor data does not need to be archived. – Scalability to really large networks not studied. • Examples – TinyDB (Berkeley) • TAG – in-network aggregation • AQP – in network SQL – SURGE (UCI) • distributed in-network aggregation Quasar Group Query Propagation in TAG Broadcast based communication SELECT COUNT(*)… Comm. Slot 1 Epoch 2 3 4 5 Quasar Group Basic Aggregation • In each epoch: – Each node samples local sensors once – Generates partial state record (PSR) • local readings • readings from children – Outputs PSR during its comm. slot. • At end of epoch, PSR for whole network output at root • Many optimizations possible – grouping, pipelining Quasar Group 1 2 3 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 1 2 3 Slot 1 1 4 5 1 2 3 Slot # 2 3 4 1 Quasar Group 4 1 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 Slot # 3 1 4 1 2 Slot 2 5 1 2 3 2 2 4 4 1 Quasar Group 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 1 5 1 2 Slot # 1 4 1 3 Slot 3 3 2 3 2 1 3 4 4 1 Quasar Group 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 Slot # Quasar Group 2 3 2 3 1 5 1 2 Slot 4 1 4 1 4 5 1 3 4 5 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 Slot # 2 3 2 3 Quasar Group 5 1 2 1 1 4 1 4 Slot 1 1 3 4 5 1 1 5 Aggregation Framework • As in extensible databases, TAG support any aggregation function conforming to: Aggn={finit, fmerge, fevaluate} finit{a0} <a0> Partial State Record (PSR) Fmerge{<a1>,<a2>} <a12> Fevaluate{<a1>} aggregate value (Merge associative, commutative!) Example: Average AVGinit {v} <v,1> AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2> AVGevaluate{<S, C>} S/C Quasar Group Types of Aggregates • SQL supports MIN, MAX, SUM, COUNT, AVERAGE • Any function can be computed via TAG • In network benefit for many operations – E.g. Standard deviation, top/bottom N, spatial union/intersection, histograms, etc. – Compactness of PSR Quasar Group Taxonomy of Aggregates • TAG insight: classify aggregates according to various functional properties – Yields a general set of optimizations that can automatically be applied Property Examples Affects Partial State MEDIAN : unbounded, MAX : 1 record Effectiveness of TAG Duplicate Sensitivity MIN : dup. insensitive, AVG : dup. sensitive Routing Redundancy Exemplary vs. Summary MAX : exemplary COUNT: summary Applicability of Sampling, Effect of Loss Monotonic COUNT : monotonic AVG : non-monotonic Hypothesis Testing, Snooping Quasar Group TAG Advantages • Communication Reduction – Important for power and contention • Continuous stream of results – Smooth transient faults across epochs • Lots of optimizations – Via operator semantics Quasar Group Simulation Environment • Evaluated via simulation • Coarse grained event based simulator – Sensors arranged on a grid – Two communication models • Lossless: All neighbors hear all messages • Lossy: Messages lost with probability that increases with distance Quasar Group Benefit of In-Network Processing Simulation Results Total Bytes Xmitted vs. Aggregation Function 2500 Nodes 50x50 Grid 100000 Neighbors = ~20 Total Bytes Xmitted Depth = ~10 90000 80000 Some aggregates require dramatically more state! 70000 60000 50000 40000 30000 20000 10000 0 EXTERNAL Quasar Group MAX AVERAGE Aggregation Function COUNT MEDIAN Processing in Network SQL Processing (Berkeley) • Query Disseminated to sensors • Results gathered en-route to the root (base station) • Issues: – How should the query be processed? • Sampling as an operator, Power-optimal ordering • Frequent events as joins – Which nodes have relevant data? • Semantic Routing Tree for effective pruning – Nodes that are queried together route together – Which samples should be transmitted? • Pick most “valuable”? • Adaptive transmission & sampling rates Quasar Group Power-Optimal Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) SAMPLE INTERVAL 1s • Energy cost of sampling mag >> cost of sampling light 1500 uJ vs. 90 uJ • Correct ordering (unless pred1 is very selective): 1. Sample light Sample mag Apply pred1 Apply pred2 2. Sample light Apply pred2 Sample mag Apply pred1 3. Sample mag Apply pred1 Sample light Apply pred2 Quasar Group Adapted from slides ©Sam Madden Attribute Driven Topology Selection • Observation: internal queries often over local area – Or some other subset of the network • E.g. regions with light value in [10,20] • Idea: build topology for those queries based on values of range-selected attributes – For range queries – Relatively static trees • Maintenance Cost Quasar Group Adapted from slides ©Sam Madden Attribute Driven Query Propagation SELECT … WHERE a > 5 AND a < 12 4 [1,10] [20,40] Precomputed intervals = Semantic Routing Tree (SRT) [7,15] 1 2 3 Early pruning Quasar Group Adapted from slides ©Sam Madden Attribute Driven Parent Selection 1 2 [1,10] 3 [7,15] [20,40] Even without intervals, expect that sending to parent with closest value will help [3,6] [1,10] = [3,6] 4 [3,6] [7,15] = ø [3,6] [3,6] [20,40] = ø Quasar Group Adapted from slides ©Sam Madden # of Nodes Visited (400 = Max) Simulation Result Nodes Visited vs. Query Range 450 400 350 300 250 Best Case (Expected) Closest Parent Random Parent Nearest Value Snooping 200 150 100 50 0 0.001 0.05 0.1 0.2 0.5 Query Size as % of Value Range 1 (Random value distribution, 20x20 grid, ideal connectivity to (8) neighbors) Quasar Group Adapted from slides ©Sam Madden Acquisitional Query Processing • How should the query be processed? – Sampling as an operator, Power-optimal ordering – Frequent events as joins • Which nodes have relevant data? – Semantic Routing Tree for effective pruning • Nodes that are queried together route together • Which samples should be transmitted? – Pick most “valuable”? – Adaptive transmission & sampling rates Quasar Group Adapted from slides ©Sam Madden Adaptive Transmission Rates Sample Rate vs. Delivery Rate Aggregate Delivery Rate (Packets/Second) 8 Adaptive = 2x % Successful Xmissions 7 6 5 4 3 1 mote 4 motes 4 motes, adaptive 2 1 0 0 2 4 6 8 10 12 Samples Per Second (Per Mote) 14 16 TinyDB monitors channel contention & backs-off as needed Quasar Group Adapted from slides ©Sam Madden Prioritizing Data Delivery • Score each item • Send largest score – Out of order -> Priority Queue • Discard or aggregate when buffer is full [1,2] Quasar Group Adapted from slides ©Sam Madden Choosing Data To Send Delta encoding Time vs. Value 16 14 [1,2] Value (time, value) 12 10 8 6 4 2 0 1 2 3 4 Time Quasar Group Adapted from slides ©Sam Madden Choosing Data To Send Delta encoding Time vs. Value 16 14 Value 12 [1,2] 10 8 6 4 2 0 1 2 3 4 Time |2-15| = 13 [2,6] Quasar Group |2-6| = 4 [3,15] [4,1] Select which of the 3 to send |2-4| = 2 Adapted from slides ©Sam Madden Choosing Data To Send Delta encoding Time vs. Value 16 14 Value 12 [1,2] [3,15] 10 8 6 4 2 0 1 2 3 4 Time [2,6] Quasar Group |2-6| = 4 [4,1] Keep selecting until hit max delivery rate |15-4| = 11 Adapted from slides ©Sam Madden Choosing Data To Send Delta encoding Time vs. Value 16 14 Value 12 [1,2] [3,15] [4,1] 10 8 6 4 2 0 1 2 3 4 Time [2,6] Quasar Group Adapted from slides ©Sam Madden Choosing Data To Send Delta encoding Time vs. Value 16 14 Value 12 [1,2] [2,6] [3,15] [4,1] 10 8 6 4 2 0 1 2 3 4 Time If manage to send all Quasar Group Adapted from slides ©Sam Madden Delta + Adaptivity • 8 element queue • 4 motes transmitting different signals • 8 samples /sec / mote Quasar Group Adapted from slides ©Sam Madden SURCH in the Sensor Database Landscape http://www.ics.uci.edu/~quasar Data representation •Precise Type of query •ad hoc aggregation Query Evaluation •In network •distributed Quasar Group Data & Query Location •At sensors SURCH Query Processing • SURCH Query: ON EVENT e SELECT Attributes or Aggregates FROM Sensors S WHERE S.loc є Region DESTINATION nodeID • Event based Query UPON Predicate – may initiate at any node in network • Results accumulated at a specified destination • Region specifies selection on sensors • In network (fully distributed) query processing Quasar Group SURCH Query Processing • Three Phases – Neighborhood discovery • broadcast based communication – Query Propagation • a sensor propagates if its neighborhood contains sensors to which query not yet propagated – Capture Partial results and route to destination • a node holds partial results if it contains aggregate values that are not broadcasted further destination result1 r1 initiator1 Q generator Q Quasar Group initiator2 result2 r2 Neighborhood Discovery nn1 re-broadcast nn2 ns broadcast response nnk – A node ns broadcasts query(e.g. MAX) and current result to all neighbors. – Neighbor nni responds with its value vni after waiting for a time period (TTR) based on fitness of value • node having data with highest “fitness” value responds first. – If partial results change, immediate rebroadcast by ns to neighbors • high likelihood that all neighbors learn the new MAX even without responding Quasar Group Query Propagation • 1-Dimensional illustration for a MAX query • ni initiates a query value 1 radio range ni Quasar Group sensors Query Propagation • 1-Dimensional illustration for a MAX query • ni initiates a query value 2 1 2 radio range ni Quasar Group sensors Query Propagation • 1-Dimensional illustration for a MAX query • ni initiates a query • nr1 and nr2 hold partial results. value 6 5 4 3 2 1 2 3 radio range nr2 nr1 Quasar Group ni sensors Capture Partial Results • Who have the partial results? – Nodes whose results are not propagated further • boundary of the query region • irregular propagation frontier – detected by remembering if any neighbor propagates the query at next level. • The partial results will be sent to a destination node for final processing. Quasar Group Issue in Query Propagation • Which nodes should broadcast query in network? • Choose the broadcasting nodes based on optimization goals: – minimal overall cost • minimum number of broadcasting nodes • minimum size connected dominating set – maximum network lifetime (uniform workload) • take into account energy level of individual node. • Heuristics to achieve optimization goals – minimal overall cost • choose based on number of undiscovered neighbors – maximize lifetime Quasar Group • battery threshold Simulation Results • SURCH is very efficient at processing queries that do not need response from every node: Quasar Group Summary of Query Processing • Queries provide an expressive and easy to use interface for programming sensors – Rapid application development – Transparent optimization • Application writers can focus on the application logic and not how to optimize it for sensor networks • Query processing in sensor networks a difficult challenge • Highly dynamic data, Energy/power constraints, Lossy, low bandwidth broadcast based communication – Standard approach of layering and isolating functionality into relatively independent software components will not work. OS, middleware, network, queries will require to be co-optimized • Issues in query processing – Where data resides, how is data represented, where queries are initiated, where results need to be delivered, where queries are processed Quasar Group Future Work in Query Processing in Sensor Databases • A rich sensor database research landscape – No clear winners yet • Many important open issues – A formal semantics of query language – A scalable architecture for sensor data gathering and query processing – Fault-tolerance and real-time constraints in query processing – Integrating sensor data (and queries) with • other sensor data (sensor data fusion) • Other relational information – XML and its role in sensor data Quasar Group Summary • Sensor networks present a very wide range of system optimization opportunities for power, application quality and performance • Energy efficiency is a system level concern that cuts across subsystem components, functionality layers and its implementations • Key components – – – – – Low power sensor microarchitectures Careful partitioning of functionality in distributed sensor network architecture Energy aware operating systems Query driven sensor data management dynamic power management that coordinates capabilities against application needs • Real-time, fault-tolerance, application quality needs – energy efficient communications and networking • energy aware MAC, routing, transport Quasar Group Questions?? Quasar Group