Database Middleware for Sensor Networks Sam Madden Assistant Professor, MIT madden@csail.mit.edu Slides prepared with Wei Hong 1 Motivation • Sensor networks (aka sensor webs, emnets) are here – Several widely deployed HW/SW platforms • Low power radio, small processor, RAM/Flash – Variety of (novel) applications: scientific, industrial, commercial – Great platform for mobile + ubicomp experimentation • Real, hard research problems to be solved – Networking, systems, languages, databases – Central problem: ease of access, appropriate programming abstractions I will summarize: – Low-level sensornet issues – A particular middleware architecture: • TinyDB + TASK Berkeley Mote – Current and future research middleware ideas 2 Some Sensornet Apps redwood forest microclimate monitoring smart cooling in data centers http://www.hpl.hp.com/research/dca/smart_cooling/ condition-based maintenance structural integrity And More… • Homeland security • Container monitoring • Mobile environmental apps • Bird tracking • Zebranet • Home automation • Etc! Architectural Overview External Tools Client Tools GUIs,etc Middleware Internet Stable Store (DBMS) Directed Diffusion COUGAR Field Tools Local Servers Sensor Network TinyDB Middleware Issues: APIs for current + historical access? Which data when? How to act on data? 4 Network and node status? Declarative Queries • Programming Apps is Hard – – – – – Limited power budget Lossy, low bandwidth communication Require long-lived, zero admin deployments Distributed Algorithms Limited tools, debugging interfaces • Queries abstract away much of the complexity – Burden on the database developers – Users get: • Safe, optimizable programs • Freedom to think about apps instead of details 5 TinyDB: Declarative Query Interface to Sensornets • Platform: Berkeley Motes + TinyOS • Continuous variant of SQL : TinySQL • Power and data-acquisition based innetwork optimization framework • Extensible interface for aggregates, new types of sensors 6 Agenda • Part 1 : Sensor Networks (40 mins) – TinyOS – NesC • Part 2: TinyDB + TASK (50 mins) – Data Model and Query Language – Software Architecture • 30 minute break • Part 3: Alternative Middleware (1:30 mins) Architectures + Research Directions • Finish around 12 7 Part 1 • Sensornet Background • Motes + Mote Hardware – TinyOS – Programming Model + NesC • TinyOS Architecture – Major Software Subsystems – Networking Services 8 Sensor Networks: a hot topic • New university courses • New conferences – ACM SenSys, IEEE IPSN, etc. • New industrial research lab projects – Intel, PARC, MSR, HP, Accenture, etc. • Startup companies – Crossbow, Dust, Ember, Sensicast, Moteiv, etc. • Media Buzz – Over 30 news articles since July 2002 covering IntelBerkeley/UC Berkeley sensor network activities – One of 10 emerging technologies that will change the world – MIT Technology Review 9 Why Now? • Commoditization of radio hardware – Cellular and cordless phones, wireless communication • Low cost -> many/tiny -> new applications! • Real application for ad-hoc network research from the late 90’s • Coming together of EE + CS communities 11 Motes Mica2Dot uProc: 4Mhz, 8 bit Atmel RISC Mica Mote Radio: 40 kbit 900/450/300 MHz or 250 kbit 2.5GHz (MicaZ 802.15.4) Memory: 4 K RAM / 128 K Program Flash / 512 K Data Flash Power: 2 x AA or coin cell iMote Telos Mote QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. uProc: 8Mhz, 16 bit TI RISC Radio: 250 kbit 2.5GHz (802.15.4) Memory: 2 K RAM / 60 K Program Flash / 512 K Data Flash Power: 2 x AA uProc: 12Mhz, 16 bit ARM Radio: Bluetooth Memory: 64K SRAM / 512 K Data Flash Power: 2 x AA 12 History of Motes • Initial research goal wasn’t hardware – Has since become more of a priority with emerging hardware needs, e.g.: • Power consumption • (Ultrasonic) ranging + localization – MIT Cricket, NEST Project • Connectivity with diverse sensors – UCLA sensor board – Even so, now on the 5th generation of devices • • • • Costs down to ~$50/node (Moteiv, Dust) Greatly improved radio quality Multitude of interfaces: USB, Ethernet, CF, etc. Variety of form factors, packages 13 Motes vs. Traditional Computing • • • • Embedded OS Lossy, Adhoc Radio Communication Sensing Hardware Severe Power Constraints 14 NesC/TinyOS • NesC: a C dialect for embedded programming – Components, “wired together” – Quick commands and asynch events • TinyOS: a set of NesC components – hardware components – ad-hoc network formation & maintenance – time synchronization Think of the pair as a programming environment Radio Communication • Low Bandwidth Shared Radio Channel – ~40kBits on motes – Much less in practice • Encoding, Contention for Media Access (MAC) • Very lossy: 30% base loss rate – Argues against TCP-like end-to-end retransmission • And for link-layer retries • Generally, not well behaved 16 From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013 Types of Sensors • Sensors attach via daughtercard •Weather –Temperature –Light x 2 (high intensity PAR, low intensity, full spectrum) –Air Pressure –Humidity •Vibration –2 or 3 axis accelerometers •Tracking –Microphone (for ranging and acoustic signatures) –Magnetometer • GPS • RFID Reader 17 Non-Volatile Storage • EEPROM – – – – 512K off chip, 32K on chip Writes at disk speeds, reads at RAM speeds Interface : random access, read/write 256 byte pages Maximum throughput ~10Kbytes / second • MatchBox Filing System – Provides a Unix-like file I/O interface – Single, flat directory – Only one file being read/written at a time 18 Power Consumption and Lifetime • Power typically supplied by a small battery – 1000-2000 mAH – 1 mAH = 1 milliamp current for 1 hour • Typically at optimum voltage, current drain rates – Power = Watts (W) = Amps (A) * Volts (V) – Energy = Joules (J) = W * time • Lifetime, power consumption varies by application – Processor: 5mA active, 1 mA idle, 5 uA sleeping – Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet – Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample 19 Energy Usage in A Typical Data Collection Scenario Percentage of Total Energy Percentage of Total Power • Each mote collects 1 sample of Processor (light,humidity) Power Consumption Breakdown Energy Breakdown data every 10 seconds, forwards it 50 • 90 Each mote can “hear” 10 other motes 45 80 40 • 70 Process: 35 60– Wake up, collect samples (~ 1 second) 30 50– Listen to radio for messages to25 forward (~1 second) 20 40– Forward data 15 30 20 10 10 5 0 Idle 0 Radio Sensors Processor Hardware Element Waiting for Radio Waiting for Sensors Sending Processing Phase 20 Sensors: Slow, Power Hungry, Noisy Time of of Day Day vs. vs. Light Light Time 200 Chamber Sensor Chamber Sensor Sensor 69 (Median of Last 10) Sensor 69 180 160 140 Lux Light (Lux) 120 100 80 60 40 20 0 20:09 20:38 21:07 21:36 22:04 22:33 23:02 23:02 -20 23:31 23:31 0:00 0:00 0:28 0:28 0:57 0:57 1:26 1:26 21 Time Time of Day TinyOS: Getting Started • The TinyOS home page: – http://webs.cs.berkeley.edu/tinyos – Start with the tutorials! • The CVS repository – http://sf.net/projects/tinyos • The NesC Project Page – http://sf.net/projects/nescc • Crossbow motes (hardware): – http://www.xbow.com • Intel Imote – www.intel.com/research/exploratory/motes.htm. 22 Part 2 The Design and Implementation of TinyDB 23 Part 2 Outline • • • • • • • TinyDB Overview Data Model and Query Language TinyDB Java API and Scripting Demo with TinyDB GUI TinyDB Internals Extending TinyDB TinyDB Status and Roadmap 24 TinyDB Revisited • High level abstraction: – Data centric programming – Interact with sensor network as a whole – Extensible framework • Under the hood: – Intelligent query processing: query optimization, power efficient execution – Fault Mitigation: automatically introduce redundancy, avoid problem areas SELECT MAX(mag) FROM sensors WHERE mag > thresh SAMPLE PERIOD 64ms App Query, Trigger Data TinyDB Sensor Network 25 Feature Overview • • • • • • Declarative SQL-like query interface Metadata catalog management Multiple concurrent queries Network monitoring (via queries) In-network, distributed query processing Extensible framework for attributes, commands and aggregates • In-network, persistent storage 26 Architecture TinyDB GUI TinyDB Client API JDBC PC side Mote side 0 0 4 TinyDB query processor 2 1 5 Sensor network DBMS 83 6 7 27 Data Model • Entire sensor network as one single, infinitely-long logical table: sensors • Columns consist of all the attributes defined in the network • Typical attributes: – Sensor readings – Meta-data: node id, location, etc. – Internal states: routing tree parent, timestamp, queue length, etc. • Nodes return NULL for unknown attributes • On server, all attributes are defined in catalog.xml • Discussion: other alternative data models? 28 Query Language (TinySQL) SELECT <aggregates>, <attributes> [FROM {sensors | <buffer>}] [WHERE <predicates>] [GROUP BY <exprs>] [SAMPLE PERIOD <const> | ONCE] [INTO <buffer>] [TRIGGER ACTION <command>] 29 Comparison with SQL • Single table in FROM clause • Only conjunctive comparison predicates in WHERE and HAVING • No subqueries • No column alias in SELECT clause • Arithmetic expressions limited to column op constant • Only fundamental difference: SAMPLE PERIOD clause 30 TinySQL Examples “Find the sensors in bright nests.” Sensors 1 SELECT nodeid, nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s Epoch Nodeid nestNo Light 0 1 17 455 0 2 25 389 1 1 17 422 1 2 25 405 31 TinySQL Examples (cont.) 2 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s 3 SELECT region, CNT(occupied) AVG(sound) FROM sensors GROUP BY region HAVING AVG(sound) > 200 EPOCH DURATION 10s “Count the number occupied nests in each loud region of the island.” Epoch region CNT(…) AVG(…) 0 North 3 360 0 South 3 520 1 North 3 370 1 South 3 520 Regions w/ AVG(sound) > 200 32 Event-based Queries • ON event SELECT … • Run query only when interesting events happens • Event examples – Button pushed – Message arrival – Bird enters nest • Analogous to triggers but events are userdefined 33 Query over Stored Data • • • • • Named buffers in Flash memory Store query results in buffers Query over named buffers Analogous to materialized views Example: – CREATE BUFFER name SIZE x (field1 type1, field2 type2, …) – SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO name – SELECT field1, field2, … FROM name SAMPLE PERIOD d 34 Using the Java API • SensorQueryer – translateQuery() converts TinySQL string into TinyDBQuery object – Static query optimization • TinyDBNetwork – sendQuery() injects query into network – abortQuery() stops a running query – addResultListener() adds a ResultListener that is invoked for every QueryResult received – removeResultListener() • QueryResult – A complete result tuple, or – A partial aggregate result, call mergeQueryResult() to combine partial results • Key difference from JDBC: push vs. pull 35 Writing Scripts with TinyDB • TinyDB’s text interface – java net.tinyos.tinydb.TinyDBMain –run “select …” – Query results printed out to the console – All motes get reset each time new query is posed • Handy for writing scripts with shell, perl, etc. 36 Using the GUI Tools • Demo time 37 Inside TinyDB SELECT T:1, AVG: 225 AVG(temp) Queries Results T:2, AVG: 250 WHERE light > 400 Multihop Network Query Processor Aggavg(temp) ~10,000 Lines Embedded C Code Filter Name: temp light > 400 got(‘temp’) ~5,000 LinesSamples (PC-Side) Java Time to sample: 50 uS get (‘temp’) Tables Cost to sample: 90 uJ Schema ~3200 Bytes RAM (w/ 768 byte heap) Table: 3 Calibration getTempFunc(…) Units: Deg. F TinyOS code ~58 kB compiled Error: ± 5 Deg F Get f Program) : getTempFunc()… (3x larger than 2nd largest TinyOS 38 TinyDB Tree-based Routing • Tree-based routing – Used in: • Query delivery • Data collection • In-network aggregation – Relationship to indexing? Q:SELECT … A Q R:{…} Q R:{…} B Q R:{…}Q Q D R:{…}Q C Q Q R:{…} Q Q Q F E Q 39 Power Consumption and Lifetime • Power typically supplied by a small battery – At full power, device will last 2-3 days -> Critical Constraint • Lifetime, power consumption varies by application – Scales with “duty cycle” : amount of time on – Low data rate (< 1 sample / 30 secs) : > 6 months possible from AA batteries Current Sensor A Time Sensor Sensor B B Must Synchronize! Sleeping Radio On, Processing Fundamental challenge: distributed coordination with low power! Transmitting 40 Time Synchronization • All messages include a 5 byte time stamp indicating system time in ms – Synchronize (e.g. set system time to timestamp) with • Any message from parent • Any new query message (even if not from parent) – Punt on multiple queries – Timestamps written just after preamble is xmitted • All nodes agree that the waking period begins when (system time % epoch dur = 0) – And lasts for WAKING_PERIOD ms • Adjustment of clock happens by changing duration of sleep cycle, not wake cycle. 41 Extending TinyDB • Why extending TinyDB? – – – – New New New New sensors attributes control/actuation commands data processing logic aggregates events • Analogous to concepts in objectrelational databases 42 Adding Attributes • Types of attributes – Sensor attributes: raw or cooked sensor readings – Introspective attributes: parent, voltage, ram usage, etc. – Constant attributes: constant values that can be statically or dynamically assigned to a mote, e.g., nodeid, location, etc. 43 Adding Attributes (cont) • Interfaces provided by Attr component – StdControl: init, start, stop – AttrRegister • • • • command registerAttr(name, type, len) event getAttr(name, resultBuf, errorPtr) event setAttr(name, val) command getAttrDone(name, resultBuf, error) • • • • • command startAttr(attr) event startAttrDone(attr) command getAttrValue(name, resultBuf, errorPtr) event getAttrDone(name, resultBuf, error) command setAttrValue(name, val) – AttrUse 44 Adding Attributes (cont) • • Steps to adding attributes to TinyDB 1) Create attribute nesC components 2) Wire new attribute components to TinyDBAttr configuration 3) Reprogram TinyDB motes 4) Add new attribute entries to catalog.xml Constant attributes can be added on the fly through TinyDB GUI 45 Adding Aggregates • Step 1: wire new nesC components TinyDB Aggregation Framework Aggregate SumM.nc AggregateUseM.nc stateSize(ID, ...) merge(ID, ...) Operator AggregateUse AggOperator.nc update(ID, ...) Aggregate CountM.nc hasData(ID,...) finalize(ID,...) init(ID, ...) getProperties(ID) Aggregate AggOperatorConf.nc AvgM.nc 46 Adding Aggregates (cont) • Step 2: add entry to catalog.xml <aggregate> <name>AVG</name> <id>5</id> <temporal>false</temporal> <readerClass>net.tinyos.tinydb.AverageClass</readerClass > </aggregate> • Step 3 (optional): implement reader class in Java – a reader class interprets and finalizes aggregate state received from the mote network, returns final result as a string for display. 47 TinyDB Status • Latest released with TinyOS 1.1 (9/03) – Install the task-tinydb package in TinyOS 1.1 distribution – First release in TinyOS 1.0 (9/02) – Widely used by research groups as well as industry pilot projects • Successful deployments in Intel Berkeley Lab and redwood trees at UC Botanical Garden – Largest deployment: ~80 weather station nodes – Network longevity: 4-5 months 48 The Redwood Tree Deployment • Redwood Grove in UC Botanical Garden, Berkeley • Collect dense sensor readings to monitor climatic variations across – – – – altitudes, angles, time, forest locations, etc. • Versus sporadic monitoring points with 30lb loggers! • Current focus: study how dense sensor data affect predictions of conventional tree-growth models 49 Data from Redwoods Humidity vs. Time 101 104 109 110 111 36m 33m: 111 32m: 110 30m: 109,108,107 Rel Humidity (%) 95 85 75 65 55 45 35 20m: 106,105,104 Temperature vs. Time 10m: 103, 102, 101 Temperature (C) 33 28 23 18 13 8 7/7/03 7/7/03 7/7/03 7/7/03 7/7/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/9/03 7/9/03 7/9/03 7/9/03 9:40 13:11 16:43 20:15 23:46 3:18 6:50 10:21 13:53 17:25 20:56 0:28 4:00 7:31 11:03 Date 50 TASK 51 A SensorNet Dilemma • Sensors still packaged like HeathKits – Pretty hard to cope with out of the box • Bare metal encourages one-off applications – Inhibits reuse • Deployment not intuitive – No configuration/monitoring tools • SensorNet PhD Factor – Today ~2.5 PhDs needed to deploy a SensorNet – Needs to be Zero 52 TASK Design Requirements • • • • • Ease of S/W Installation Deployment tools Reconfigurability Health/Mgmt Monitoring Network Reliability Guarantee • Interpretable Sensor Results • Tool Integration • Audit Trails • Lifetime estimates ~ For Developers ~ • Familiar API • Extensibility of S/W • Modular services 53 Tiny Application Sensor Kit External Tools TASK Client Tools TaskView Internet Stable Store (DBMS) TASK Field Tools SensorNet Appliance TASK Server TinyDB Sensor Network • • • • Simplicity vs. Functionality Modularity Remote control 54 Fault Tolerant SensorNet Appliance • Intelligent Gateway SNA http, other ODBC DBMS TASK Server – – – – Proxy for the sensornet Distributes query Stages results Manages configuration • Components – – – – TASK Server TinyDB Client (Java) DBMS (PostgreSQL) WebServer (Apache) TinyDB Client SensorNet 55 Tools • Field Tool – In-situ diagnostics • TaskView – Integrated tool for management and monitoring 56 For more information • http://triplerock.cs.bekeley.edu/tinydb 57 Part 3 Middleware Architecture and Research Topics 58 Architectural Overview External Tools Client Tools GUIs,etc Middleware Internet Stable Store (DBMS) Field Tools Local Servers Sensor Network TinyDB 59 What’s Left? • TinyDB and TinyOS provide a reasonable lowlevel substrate • TASK sufficient for many data collection apps • But… there are other architecture issues – Efficiency concerns • Currently transmit readings from all sensors on each epoch • Variable, context sensitive rates… – Data quality issues • Missing and faulty sensors? – Architectural issues • Actuation / closed loop issues stuff • Disconnection, etc. 60 Sensor Network Research • Very active research area – Can’t summarize it all • Focus: database-relevant research topics – Some outside of Berkeley – Other topics that are itching to be scratched – But, some bias towards work that we find compelling 61 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 62 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 63 Tiny Aggregation (TAG) • In-network processing of aggregates – Common data analysis operation • Aka gather operation or reduction in || programming – Communication reducing • Operator dependent benefit – Across nodes during same epoch • Exploit query semantics to improve efficiency! Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002. 64 Basic Aggregation • In each epoch: – Each node samples local sensors once – Generates partial state record (PSR) • local readings • readings from children 1 4 2 3 – Outputs PSR during assigned comm. interval • Interval assigned based on depth in tree • At end of epoch, PSR for whole network output at root • New result on each successive epoch 3 3 4 2 5 Interval 1 65 Illustration: In-Network Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 Interval # 4 2 3 Interval 4 1 4 Sample Period 5 1 2 3 3 2 1 4 Time 4 1 5 66 Illustration: In-Network Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 Interval # 2 1 4 4 3 Interval 3 5 1 2 3 2 2 4 1 4 5 67 Illustration: In-Network Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 Interval # 1 4 4 1 5 1 3 2 Interval 2 3 2 3 2 1 3 4 1 4 5 68 Illustration: In-Network Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 Interval # 2 3 2 2 4 5 1 3 Interval 1 1 4 4 1 5 1 3 4 5 5 69 Illustration: In-Network Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 Interval # 5 1 3 2 3 2 2 4 1 4 4 1 Interval 4 1 3 4 5 1 1 5 70 Illustration: In-Network Aggregation Interval # SELECT COUNT(*) FROM sensors Sensor # 1 2 3 4 zzz zzz zzz 3 zzz zzz 2 Interval 4 1 4 5 1 2 zzz 1 3 zzz zzz zzz zzz 1 5 zzz zzz 4 zzz zzz zzz 1 2 3 4 1 5 71 Aggregation Framework • As in extensible databases, TinyDB supports any aggregation function conforming to: Aggn={finit, fmerge, fevaluate} Finit {a0} <a0> Partial State Record (PSR) Fmerge {<a1>,<a2>} <a12> Fevaluate {<a1>} aggregate value Example: Average AVGinit {v} <v,1> AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2> AVGevaluate{<S, C>} S/C Restriction: Merge associative, commutative 72 Taxonomy of Aggregates • TAG insight: classify aggregates according to various functional properties – Yields a general set of optimizations that can automatically be applied Property Partial State Examples MEDIAN : unbounded, MAX : 1 record Affects Effectiveness of TAG Monotonicity COUNT : monotonic AVG : non-monotonic MAX : exemplary COUNT: summary MIN : dup. insensitive, AVG : dup. sensitive Hypothesis Testing, Snooping Exemplary vs. Summary Duplicate Sensitivity Drives an API! Applicability of Sampling, Effect of Loss Routing Redundancy 73 Use Multiple Parents • Use graph structure – Increase delivery probability with no communication overhead • For duplicate insensitive aggregates, or • Aggs expressible as sum of parts – Send (part of) aggregate to all parents SELECT COUNT(*) • In just one message, via multicast R – Assuming independence, decreases variance P(link xmit successful) = p P(success from A->R) = p2 E(cnt) = c * p2 Var(cnt) = c2 * p2 * (1 – p2) V # of parents = n E(cnt) = n * (c/n * p2) (c/n)2 Var(cnt) = n * * p2 * (1 – p2) = V/n B C c c/n n=2 c/n A 74 Multiple Parents Results With Splitting Benefit of Result Splitting (COUNT query) 1400 1200 Avg. COUNT No Splitting • Better than previous analysis expected! Critical • Losses aren’t Link! independent! • Insight: spreads data over many links 1000 800 Splitting No Splitting 600 400 200 0 (2500 nodes, lossy radio model, 6 parents per node) 75 Acquisitional Query Processing (ACQP) • TinyDB acquires AND processes data – Could generate an infinite number of samples • An acqusitional query processor controls – when, – where, – and with what frequency data is collected! • Versus traditional systems where data is provided a priori Madden, Franklin, Hellerstein, and Hong. The Design of An 76 Acqusitional Query Processor. SIGMOD, 2003. ACQP: What’s Different? • How should the query be processed? – Sampling as a first class operation • How does the user control acquisition? – Rates or lifetimes – Event-based triggers • Which nodes have relevant data? – Index-like data structures • Which samples should be transmitted? – Prioritization, summary, and rate control 77 Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) EPOCH DURATION 1s Traditional DBMS (pred1) (pred2) At 1 sample / sec, total power savings • could E(sampling mag) as >> 3.5mW E(sampling be as much light) 1500 uJ vs. uJ Comparable to 90 processor! Correct ordering (unless pred1 is very selective and pred2 is not): (pred1) ACQP Costly (pred2) Cheap mag light mag light (pred2) light (pred1) mag 78 Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x EPOCH DURATION 1s Traditional DBMS WINMAX (mag>x) ACQP WINMAX (mag>x) mag • Novel, general pushdown technique • Mag sampling is the most expensive operation! (light > MAX) light mag light 79 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 80 Statistical Techniques • Approximations, summaries, and sampling based on statistics and statistical models • Applications: – Limited bandwidth and large number of nodes -> data reduction – Lossiness -> predictive modeling – Uncertainty -> tracking correlations and changes over time – Physical models -> improved query answering 81 TinyDB Retrospective Query Data aggregation: Can reduce communication TinyDB SQL-style query Distribute Collect Declarative query interface: query answer Sensor nets are not just for or PhDs data Decrease deployment time Every time step 82 Limitations of TinyDB approach NewQuery Query TinyDB SQL-style Data collection: query Every node must wake up at every time step Distribute Collect Data loss ignored query data distribution: No quality guarantees Query Wastes resources byquery ignoring correlations Every node must receive Redo process every time query changes Every time step 83 Sensor net data is correlated • Data is not i.i.d. shouldn’t ignore missing data • Observing one sensor information about other sensors (and future Spatial-temporal correlation values) • Observing one type of reading information about other local readings 84 BBQ: Model-driven data acquisition NewQuery Query Probabilistic Model Middleware Layer posterior belief 0.4 0.3 Example model: Multidimensional Gaussian 0.2 0.1 SQL-style query 0 10 20 30 with desired Data Condition Strengths of model-based data acquisition confidence gathering Dt on new Observe fewer plan attributesobservations transition model Exploit correlations Reuse information between queries Directly deal with missing data Answer more complex (probabilistic) queries 85 0.4 0.3 0.2 0.1 0 10 20 30 Probabilistic models and queries 0.4 0.3 User’s perspective: 0.2 0.1 0 Query 10 20 30 SELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensors WHERE nodeId in {1..8} 1.0°C System selects and observes subset of nodes Observed nodes: {3,6,8} Query result Node 1 2 3 4 5 6 7 8 Temp. 17.3 18.1 17.4 16.1 19.2 21.3 17.5 16.3 Conf. 98% 95% 100% 99% 95% 100% 98% 100%86 Supported queries • Value query – Xi ± with prob. at least 1- • SELECT and Range query – Xi[a,b] with prob. at least 1- – which sensors have temperature greater than 25°C ? • Aggregation – average ± of subset of attribs. with prob. > 1- – combine aggregation and selection – probability > 10 sensors have temperature greater Queries require solution to integrals than 25°C ? Many queries computed in closed-form Some require numerical integration/sampling87 Experimental results 50 OFFIC E 52 49 12 9 54 OFFIC E 51 53 QUIET PHONE 11 8 C ONFER ENC E 16 15 10 13 14 7 17 18 STOR AGE 48 LAB ELEC C OPY 5 47 19 6 4 46 45 21 3 2 SERVER 44 K ITC HEN 39 37 42 41 38 36 23 33 35 40 22 1 43 20 29 27 31 34 25 32 30 28 24 26 • Redwood trees and Intel Lab datasets • Learned models from data – Static model – Dynamic model – Kalman filter, time-indexed transition probabilities • Evaluated on a wide range of queries 88 Cost versus Confidence level 89 Obtaining approximate values Query: True temperature value ± epsilon with confidence 95% 90 Next Step : Outliers and Unusual Events • Once we have a model of the expected behavior, we can: – Detect unusual (low probability) events – Predict missing values OFF AC OFF • Often, there are several “expected” behavior modes, which we want to AC ON differentiate between ON –E.g., if we can characterize failure modes, we can discard them • Applying well known probabilistic techniques to allow TinyDB to deal with such issues. 91 IDSQ • Similar idea: suppose you want to e.g., localize a vehicle in a field of sensors • Idea: task sensors in order of best improvement to estimate of some value: – Choose leader(s) • Suppress subordinates • Task subordinates, one at a time – Until some measure of goodness (error bound) is met See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001. 92 Graphical Representation Model location estimate as a point with 2-dimensional Gaussian uncertainty. Residual 1 Residual 2 Area of residuals is equal Principal Axis S1 S2 Preferred because it reduces error along principal 93 axis Lots of Other Work with of This Flavor • Precision / Energy Tradeoff -- Want nodes to sleep except when their data is needed – Olston et al. Approximate Caching. SIGMOD ‘03. – Cheng et al. Kalman Filters. SIGMOD ‘04. - Lazaridis and Mehrotra. Approximate Selection Queries over Imprecise Data. ICDE 2004. - UCI Quasar Project - Timeliness + Real Time Constraints • John A. Stankovic etl al. Real Time Communication and Coordination in Sensor Networks. Proceedings of the IEEE, 91(7), July 2003. • Tian He et al. SPEED: a stateless protocol (ICDCS’03) 94 In-Net Regression • Linear regressionX :vssimple way to predict Y w/ Curve Fit future12values, identify outliers y = 0.9703x - 0.0067 10 2 • Regression can be acrossRlocal = 0.947or remote values, 8multiple dimensions, or with high degree6polynomials – E.g., node A readings vs. node B’s 4 – Or, location (X,Y), versus temperature 2 E.g., over many nodes 0 1 3 5 7 9 Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient95 Framework for Modeling Sensor Network Data .” Under submission. In-Net Regression (Continued) • Problem: may require data from all sensors to build model • Solution: partition sensors into overlapping “kernels” that influence each other – Run regression in each kernel • Requiring just local communication – Blend data between kernels – Requires some clever matrix manipulation • End result: regressed model at every node – Useful in failure detection, missing value estimation 96 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 97 Heterogeneous Sensor Networks • Leverage small numbers of high-end nodes to benefit large numbers of inexpensive nodes • Still must be transparent and ad-hoc • Key to scalability of sensor networks • Interesting heterogeneities – – – – – Energy: battery vs. outlet power Link bandwidth: Chipcon vs. 802.11x Computing and storage: ATMega128 vs. Xscale Pre-computed results Sensing nodes vs. QP nodes 98 Computing Heterogeneity with TinyDB • Separate query processing from sensing – Provide query processing on a small number of nodes – Attract packets to query processors based on “service value” • Compare the total energy consumption of the network • • • • No aggregation All aggregation Opportunistic aggregation HSN proactive aggregation Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor Network Project, ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf. 99 5x7 TinyDB/HSN Mica2 Testbed 100 Data Packet Saving Data Packet Saving 0.00% % Change in Data Packet Count -5.00% • How many aggregators are desired? • Does placement matter? -10.00% -15.00% 11% aggregators achieve 72% of max data reduction -20.00% -25.00% -30.00% -35.00% -40.00% -45.00% -50.00% 1 2 3 4 5 6 All (35) Number of Aggregator Data Packet Saving - Aggregator Placement % Change in Data Packet Counnt 0.00% -5.00% -10.00% -15.00% -20.00% -25.00% Optimal placement 2/3 distance from sink. -30.00% -35.00% -40.00% -45.00% -50.00% 25 27 29 31 Aggregator Location All (35) 101 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 102 Occasionally Connected Sensornets internet TinyDB Server GTWY TinyDB QP Mobile GTWY Mobile GTWY Mobile GTWY TinyDB QP GTWY TinyDB QP 103 Occasionally Connected Sensornets Challenges • Networking support – Tradeoff between reliability, power consumption and delay – Data custody transfer: duplicates? – Load shedding – Routing of mobile gateways • Query processing – Operation placement: in-network vs. on mobile gateways – Proactive pre-computation and data movement • Tight interaction between networking and QP Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf . 104 Other Occasionally Connected Work • Kevin Fall. Delay Tolerant Networks. SIGCOMM 2003. • Juang et al. Enery efficient computing for wildlife tracking. ASPLOS 2002. • Li et al. Sending messages to mobile users in disconnected ad-hoc wireless networks. MOBICOM 2000. • Shah et al. Data Mules. SNPA 2003. 105 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 106 Distributed In-network Storage • Collectively, sensornets have large amounts of in-network storage • Good for in-network consumption or caching • Challenges – Distributed indexing for fast query dissemination – Resilience to node or link failures – Graceful adaptation to data skews – Minimizing index insertion/maintenance cost 107 Example: DIM • Functionality – Efficient range query for multidimensional data. • Approaches – Divide sensor field into bins. – Locality preserving mapping from m-d space to geographic locations. – Use geographic routing such as GPSR. E2= <0.6, 0.7> E1 = <0.7, 0.8> • Assumptions – Nodes know their locations and network boundary – No node mobility Q1=<.5-.7, .5-1> Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003. 108 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 109 Closing the Loop • Challenge: want more than data collection – Condition-based sensing, rate adjustment – Condition-based actuation • E.g., – Kansal et al. Sensor Uncertainty Reduction Using Low Complexity Actuation. IPSN 2004. – work from Qiong Luo HKUST et al in CIDR. – Various process control systems: ladder logic, SCADA, etc. • Questions: – Appropriate languages – Resource contention on actuators – Closed-loop safety concerns 110 Topics • Improving TinyDB Efficiency – In-network aggregation – Acquisitional Query Processing • Alternative Architectures – Statistical Techniques – Heterogeneity – Intermittent Connectivity • New features – In-network storage – Closing the loop – Integration with traditional databases 111 Alternative Middleware: Integration into an Existing DBMS 112 Concluding Remarks • Sensor networks are an exciting emerging technology, with a wide variety of applications • Many research challenges in all areas of computer science – Database community included – Some agreement that a declarative interface is right • TinyDB and other early work are an important first step • But there’s lots more to be done! – Real challenge is building appropriate middleware abstractions 113 Questions? http://db.lcs.mit.edu/madden/middleware_tutorial.ppt 114 In-Network Join Strategies • Types of joins: – non-sensor -> sensor – sensor -> sensor • Optimization questions: – Should the join be pushed down? – If so, where should it be placed? – What if a join table exceeds the memory available on one node? 115 Choosing Where to Place Operators • Idea : choose a “join node” to run the operator • Over time, explore other candidate placements – Nodes advertise data rates to their neighbors – Neighbors compute expected cost of running the join based on these rates – Neighbors advertise costs – Current join node selects a new, lower cost node Bonfils + Bonnet, Adaptive and Decentralized Operator Placement for In-Network QueryProcessing IPSN 2003. 116 Topics • • • • • • In-network aggregation Acquisitional Query Processing Heterogeneity Intermittent Connectivity In-network Storage Statistics-based summarization and sampling • In-network Joins • Adaptivity and Sensor Networks • Multiple Queries 117 Adaptivity In Sensor Networks • Queries are long running • Selectivities change – E.g. night vs day • Network load and available energy vary • All suggest that some adaptivity is needed – Of data rates or granularity of aggregation when optimizing for lifetimes – Of operator orderings or placements when selectivities change (c.f., conditional plans for correlations) • As far as we know, this is an open problem! 118 Multiple Queries and Work Sharing • As sensornets evolve, users will run many queries simultaneously – E.g., traffic monitoring • Likely that queries will be similar – But have different end points, parameters, etc • Would like to share processing, routing as much as possible • But how? Again, an open problem. 119