In-Network Query Processing Sam Madden CS294-1 9/30/03 Outline • TinyDB • • • • • – And demo! Aggregate Queries ACQP Break Adaptive Operator Placement … Outline • TinyDB • • • • • – And demo! Aggregate Queries ACQP Break Adaptive Operator Placement … Programming Sensor Nets Is Hard – Months of lifetime required from small batteries – 3-5 days naively; can’t recharge often – Interleave sleep with processing – Lossy, low-bandwidth, short range communication » Nodes coming and going » Multi-hop High-Level Abstraction Is Needed! – Remote, zero administration deployments – Highly distributed environment – Limited Development Tools » Embedded, LEDs for Debugging! A Solution: Declarative Queries • Users specify the data they want – Simple, SQL-like queries – Using predicates, not specific addresses – Our system: TinyDB • Challenge is to provide: – Expressive & easy-to-use interface – High-level operators • “Transparent Optimizations” that many programmers would miss – Sensor-net specific techniques – Power efficient execution framework TinyDB Demo TinyDB Architecture SELECT T:1, AVG: 225 AVG(temp) Queries Results T:2, AVG: 250 WHERE light > 400 Multihop Network Query Processor Aggavg(temp) Schema: •“Catalog” of commands & attributes ~10,000 Filter Lines Embedded C CodeName: temp light > 400 Time to sample: 50 uS ~5,000 Lines Java get (‘temp’) Tables (PC-Side) Samples got(‘temp’) Cost to sample: 90 uJ Schema ~3200 Bytes RAM (w/ 768 byte heap) Calibration Table: 3 getTempFunc(…) Units: Deg. F TinyOScode ~58 kB compiled Error: ± 5 Deg F GetProgram) f : getTempFunc()… (3x larger than 2nd largest TinyOS TinyDB Declarative Queries for Sensor Networks “Find the sensors in bright nests.” •1 Examples: SELECT nodeid, nestNo, light FROM sensors WHERE light > 400 EPOCH DURATION 1s Sensors Epoch Nodeid nestNo Light 0 1 17 455 0 2 25 389 1 1 17 422 1 2 25 405 Aggregation Queries 2 SELECT AVG(sound) FROM sensors EPOCH DURATION 10s 3 SELECT region, CNT(occupied) AVG(sound) FROM sensors GROUP BY region HAVING AVG(sound) > 200 EPOCH DURATION 10s “Count the number occupied nests in each loud region of the island.” Epoch region CNT(…) AVG(…) 0 North 3 360 0 South 3 520 1 North 3 370 1 South 3 520 Regions w/ AVG(sound) > 200 • • • • Benefits of Declarative Queries Specification of “whole-network” behavior Simple, safe Complex behavior via multiple queries, app logic Optimizable – Exploit (non-obvious) interactions – E.g.: • ACQP operator ordering, Adaptive join operator placement, Lifetime selection, Topology selection • Versus other approaches, e.g., Diffusion • Black box ‘filter’ operators • Intanagonwiwat , “Directed Diffusion”, Mobicomm 2000 Outline • TinyDB • • • • • – And demo! Aggregate Queries ACQP Break Adaptive Operator Placement … Tiny Aggregation (TAG) • Not in today’s reading • In-network processing of aggregates – Common data analysis operation • Aka gather operation or reduction in || programming – Communication reducing • Operator dependent benefit • Exploit query semantics to improve efficiency! Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002. Query Propagation Via TreeBased Routing • Tree-based routing – Used in: • Query delivery • Data collection – Topology selection is important; e.g. • Krishnamachari, DEBS 2002, Intanagonwiwat, ICDCS 2002, Heidemann, SOSP 2001 • LEACH/SPIN, Heinzelman et al. MOBICOM 99 • SIGMOD 2003 – Continuous process Q:SELECT … A Q R:{…} Q R:{…} B Q R:{…}Q Q D R:{…}Q C Q Q R:{…} Q Q Q F • Mitigates failures E Q Basic Aggregation • In each epoch: – Each node samples local sensors once – Generates partial state record (PSR) • local readings • readings from children – Outputs PSR during assigned comm. interval 1 2 3 • Communication scheduling for power reduction • At end of epoch, PSR for whole network output at root • New result on each successive epoch • Extras: – Predicate-based partitioning via GROUP BY 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 4 2 3 1 4 5 1 2 3 <- Time 3 2 1 4 4 1 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 4 4 <- Time 3 2 1 5 1 2 3 2 2 4 1 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 4 4 <- Time 1 5 1 3 2 1 3 2 3 2 1 3 4 1 4 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 <- Time 5 1 3 2 3 2 2 4 1 4 4 1 5 1 3 4 5 5 Illustration: Aggregation SELECT COUNT(*) FROM sensors Sensor # 1 2 3 4 4 <- Time 2 3 2 2 4 5 1 3 1 1 1 3 4 5 1 1 5 Aggregation Framework • As in extensible databases, TAG supports any aggregation function conforming to: Aggn={finit, fmerge, fevaluate} Finit {a0} <a0> Partial State Record (PSR) Fmerge {<a1>,<a2>} <a12> Fevaluate {<a1>} aggregate value Example: Average AVGinit {v} <v,1> AVGmerge {<S1, C1>, <S2, C2>} < S1 + S2 , C1 + C2> AVGevaluate{<S, C>} S/C Restriction: Merge associative, commutative Types of Aggregates • SQL supports MIN, MAX, SUM, COUNT, AVERAGE • Any function over a set can be computed via TAG • In network benefit for many operations – E.g. Standard deviation, top/bottom N, spatial union/intersection, histograms, etc. – Compactness of PSR Taxonomy of Aggregates • TAG insight: classify aggregates according to various functional properties – Yields a general set of optimizations that can automatically be applied Properties Partial State Monotonicity Exemplary vs. Summary Duplicate Sensitivity Drives an API! Partial State • Growth of PSR vs. number of aggregated values (n) – – – – Algebraic: Distributive: Holistic: Unique: • d = # of distinct values – Content Sensitive: Property Partial State |PSR| |PSR| |PSR| |PSR| = 1 (e.g. MIN) “Data Cube”, = c (e.g. AVG) Gray et. al = n (e.g. MEDIAN) = d (e.g. COUNT DISTINCT) |PSR| < n (e.g. HISTOGRAM) Examples MEDIAN : unbounded, MAX : 1 record Affects Effectiveness of TAG Benefit of In-Network Processing Simulation Results 2500 Nodes 50x50 Grid Depth = ~10 100000 Neighbors = ~20 90000 Total Bytes Xmitted Uniform Dist over [0,100] Total Bytes Xmitted vs. Aggregation Function Unique 80000 70000 Holistic • Aggregate & depth dependent benefit! 60000 50000 40000 30000 Distributive Algebraic 20000 10000 0 EXTERNAL MAX AVERAGE Aggregation Function DISTINCT MEDIAN Outline • TinyDB • • • • • – And demo! Aggregate Queries ACQP Break Adaptive Operator Placement … Acquisitional Query Processing (ACQP) Traditional DBMS: processes data already in the system Acquisitional DBMS: generates the data in the system! An acquisitional query processor controls • when, • where, • and with what frequency data is collected Versus traditional systems where data is provided a priori ACQP: What’s Different? • Basic Acquisitional Processing – Continuous queries, with rates or lifetimes – Events for asynchronous triggering • Avoiding Acquisition Through Optimization – Sampling as a query operator • Choosing Where to Sample via Co-acquisition – Index-like data structures • Acquiring data from the network – Prioritization, summary, and rate control Lifetime Queries • Lifetime vs. sample rate SELECT … EPOCH DURATION 10 s SELECT … LIFETIME 30 days • Extra: Allow a MAX SAMPLE PERIOD – Discard some samples – Sampling cheaper than transmitting (Single Node) Lifetime Prediction SELECT nodeid, light LIFETIME 24 Weeks Voltage vs. Time, Measured Vs. Expected Lifetime Goal = 24 Weeks (4032 Hours. 15 s / sample) Voltage (Raw Units) 1000 Voltage (Expected) Voltage (Measured) Linear Fit 2 R = 0.8455 900 800 700 Expected Measured 1030 1010 600 990 500 970 950 400 0 100 200 300 Insufficient Voltage to Operate (V = 350) 300 0 500 1000 1500 2000 2500 Time (Hours) 3000 3500 4000 Operator Ordering: Interleave Sampling + Selection SELECT light, mag FROM sensors WHERE pred1(mag) AND pred2(light) EPOCH DURATION 1s Traditional DBMS (pred1) (pred2) At 1 sample / sec, total power savings • could E(sampling mag) as >> 3.5mW E(sampling be as much light) 1500 uJ vs. uJ Comparable to 90 processor! Correct ordering (unless pred1 is very selective and pred2 is not): (pred1) ACQP Costly (pred2) Cheap mag light mag light (pred2) light (pred1) mag Exemplary Aggregate Pushdown SELECT WINMAX(light,8s,8s) FROM sensors WHERE mag > x EPOCH DURATION 1s Traditional DBMS WINMAX (mag>x) ACQP WINMAX (mag>x) mag (light > MAX) light mag light • Novel, general pushdown technique • Mag sampling is the most expensive operation! Event Based Processing • Epochs are synchronous • Might want to issue queries in response to asynchronous events – Avoid unneccessary “polling” CREATE TABLE birds(uint16 cnt) SIZE 1 CIRCULAR ON EVENT bird-enter(…) SELECT b.cnt+1 FROM birds AS b OUTPUT INTO b ONCE In-network storage Placement subject to optimization Attribute Driven Network Construction • Goal: co-acquisition -- sensors that sample together route together • Observation: queries often over limited area – Or some other subset of the network • E.g. regions with light value in [10,20] • Idea: build network topology such that likevalued nodes route through each other – For range queries – Relatively static attributes (e.g. location) • Maintenance Issues Tracking Co-Acquisition Via Semantic Routing Trees •Idea: send range queries only to participating nodes –Parents maintain ranges of descendants SELECT … WHERE a > 5 AND a < 12 • Precomputed intervals 4 a:[1,10] 1 a:[7,15] 2 a:[20,40] 3 • Reported by children as they join Excluded from query broadcast and result collection! Parent Selection for SRTs • Idea: Node picks parent whose 0 1 [1,10] 2 ancestors’ interval most overlap its descendants’ interval [7,15] 3 [20,40] Other selection policies in paper! [3,6] [1,10] = [3,6] 4 [3,6] [3,6] [7,15] = ø [3,6] [20,40] = ø # Active Nodes (400 = Max) Simulation Result Active Nodes vs. Query Range 450 400 350 Best Case (Expected) 300 Closest Parent All Nodes 250 200 150 100 50 0 0 0.05 0.1 0.2 Query Size as % of Value Range 0.5 1 (Uniform value distribution, 20x20 grid, ideal connectivity to (8) neighbors) Outline • TinyDB • • • • • – And demo! Aggregate Queries ACQP Break Adaptive Operator Placement … Outline • TinyDB • • • • • – And demo! Aggregate Queries ACQP Break Adaptive Operator Placement … Adaptive & Decentralized Operator Placement • IPSN 2003 Paper • Main Idea – Place operators near data sources – Greater operator rate Closer placement • For each operator – Explore candidate neighbors – Migrate to lower cost placements – Via extra messages Rate A Rate B Proper placement depends on path lengths and relative rates! “Adaptivity” in Databases • Adaptivity : changing query plans on the fly – Typically at the physical level • Where the plan runs • Ordering of operators • Instantiations of operators, e.g. hash join vs merge join – Non-traditional • Conventionally, complete plans are built prior to execution • Using cost estimates (collected from history) – Important in volatile or long running environments • Where a priori estimates are unlikely to be good • E.g., sensor networks Adaptivity for Operator Placement • Adaptivity comes at a cost – Extra work on each operator, each tuple – In a DBMS, processing per tuple is small • 100’s of instructions per operator – Unless you have to hit the disk! • Costs in this case? – Extra communication hurts • Finding candidate placements (exploration) – Cost advertisements from local node – New costs from candidates • Moving state (migration) – Joins, windowed aggregates Do Benefits Justify Costs? • Not Evaluated! – 3x reduction on messages vs. external – Excluding exploration & migration costs • Seems somewhat implausible, especially given added complexity – Hard to make migration protocol work • Depends on ability to reliably quiesce child ops. • What else could you do? – Static placement Summary • Declarative QP – Simplify data collection in sensornets – In-network processing, query optimization for performance • Acquisitional QP – Focus on costs associated with sampling data • New challenge of sensornets, other streaming systems? • Adaptive Join Placement – In-network optimization – Some benefit, but practicality unclear – Operator pushdown still a good idea Open Problems • Many; a few: – In-network storage and operator placement – Dealing with heterogeneity – Dealing with loss • Need real implementations of many of these ideas • See me! (madden@cs.berkeley.edu) Questions / Discussion Making TinyDB REALLY Work • Berkeley Botanical Garden • First “real deployment” • Requirements: – At least 1 month unattended operation – Support for calibrated environmental sensors – Multi-hop routing • What we started with: – Limited power management, no timesynchronization – Motes crashed hard occasionally – Limited, relatively untested multihop routing Power Consumption in Sensornets • Waking current ~12mA – Fairly evenly spread between sensors, processor, radio • Sleeping current 20-100uA • Power consumption dominated by sensing, reception: – 1s Power up on Mica2Dot sensor board – Most mote apps use “always on” radio • Completely unstructured communication • Bad for battery life Why Not Use TDMA? • CSMA is very flexible: easy for new nodes to join – Reasonably scalable (relative to Bluetooth) • CSMA implemented, available – We wanted to build something that worked Power Management Approach Coarse-grained communication scheduling Mote ID 1 … zzz … Epoch (10s -100s of seconds) … zzz … 2 3 4 5 time 2-4s Waking Period Benefits / Drawbacks • Benefits – Can still use CSMA within waking period • No reservation required: new nodes can join easily! – Waking period duration is easily tunable • Depending on network size • Drawbacks – Longer waking time vs. TDMA? • Could stagger slots based on tree-depth – No “guaranteed” slot reservation • Nothing is guaranteed anyway Challenges • • • • Time Synchronization Fault recovery Joining, starting, and stopping Parameter Tuning – Waking period: Hardcode at 4s? Time Synchronization • All messages include a 5 byte time stamp indicating system time in ms – Synchronize (e.g. set local system time to timestamp) with • Any message from parent • Any new query message (even if not from parent) – Punt on multiple queries – Timestamps written just after preamble is xmitted • All nodes agree that the waking period begins when (system time % epoch dur = 0) – And lasts for WAKING_PERIOD ms Wrinkles • If node hasn’t heard from it’s parent for k epochs – Switch to “always on” mode for 1 epoch • If waking period ends – Punt on this epoch • Query dissemination / deletion via always-on basestation and viral propagation – Data messages as advertisements for running queries – Explicit requests for missing queries – Explicit “dead query” messages for stopped queries • Don’t request the last dead query Results • 30 Nodes in the Garden – Lasted 21 days – 10% duty cycle • Sample period = 30s, waking period = 3s – Next deployment: • 1-2% duty cycle • Fixes to clock to reduce baseline power • Should last at least 60 days • Time sync test: 2,500 readings – 5 readings “out of synch” – 15s epoch duration, 3s waking period Garden Data Humidity vs. Time 101 104 109 110 111 36m 33m: 111 32m: 110 30m: 109,108,107 Rel Humidity (%) 95 85 75 65 55 45 35 Temperature vs. Time 20m: 106,105,104 10m: 103, 102, 101 Temperature (C) 33 28 23 18 13 8 7/7/03 7/7/03 7/7/03 7/7/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/9/03 7/9/03 7/9/03 9:40 13:41 17:43 21:45 1:47 5:49 9:51 13:53 17:55 21:57 1:59 6:01 10:03 Date Data Quality Sensor Id vs. Number of Results, Summarized by Parent Sensor 12000 100 % of Results 10000 Number oF Messages 75% 8000 62% 61% 53% 6000 57% 54% 55% 52% 48% 37% 4000 42% 51% 38% 35% 22% 2000 3% 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Sensor Id Parent Sensor: 0 1 2 4 5 6 8 9 10 11 12 13 16