In-Network Query Processing

advertisement
In-Network Query Processing
Sam Madden
CS294-1
9/30/03
Outline
• TinyDB
•
•
•
•
•
– And demo!
Aggregate Queries
ACQP
Break
Adaptive Operator Placement
…
Outline
• TinyDB
•
•
•
•
•
– And demo!
Aggregate Queries
ACQP
Break
Adaptive Operator Placement
…
Programming Sensor Nets Is
Hard
– Months of lifetime required from small batteries
– 3-5 days naively; can’t recharge often
– Interleave sleep with processing
– Lossy, low-bandwidth, short range communication
» Nodes coming and going
» Multi-hop
High-Level Abstraction
Is Needed!
– Remote, zero administration deployments
– Highly distributed environment
– Limited Development Tools
» Embedded, LEDs for Debugging!
A Solution: Declarative
Queries
• Users specify the data they want
– Simple, SQL-like queries
– Using predicates, not specific addresses
– Our system: TinyDB
• Challenge is to provide:
– Expressive & easy-to-use interface
– High-level operators
• “Transparent Optimizations” that many programmers would miss
– Sensor-net specific techniques
– Power efficient execution framework
TinyDB Demo
TinyDB Architecture
SELECT
T:1, AVG: 225
AVG(temp) Queries
Results T:2, AVG: 250
WHERE
light > 400
Multihop
Network
Query Processor
Aggavg(temp)
Schema:
•“Catalog” of commands &
attributes
~10,000 Filter
Lines Embedded C CodeName: temp
light >
400
Time to sample: 50 uS
~5,000
Lines
Java
get (‘temp’) Tables (PC-Side)
Samples got(‘temp’)
Cost to sample: 90 uJ
Schema
~3200 Bytes
RAM (w/ 768 byte heap)
Calibration Table: 3
getTempFunc(…)
Units: Deg. F
TinyOScode
~58 kB compiled
Error: ± 5 Deg F
GetProgram)
f : getTempFunc()…
(3x larger than 2nd largest TinyOS
TinyDB
Declarative Queries for
Sensor Networks
“Find the sensors in bright
nests.”
•1 Examples:
SELECT nodeid, nestNo, light
FROM sensors
WHERE light > 400
EPOCH DURATION 1s
Sensors
Epoch Nodeid nestNo Light
0
1
17
455
0
2
25
389
1
1
17
422
1
2
25
405
Aggregation Queries
2 SELECT AVG(sound)
FROM sensors
EPOCH DURATION 10s
3 SELECT region, CNT(occupied)
AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10s
“Count the number
occupied nests in each loud
region of the island.”
Epoch
region
CNT(…)
AVG(…)
0
North
3
360
0
South
3
520
1
North
3
370
1
South
3
520
Regions w/ AVG(sound) > 200
•
•
•
•
Benefits of Declarative
Queries
Specification of “whole-network” behavior
Simple, safe
Complex behavior via multiple queries, app logic
Optimizable
– Exploit (non-obvious) interactions
– E.g.:
• ACQP operator ordering, Adaptive join operator
placement, Lifetime selection, Topology selection
• Versus other approaches, e.g., Diffusion
• Black box ‘filter’ operators
• Intanagonwiwat , “Directed Diffusion”, Mobicomm 2000
Outline
• TinyDB
•
•
•
•
•
– And demo!
Aggregate Queries
ACQP
Break
Adaptive Operator Placement
…
Tiny Aggregation (TAG)
• Not in today’s reading
• In-network processing of aggregates
– Common data analysis operation
• Aka gather operation or reduction in || programming
– Communication reducing
• Operator dependent benefit
• Exploit query semantics to improve efficiency!
Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.
Query Propagation Via TreeBased Routing
• Tree-based routing
– Used in:
• Query delivery
• Data collection
– Topology selection is
important; e.g.
• Krishnamachari, DEBS 2002,
Intanagonwiwat, ICDCS
2002, Heidemann, SOSP
2001
• LEACH/SPIN, Heinzelman
et al. MOBICOM 99
• SIGMOD 2003
– Continuous process
Q:SELECT …
A
Q
R:{…}
Q
R:{…}
B
Q
R:{…}Q
Q
D
R:{…}Q
C
Q
Q
R:{…}
Q
Q
Q
F
• Mitigates failures
E
Q
Basic Aggregation
• In each epoch:
– Each node samples local sensors once
– Generates partial state record (PSR)
• local readings
• readings from children
– Outputs PSR during assigned comm. interval
1
2
3
• Communication scheduling for power reduction
• At end of epoch, PSR for whole network
output at root
• New result on each successive epoch
• Extras:
– Predicate-based partitioning via GROUP BY
4
5
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
4
2
3
1
4
5
1
2
3
<- Time
3
2
1
4
4
1
5
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
4
4
<- Time
3
2
1
5
1
2
3
2
2
4
1
4
5
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
4
4
<- Time
1
5
1
3
2
1
3
2
3
2
1
3
4
1
4
5
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
<- Time
5
1
3
2
3
2
2
4
1
4
4
1
5
1
3
4
5
5
Illustration: Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
4
4
<- Time
2
3
2
2
4
5
1
3
1
1
1
3
4
5
1
1
5
Aggregation Framework
• As in extensible databases, TAG supports any
aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0}
 <a0>
Partial State Record (PSR)
Fmerge {<a1>,<a2>}  <a12>
Fevaluate {<a1>}
 aggregate value
Example: Average
AVGinit
{v}
 <v,1>
AVGmerge {<S1, C1>, <S2, C2>}
 < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>}
 S/C
Restriction: Merge associative, commutative
Types of Aggregates
• SQL supports MIN, MAX, SUM, COUNT,
AVERAGE
• Any function over a set can be computed via TAG
• In network benefit for many operations
– E.g. Standard deviation, top/bottom N, spatial
union/intersection, histograms, etc.
– Compactness of PSR
Taxonomy of Aggregates
• TAG insight: classify aggregates according to various functional
properties
– Yields a general set of optimizations that can automatically be applied
Properties
Partial State
Monotonicity
Exemplary vs. Summary
Duplicate Sensitivity
Drives an API!
Partial State
• Growth of PSR vs. number of aggregated values (n)
–
–
–
–
Algebraic:
Distributive:
Holistic:
Unique:
• d = # of distinct values
– Content Sensitive:
Property
Partial State
|PSR|
|PSR|
|PSR|
|PSR|
= 1 (e.g. MIN)
“Data Cube”,
= c (e.g. AVG)
Gray et. al
= n (e.g. MEDIAN)
= d (e.g. COUNT DISTINCT)
|PSR| < n (e.g. HISTOGRAM)
Examples
MEDIAN : unbounded,
MAX : 1 record
Affects
Effectiveness of TAG
Benefit of In-Network
Processing
Simulation Results
2500 Nodes
50x50 Grid
Depth = ~10
100000
Neighbors = ~20
90000
Total Bytes Xmitted
Uniform Dist
over [0,100]
Total Bytes Xmitted vs. Aggregation Function
Unique
80000
70000
Holistic
• Aggregate & depth
dependent benefit!
60000
50000
40000
30000
Distributive
Algebraic
20000
10000
0
EXTERNAL
MAX
AVERAGE
Aggregation Function
DISTINCT
MEDIAN
Outline
• TinyDB
•
•
•
•
•
– And demo!
Aggregate Queries
ACQP
Break
Adaptive Operator Placement
…
Acquisitional Query
Processing (ACQP)
Traditional DBMS: processes data already in the system
Acquisitional DBMS: generates the data in the system!
An acquisitional query processor controls
• when,
• where,
• and with what frequency data is collected
Versus traditional systems where data is provided a priori
ACQP: What’s Different?
• Basic Acquisitional Processing
– Continuous queries, with rates or lifetimes
– Events for asynchronous triggering
• Avoiding Acquisition Through Optimization
– Sampling as a query operator
• Choosing Where to Sample via Co-acquisition
– Index-like data structures
• Acquiring data from the network
– Prioritization, summary, and rate control
Lifetime Queries
• Lifetime vs. sample rate
SELECT …
EPOCH DURATION 10 s
SELECT …
LIFETIME 30 days
• Extra: Allow a MAX SAMPLE PERIOD
– Discard some samples
– Sampling cheaper than transmitting
(Single Node) Lifetime Prediction
SELECT nodeid, light
LIFETIME 24 Weeks
Voltage vs. Time, Measured Vs. Expected
Lifetime Goal = 24 Weeks (4032 Hours. 15 s / sample)
Voltage (Raw Units)
1000
Voltage (Expected)
Voltage (Measured)
Linear Fit
2
R = 0.8455
900
800
700
Expected
Measured
1030
1010
600
990
500
970
950
400
0
100
200
300
Insufficient Voltage to
Operate (V = 350)
300
0
500
1000
1500
2000
2500
Time (Hours)
3000
3500
4000
Operator Ordering: Interleave Sampling +
Selection
SELECT light, mag
FROM sensors
WHERE pred1(mag)
AND pred2(light)
EPOCH DURATION 1s
Traditional DBMS
(pred1)
(pred2)
At 1 sample / sec, total power savings
• could
E(sampling
mag) as
>> 3.5mW
E(sampling
be as much
 light)
1500 uJ vs.
uJ
Comparable
to 90
processor!
Correct ordering
(unless pred1 is very selective
and pred2 is not):
(pred1)
ACQP
Costly
(pred2)
Cheap
mag
light
mag
light
(pred2)
light
(pred1)
mag
Exemplary Aggregate
Pushdown
SELECT WINMAX(light,8s,8s)
FROM sensors
WHERE mag > x
EPOCH DURATION 1s
Traditional DBMS
WINMAX
(mag>x)
ACQP
WINMAX
(mag>x)
mag
(light > MAX)
light
mag
light
• Novel, general
pushdown
technique
• Mag sampling is
the most
expensive
operation!
Event Based Processing
• Epochs are synchronous
• Might want to issue queries in
response to asynchronous events
– Avoid unneccessary “polling”
CREATE TABLE birds(uint16 cnt)
SIZE 1 CIRCULAR
ON EVENT bird-enter(…)
SELECT b.cnt+1
FROM birds AS b
OUTPUT INTO b
ONCE
In-network storage
Placement subject
to optimization
Attribute Driven Network
Construction
• Goal: co-acquisition -- sensors that sample
together route together
• Observation: queries often over limited area
–
Or some other subset of the network
•
E.g. regions with light value in [10,20]
• Idea: build network topology such that likevalued nodes route through each other
–
For range queries
–
Relatively static attributes (e.g. location)
•
Maintenance Issues
Tracking Co-Acquisition Via Semantic
Routing Trees
•Idea: send range queries only to participating nodes
–Parents maintain ranges of descendants
SELECT …
WHERE a > 5 AND a < 12
• Precomputed
intervals
4
a:[1,10]
1
a:[7,15]
2
a:[20,40]
3
• Reported by
children as they join
Excluded from query
broadcast and result
collection!
Parent Selection for SRTs
• Idea: Node picks parent whose
0
1
[1,10]
2
ancestors’ interval most overlap its
descendants’ interval
[7,15]
3
[20,40]
Other selection
policies in paper!
[3,6]  [1,10] = [3,6]
4
[3,6]
[3,6]  [7,15] = ø
[3,6]  [20,40] = ø
# Active Nodes (400 = Max)
Simulation Result
Active Nodes vs. Query Range
450
400
350
Best Case (Expected)
300
Closest Parent
All Nodes
250
200
150
100
50
0
0
0.05
0.1
0.2
Query Size as % of Value Range
0.5
1
(Uniform value distribution, 20x20 grid, ideal connectivity to (8) neighbors)
Outline
• TinyDB
•
•
•
•
•
– And demo!
Aggregate Queries
ACQP
Break
Adaptive Operator Placement
…
Outline
• TinyDB
•
•
•
•
•
– And demo!
Aggregate Queries
ACQP
Break
Adaptive Operator Placement
…
Adaptive & Decentralized
Operator Placement
• IPSN 2003 Paper
• Main Idea
– Place operators near data
sources
– Greater operator rate
 Closer placement
• For each operator
– Explore candidate
neighbors
– Migrate to lower cost
placements
– Via extra messages
Rate A
Rate B
Proper placement
depends on path
lengths and relative
rates!
“Adaptivity” in Databases
• Adaptivity : changing query plans on the fly
– Typically at the physical level
• Where the plan runs
• Ordering of operators
• Instantiations of operators, e.g. hash join vs merge join
– Non-traditional
• Conventionally, complete plans are built prior to execution
• Using cost estimates (collected from history)
– Important in volatile or long running environments
• Where a priori estimates are unlikely to be good
• E.g., sensor networks
Adaptivity for Operator
Placement
• Adaptivity comes at a cost
– Extra work on each operator, each tuple
– In a DBMS, processing per tuple is small
• 100’s of instructions per operator
– Unless you have to hit the disk!
• Costs in this case?
– Extra communication hurts
• Finding candidate placements (exploration)
– Cost advertisements from local node
– New costs from candidates
• Moving state (migration)
– Joins, windowed aggregates
Do Benefits Justify Costs?
• Not Evaluated!
– 3x reduction on messages vs. external
– Excluding exploration & migration costs
• Seems somewhat implausible, especially
given added complexity
– Hard to make migration protocol work
• Depends on ability to reliably quiesce child ops.
• What else could you do?
– Static placement
Summary
• Declarative QP
– Simplify data collection in sensornets
– In-network processing, query optimization for
performance
• Acquisitional QP
– Focus on costs associated with sampling data
• New challenge of sensornets, other streaming
systems?
• Adaptive Join Placement
– In-network optimization
– Some benefit, but practicality unclear
– Operator pushdown still a good idea
Open Problems
• Many; a few:
– In-network storage and operator placement
– Dealing with heterogeneity
– Dealing with loss
• Need real implementations of many of
these ideas
• See me! (madden@cs.berkeley.edu)
Questions / Discussion
Making TinyDB REALLY Work
• Berkeley Botanical Garden
• First “real deployment”
• Requirements:
– At least 1 month unattended operation
– Support for calibrated environmental sensors
– Multi-hop routing
• What we started with:
– Limited power management, no timesynchronization
– Motes crashed hard occasionally
– Limited, relatively untested multihop routing
Power Consumption in
Sensornets
• Waking current ~12mA
– Fairly evenly spread between sensors, processor, radio
• Sleeping current 20-100uA
• Power consumption dominated by sensing,
reception:
– 1s Power up on Mica2Dot sensor board
– Most mote apps use “always on” radio
• Completely unstructured communication
• Bad for battery life
Why Not Use TDMA?
• CSMA is very flexible: easy for new
nodes to join
– Reasonably scalable (relative to Bluetooth)
• CSMA implemented, available
– We wanted to build something that worked
Power Management Approach
Coarse-grained communication scheduling
Mote ID
1
… zzz …
Epoch (10s -100s of seconds)
… zzz …
2
3
4
5
time
2-4s Waking Period
Benefits / Drawbacks
• Benefits
– Can still use CSMA within waking period
• No reservation required: new nodes can join easily!
– Waking period duration is easily tunable
• Depending on network size
• Drawbacks
– Longer waking time vs. TDMA?
• Could stagger slots based on tree-depth
– No “guaranteed” slot reservation
• Nothing is guaranteed anyway
Challenges
•
•
•
•
Time Synchronization
Fault recovery
Joining, starting, and stopping
Parameter Tuning
– Waking period: Hardcode at 4s?
Time Synchronization
• All messages include a 5 byte time stamp
indicating system time in ms
– Synchronize (e.g. set local system time to timestamp)
with
• Any message from parent
• Any new query message (even if not from parent)
– Punt on multiple queries
– Timestamps written just after preamble is xmitted
• All nodes agree that the waking period begins
when (system time % epoch dur = 0)
– And lasts for WAKING_PERIOD ms
Wrinkles
• If node hasn’t heard from it’s parent for k epochs
– Switch to “always on” mode for 1 epoch
• If waking period ends
– Punt on this epoch
• Query dissemination / deletion via always-on basestation
and viral propagation
– Data messages as advertisements for running queries
– Explicit requests for missing queries
– Explicit “dead query” messages for stopped queries
• Don’t request the last dead query
Results
• 30 Nodes in the Garden
– Lasted 21 days
– 10% duty cycle
• Sample period = 30s, waking period = 3s
– Next deployment:
• 1-2% duty cycle
• Fixes to clock to reduce baseline power
• Should last at least 60 days
• Time sync test: 2,500 readings
– 5 readings “out of synch”
– 15s epoch duration, 3s waking period
Garden Data
Humidity vs. Time
101
104
109
110
111
36m
33m: 111
32m: 110
30m: 109,108,107
Rel Humidity (%)
95
85
75
65
55
45
35
Temperature vs. Time
20m: 106,105,104
10m: 103, 102, 101
Temperature (C)
33
28
23
18
13
8
7/7/03 7/7/03 7/7/03 7/7/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/9/03 7/9/03 7/9/03
9:40 13:41 17:43 21:45 1:47
5:49
9:51 13:53 17:55 21:57 1:59
6:01 10:03
Date
Data Quality
Sensor Id vs. Number of Results, Summarized by Parent
Sensor
12000
100 % of Results
10000
Number oF Messages
75%
8000
62%
61%
53%
6000
57%
54%
55%
52%
48%
37%
4000
42%
51%
38%
35%
22%
2000
3%
0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Sensor Id
Parent Sensor:
0
1
2
4
5
6
8
9
10
11
12
13
16
Download