middleware_tutorial

advertisement
Database Middleware for
Sensor Networks
Sam Madden
Assistant Professor, MIT
madden@csail.mit.edu
Slides prepared with Wei Hong
1
Motivation
• Sensor networks (aka sensor webs, emnets) are here
– Several widely deployed HW/SW platforms
• Low power radio, small processor, RAM/Flash
– Variety of (novel) applications: scientific, industrial, commercial
– Great platform for mobile + ubicomp experimentation
• Real, hard research problems to be solved
– Networking, systems, languages, databases
– Central problem: ease of access, appropriate programming
abstractions
I will summarize:
– Low-level sensornet issues
– A particular middleware architecture:
• TinyDB + TASK
Berkeley
Mote
– Current and future research middleware ideas
2
Some Sensornet Apps
redwood forest
microclimate
monitoring
smart cooling
in data centers
http://www.hpl.hp.com/research/dca/smart_cooling/
condition-based
maintenance
structural
integrity
And More…
• Homeland security
• Container monitoring
• Mobile environmental apps
• Bird tracking
• Zebranet
• Home automation
• Etc!
Architectural Overview
External Tools
Client Tools
GUIs,etc
Middleware
Internet
Stable Store
(DBMS)
Directed Diffusion
COUGAR
Field Tools
Local Servers
Sensor Network
TinyDB
Middleware Issues:
APIs for current + historical access?
Which data when?
How to act on data?
4
Network and node status?
Declarative Queries
• Programming Apps is Hard
–
–
–
–
–
Limited power budget
Lossy, low bandwidth communication
Require long-lived, zero admin deployments
Distributed Algorithms
Limited tools, debugging interfaces
• Queries abstract away much of the complexity
– Burden on the database developers
– Users get:
• Safe, optimizable programs
• Freedom to think about apps instead of details
5
TinyDB: Declarative Query Interface to
Sensornets
• Platform: Berkeley Motes + TinyOS
• Continuous variant of SQL : TinySQL
• Power and data-acquisition based innetwork optimization framework
• Extensible interface for aggregates, new
types of sensors
6
Agenda
• Part 1 : Sensor Networks (40 mins)
– TinyOS
– NesC
• Part 2: TinyDB + TASK (50 mins)
– Data Model and Query Language
– Software Architecture
• 30 minute break
• Part 3: Alternative Middleware (1:30 mins)
Architectures + Research Directions
• Finish around 12
7
Part 1
• Sensornet Background
• Motes + Mote Hardware
– TinyOS
– Programming Model + NesC
• TinyOS Architecture
– Major Software Subsystems
– Networking Services
8
Sensor Networks: a hot topic
• New university courses
• New conferences
– ACM SenSys, IEEE IPSN, etc.
• New industrial research lab projects
– Intel, PARC, MSR, HP, Accenture, etc.
• Startup companies
– Crossbow, Dust, Ember, Sensicast, Moteiv, etc.
• Media Buzz
– Over 30 news articles since July 2002 covering IntelBerkeley/UC Berkeley sensor network activities
– One of 10 emerging technologies that will change the
world – MIT Technology Review
9
Why Now?
• Commoditization of radio hardware
– Cellular and cordless phones, wireless communication
• Low cost -> many/tiny -> new applications!
• Real application for ad-hoc network research from
the late 90’s
• Coming together of EE + CS communities
11
Motes
Mica2Dot uProc: 4Mhz, 8 bit Atmel RISC
Mica Mote
Radio:
40 kbit 900/450/300 MHz or
250 kbit 2.5GHz (MicaZ 802.15.4)
Memory:
4 K RAM / 128 K Program Flash /
512 K Data Flash
Power: 2 x AA or coin cell
iMote
Telos Mote
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
uProc: 8Mhz, 16 bit TI RISC
Radio: 250 kbit 2.5GHz (802.15.4)
Memory:
2 K RAM / 60 K Program Flash /
512 K Data Flash
Power: 2 x AA
uProc: 12Mhz, 16 bit ARM
Radio: Bluetooth
Memory:
64K SRAM / 512 K Data Flash
Power: 2 x AA
12
History of Motes
• Initial research goal wasn’t hardware
– Has since become more of a priority with emerging hardware
needs, e.g.:
• Power consumption
• (Ultrasonic) ranging + localization
– MIT Cricket, NEST Project
• Connectivity with diverse sensors
– UCLA sensor board
– Even so, now on the 5th generation of devices
•
•
•
•
Costs down to ~$50/node (Moteiv, Dust)
Greatly improved radio quality
Multitude of interfaces: USB, Ethernet, CF, etc.
Variety of form factors, packages
13
Motes vs. Traditional
Computing
•
•
•
•
Embedded OS
Lossy, Adhoc Radio Communication
Sensing Hardware
Severe Power Constraints
14
NesC/TinyOS
• NesC: a C dialect
for embedded
programming
– Components, “wired
together”
– Quick commands
and asynch events
• TinyOS: a set of
NesC components
– hardware
components
– ad-hoc network
formation &
maintenance
– time
synchronization
Think of the pair as a programming environment
Radio Communication
• Low Bandwidth Shared Radio Channel
– ~40kBits on motes
– Much less in practice
• Encoding, Contention for Media Access (MAC)
• Very lossy: 30% base loss rate
– Argues against TCP-like end-to-end retransmission
• And for link-layer retries
• Generally, not well behaved
16
From Ganesan, et al. “Complex Behavior at Scale.” UCLA/CSD-TR 02-0013
Types of Sensors
• Sensors attach via daughtercard
•Weather
–Temperature
–Light x 2 (high
intensity PAR, low
intensity, full
spectrum)
–Air Pressure
–Humidity
•Vibration
–2 or 3 axis accelerometers
•Tracking
–Microphone (for ranging
and acoustic signatures)
–Magnetometer
• GPS
• RFID Reader
17
Non-Volatile Storage
• EEPROM
–
–
–
–
512K off chip, 32K on chip
Writes at disk speeds, reads at RAM speeds
Interface : random access, read/write 256 byte pages
Maximum throughput ~10Kbytes / second
• MatchBox Filing System
– Provides a Unix-like file I/O interface
– Single, flat directory
– Only one file being read/written at a time
18
Power Consumption and
Lifetime
• Power typically supplied by a small battery
– 1000-2000 mAH
– 1 mAH = 1 milliamp current for 1 hour
• Typically at optimum voltage, current drain rates
– Power = Watts (W) = Amps (A) * Volts (V)
– Energy = Joules (J) = W * time
• Lifetime, power consumption varies by application
– Processor: 5mA active, 1 mA idle, 5 uA sleeping
– Radio: 5 mA listen, 10 mA xmit/receive, ~20mS / packet
– Sensors: 1 uA -> 100’s mA, 1 uS -> 1 S / sample
19
Energy Usage in A Typical
Data Collection Scenario
Percentage of Total Energy
Percentage of Total Power
• Each
mote collects 1 sample of Processor
(light,humidity)
Power Consumption Breakdown
Energy Breakdown
data every 10 seconds, forwards it
50
• 90
Each mote can “hear” 10 other
motes
45
80
40
• 70
Process:
35
60– Wake up, collect samples (~ 1 second)
30
50– Listen to radio for messages to25
forward (~1 second)
20
40– Forward data
15
30
20
10
10
5
0
Idle
0
Radio
Sensors
Processor
Hardware Element
Waiting
for Radio
Waiting
for
Sensors
Sending
Processing Phase
20
Sensors: Slow, Power Hungry, Noisy
Time of
of Day
Day vs.
vs. Light
Light
Time
200
Chamber Sensor
Chamber Sensor
Sensor 69 (Median of Last 10)
Sensor 69
180
160
140
Lux
Light
(Lux)
120
100
80
60
40
20
0
20:09
20:38
21:07
21:36
22:04
22:33
23:02
23:02
-20
23:31
23:31
0:00
0:00
0:28
0:28
0:57
0:57
1:26
1:26
21
Time
Time
of Day
TinyOS: Getting Started
• The TinyOS home page:
– http://webs.cs.berkeley.edu/tinyos
– Start with the tutorials!
• The CVS repository
– http://sf.net/projects/tinyos
• The NesC Project Page
– http://sf.net/projects/nescc
• Crossbow motes (hardware):
– http://www.xbow.com
• Intel Imote
– www.intel.com/research/exploratory/motes.htm.
22
Part 2
The Design and Implementation
of TinyDB
23
Part 2 Outline
•
•
•
•
•
•
•
TinyDB Overview
Data Model and Query Language
TinyDB Java API and Scripting
Demo with TinyDB GUI
TinyDB Internals
Extending TinyDB
TinyDB Status and Roadmap
24
TinyDB Revisited
• High level abstraction:
– Data centric programming
– Interact with sensor
network as a whole
– Extensible framework
• Under the hood:
– Intelligent query processing:
query optimization, power
efficient execution
– Fault Mitigation:
automatically introduce
redundancy, avoid problem
areas
SELECT MAX(mag)
FROM sensors
WHERE mag > thresh
SAMPLE PERIOD 64ms
App
Query,
Trigger
Data
TinyDB
Sensor Network
25
Feature Overview
•
•
•
•
•
•
Declarative SQL-like query interface
Metadata catalog management
Multiple concurrent queries
Network monitoring (via queries)
In-network, distributed query processing
Extensible framework for attributes,
commands and aggregates
• In-network, persistent storage
26
Architecture
TinyDB GUI
TinyDB Client API
JDBC
PC side
Mote side
0
0
4
TinyDB query
processor
2
1
5
Sensor network
DBMS
83
6
7
27
Data Model
• Entire sensor network as one single, infinitely-long logical
table: sensors
• Columns consist of all the attributes defined in the network
• Typical attributes:
– Sensor readings
– Meta-data: node id, location, etc.
– Internal states: routing tree parent, timestamp, queue length,
etc.
• Nodes return NULL for unknown attributes
• On server, all attributes are defined in catalog.xml
• Discussion: other alternative data models?
28
Query Language (TinySQL)
SELECT <aggregates>, <attributes>
[FROM {sensors | <buffer>}]
[WHERE <predicates>]
[GROUP BY <exprs>]
[SAMPLE PERIOD <const> | ONCE]
[INTO <buffer>]
[TRIGGER ACTION <command>]
29
Comparison with SQL
• Single table in FROM clause
• Only conjunctive comparison predicates in
WHERE and HAVING
• No subqueries
• No column alias in SELECT clause
• Arithmetic expressions limited to column
op constant
• Only fundamental difference: SAMPLE
PERIOD clause
30
TinySQL Examples
“Find the sensors in bright
nests.”
Sensors
1
SELECT nodeid, nestNo, light
FROM sensors
WHERE light > 400
EPOCH DURATION 1s
Epoch Nodeid nestNo Light
0
1
17
455
0
2
25
389
1
1
17
422
1
2
25
405
31
TinySQL Examples (cont.)
2 SELECT AVG(sound)
FROM sensors
EPOCH DURATION 10s
3 SELECT region, CNT(occupied)
AVG(sound)
FROM sensors
GROUP BY region
HAVING AVG(sound) > 200
EPOCH DURATION 10s
“Count the number occupied
nests in each loud region of
the island.”
Epoch
region
CNT(…)
AVG(…)
0
North
3
360
0
South
3
520
1
North
3
370
1
South
3
520
Regions w/ AVG(sound) > 200
32
Event-based Queries
• ON event SELECT …
• Run query only when interesting events
happens
• Event examples
– Button pushed
– Message arrival
– Bird enters nest
• Analogous to triggers but events are userdefined
33
Query over Stored Data
•
•
•
•
•
Named buffers in Flash memory
Store query results in buffers
Query over named buffers
Analogous to materialized views
Example:
– CREATE BUFFER name SIZE x (field1 type1, field2 type2, …)
– SELECT a1, a2 FROM sensors SAMPLE PERIOD d INTO
name
– SELECT field1, field2, … FROM name SAMPLE PERIOD d
34
Using the Java API
• SensorQueryer
– translateQuery() converts TinySQL string into
TinyDBQuery object
– Static query optimization
• TinyDBNetwork
– sendQuery() injects query into network
– abortQuery() stops a running query
– addResultListener() adds a ResultListener that is invoked
for every QueryResult received
– removeResultListener()
• QueryResult
– A complete result tuple, or
– A partial aggregate result, call mergeQueryResult() to
combine partial results
• Key difference from JDBC: push vs. pull
35
Writing Scripts with TinyDB
• TinyDB’s text interface
– java net.tinyos.tinydb.TinyDBMain –run
“select …”
– Query results printed out to the console
– All motes get reset each time new query
is posed
• Handy for writing scripts with shell,
perl, etc.
36
Using the GUI Tools
• Demo time
37
Inside TinyDB
SELECT
T:1, AVG: 225
AVG(temp) Queries
Results T:2, AVG: 250
WHERE
light > 400
Multihop
Network
Query Processor
Aggavg(temp)
~10,000
Lines Embedded C Code
Filter
Name: temp
light >
400
got(‘temp’)
~5,000
LinesSamples
(PC-Side)
Java Time to sample: 50 uS
get (‘temp’) Tables
Cost to sample: 90 uJ
Schema
~3200 Bytes
RAM (w/ 768 byte
heap) Table: 3
Calibration
getTempFunc(…)
Units: Deg. F
TinyOS code
~58 kB compiled
Error: ± 5 Deg F
Get f Program)
: getTempFunc()…
(3x larger than 2nd largest TinyOS
38
TinyDB
Tree-based Routing
• Tree-based routing
– Used in:
• Query delivery
• Data collection
• In-network aggregation
– Relationship to indexing?
Q:SELECT …
A
Q
R:{…}
Q
R:{…}
B
Q
R:{…}Q
Q
D
R:{…}Q
C
Q
Q
R:{…}
Q
Q
Q
F
E
Q
39
Power Consumption and
Lifetime
• Power typically supplied by a small battery
– At full power, device will last 2-3 days -> Critical Constraint
• Lifetime, power consumption varies by application
– Scales with “duty cycle” : amount of time on
– Low data rate (< 1 sample / 30 secs) : > 6 months possible from AA batteries
Current
Sensor A
Time
Sensor
Sensor B
B
Must
Synchronize!
Sleeping
Radio On, Processing
Fundamental challenge: distributed coordination with low power!
Transmitting
40
Time Synchronization
• All messages include a 5 byte time stamp indicating system
time in ms
– Synchronize (e.g. set system time to timestamp) with
• Any message from parent
• Any new query message (even if not from parent)
– Punt on multiple queries
– Timestamps written just after preamble is xmitted
• All nodes agree that the waking period begins when (system
time % epoch dur = 0)
– And lasts for WAKING_PERIOD ms
• Adjustment of clock happens by changing duration of sleep
cycle, not wake cycle.
41
Extending TinyDB
• Why extending TinyDB?
–
–
–
–
New
New
New
New
sensors  attributes
control/actuation  commands
data processing logic  aggregates
events
• Analogous to concepts in objectrelational databases
42
Adding Attributes
• Types of attributes
– Sensor attributes: raw or cooked sensor
readings
– Introspective attributes: parent,
voltage, ram usage, etc.
– Constant attributes: constant values
that can be statically or dynamically
assigned to a mote, e.g., nodeid, location,
etc.
43
Adding Attributes (cont)
• Interfaces provided by Attr component
– StdControl: init, start, stop
– AttrRegister
•
•
•
•
command registerAttr(name, type, len)
event getAttr(name, resultBuf, errorPtr)
event setAttr(name, val)
command getAttrDone(name, resultBuf, error)
•
•
•
•
•
command startAttr(attr)
event startAttrDone(attr)
command getAttrValue(name, resultBuf, errorPtr)
event getAttrDone(name, resultBuf, error)
command setAttrValue(name, val)
– AttrUse
44
Adding Attributes (cont)
•
•
Steps to adding attributes to TinyDB
1) Create attribute nesC components
2) Wire new attribute components to
TinyDBAttr configuration
3) Reprogram TinyDB motes
4) Add new attribute entries to catalog.xml
Constant attributes can be added on the
fly through TinyDB GUI
45
Adding Aggregates
• Step 1: wire new nesC components
TinyDB Aggregation Framework
Aggregate
SumM.nc
AggregateUseM.nc
stateSize(ID, ...)
merge(ID, ...)
Operator
AggregateUse
AggOperator.nc
update(ID, ...)
Aggregate
CountM.nc
hasData(ID,...)
finalize(ID,...)
init(ID, ...)
getProperties(ID)
Aggregate
AggOperatorConf.nc
AvgM.nc
46
Adding Aggregates (cont)
• Step 2: add entry to catalog.xml
<aggregate>
<name>AVG</name>
<id>5</id>
<temporal>false</temporal>
<readerClass>net.tinyos.tinydb.AverageClass</readerClass
>
</aggregate>
• Step 3 (optional): implement reader class in Java
– a reader class interprets and finalizes aggregate state
received from the mote network, returns final result as a
string for display.
47
TinyDB Status
• Latest released with TinyOS 1.1 (9/03)
– Install the task-tinydb package in TinyOS 1.1 distribution
– First release in TinyOS 1.0 (9/02)
– Widely used by research groups as well as industry pilot
projects
• Successful deployments in Intel Berkeley Lab and
redwood trees at UC Botanical Garden
– Largest deployment: ~80 weather station nodes
– Network longevity: 4-5 months
48
The Redwood Tree Deployment
• Redwood Grove in UC Botanical
Garden, Berkeley
• Collect dense sensor readings to
monitor climatic variations
across
–
–
–
–
altitudes,
angles,
time,
forest locations, etc.
• Versus sporadic monitoring
points with 30lb loggers!
• Current focus: study how dense
sensor data affect predictions
of conventional tree-growth
models
49
Data from Redwoods
Humidity vs. Time
101
104
109
110
111
36m
33m: 111
32m: 110
30m: 109,108,107
Rel Humidity (%)
95
85
75
65
55
45
35
20m: 106,105,104
Temperature vs. Time
10m: 103, 102, 101
Temperature (C)
33
28
23
18
13
8
7/7/03 7/7/03 7/7/03 7/7/03 7/7/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/8/03 7/9/03 7/9/03 7/9/03 7/9/03
9:40 13:11 16:43 20:15 23:46 3:18 6:50 10:21 13:53 17:25 20:56 0:28 4:00 7:31 11:03
Date
50
TASK
51
A SensorNet Dilemma
• Sensors still packaged like HeathKits
– Pretty hard to cope with out of the box
• Bare metal encourages one-off applications
– Inhibits reuse
• Deployment not intuitive
– No configuration/monitoring tools
• SensorNet PhD Factor
– Today ~2.5 PhDs needed to deploy a SensorNet
– Needs to be Zero
52
TASK Design Requirements
•
•
•
•
•
Ease of S/W Installation
Deployment tools
Reconfigurability
Health/Mgmt Monitoring
Network Reliability
Guarantee
• Interpretable Sensor
Results
• Tool Integration
• Audit Trails
• Lifetime estimates
~ For Developers ~
• Familiar API
• Extensibility of S/W
• Modular services
53
Tiny Application Sensor Kit
External Tools
TASK Client Tools
TaskView
Internet
Stable Store
(DBMS)
TASK Field Tools
SensorNet Appliance
TASK Server
TinyDB Sensor Network
•
•
•
•
Simplicity vs. Functionality
Modularity
Remote control
54
Fault Tolerant
SensorNet Appliance
• Intelligent Gateway
SNA
http,
other
ODBC
DBMS
TASK
Server
–
–
–
–
Proxy for the sensornet
Distributes query
Stages results
Manages configuration
• Components
–
–
–
–
TASK Server
TinyDB Client (Java)
DBMS (PostgreSQL)
WebServer (Apache)
TinyDB Client
SensorNet
55
Tools
• Field Tool
– In-situ diagnostics
• TaskView
– Integrated tool for
management and
monitoring
56
For more information
• http://triplerock.cs.bekeley.edu/tinydb
57
Part 3
Middleware Architecture and Research
Topics
58
Architectural Overview
External Tools
Client Tools
GUIs,etc
Middleware
Internet
Stable Store
(DBMS)
Field Tools
Local Servers
Sensor Network
TinyDB
59
What’s Left?
• TinyDB and TinyOS provide a reasonable lowlevel substrate
• TASK sufficient for many data collection apps
• But… there are other architecture issues
– Efficiency concerns
• Currently transmit readings from all sensors on each epoch
• Variable, context sensitive rates…
– Data quality issues
• Missing and faulty sensors?
– Architectural issues
• Actuation / closed loop issues stuff
• Disconnection, etc.
60
Sensor Network Research
• Very active research area
– Can’t summarize it all
• Focus: database-relevant research topics
– Some outside of Berkeley
– Other topics that are itching to be scratched
– But, some bias towards work that we find
compelling
61
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
62
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
63
Tiny Aggregation (TAG)
• In-network processing of aggregates
– Common data analysis operation
• Aka gather operation or reduction in || programming
– Communication reducing
• Operator dependent benefit
– Across nodes during same epoch
• Exploit query semantics to improve
efficiency!
Madden, Franklin, Hellerstein, Hong. Tiny AGgregation (TAG), OSDI 2002.
64
Basic Aggregation
• In each epoch:
– Each node samples local sensors once
– Generates partial state record (PSR)
• local readings
• readings from children
1 4
2 3
– Outputs PSR during assigned comm. interval
• Interval assigned based on depth in tree
• At end of epoch, PSR for whole network output at
root
• New result on each successive epoch
3 3
4 2
5 Interval 1
65
Illustration: In-Network
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
Interval #
4
2
3
Interval 4
1
4
Sample
Period
5
1
2
3
3
2
1
4
Time
4
1
5
66
Illustration: In-Network
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
Interval #
2
1
4
4
3
Interval 3
5
1
2
3
2
2
4
1
4
5
67
Illustration: In-Network
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
Interval #
1
4
4
1
5
1
3
2
Interval 2
3
2
3
2
1
3
4
1
4
5
68
Illustration: In-Network
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
Interval #
2
3
2
2
4
5
1
3
Interval 1
1
4
4
1
5
1
3
4
5
5
69
Illustration: In-Network
Aggregation
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
Interval #
5
1
3
2
3
2
2
4
1
4
4
1
Interval 4
1
3
4
5
1
1
5
70
Illustration: In-Network
Aggregation
Interval #
SELECT COUNT(*)
FROM sensors
Sensor #
1
2
3
4
zzz
zzz
zzz
3
zzz
zzz
2
Interval 4
1
4
5
1
2
zzz
1
3
zzz
zzz
zzz
zzz
1
5
zzz
zzz
4
zzz
zzz
zzz
1
2
3
4
1
5
71
Aggregation Framework
• As in extensible databases, TinyDB supports any
aggregation function conforming to:
Aggn={finit, fmerge, fevaluate}
Finit {a0}
 <a0>
Partial State Record (PSR)
Fmerge {<a1>,<a2>}  <a12>
Fevaluate {<a1>}
 aggregate value
Example: Average
AVGinit
{v}
 <v,1>
AVGmerge {<S1, C1>, <S2, C2>}
 < S1 + S2 , C1 + C2>
AVGevaluate{<S, C>}
 S/C
Restriction: Merge associative, commutative
72
Taxonomy of Aggregates
• TAG insight: classify aggregates according to various
functional properties
– Yields a general set of optimizations that can automatically be
applied
Property
Partial State
Examples
MEDIAN : unbounded,
MAX : 1 record
Affects
Effectiveness of TAG
Monotonicity
COUNT : monotonic
AVG : non-monotonic
MAX : exemplary
COUNT: summary
MIN : dup. insensitive,
AVG : dup. sensitive
Hypothesis Testing, Snooping
Exemplary vs.
Summary
Duplicate
Sensitivity
Drives an API!
Applicability of Sampling,
Effect of Loss
Routing Redundancy
73
Use Multiple Parents
• Use graph structure
– Increase delivery probability with no communication overhead
• For duplicate insensitive aggregates, or
• Aggs expressible as sum of parts
– Send (part of) aggregate to all parents
SELECT COUNT(*)
• In just one message, via multicast
R
– Assuming independence, decreases variance
P(link xmit successful) = p
P(success from A->R) = p2
E(cnt) = c *
p2
Var(cnt) = c2 * p2 * (1 – p2)
V
# of parents = n
E(cnt) = n * (c/n * p2)
(c/n)2
Var(cnt) = n *
*
p2 * (1 – p2) = V/n
B
C
c
c/n
n=2
c/n
A
74
Multiple Parents Results
With Splitting
Benefit of Result Splitting
(COUNT query)
1400
1200
Avg. COUNT
No Splitting
• Better than
previous
analysis expected!
Critical
• Losses aren’t
Link!
independent!
• Insight: spreads data
over many links
1000
800
Splitting
No Splitting
600
400
200
0
(2500 nodes, lossy radio model, 6 parents per
node)
75
Acquisitional Query
Processing (ACQP)
• TinyDB acquires AND processes data
– Could generate an infinite number of samples
• An acqusitional query processor controls
– when,
– where,
– and with what frequency data is collected!
• Versus traditional systems where data is provided
a priori
Madden, Franklin, Hellerstein, and Hong. The Design of An
76
Acqusitional Query Processor. SIGMOD, 2003.
ACQP: What’s Different?
• How should the query be processed?
– Sampling as a first class operation
• How does the user control acquisition?
– Rates or lifetimes
– Event-based triggers
• Which nodes have relevant data?
– Index-like data structures
• Which samples should be transmitted?
– Prioritization, summary, and rate control
77
Operator Ordering: Interleave Sampling +
Selection
SELECT light, mag
FROM sensors
WHERE pred1(mag)
AND pred2(light)
EPOCH DURATION 1s
Traditional DBMS
(pred1)
(pred2)
At 1 sample / sec, total power savings
• could
E(sampling
mag) as
>> 3.5mW
E(sampling
be as much
 light)
1500 uJ vs.
uJ
Comparable
to 90
processor!
Correct ordering
(unless pred1 is very selective
and pred2 is not):
(pred1)
ACQP
Costly
(pred2)
Cheap
mag
light
mag
light
(pred2)
light
(pred1)
mag
78
Exemplary Aggregate
Pushdown
SELECT WINMAX(light,8s,8s)
FROM sensors
WHERE mag > x
EPOCH DURATION 1s
Traditional DBMS
WINMAX
(mag>x)
ACQP
WINMAX
(mag>x)
mag
• Novel, general
pushdown
technique
• Mag sampling is
the most
expensive
operation!
(light > MAX)
light
mag
light
79
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
80
Statistical Techniques
• Approximations, summaries, and sampling
based on statistics and statistical models
• Applications:
– Limited bandwidth and large number of nodes ->
data reduction
– Lossiness -> predictive modeling
– Uncertainty -> tracking correlations and changes
over time
– Physical models -> improved query answering
81
TinyDB Retrospective
Query
Data aggregation:
 Can reduce communication
TinyDB
SQL-style
query
Distribute
Collect
Declarative
query interface: query answer
 Sensor nets are not just for
or PhDs
data
 Decrease deployment time
Every
time step
82
Limitations of TinyDB approach
NewQuery
Query
TinyDB
SQL-style
Data collection:
query
 Every node
must wake up at every
time step
Distribute
Collect
 Data loss ignored
query
data
 distribution:
No quality guarantees
Query
 Wastes
resources
byquery
ignoring correlations
 Every
node must
receive
Redo
process
every
time
query
changes
Every
time step
83
Sensor net data is correlated
• Data is not i.i.d.  shouldn’t ignore
missing data
• Observing one sensor  information
about other sensors (and future
Spatial-temporal correlation
values)
• Observing one type of reading 
information about other local readings
84
BBQ: Model-driven data
acquisition
NewQuery
Query
Probabilistic
Model
Middleware Layer
posterior belief
0.4
0.3
Example model:
Multidimensional
Gaussian
0.2
0.1
SQL-style query
0
10
20
30
with
desired
Data
Condition
Strengths of model-based
data acquisition
confidence gathering
Dt on new
 Observe fewer
plan attributesobservations transition model




Exploit correlations
Reuse information between queries
Directly deal with missing data
Answer more complex (probabilistic) queries
85
0.4
0.3
0.2
0.1
0
10
20
30
Probabilistic models and queries
0.4
0.3
User’s perspective:
0.2
0.1
0
Query
10
20
30
SELECT nodeId, temp ± 0.5°C, conf(.95) FROM sensors
WHERE nodeId in {1..8}
1.0°C
System selects and observes subset of nodes
Observed nodes: {3,6,8}
Query result
Node
1
2
3
4
5
6
7
8
Temp.
17.3 18.1
17.4
16.1
19.2
21.3
17.5
16.3
Conf.
98% 95% 100%
99% 95% 100%
98% 100%86
Supported queries
• Value query
– Xi ±  with prob. at least 1-
• SELECT and Range query
– Xi[a,b] with prob. at least 1-
– which sensors have temperature greater than 25°C ?
• Aggregation
– average ±  of subset of attribs. with prob. > 1-
– combine aggregation and selection
– probability > 10 sensors have temperature greater
Queries
require
solution to integrals
than 25°C
?
 Many queries computed in closed-form
 Some require numerical integration/sampling87
Experimental results
50
OFFIC E
52
49
12
9
54
OFFIC E
51
53
QUIET
PHONE
11
8
C ONFER ENC E
16
15
10
13
14
7
17
18
STOR AGE
48
LAB
ELEC
C OPY
5
47
19
6
4
46
45
21
3
2
SERVER
44
K ITC HEN
39
37
42
41
38
36
23
33
35
40
22
1
43
20
29
27
31
34
25
32
30
28
24
26
• Redwood trees and Intel Lab datasets
• Learned models from data
– Static model
– Dynamic model – Kalman filter, time-indexed transition
probabilities
• Evaluated on a wide range of queries
88
Cost versus Confidence level
89
Obtaining approximate values
Query: True temperature value ± epsilon with confidence 95%
90
Next Step : Outliers and
Unusual Events
• Once we have a model of the expected behavior, we can:
– Detect unusual (low probability) events
– Predict missing values
OFF
AC OFF
• Often, there are several “expected” behavior modes, which we want to
AC ON
differentiate between
ON
–E.g., if we can characterize failure modes, we can discard them
• Applying well known probabilistic techniques to allow
TinyDB to deal with such issues.
91
IDSQ
• Similar idea: suppose you want to e.g.,
localize a vehicle in a field of sensors
• Idea: task sensors in order of best
improvement to estimate of some value:
– Choose leader(s)
• Suppress subordinates
• Task subordinates, one at a time
– Until some measure of goodness (error bound) is met
See “Scalable Information-Driven Sensor Querying and Routing for ad hoc Heterogeneous
Sensor Networks.” Chu, Haussecker and Zhao. Xerox TR P2001-10113. May, 2001. 92
Graphical Representation
Model location
estimate as a point
with 2-dimensional
Gaussian
uncertainty.
Residual 1
Residual 2
Area of residuals is equal
Principal Axis
S1
S2
Preferred
because it
reduces error
along principal 93
axis
Lots of Other Work with of
This Flavor
• Precision / Energy Tradeoff -- Want nodes to
sleep except when their data is needed
– Olston et al. Approximate Caching. SIGMOD ‘03.
– Cheng et al. Kalman Filters. SIGMOD ‘04.
- Lazaridis and Mehrotra. Approximate Selection Queries
over Imprecise Data. ICDE 2004.
- UCI Quasar Project
- Timeliness + Real Time Constraints
• John A. Stankovic etl al. Real Time Communication and
Coordination in Sensor Networks. Proceedings of the IEEE,
91(7), July 2003.
• Tian He et al. SPEED: a stateless protocol (ICDCS’03)
94
In-Net Regression
• Linear regressionX :vssimple
way to predict
Y w/ Curve Fit
future12values, identify outliers
y = 0.9703x - 0.0067
10
2
• Regression can be acrossRlocal
= 0.947or remote
values, 8multiple dimensions, or with high
degree6polynomials
– E.g., node
A readings vs. node B’s
4
– Or, location
(X,Y), versus temperature
2
E.g., over many nodes
0
1
3
5
7
9
Guestrin, Thibaux, Bodik, Paskin, Madden. “Distributed Regression: an Efficient95
Framework for Modeling Sensor Network Data .” Under submission.
In-Net Regression (Continued)
• Problem: may require data from all sensors to
build model
• Solution: partition sensors into overlapping
“kernels” that influence each other
– Run regression in each kernel
• Requiring just local communication
– Blend data between kernels
– Requires some clever matrix manipulation
• End result: regressed model at every node
– Useful in failure detection, missing value estimation
96
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
97
Heterogeneous Sensor
Networks
• Leverage small numbers of high-end nodes
to benefit large numbers of inexpensive
nodes
• Still must be transparent and ad-hoc
• Key to scalability of sensor networks
• Interesting heterogeneities
–
–
–
–
–
Energy: battery vs. outlet power
Link bandwidth: Chipcon vs. 802.11x
Computing and storage: ATMega128 vs. Xscale
Pre-computed results
Sensing nodes vs. QP nodes
98
Computing Heterogeneity with
TinyDB
• Separate query processing from sensing
– Provide query processing on a small number of nodes
– Attract packets to query processors based on “service
value”
• Compare the total energy consumption of the
network
•
•
•
•
No aggregation
All aggregation
Opportunistic aggregation
HSN proactive
aggregation
Mark Yarvis and York Liu, Intel’s Heterogeneous Sensor
Network Project,
ftp://download.intel.com/research/people/HSN_IR_Day_Poster_03.pdf.
99
5x7 TinyDB/HSN Mica2 Testbed
100
Data Packet Saving
Data Packet Saving
0.00%
% Change in Data Packet Count
-5.00%
• How many
aggregators are
desired?
• Does placement
matter?
-10.00%
-15.00%
11% aggregators
achieve 72% of max
data reduction
-20.00%
-25.00%
-30.00%
-35.00%
-40.00%
-45.00%
-50.00%
1
2
3
4
5
6
All (35)
Number of Aggregator
Data Packet Saving - Aggregator Placement
% Change in Data Packet Counnt
0.00%
-5.00%
-10.00%
-15.00%
-20.00%
-25.00%
Optimal placement 2/3
distance from sink.
-30.00%
-35.00%
-40.00%
-45.00%
-50.00%
25
27
29
31
Aggregator Location
All (35)
101
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
102
Occasionally Connected
Sensornets
internet
TinyDB Server
GTWY
TinyDB QP
Mobile GTWY
Mobile GTWY
Mobile GTWY
TinyDB QP
GTWY
TinyDB QP
103
Occasionally Connected
Sensornets Challenges
• Networking support
– Tradeoff between reliability, power consumption
and delay
– Data custody transfer: duplicates?
– Load shedding
– Routing of mobile gateways
• Query processing
– Operation placement: in-network vs. on mobile
gateways
– Proactive pre-computation and data movement
• Tight interaction between networking and QP
Fall, Hong and Madden, Custody Transfer for Reliable Delivery in Delay Tolerant
Networks, http://www.intel-research.net/Publications/Berkeley/081220030852_157.pdf .
104
Other Occasionally Connected
Work
• Kevin Fall. Delay Tolerant Networks.
SIGCOMM 2003.
• Juang et al. Enery efficient computing for
wildlife tracking. ASPLOS 2002.
• Li et al. Sending messages to mobile users
in disconnected ad-hoc wireless networks.
MOBICOM 2000.
• Shah et al. Data Mules. SNPA 2003.
105
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
106
Distributed In-network Storage
• Collectively, sensornets have large amounts
of in-network storage
• Good for in-network consumption or
caching
• Challenges
– Distributed indexing for fast query
dissemination
– Resilience to node or link failures
– Graceful adaptation to data skews
– Minimizing index insertion/maintenance cost
107
Example: DIM
• Functionality
– Efficient range query for
multidimensional data.
• Approaches
– Divide sensor field into bins.
– Locality preserving mapping
from m-d space to
geographic locations.
– Use geographic routing such
as GPSR.
E2= <0.6, 0.7>
E1 = <0.7, 0.8>
• Assumptions
– Nodes know their locations
and network boundary
– No node mobility
Q1=<.5-.7, .5-1>
Xin Li, Young Jin Kim, Ramesh Govindan and Wei Hong, Distributed Index
for Multi-dimentional Data (DIM) in Sensor Networks, SenSys 2003.
108
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
109
Closing the Loop
• Challenge: want more than data collection
– Condition-based sensing, rate adjustment
– Condition-based actuation
• E.g.,
– Kansal et al. Sensor Uncertainty Reduction Using Low
Complexity Actuation. IPSN 2004.
– work from Qiong Luo HKUST et al in CIDR.
– Various process control systems: ladder logic, SCADA,
etc.
• Questions:
– Appropriate languages
– Resource contention on actuators
– Closed-loop safety concerns
110
Topics
• Improving TinyDB Efficiency
– In-network aggregation
– Acquisitional Query Processing
• Alternative Architectures
– Statistical Techniques
– Heterogeneity
– Intermittent Connectivity
• New features
– In-network storage
– Closing the loop
– Integration with traditional databases
111
Alternative Middleware:
Integration into an Existing
DBMS
112
Concluding Remarks
• Sensor networks are an exciting emerging technology,
with a wide variety of applications
• Many research challenges in all areas of computer science
– Database community included
– Some agreement that a declarative interface is right
• TinyDB and other early work are an important first step
• But there’s lots more to be done!
– Real challenge is building appropriate middleware abstractions
113
Questions?
http://db.lcs.mit.edu/madden/middleware_tutorial.ppt
114
In-Network Join Strategies
• Types of joins:
– non-sensor -> sensor
– sensor -> sensor
• Optimization questions:
– Should the join be pushed down?
– If so, where should it be placed?
– What if a join table exceeds the memory
available on one node?
115
Choosing Where to Place
Operators
• Idea : choose a “join node” to run the operator
• Over time, explore other candidate placements
– Nodes advertise data rates to their neighbors
– Neighbors compute expected cost of running the
join based on these rates
– Neighbors advertise costs
– Current join node selects a new, lower cost node
Bonfils + Bonnet, Adaptive and Decentralized Operator
Placement for In-Network QueryProcessing IPSN 2003.
116
Topics
•
•
•
•
•
•
In-network aggregation
Acquisitional Query Processing
Heterogeneity
Intermittent Connectivity
In-network Storage
Statistics-based summarization and
sampling
• In-network Joins
• Adaptivity and Sensor Networks
• Multiple Queries
117
Adaptivity In Sensor Networks
• Queries are long running
• Selectivities change
– E.g. night vs day
• Network load and available energy vary
• All suggest that some adaptivity is needed
– Of data rates or granularity of aggregation
when optimizing for lifetimes
– Of operator orderings or placements when
selectivities change (c.f., conditional plans for
correlations)
• As far as we know, this is an open problem!
118
Multiple Queries and Work
Sharing
• As sensornets evolve, users will run many
queries simultaneously
– E.g., traffic monitoring
• Likely that queries will be similar
– But have different end points, parameters, etc
• Would like to share processing, routing as
much as possible
• But how? Again, an open problem.
119
Download