A Database Perspective on Sensor Networks Philippe Bonnet Cornell University

advertisement
1
A Database Perspective on
Sensor Networks
Philippe Bonnet
Cornell University
bonnet@cs.cornell.edu
2
Outline
• Introduction
– Applications
– Sensor Networks & Database Technology
• Part I: Sensor Networks
– What are the capabilities of sensor nodes and of sensor
networks? What is the nature of sensor data?
• Part II: Database Technology
– What are the relevant aspects of DB technology? Can
they be applied in the context of sensor networks? What
are the new problems?
3
Sensor-based Application #1
http://www.spyplanes.com/
http://www.millennium.berkeley.edu/tinyos/uav.html
4
Sensor-based Application #2
Interne
t
http://www.media.mit.edu/resenv/vehicles.html
http://www.media.mit.edu/resenv/ (Ara Knaian’s thesis)
5
Sensor-based Application #3
Long-range
Radio
http://birds.cornell.edu/
6
Area Monitoring Applications
Declarative Access
Signal Processing
(Sensor Tasking)
•
•
•
•
•
Energy Efficient
Scalable
Accurate
Reliable
Low Latency
7
Area Monitoring Applications
On-demand
Sensor Tasking
Application #3
(fixed point for data collection)
Application #1
(mobile point for data collection)
Predefined
access to
sensor data
Application #2
One-Time
Sensor Tasking
On-demand
access
to sensor data
8
Declarative Access
to Sensor Data
Example #1: Every minute return the measurement obtained from
Region X.
Example #2: Whenever two sensors within 5 yards of each
other detect a bird then return their location.
Example #3: Every five minutes return the number of birds
detected in Region X.
• SQL Queries over a Sensor Network [T00][BS00]
– Access to large collection of sensors
– Associative access independent of the physical
organization of the sensor network
9
Database Analogy
Declarative
SQL Query
Data
Extraction
SQL Engine
Sensor
Network
Storage
Manager
Sensors
Data on Disk
10
Sensor Database System
Declarative
SQL Query
Sensor
Network
Sensors
SQL Engine
Storage
Manager
Data on Disk
Adapting database technology to support
declarative access to sensor data in the context
of area monitoring applications
11
Other Sensor-based Applications
• Condition-based maintenance
– Product Quality Monitoring
• Device management
– Smart office spaces
– Home automation
– Networked cars
• …
The opportunities for database technology
might exist but are less obvious
12
Part I: Sensor Networks
13
Issues from a
Database Perspective
• What is sensor data?
• How is sensor data accessed?
• What about data storage and processing
capabilities on sensor nodes?
• What is the cost of accessing sensor data?
• What kind of abstraction to use in order to
represent a sensor network?
• Ideas to reuse?
14
WINS NG Sensor Nodes
Analog
I/O
RF
Modem
DSP
Digital Control
I/O
Proc.
Real-Time
Interface
Processor
PowerPC
32 bits
Processor
Power
supply
http://www.sensoria.com/
GPS
Ethernet
15
Smart Dust Motes
Laser diode
III-V process
Passive CCR comm.
MEMS/ polysilicon
Active beam steering laser comm.
MEMS/optical quality
polysilicon
Analog I/O, DSP, Control
COTS CMOS
Sensor
MEMS/bulk, surface, ...
Power capacitor
Multi -layer ceramic
Solar cell
CMOS or III -V
Thick film battery
Sol/gel V 2O 5
1-2 mm
http://robotics.eecs.berkeley.edu/~pister/SmartDust/
16
COTS Macro Dust Motes
http://www-bsac.eecs.berkeley.edu/~shollar/macro_motes/macromotes.html
17
Processing Capabilities
• WINS NG :
– General Purpose Processor - PowerPC
• 66 MHz– 87 MIPS – 16 MB RAM
– DSP – TI5402
• 100 MHz, 25 ksps input, 5ksps output to processor
• Macro Motes:
– Micro-controller - AMTEL MCU
• 4 MHz, 8 kb of program memory, 512 of data memory.
• Idle, power down, power save modes.
18
Communication Capabilities
• Radio Frequency
– WINS NG
• WINS2.0 modem – 2.4 GHz - Frequency Hopping - 56 kbps –
30 m range
– Macro Motes
• RFM T1000 – 900 MHz - On/Off Key Encoding – 10 kbps –
20 m range
• Optical Communication
– Smart Dust
• Passive Corner Cube Reflector – On/Off Key Encoding
(downlink) - 1kbps link over 500 m range
19
Optical Networking
Top View of the Interrogator
Filter
CCD Camera
Polarizing
Beamsplitter
Quarter-wave
Plate
Lens
0.25% reflectance
on each surface
Frequency-Doubled
YAG Green Laser
Beam
Expander
45o mirror
J. M. Kahn, R. H. Katz and K. S. J. Pister, "Mobile Networking for Smart Dust",
ACM/IEEE Intl. Conf. on Mobile Computing and Networking (MobiCom 99).
20
Piconet
S
S
M
S
S
S
M
S
S
M
• Cluster: 1 Master / N Slaves
• Master synchronizes
communications in a cluster
(TDMA)
• Dual radio used in WINS NG
to allow for multi-hop
communication across
clusters
ftp://ftp.uk.research.att.com/pub/docs/att/tr.97.9.pdf
The Bluetooth Radio System: Jaap C. Haartsen. IEEE PC Feb 2000
22
Batteries
• Energy densities (Wh/L)
18650 Li-ion Cell Energy Density
Energy Density (Wh/L)
600
500
400
Energy
300
200
100
0
1995 1997 1999 2001 2003 2005 2007 2009
Year
Courtesy of
Marc Doyle,
DuPont
–
–
–
–
Li-ion: 500 (~1.8J/mm3)
Li/So2: 176
Alkaline: 80
Nickel Cadmium: 40
• Moore’s law does not
apply to batteries
Joe Paradiso’s survey of “renewable energy sources for the future
of mobile and embedded computing”
http://www.media.mit.edu/resenv/
23
Energy Consumption
• Smart Dust
– Objective: each mote
should consume less than 1
J / day (amount of energy
produced by solar cells)
– Towards 10 pJ/ instruction
for dedicated
microcontrollers
– 1nJ to transmit a bit with
CCR passive transmitter
• Macro Motes
– 1 J to transmit a bit; 0.5 J
to receive a bit (10kpbs &
10mW)
– 10 nJ / instructions
• WINS
– 10  J to transmit a bit (i.e.,
100 mW transmit power
and 100 ms to send a 32
bytes packet – very
conservative estimate)
– 1 nJ/ instructions
Executing an instruction costs orders of magniture less than
sending a bit of data
24
Signal Processing:
Basics
•
•
•
•
•
Measurement
Detection
Classification
Localization
Tracking
Timer
Time
Series
FFT
Adaptive
Normalizer
Energy
Detect
Decision
Event
No Event
Threshold
A time stamp is associated to
each signal processing output
Fundamentals of Statistical Signal Processing, Vol I&II by Steven McKay
25
Signal Processing:
Data Fusion
• Data Fusion
– In: Observations from
different sensors
– Out: Weight associated
to hypothesis
• Approach
– Inferences (Bayesian,
genetic algorithm, …)
– Peer-tasking
R.Brooks and S.Iyengar. Multi-sensor Fusion: Fundamentals and
Applications with Software. Prentice Hall.
26
RF Networking:
Directed Diffusion
• Publish-Subscribe interface
• Gradient based routing
– Data is sent on multiple routes
• Reinforcement learning
– Chooses good route
– Adapts to node failures
• In-network aggregation
SCADDS Project - http://www.isi.edu/scadds
DataSpaces
- http://www.cs.rutgers.edu/dataman
DSN Project
- http://www.east.isi.edu/projects/DSN/
27
Operating System: Requirements
• Compact scale
– Small footprint, efficient use of instruction set
• Efficient Multithreading
– Concurrency-intensive operations
• Sensor data + network data (+ GPS data)
• Efficient drivers
– Limited levels of abstractions
– Migration across hardware/software boundaries
• Modularity
– Composition of modules for each type of sensor node
– Support for mobile code
• Robust operations
– Memory management
28
Operating System:
tinyOS
application
Route map
router
sensor appln
Active Messages
packet
Radio Packet
Serial Packet
Temp
Radio byte
UART
i2c
SW
byte
HW
photo
bit
RFM
clocks
J.Hill, R.Szewczyk, A.Woo, S.Hollar, D.Culler, K.Pister
System Architecture Directions for Networked Sensors. ASPLOS 2000.
http://www.cs.berkeley.edu/~jhill/tos/
29
Design Space
Sensor Pack
WINS NG
Macro Motes
Star
topology
Smart Dust
Multi-hop
topology
Front-end
Front-end
“System
on a chip”
30
What is Sensor Data?
• Sensor data is generated by signal processing
functions
– Measurements
– Detections
– Classification
• Time stamp associated to each sensor data item
• Sensor data produced by individual sensors or
groups of sensors
– If no “peer tasking” is used then the group of sensors
that produce data is the group of sensors on which the
signal processing functions are invoked.
31
How is Sensor Data Accessed?
• Multi-hop RF network
– Front-end connected to gateways nodes
– Sensor nodes that produce data are sources, gateway
nodes are sinks.
– Processing can be pushed in multi-hop network in order
to trade increased local processing for reduced traffic.
• Optical network
– Front-end obtains data from all the nodes in its line of
sight.
– Star Topology.
32
What About Data Storage and
Processing Capabilities on the Nodes?
• Sensor pack
– Large processing capabilities and buffer space
• System on a chip
– Restricted processing capabilities and buffer space
• Data items should be processed as they are generated
• No elaborate processing on the sensor nodes
• No historical data is maintained
• Possible hierarchy of sensor nodes
– A few sensor packs arranged in a multi-hop network
– To each sensor pack is attached lots of miniature
sensors (system on a chip).
33
What is the Cost of Accessing
Sensor Data?
• Energy is the scarce resource
– Processing
– Storage
– Transmission
• Local processing is orders of magnitude
cheaper than transmission
– Propagation with nodes on the ground
accentuates this characteristic
34
What kind of abstraction to
represent a sensor network?
• G = (V,E)
– Vertices represent sensor nodes
– Edges represent connected sensor nodes
• Model#1: The graph of connected nodes is fully connected.
Each edge is annotated with the cost of the transmission
between any two nodes.
– Relies on routing layer
– How to estimate cost of transmission?
• Model#2: The graph of connected nodes is not fully connected.
An edge represents a single hop
– Relies on physical layer
– Stable for limited periods of time
35
Ideas to Reuse?
• Energy efficient, small footprint solutions
• Easy to reconfigure, “0 administration” systems
• Reinforcement learning
– Finding an optimal solution in a dynamic environment
• Event-based processing
– Streams of sensor data items need be processed as they
are produced
36
Break
37
Part II: Sensor Networks &
Databases
38
Declarative Access to
Sensor Data
• Sensors are data sources
• Queries to access sensor data regardless of
physical organization
Example #1: Every minute return the measurement obtained from
Region X.
Example #2: Whenever two sensors within 5 yards of each
other detect a bird then return their location.
Example #3: Every five minutes return the number of birds
detected in Region X.
39
Queries over a Sensor Network
• Do data fusion, directed diffusion, and query
processing share the same notion of query?
– Yes
• Collect, filter, correlate, aggregate sensor data
– … and No
• Data Fusion: hypothesis testing in a neighborhood
• Directed Diffusion: efficient, scalable cross-layer routing
• Query Processing: SQL queries over sensor data
• From a query processing viewpoint
– Support for data fusion?
– Integration with network routing?
40
Warehousing Approach
• Data is extracted from sensors and stored on a front-end
server
• Query processing takes place on the front-end.
Warehouse
Front-end
Sensor Nodes
41
Sensor Database System
• Sensor Database System supports
distributed query processing over a sensor
network
Sensor
DB
Sensor
DB
Sensor
DB
Sensor
DB
Front-end
Sensor
DB
Sensor
DB
Sensor
DB
Sensor Nodes
Sensor
DB
42
Sensor Database System
•
Characteristics of a Sensor Network: Streams of
data, uncertain data, large number of nodes,
multi-hop network, no global knowledge about
the network, failure is the rule, energy is the
scarce resource, limited memory, no
administration, …
1. Can existing database techniques be reused in
this new context? What are their limitations?
2. What are the new problems? What are the new
solutions?
43
Issues
•
•
•
•
•
•
•
Representing sensor data
Representing sensor queries
Processing query fragments on sensor nodes
Distributing query fragments
Adapting to changing network conditions
Dealing with site and communication failures
Deploying and Managing a sensor database system
44
Performance Metrics
• High accuracy
– Distance between ideal answer and actual answer?
– Ratio of sensors participating in answer?
• Low latency
– Time between data is generated on sensors and answer is
returned
• Limited resource usage
– Energy consumption:
E (J) = Wcpu (J/inst) * CPU (inst) + Wram (J/b) * RAM (b) +
Wmsg (J/msg sent) * nb msg sent + Wbdw (J/b) * bytes sent (b)
45
Representing
Sensor Data and Sensor Queries
• Sensor Data:
– Output of signal processing functions
• Time Stamped values produced over a given duration
– Inherently distributed
• Sensor Queries
– Conditions on time and space
• Location dependent queries
• Constraints on time stamps or aggregates over time windows
– Event notification
46
The COUGAR Model
detect
TimeStamp
T1
T2
T4
• Schema-Level
SensorId
#1
#2
#1
#3
In
90
90
90
90
Out
True
True
True
True
– Each type of sensor is
represented as an ADT
– To each signal-processing
function is associated an
ADT function that returns a
sequence
– A sequence associates sets
of records with positions
(elements in an ordered
domain).
47
The COUGAR Model
detect
TimeStamp
T1
T2
T4
SensorId
#1
#2
#1
#3
In
90
90
90
90
Out
True
True
True
True
Select R.s.detect(90).project(s1.sensorId)
From R
Where $every(60);
• Long-running SQL
queries
– Sequence functions over
sensor ADT functions
(returning sequences)
– New sensor data items
appended to sequence as
they are produced
– Materialized view updated
as sensor data items are
appended
P.Bonnet, J.Gehrke, P.Seshadri. Towards Sensor Database Systems. MDM’01
http://www.cs.cornell.edu/database/cougar
48
A Measure Theoretic
Probabilistic Data Model
Detection
TimeStamp
T1
T1
T2
T4
SensorId
#1
#2
#1
#3
In
90
90
90
90
Out
• Outputs of a signal processing
function might be continuous
probability distributions
• Extension of data model for
discrete probability distributions
using measure theory
• Specific model for
multidimensional parametric
distributions (e.g., Gaussians)
– Event probabilities
– Comparisons
T.Faradjian, J.Gehrke, P.Bonnet. A Model Theoretic Probabilistic Data Model.
Cornell Technical Report . December 2000.
49
WebDust
• Data Model
– DataSpaces: spatial
decomposition of physical
space
– Each sensor is an abstract data
type
• InfoDispensers
– Data aggregation devices
• Spatial Web
T.Imielinski, S.Goel. DataSpace – Querying
and Monitoring Deeply Networked Collections
in Physical Space. MobiDE 1999.
http://www.cs.rutgers.edu/dataman/webdust
– For organizing and
representing information
aggregated by InfoDispenders
50
Control Language in Sagres
• Data model
– Ontology that contains class
information
– World State that contains
device data
– XML encoding
• DevL language
– Rules are defined for each
device
– ECA model for querying
and updating the World
State
http://data.cs.washington.edu/ubiquitous/sagres/
51
Subscription Language in
LeSubscribe
• An event instance e matches a
subscription s if e provides a
– Similar to LDAP data model
binding for every attribute
– An event type is associated to
occurring in s and all predicates
a set of attributes
in s are true with respect to this
– An event instance includes a
binding
set of values
• Event Model
• Subscription Language
– A subscription is a
conjunction of conditions on
attributes
J.Pereira et al. Publish/Subscribe on the Web at Extreme Speed.
VLDB 2000.
52
Discussion
• Data Model
– Representing sensors and
signal-processing functions
• Abstract Data Types vs.
attribute-value pairs
– Capturing the temporal
aspect of sensor data
• Sequences vs. event model
• New operators on data
streams
– Representing uncertain data
• Probabilistic Data Model
– Data Format
• XML vs. byte array
• Query Language
– Manipulating sensor
data
• Long-running SQL
queries vs. active rules
– Need for a propagation
mechanism for sensor
data (as events)
53
Processing query fragments on
sensor nodes
• Processing query
fragments on sensor nodes
allows trading increased
processing on sensor
nodes for reduced network
traffic
– Valid trade-off in multi-hop
networks
• Need for a light-weight
query engine on sensor
nodes
• Limited Resources:
– How to scale down the
footprint of the query engine?
– How to manage the resource
consumption of the query
engine (including CPU, RAM
and energy)
• Event-based processing
– Query processing takes place
as data items are produced by
signal processing functions (or
obtained from other sensor
nodes). How does this impact
the architecture of the query
engine?
54
Light-weight query engines
• Commercial DBMS for palm-sized PCs including
query processing and replication capabilities
– Footprint limited to several hundred kbytes.
• PicoDBMS for the SmartCard
– Focus on query processing without RAM.
C.Bobineau, L.Bouganim, P.Pucheral, P.Valduriez. PicoDBMS:
Scaling down Database Techniques for the Smartcard. VLDB 2000.
• RISC-style Database System
S.Chaudhuri, G.Weikum. Rethinking Database System Architecture:
Towards a Self-Tuning RISC-style Database System
55
Discussion
• Need for scaled down database systems
– PicoDBMS focuses on RAM
– Need for energy-aware query processing: managing
CPU mode to reduce energy usage
M.Weiser et al. Scheduling for reduced CPU usage. OSDI 1994.
• Need for composition of database components
– Building systems adapted to sensor capabilities (RAM,
CPU, energy) – tinyOS argument - similar to wrapper
generators objective.
– Predictable performances for capacity planning and
admission control
56
Distributing query fragments
• Because producing and transmitting data is energy
expensive, only the sensors involved in a query
should be tasked to produce and transmit data.
• When placing query fragments, the system should
consider the performance trade-off between
increased processing on the nodes and reduced
network traffic
– Accuracy
– Response Time
– Resource Usage
Cost model or Admission Control?
57
Distributing query fragments
• Distributed Database Systems assume
– A centralized optimizer has global knowledge about all
the nodes
– Meta-data is static
• This assumptions is challenged in the context of
large-scale multi-hop sensor networks:
– No global knowledge
– Mobile sensors
– Meta-data is dynamic
Decentralized Meta-data Management
58
Decentralized
Meta-data Management
• No global knowledge
– Resource Discovery on the Internet
• Index structure imposed on the network
Astrolabe
- http://www.cs.cornell.edu/Info/People/rvr/astrolabe/
Tapestry (OceanStore) - http://oceanstore.cs.berkeley.edu/
• Dynamic Meta-data
– Indexing Moving Objects
S.Salteis et al. Indexing the routes of Continuously Moving Objects. SIGMOD 2000
O.Wolfson et al. Location Prediction and Queries for Tracking Moving Objects.
ICDE 2000.
– Decisions taken at one point in time might be
challenged later on!
59
Cost Model or
Admission Control?
• Mariposa
– Each autonomous site bids for queries in order to
increase the value of a reward function
http://s2k-ftp.cs.berkeley.edu:8000/mariposa
• Quality of Service and Query Processing
– Budget associated to each query
• Accuracy, Latency, Resource Usage
– The system guarantees that each query is evaluated
within the given budget
• Admission Control
• Monitoring and Adaptation
http://www.db.fmi.uni-passau.fr:8000/projects/OG
60
Discussion
• Decentralized Meta-data management
– Adapting data structures defined for resource discovery
on the Internet seems promising
– Dealing with continuously changing meta-data
– Similar problem for large-scale mediator systems
• Decentralized Query Planning
– Query Decomposition
• Bottom-up? Top Down?
– Negotiation between sites to reach agreement on which
site processes which query fragments
• Need for adaptation and renegotiations when meta-data change
61
Adapting to changing network
conditions
• During query executions streams of data flow
from a large number of sensors to front-ends or
between sensors
– Dataflow engine
• Because of the nature of sensor data and because
of congestion or failures it is impossible to predict
how data will be obtained at a query processing
site.
– Adaptive query processing at each site
62
Split
Split
Split
Split
Split
Split
Split
Dataflow Engines
Op
Op
Op
Op
Op
Op
Op
Merge
Merge
Merge
Merge
Merge
Merge
Merge
• Same set of operations (query
fragment) performed in parallel
on multiple sites
• Mechanisms for load balancing
– River: over a cluster
– Mayr et al.: over heterogeneous
resources
Telegraph: http://telegraph.cs.berkeley.edu/
River: http://now.cs.berkeley.edu/River/
http://www.research.microsoft.com/~gray/river
Heterogeneous Resources: http://www.cs.cornell.edu/mayr
63
Adaptive Query Processing
Eddy
• Given a query
fragment: for each
record, which operator
should be executed
next?
• Decision based on
“back pressure” at the
queue associated to
each operator
– Reinforcement learning
Ron Avnur and Joseph M. Hellerstein . Eddies: Continuously
Adaptive Query Processing. SIGMOD 2000
64
Discussion
• Integration of adaptive query processing with
dataflow engines over a sensor network
– How to take site or communication failure into
account?
• Using reinforcement learning to take decisions over multiple
dataflows?
– How to establish dataflow?
• No centralized site that establishes a dataflow. Need to take
mobile sites into account.
• Need for distributed scheduling. Data driven control might not
be sufficient. Using admission control to establish dataflow
schedules?
65
Dealing with Site or
Communication Failures
– Fault-tolerance mechanisms
• Because sensors run out of
for intermediate query
energy, site and
processing sites
communication failures are
– Trading resource usage and
the rule and not the
delay for increased
exception in a sensor
accuracy in case of
network
communication failure
• Taking site or
• Assessing the quality of
communication failure into
each answer
account in dataflow
– Approximate Query
Processing
processing:
– Sensor data is uncertain in the
first place. Combining
uncertainty and unavailability?
– Quality of Service
• Accuracy requirement
• The system guarantees that
requirements are met
66
Deploying and Managing a
Sensor Database System
• Sensor networks should be
deployed and left
unattended.
• It should be easy to add or
remove sensor nodes.
• A sensor database system
should
– Take advantage of all
sensors in the system
– Be as easy to deploy and
manage as all other
components
• Need for mechanisms to
acquire and distribute metadata
• Need for mechanisms to
adjust dataflow depending
on the status of the sensor
network
• It should be easy to
configure, install and reboot
sensor database components
– Risc-style architecture?
67
Summary
• What database techniques
can be reused?
– Data model and query
languages
• Sequences
• Subscription languages
– Adaptive query processing
– Small footprint and modular
architecture for query
engine
• What is new?
– Uncertain data and
unavailable data
– Decentralized meta-data
management and query
planning
– Combining dataflow engine
and adaptive query
processing
– Failure handling in dataflow
engines
– Quality of service and
query processing
68
Other Issues
• Historical analysis over data cached in the
sensor network
Example: What was the average temperature
in Region X between 10 am and 1 pm yesterday.
• Asynchronous query processing
– User submits a query at a given location and
obtains the answer later on at a different
location
69
Queries over a Sensor Network
• Support for data
fusion
– Peer-tasking: extending
dataflow dynamically
– Fully decentralized
system: each sensor
node can submit a
query
• Integration with
network routing
– Sharing meta-data
– Dataflow engine as
application in a cross
layer routing mechanism
– Quality of service or
cost information
provided by routing
layer
70
Acknowledgements
DARPA Sensit Program
http://www.darpa.mil/ito/research/sensit/
Many thanks to Steve Beck, Richard
Brooks, Jason Hill, Bill Kaiser, Donald
Kossman, Sri Kumar, Tobias Mayr, Kris
Pister, Joe Paradiso
Download