1 A Database Perspective on Sensor Networks Philippe Bonnet Cornell University bonnet@cs.cornell.edu 2 Outline • Introduction – Applications – Sensor Networks & Database Technology • Part I: Sensor Networks – What are the capabilities of sensor nodes and of sensor networks? What is the nature of sensor data? • Part II: Database Technology – What are the relevant aspects of DB technology? Can they be applied in the context of sensor networks? What are the new problems? 3 Sensor-based Application #1 http://www.spyplanes.com/ http://www.millennium.berkeley.edu/tinyos/uav.html 4 Sensor-based Application #2 Interne t http://www.media.mit.edu/resenv/vehicles.html http://www.media.mit.edu/resenv/ (Ara Knaian’s thesis) 5 Sensor-based Application #3 Long-range Radio http://birds.cornell.edu/ 6 Area Monitoring Applications Declarative Access Signal Processing (Sensor Tasking) • • • • • Energy Efficient Scalable Accurate Reliable Low Latency 7 Area Monitoring Applications On-demand Sensor Tasking Application #3 (fixed point for data collection) Application #1 (mobile point for data collection) Predefined access to sensor data Application #2 One-Time Sensor Tasking On-demand access to sensor data 8 Declarative Access to Sensor Data Example #1: Every minute return the measurement obtained from Region X. Example #2: Whenever two sensors within 5 yards of each other detect a bird then return their location. Example #3: Every five minutes return the number of birds detected in Region X. • SQL Queries over a Sensor Network [T00][BS00] – Access to large collection of sensors – Associative access independent of the physical organization of the sensor network 9 Database Analogy Declarative SQL Query Data Extraction SQL Engine Sensor Network Storage Manager Sensors Data on Disk 10 Sensor Database System Declarative SQL Query Sensor Network Sensors SQL Engine Storage Manager Data on Disk Adapting database technology to support declarative access to sensor data in the context of area monitoring applications 11 Other Sensor-based Applications • Condition-based maintenance – Product Quality Monitoring • Device management – Smart office spaces – Home automation – Networked cars • … The opportunities for database technology might exist but are less obvious 12 Part I: Sensor Networks 13 Issues from a Database Perspective • What is sensor data? • How is sensor data accessed? • What about data storage and processing capabilities on sensor nodes? • What is the cost of accessing sensor data? • What kind of abstraction to use in order to represent a sensor network? • Ideas to reuse? 14 WINS NG Sensor Nodes Analog I/O RF Modem DSP Digital Control I/O Proc. Real-Time Interface Processor PowerPC 32 bits Processor Power supply http://www.sensoria.com/ GPS Ethernet 15 Smart Dust Motes Laser diode III-V process Passive CCR comm. MEMS/ polysilicon Active beam steering laser comm. MEMS/optical quality polysilicon Analog I/O, DSP, Control COTS CMOS Sensor MEMS/bulk, surface, ... Power capacitor Multi -layer ceramic Solar cell CMOS or III -V Thick film battery Sol/gel V 2O 5 1-2 mm http://robotics.eecs.berkeley.edu/~pister/SmartDust/ 16 COTS Macro Dust Motes http://www-bsac.eecs.berkeley.edu/~shollar/macro_motes/macromotes.html 17 Processing Capabilities • WINS NG : – General Purpose Processor - PowerPC • 66 MHz– 87 MIPS – 16 MB RAM – DSP – TI5402 • 100 MHz, 25 ksps input, 5ksps output to processor • Macro Motes: – Micro-controller - AMTEL MCU • 4 MHz, 8 kb of program memory, 512 of data memory. • Idle, power down, power save modes. 18 Communication Capabilities • Radio Frequency – WINS NG • WINS2.0 modem – 2.4 GHz - Frequency Hopping - 56 kbps – 30 m range – Macro Motes • RFM T1000 – 900 MHz - On/Off Key Encoding – 10 kbps – 20 m range • Optical Communication – Smart Dust • Passive Corner Cube Reflector – On/Off Key Encoding (downlink) - 1kbps link over 500 m range 19 Optical Networking Top View of the Interrogator Filter CCD Camera Polarizing Beamsplitter Quarter-wave Plate Lens 0.25% reflectance on each surface Frequency-Doubled YAG Green Laser Beam Expander 45o mirror J. M. Kahn, R. H. Katz and K. S. J. Pister, "Mobile Networking for Smart Dust", ACM/IEEE Intl. Conf. on Mobile Computing and Networking (MobiCom 99). 20 Piconet S S M S S S M S S M • Cluster: 1 Master / N Slaves • Master synchronizes communications in a cluster (TDMA) • Dual radio used in WINS NG to allow for multi-hop communication across clusters ftp://ftp.uk.research.att.com/pub/docs/att/tr.97.9.pdf The Bluetooth Radio System: Jaap C. Haartsen. IEEE PC Feb 2000 22 Batteries • Energy densities (Wh/L) 18650 Li-ion Cell Energy Density Energy Density (Wh/L) 600 500 400 Energy 300 200 100 0 1995 1997 1999 2001 2003 2005 2007 2009 Year Courtesy of Marc Doyle, DuPont – – – – Li-ion: 500 (~1.8J/mm3) Li/So2: 176 Alkaline: 80 Nickel Cadmium: 40 • Moore’s law does not apply to batteries Joe Paradiso’s survey of “renewable energy sources for the future of mobile and embedded computing” http://www.media.mit.edu/resenv/ 23 Energy Consumption • Smart Dust – Objective: each mote should consume less than 1 J / day (amount of energy produced by solar cells) – Towards 10 pJ/ instruction for dedicated microcontrollers – 1nJ to transmit a bit with CCR passive transmitter • Macro Motes – 1 J to transmit a bit; 0.5 J to receive a bit (10kpbs & 10mW) – 10 nJ / instructions • WINS – 10 J to transmit a bit (i.e., 100 mW transmit power and 100 ms to send a 32 bytes packet – very conservative estimate) – 1 nJ/ instructions Executing an instruction costs orders of magniture less than sending a bit of data 24 Signal Processing: Basics • • • • • Measurement Detection Classification Localization Tracking Timer Time Series FFT Adaptive Normalizer Energy Detect Decision Event No Event Threshold A time stamp is associated to each signal processing output Fundamentals of Statistical Signal Processing, Vol I&II by Steven McKay 25 Signal Processing: Data Fusion • Data Fusion – In: Observations from different sensors – Out: Weight associated to hypothesis • Approach – Inferences (Bayesian, genetic algorithm, …) – Peer-tasking R.Brooks and S.Iyengar. Multi-sensor Fusion: Fundamentals and Applications with Software. Prentice Hall. 26 RF Networking: Directed Diffusion • Publish-Subscribe interface • Gradient based routing – Data is sent on multiple routes • Reinforcement learning – Chooses good route – Adapts to node failures • In-network aggregation SCADDS Project - http://www.isi.edu/scadds DataSpaces - http://www.cs.rutgers.edu/dataman DSN Project - http://www.east.isi.edu/projects/DSN/ 27 Operating System: Requirements • Compact scale – Small footprint, efficient use of instruction set • Efficient Multithreading – Concurrency-intensive operations • Sensor data + network data (+ GPS data) • Efficient drivers – Limited levels of abstractions – Migration across hardware/software boundaries • Modularity – Composition of modules for each type of sensor node – Support for mobile code • Robust operations – Memory management 28 Operating System: tinyOS application Route map router sensor appln Active Messages packet Radio Packet Serial Packet Temp Radio byte UART i2c SW byte HW photo bit RFM clocks J.Hill, R.Szewczyk, A.Woo, S.Hollar, D.Culler, K.Pister System Architecture Directions for Networked Sensors. ASPLOS 2000. http://www.cs.berkeley.edu/~jhill/tos/ 29 Design Space Sensor Pack WINS NG Macro Motes Star topology Smart Dust Multi-hop topology Front-end Front-end “System on a chip” 30 What is Sensor Data? • Sensor data is generated by signal processing functions – Measurements – Detections – Classification • Time stamp associated to each sensor data item • Sensor data produced by individual sensors or groups of sensors – If no “peer tasking” is used then the group of sensors that produce data is the group of sensors on which the signal processing functions are invoked. 31 How is Sensor Data Accessed? • Multi-hop RF network – Front-end connected to gateways nodes – Sensor nodes that produce data are sources, gateway nodes are sinks. – Processing can be pushed in multi-hop network in order to trade increased local processing for reduced traffic. • Optical network – Front-end obtains data from all the nodes in its line of sight. – Star Topology. 32 What About Data Storage and Processing Capabilities on the Nodes? • Sensor pack – Large processing capabilities and buffer space • System on a chip – Restricted processing capabilities and buffer space • Data items should be processed as they are generated • No elaborate processing on the sensor nodes • No historical data is maintained • Possible hierarchy of sensor nodes – A few sensor packs arranged in a multi-hop network – To each sensor pack is attached lots of miniature sensors (system on a chip). 33 What is the Cost of Accessing Sensor Data? • Energy is the scarce resource – Processing – Storage – Transmission • Local processing is orders of magnitude cheaper than transmission – Propagation with nodes on the ground accentuates this characteristic 34 What kind of abstraction to represent a sensor network? • G = (V,E) – Vertices represent sensor nodes – Edges represent connected sensor nodes • Model#1: The graph of connected nodes is fully connected. Each edge is annotated with the cost of the transmission between any two nodes. – Relies on routing layer – How to estimate cost of transmission? • Model#2: The graph of connected nodes is not fully connected. An edge represents a single hop – Relies on physical layer – Stable for limited periods of time 35 Ideas to Reuse? • Energy efficient, small footprint solutions • Easy to reconfigure, “0 administration” systems • Reinforcement learning – Finding an optimal solution in a dynamic environment • Event-based processing – Streams of sensor data items need be processed as they are produced 36 Break 37 Part II: Sensor Networks & Databases 38 Declarative Access to Sensor Data • Sensors are data sources • Queries to access sensor data regardless of physical organization Example #1: Every minute return the measurement obtained from Region X. Example #2: Whenever two sensors within 5 yards of each other detect a bird then return their location. Example #3: Every five minutes return the number of birds detected in Region X. 39 Queries over a Sensor Network • Do data fusion, directed diffusion, and query processing share the same notion of query? – Yes • Collect, filter, correlate, aggregate sensor data – … and No • Data Fusion: hypothesis testing in a neighborhood • Directed Diffusion: efficient, scalable cross-layer routing • Query Processing: SQL queries over sensor data • From a query processing viewpoint – Support for data fusion? – Integration with network routing? 40 Warehousing Approach • Data is extracted from sensors and stored on a front-end server • Query processing takes place on the front-end. Warehouse Front-end Sensor Nodes 41 Sensor Database System • Sensor Database System supports distributed query processing over a sensor network Sensor DB Sensor DB Sensor DB Sensor DB Front-end Sensor DB Sensor DB Sensor DB Sensor Nodes Sensor DB 42 Sensor Database System • Characteristics of a Sensor Network: Streams of data, uncertain data, large number of nodes, multi-hop network, no global knowledge about the network, failure is the rule, energy is the scarce resource, limited memory, no administration, … 1. Can existing database techniques be reused in this new context? What are their limitations? 2. What are the new problems? What are the new solutions? 43 Issues • • • • • • • Representing sensor data Representing sensor queries Processing query fragments on sensor nodes Distributing query fragments Adapting to changing network conditions Dealing with site and communication failures Deploying and Managing a sensor database system 44 Performance Metrics • High accuracy – Distance between ideal answer and actual answer? – Ratio of sensors participating in answer? • Low latency – Time between data is generated on sensors and answer is returned • Limited resource usage – Energy consumption: E (J) = Wcpu (J/inst) * CPU (inst) + Wram (J/b) * RAM (b) + Wmsg (J/msg sent) * nb msg sent + Wbdw (J/b) * bytes sent (b) 45 Representing Sensor Data and Sensor Queries • Sensor Data: – Output of signal processing functions • Time Stamped values produced over a given duration – Inherently distributed • Sensor Queries – Conditions on time and space • Location dependent queries • Constraints on time stamps or aggregates over time windows – Event notification 46 The COUGAR Model detect TimeStamp T1 T2 T4 • Schema-Level SensorId #1 #2 #1 #3 In 90 90 90 90 Out True True True True – Each type of sensor is represented as an ADT – To each signal-processing function is associated an ADT function that returns a sequence – A sequence associates sets of records with positions (elements in an ordered domain). 47 The COUGAR Model detect TimeStamp T1 T2 T4 SensorId #1 #2 #1 #3 In 90 90 90 90 Out True True True True Select R.s.detect(90).project(s1.sensorId) From R Where $every(60); • Long-running SQL queries – Sequence functions over sensor ADT functions (returning sequences) – New sensor data items appended to sequence as they are produced – Materialized view updated as sensor data items are appended P.Bonnet, J.Gehrke, P.Seshadri. Towards Sensor Database Systems. MDM’01 http://www.cs.cornell.edu/database/cougar 48 A Measure Theoretic Probabilistic Data Model Detection TimeStamp T1 T1 T2 T4 SensorId #1 #2 #1 #3 In 90 90 90 90 Out • Outputs of a signal processing function might be continuous probability distributions • Extension of data model for discrete probability distributions using measure theory • Specific model for multidimensional parametric distributions (e.g., Gaussians) – Event probabilities – Comparisons T.Faradjian, J.Gehrke, P.Bonnet. A Model Theoretic Probabilistic Data Model. Cornell Technical Report . December 2000. 49 WebDust • Data Model – DataSpaces: spatial decomposition of physical space – Each sensor is an abstract data type • InfoDispensers – Data aggregation devices • Spatial Web T.Imielinski, S.Goel. DataSpace – Querying and Monitoring Deeply Networked Collections in Physical Space. MobiDE 1999. http://www.cs.rutgers.edu/dataman/webdust – For organizing and representing information aggregated by InfoDispenders 50 Control Language in Sagres • Data model – Ontology that contains class information – World State that contains device data – XML encoding • DevL language – Rules are defined for each device – ECA model for querying and updating the World State http://data.cs.washington.edu/ubiquitous/sagres/ 51 Subscription Language in LeSubscribe • An event instance e matches a subscription s if e provides a – Similar to LDAP data model binding for every attribute – An event type is associated to occurring in s and all predicates a set of attributes in s are true with respect to this – An event instance includes a binding set of values • Event Model • Subscription Language – A subscription is a conjunction of conditions on attributes J.Pereira et al. Publish/Subscribe on the Web at Extreme Speed. VLDB 2000. 52 Discussion • Data Model – Representing sensors and signal-processing functions • Abstract Data Types vs. attribute-value pairs – Capturing the temporal aspect of sensor data • Sequences vs. event model • New operators on data streams – Representing uncertain data • Probabilistic Data Model – Data Format • XML vs. byte array • Query Language – Manipulating sensor data • Long-running SQL queries vs. active rules – Need for a propagation mechanism for sensor data (as events) 53 Processing query fragments on sensor nodes • Processing query fragments on sensor nodes allows trading increased processing on sensor nodes for reduced network traffic – Valid trade-off in multi-hop networks • Need for a light-weight query engine on sensor nodes • Limited Resources: – How to scale down the footprint of the query engine? – How to manage the resource consumption of the query engine (including CPU, RAM and energy) • Event-based processing – Query processing takes place as data items are produced by signal processing functions (or obtained from other sensor nodes). How does this impact the architecture of the query engine? 54 Light-weight query engines • Commercial DBMS for palm-sized PCs including query processing and replication capabilities – Footprint limited to several hundred kbytes. • PicoDBMS for the SmartCard – Focus on query processing without RAM. C.Bobineau, L.Bouganim, P.Pucheral, P.Valduriez. PicoDBMS: Scaling down Database Techniques for the Smartcard. VLDB 2000. • RISC-style Database System S.Chaudhuri, G.Weikum. Rethinking Database System Architecture: Towards a Self-Tuning RISC-style Database System 55 Discussion • Need for scaled down database systems – PicoDBMS focuses on RAM – Need for energy-aware query processing: managing CPU mode to reduce energy usage M.Weiser et al. Scheduling for reduced CPU usage. OSDI 1994. • Need for composition of database components – Building systems adapted to sensor capabilities (RAM, CPU, energy) – tinyOS argument - similar to wrapper generators objective. – Predictable performances for capacity planning and admission control 56 Distributing query fragments • Because producing and transmitting data is energy expensive, only the sensors involved in a query should be tasked to produce and transmit data. • When placing query fragments, the system should consider the performance trade-off between increased processing on the nodes and reduced network traffic – Accuracy – Response Time – Resource Usage Cost model or Admission Control? 57 Distributing query fragments • Distributed Database Systems assume – A centralized optimizer has global knowledge about all the nodes – Meta-data is static • This assumptions is challenged in the context of large-scale multi-hop sensor networks: – No global knowledge – Mobile sensors – Meta-data is dynamic Decentralized Meta-data Management 58 Decentralized Meta-data Management • No global knowledge – Resource Discovery on the Internet • Index structure imposed on the network Astrolabe - http://www.cs.cornell.edu/Info/People/rvr/astrolabe/ Tapestry (OceanStore) - http://oceanstore.cs.berkeley.edu/ • Dynamic Meta-data – Indexing Moving Objects S.Salteis et al. Indexing the routes of Continuously Moving Objects. SIGMOD 2000 O.Wolfson et al. Location Prediction and Queries for Tracking Moving Objects. ICDE 2000. – Decisions taken at one point in time might be challenged later on! 59 Cost Model or Admission Control? • Mariposa – Each autonomous site bids for queries in order to increase the value of a reward function http://s2k-ftp.cs.berkeley.edu:8000/mariposa • Quality of Service and Query Processing – Budget associated to each query • Accuracy, Latency, Resource Usage – The system guarantees that each query is evaluated within the given budget • Admission Control • Monitoring and Adaptation http://www.db.fmi.uni-passau.fr:8000/projects/OG 60 Discussion • Decentralized Meta-data management – Adapting data structures defined for resource discovery on the Internet seems promising – Dealing with continuously changing meta-data – Similar problem for large-scale mediator systems • Decentralized Query Planning – Query Decomposition • Bottom-up? Top Down? – Negotiation between sites to reach agreement on which site processes which query fragments • Need for adaptation and renegotiations when meta-data change 61 Adapting to changing network conditions • During query executions streams of data flow from a large number of sensors to front-ends or between sensors – Dataflow engine • Because of the nature of sensor data and because of congestion or failures it is impossible to predict how data will be obtained at a query processing site. – Adaptive query processing at each site 62 Split Split Split Split Split Split Split Dataflow Engines Op Op Op Op Op Op Op Merge Merge Merge Merge Merge Merge Merge • Same set of operations (query fragment) performed in parallel on multiple sites • Mechanisms for load balancing – River: over a cluster – Mayr et al.: over heterogeneous resources Telegraph: http://telegraph.cs.berkeley.edu/ River: http://now.cs.berkeley.edu/River/ http://www.research.microsoft.com/~gray/river Heterogeneous Resources: http://www.cs.cornell.edu/mayr 63 Adaptive Query Processing Eddy • Given a query fragment: for each record, which operator should be executed next? • Decision based on “back pressure” at the queue associated to each operator – Reinforcement learning Ron Avnur and Joseph M. Hellerstein . Eddies: Continuously Adaptive Query Processing. SIGMOD 2000 64 Discussion • Integration of adaptive query processing with dataflow engines over a sensor network – How to take site or communication failure into account? • Using reinforcement learning to take decisions over multiple dataflows? – How to establish dataflow? • No centralized site that establishes a dataflow. Need to take mobile sites into account. • Need for distributed scheduling. Data driven control might not be sufficient. Using admission control to establish dataflow schedules? 65 Dealing with Site or Communication Failures – Fault-tolerance mechanisms • Because sensors run out of for intermediate query energy, site and processing sites communication failures are – Trading resource usage and the rule and not the delay for increased exception in a sensor accuracy in case of network communication failure • Taking site or • Assessing the quality of communication failure into each answer account in dataflow – Approximate Query Processing processing: – Sensor data is uncertain in the first place. Combining uncertainty and unavailability? – Quality of Service • Accuracy requirement • The system guarantees that requirements are met 66 Deploying and Managing a Sensor Database System • Sensor networks should be deployed and left unattended. • It should be easy to add or remove sensor nodes. • A sensor database system should – Take advantage of all sensors in the system – Be as easy to deploy and manage as all other components • Need for mechanisms to acquire and distribute metadata • Need for mechanisms to adjust dataflow depending on the status of the sensor network • It should be easy to configure, install and reboot sensor database components – Risc-style architecture? 67 Summary • What database techniques can be reused? – Data model and query languages • Sequences • Subscription languages – Adaptive query processing – Small footprint and modular architecture for query engine • What is new? – Uncertain data and unavailable data – Decentralized meta-data management and query planning – Combining dataflow engine and adaptive query processing – Failure handling in dataflow engines – Quality of service and query processing 68 Other Issues • Historical analysis over data cached in the sensor network Example: What was the average temperature in Region X between 10 am and 1 pm yesterday. • Asynchronous query processing – User submits a query at a given location and obtains the answer later on at a different location 69 Queries over a Sensor Network • Support for data fusion – Peer-tasking: extending dataflow dynamically – Fully decentralized system: each sensor node can submit a query • Integration with network routing – Sharing meta-data – Dataflow engine as application in a cross layer routing mechanism – Quality of service or cost information provided by routing layer 70 Acknowledgements DARPA Sensit Program http://www.darpa.mil/ito/research/sensit/ Many thanks to Steve Beck, Richard Brooks, Jason Hill, Bill Kaiser, Donald Kossman, Sri Kumar, Tobias Mayr, Kris Pister, Joe Paradiso