Data Dissemination and Fusion In Sensor Networks The need for Data Dissemination and Fusion – Energy efficiency is an essential factor; therefore, short-range hop-byhop communication is preferred over direct long-range communication to the destination – Since sensor network contains large amount of data for the end user, methods of combining or aggregating data into small set of information is necessary and contributes to energy savings – Data aggregation (aka data fusion) can combine unreliable data readings to produce accurate signal by improving the common signal and reducing the noise Energy-Efficient Communication Protocol Architecture for Wireless Microsensor Networks (LEACH Protocol) [Heinzelman+ 2000, 2002] – LEACH (Low-Energy Adaptive Clustering Hierarchy) is a clustering-based protocol that utilizes the randomized rotation of local cluster base stations to evenly distribute the energy load within the network of sensors – It is a distributed, does not require any control information from base station (BS) and the nodes do not need to have knowledge of global network for LEACH to function – The energy saving of LEACH is achieved by combining compression with data routing – Key features of LEACH include: Localized coordination and control of cluster set-up and operation Randomized rotation of the cluster base stations or clusterheads and their clusters Local compression of information to reduce global communication LEACH – Considered microsensor network has the following characteristics: The base station is fixed and located far from the sensors All the sensor nodes are homogeneous and energy constrained – Communication between sensor nodes and the base station is expensive and no high energy nodes exist to achieve communication – By using clusters to transmit data to the BS, only few nodes need to transmit for larger distances to the BS while other nodes in each cluster use small transmit distances – LEACH achieves superior performance compared to classical clustering algorithms by using adaptive clustering and rotating clusterheads; assisting the total energy of the system to be distributed among all the nodes – By performing load computation in each cluster, amount of data to be transmitted to BS is reduced. Therefore, large reduction in the energy dissipation is achieved since communication is more expensive than computation LEACH Algorithm Overview – The nodes are grouped into local clusters with one node acting as the local base station (BS) or clusterhead (CH) – The CHs are rotated in random fashion among the various sensors – Local data fusion is achieved to compress the data being sent from clusters to the BS; resulting the reduction in the energy dissipation and increase in the network lifetime – Sensor elect themselves to be local BSs at any any given time with a certain probability and these CHs broadcast their status to other sensor nodes – Each node decided which CH to join based on the minimum communication energy – Upon clusters formation, each CH creates a schedule for the nodes in its cluster such that radio components of each non-clusterhead node need to be turned OFF always except during the transmit time – The CH aggregates all the data received from the nodes in its cluster before transmitting the compressed data to BS LEACH Algorithm Overview – The transmission between CH and BS requires high energy transmission – In order to evenly distribute energy usage among the sensor nodes, clusterheads are self-elected at different time intervals – The nodes decides to become a CH depending on the amount of energy it has left – The decisions to become CH are made independently of the other nodes – The system can determine the optimal number of CHs prior to election procedure based on parameters such as network topology and relative costs of computation vs. communication (Optimal number of CHs considered is 5% of the nodes) – It has been observed that nodes die in a random fashion – No communication exists between CHs – Each node has same probability to become a CH LEACH Algorithm Details – The operation of LEACH is achieved by rounds – Each round begins with a set-up phase (clusters are selected) followed by steadystate phase (data transmission to BS occurs) 1. Advertisement Phase: – Initially, each node need to decide to become a CH for the current round based on the suggested percentage of CHs for the network (set prior to this phase) and the number times the node has acted as a CH – The node (n) decides by choosing a random number between 0 and 1 – If this random number is less than T(n), the nodes become a CH for this round – The threshold is set as follows: P T(n) = 1 – P * (rmod 1P ) 0 If n C G Otherwise P = desired percentage of CHs r = current round G = set of nodes that have not been CHs in the last 1/P rounds LEACH Algorithm Details 1. Advertisement Phase: – Assumptions are (i) each node starts with the same amount of energy and (ii) each CHs consumes relatively same amount of energy for each node – Each node elected as CH broadcasts an advertisement message to the rest – During this “clusterhead-advertisement” phase, the non-clusterhead nodes hear the ads of all CHs and decide which CH to join – A node joins to a CH in which it hears with its advertisement with the highest signal strength 2. Cluster Set-Up Phase: – Each node informs its clusterhead that it will be member of the cluster 3. Schedule Creation: – Upon receiving all the join messages from its members, CH creates a TDMA schedule about their allowed transmission time based on the total number of members in the cluster LEACH Algorithm Details 4. Data Transmission: – Each node starts data transmission to their CH based on their TDMA schedule – The radio of each cluster member nodes can be turned OFF until their allocated transmission time comes; minimizing the energy dissipation – The CH nodes must keep its receiver ON to receive all the data – Once all the data is received, the CH compresses the data to send it to BS Multiple Clusters – In order to minimize the radio interference between nearby clusters, each CH chooses randomly from a list of spreading CDMA codes and it informs its cluster members to transmit using this code – The neighboring CHs radio signals will be filtered out to avoid corruption in the transmission LEACH Advantages: – Localized coordination to enable scalability, and robustness for dynamic networks – Incorporates data fusion into the routing protocol in order to reduce the amount of information transmitted to BS – Distributes energy dissipation evenly throughout the sensors, thus increasing the system lifetime of the network LEACH Disadvantages: – How to decide the percentage of cluster heads for a network? The topology, density and number of nodes of a network could be different from other networks – No suggestions about when the re-election needs to be invoked – The clusterheads farther away from the base station will use higher power and die more quickly than the nearby ones LEACH Suggestions/Improvements/Future Work: – Extensions can be included to have hierarchical clustering where each CH will communicate with “super-clusterhead” until the top layer of hierarchy in which the data needs to be sent to BS – The degree and remaining energy of a node may be considered as parameters to decide a clusterhead in a round. If a clusterhead with a limited power used up its power in a round, the data to be transmitting may be lost – Since TDMA schedule is used, a large delay may be introduced between event detection and notification at base station. Therefore, the protocol is not suitable for a real-time application Negotiation-Based Protocols for Disseminating Information in Wireless Sensor Networks (SPIN Protocols) [Kulik+ 2002] – SPIN (Sensor Protocols for Information via Negotiation) is a family of negotiation-based information dissemination protocols which is designed to address the deficiencies of classic flooding by negotiation and resourceadaptation – SPIN disseminates each sensor readings to all sensors in the network, treating all sensors as potential sink nodes – Nodes using SPIN protocols names their data using high-level data descriptors, called meta-data and usage of meta-data negotiations eliminate transmission of redundant data in the network – Communication decisions can be based upon both application-specific knowledge of the data and knowledge of the resources available to nodes SPIN – – SPIN has two basic ideas: Operate efficiently and conserve energy: communicate with each other about the sensor data received already and the data needed still Monitor and adapt changes in their own energy resources: extend the lifetime of the system Four difference SPIN protocols: SPIN-PP SPIN-EC SPIN-BC SPIN-RL Meta Data – Used to uniquely and completely describe the data being collected by sensors – If two pieces of actual data are distinguishable, then their meta-data should also be distinguishable – Since the format of meta-data is application-specific, each application needs to interpret and synthesize its own meta-data SPIN Meta Data – SPIN applications must define a meta-data format for representing data that concerns with the costs of storing, retrieving and managing the meta-data – SPIN nodes uses three types of communication messages: – ADV (new data advertisement) REQ (request for data) DATA (data message) ADV and REQ messages contain only meta-data that is smaller than the DATA message SPIN Resource Management – SPIN applications are resource-aware and resource-adaptive – By knowing the resources at hand, the nodes makes informed decisions about using their resources effectively – SPIN specifies an interface that applications can use to find out their available resources rather than specifying a specific energy management protocols SPIN The Problem – In conventional classic flooding, the source nodes sends data to all its neighbors and the neighbors check their record of already sent data to see if they have forwarded the data to their neighbors. If not, they forward the data and update the record – This requires small amount of protocol state at any node, disseminates data quickly in the network where neither the bandwidth is scarce and the links are error prone – The problems include: implosion, overlap and resource blindness Implosion: A node always sends data to its neighbors without being concerned about if the same data has been received by the neighbors from other nodes Overlap: The nodes waste energy and bandwidth by sending the overlapping data Resource Blindness: Nodes do not make decisions based on the energy available SPIN The Solution – SPIN provides solution to the problems of implosion and overlap by negotiating with each other before transmitting data eliminates the transmission of redundant data – Nodes poll their resources before transmitting or processing data by probing the resource manager which keeps track of the resource consumption – Nodes can make efficient decisions based on the available energy level – The use meta-data descriptors eliminates the possibility of overlap since the nodes can name the part of the data the nodes are interested in receiving – Resource-awareness of local resources allow sensors to make meaningful decisions to extend longevity SPIN SPIN Protocols 1. SPIN-PP: A Three–stage handshake protocol for point-to-point media – This protocol works in three stages (ADV-REQ-DATA) with each stage corresponding to one of the messages – The node sends ADV message to its neighbors – Neighbors check to see if they already have received or requested this data – If not, the neighbors respond by sending REQ message to the sender – The sender responds to the REQ message sent by sending the actual DATA to the neighbors requesting the data – If the neighbor already has the advertised data, it does not send any message – Simplicity is the main strength, meaning that nodes make simple decisions, resulting in usage of small energy in computation – Each node only needs to know about its one hop neighbors SPIN SPIN Protocols 2. SPIN-EC: SPIN-PP with low-energy threshold – Adds simple energy-conservation heuristic to the SPIN-PP protocol – When energy is abundant, SPIN-EC acts as SPIN-PP protocol – Whenever energy comes close to low-energy threshold, it adapts by reducing its participation – The node will only participate in the full protocol if it believes that it has enough energy to complete the protocol without reaching below the threshold value – It does not prevent nodes from receiving messages such as ADV or REQ below its low-energy threshold, but prevents the nodes to handle a DATA message below the threshold SPIN SPIN Protocols 3. SPIN-BC: A Three–stage handshake protocol for broadcast media – Improves upon SPIN-PP for broadcast networks by using cheap, one-to-many communications, meaning that all messages are sent to broadcast address and processed by all the nodes that are within transmission range of the sender – This approach is often called broadcast-message-suppression – SPIN-BC has three main differences from SPIN-PP are: All SPIN-BC nodes send their messages to the broadcast address such that all nodes within the transmission range of sender will receive message Upon receiving ADV message, each node checks to see if they already have the data. If not, node sets a random timer to expire, uniformly chosen from a predetermined interval. After timer expires, the node sends an REQ message to the broadcast address, including the original advertiser in the header of message. When the nodes who are not original advertiser receive the REQ, they cancel their own request timers, preventing from sending out redundant copies of the same REQ The nodes will send out the requested data to the broadcast address only once to get the data all its neighbors. It will not respond to multiple requests of the same data SPIN SPIN Protocols 4. SPIN-RL: SPIN-BC for lossy networks – Reliable version of SPIN-BC which disseminates data through a broadcast network even in the cases of network loses packets or communication is asymmetric – Adds two adjustments to SPIN-BC to achieve reliability: Each node maintains a record of which advertisements it hears from which nodes, and if does not receive the data within a set time after request, node rerequests the data Nodes limit the frequency with which they will resend the data, meaning that it will wait for a set time before responding to any additional requests for the same data SPIN Advantages: – Meta-data negotiation and resource adaptation – Maintains only local information about the nearest neighbors – Suitable for mobile sensors since the nodes base their forwarding decisions on local neighborhood information Disadvantages: – It cannot isolate the nodes that do not want to receive information; unnecessary power may be consumed Suggestions/Improvements/Future Work: – Study SPIN protocols in mobile wireless network models – Develop more sophisticated resource-adaptation protocols to use available energy well – Design protocols that make adaptive decisions based not only on the cost of communicating data, but also the cost of synthesizing it Directed Diffusion [Intanagonwiwat+ 2000] – Motivated by scaling, robustness and energy efficiency requirements – Directed diffusion is data-centric in that all communication is for named data – Data generated by sensor nodes is named using attribute-value pairs – All nodes in the network are application-aware – A node requests data by sending interests for named data – A sensing task is disseminated via sequence of local interactions throughout the sensor network as an interest for named data – Nodes diffusing the interest sets up their own caches and gradients within the network to which channel the delivery of data – During the data transmission, reinforcement and negative reinforcement are used to converge to efficient distribution – Intermediate nodes fuse interests, aggregate, correlate or cache data Directed Diffusion – Assumes that sensor networks are task-specific – the task types are known at the time the sensor network is deployed – An essential feature of directed diffusion is that interest, data propagation and data aggregation are determined by local interactions – Focused on design of dissemination protocols for tasks and events Naming – Task descriptions are named (specifies an interest for data matching the list of attribute-value pairs) and also called as interest – Example task: “Every I ms, for the next T seconds, send me a location of any four-legged animal in subregion R of the sensor field.” task = four-legged animal // detect animal location interval = 20 ms // send back events every 20 ms duration = 10 seconds // … for the next 10 seconds rect = [-100, 100, 200, 400] // from sensors within rectangle Directed Diffusion Naming – A sensor detecting an animal may generate the following data: task = four-legged animal // type of animal seen instance = horse // instance of this type location = [150, 200] // node location intensity = 0.5 // signal amplitude measure confidence = 0.85 // confidence in the match timestamp = 01:30:45 // event generation time Interests and Gradients – Interest is generally given by the sink node – For each active task, sink periodically broadcasts an interest message to each of its neighbors (including rect and duration attributes) – Sink periodically refreshes each interest by sending re-sending the same interest with monotonically increasing timestamp attribute for reliability purposes Directed Diffusion Interests and Gradients – Every node maintains an interest cache where each item in the cache corresponds to a distinct interest (different type, interval attributes with disjoint rect attributes) – Interest entries in the cache do not contain information about the sink – In some cases, definition of distinct interests allows interest aggregation – The interest entry contains several gradient fields, up to one per neighbor – When a node receives an interest, it determines if the interest exists in the cache 1. 2. If no matching exist, the node creates an interest entry This entry has single gradient towards the neighbor from which the interest was received with specified data rate Individual neighbors can be distinguished by locally unique identifiers If the interest entry exists, but no gradient for the sender of interest Node adds a gradient with the specified value Updates the entry’s timestamp and duration fields Directed Diffusion Interests and Gradients 3. If there exists both entry and a gradient, The node updates the entry’s timestamp and duration fields – When a gradient expires, it is removed from its interest entry – When all gradients for an interest entry have expired, the interest entry is removed from the cache – After receiving an interest, a node may re-send the interest to subset of its neighbors – To the neighbors, it may seem that interest originated from the sending node even though it may have been generated a distant sink. This represents a local interaction – This way, interest diffuse throughout the network and not each interest have been sent to all the neighbors if a node sent matching interest recently – Gradient specifies data rate (value) and a direction in directed diffusion, whereas the values can be used to probabilistically forward data in different paths in other sensor networks Directed Diffusion Data propagation – Data message is unicast individually to the relevant neighbors – A node receiving a data message from its neighbors checks to see if matching interest entry in its cache exists according the matching rules described 1. If no match exist, the data message is dropped 2. If match exists, the node checks its data cache associated with the matching interest entry If a received data message has a matching data cache entry, the data message is dropped Otherwise, the received message is added to the data cache and the data message is re-sent to the neighbors – Data cache keeps track of the recently seen data items, preventing loops – By checking the data cache, a node can determine the data rate of the received events Directed Diffusion Reinforcement – After the sink starts receiving low data rate events, it reinforces one neighbor in order to “draw down” higher quality (higher data rate) events – This is achieved by data driven local rules – To enforce a neighbor, the sink may re-send the original interest with higher data rate – When the data rate is higher than before, the node node must also reinforce at least one neighbor – Reinforcement can be carried out from neighbors to other neighbors in a particular path (i.e., if a path when a path delivers an event faster than others, sink attempts to use this path to draw down high quality data) – In Summary, reinforce one path, or part of it, based on observed losses, delay variances, and so on – Negative reinforce certain paths because resource levels are low Directed Diffusion [Figure adapted from Intanagonwiwat+ 2000] Directed Diffusion Advantages: – Data-centric dissemination – Robust multi-path delivery – Reinforcement-based adaptation to the empirically best network path – Energy savings with in-network data aggregation and caching – Gives designers the freedom to attach different semantics to gradient values – Reinforcement can be triggered not only by sources but also by intermediate nodes Disadvantages: – It may consume memory since all the attribute list is being sent Suggestions/Improvements/Future Work: – Exploration of possible naming schemes References [Heinzelman+ 2002] W. Heinzelman, A.P. Chandrakasan and H. Balakrishnan, An Application-Specific Protocol Architecture for Wireless Microsensor Networks, IEEE Transactions on Wireless Communications, Vol. 1, No. 4, October 2002, pp. 660-670. [Heinzelman+ 2000] W. Heinzelman, A.P. Chandrakasan and H. Balakrishnan, Energy-Efficient Communication Protocol for Wireless Microsensor Networks, IEEE Proceedings of the Hawaii International Conference on System Sciences, January 4-7, 2000, Maui, Hawaii. [Intanagonwiwat + 2000] C. Intanagonwiwat, R. Govindan and D. Estrin, Directed Diffusion: A Scalable and Robust Communication Paradigm for Sensor Networks, In Proceedings of the Sixth Annual International Conference on Mobile Computing and Networks (MobiCOM 2000), August 2000, Boston, Massachusetts [Kulik+ 2002] J. Kulik, W. Heinzelman and H. Balakrishnan, Negotiation-Based Protocols for Disseminating Information in Wireless Sensor Networks, Wireless Networks 8, 2002, pp. 169-185.