Distributed Systems A Statistical Vector-based Routing Protocol for Wireless Sensor Networks Rheinisch-Westfälische Technische Hochschule Aachen LuFG Informatik 4 Verteilte Systeme Diploma Thesis Tobias Vaegs Advisors: M. Comp. Sc. Muhammad Hamad Alizai M. Comp. Sc. Olaf Landsiedel Prof. Dr.-Ing. Klaus Wehrle Registration Date: 03. September 2009 Submission Date: 18. March 2010 I hereby affirm that I composed this work independently and used no other than the specified sources and tools and that I marked all quotes as such. Ich erkläre hiermit, dass ich die vorliegende Arbeit selbständig verfasst und keine anderen als die angegebenen Quellen und Hilfsmittel verwendet habe. Aachen, den 18. März 2010 Abstract Establishing stable point-to-point multi-hop routing in Wireless Sensor Networks (WSNs) is challenging because of the dynamics inherent in wireless links. Routing protocols based on virtual coordinates often restrict their routing decisions to connections with a very high and stable quality to avoid the overhead induced by frequent retransmissions and coordinate changes. A link estimator is used to identify stable links based on their quality calculated over a longer time period. However, this approach makes traditional routing protocols miss out on potential opportunities to reduce the hop count and the number of transmissions in the network provided by long range intermediate links. In this thesis we present Statistical Vector Routing (SVR), a protocol that efficiently deals with communication link dynamics in wireless sensor networks. It assigns coordinates based on the statistical distribution of a node’s hop distance to a set of beacon nodes. The routing metric predicts the current location of a node in its address distribution. Our evaluation of a prototype implementation over real testbeds compares SVR with a state-of-the-art virtual coordinates-based routing protocol, i.e. Beacon Vector Routing (BVR). The results indicate that SVR reduces the hop distance towards the beacons by 15% and achieves three times more stable addressing when compared to BVR. Kurzfassung Stabiles Punkt-zu-Punkt Routing in drahtlosen Sensornetzen zu erreichen ist sehr anspruchsvoll aufgrund der Dynamik drahtloser Verbindungen. Auf virtuellen Koordinaten basierende Routing-Protokolle schränken daher ihre Weiterleitungsentscheidungen auf Verbindungen ein, die sehr hohe und stabile Qualität aufweisen um die Kosten für häufig zu wiederholende Übertragungen und Änderungen in den Koordinaten zu minimieren. Ein Link Estimator wird verwendet um Verbindungen über ihre Langzeit-Qualität als stabil zu identifizieren. Dieser Ansatz führt jedoch dazu dass traditionelle Routing-Protokolle potenzielle Gelegenheiten verpassen die HopAnzahl und die Anzahl der Übertragungen für Pakete im Netzwerk zu reduzieren, welche durch Verbindungen ermöglicht werden, die von mittelmäßiger Qualität aber großer Reichweite sind. In dieser Arbeit präsentieren wir Statistical Vector Routing (SVR), ein Protokoll welches der Dynamik der Verbindungen drahtloser Sensornetze auf effiziente Art und Weise begegnet. Es ordnet Koordinaten zu, basierend auf statistischen Verteilungen der Hop-Entfernung der Knoten zu bestimmten Beacon-Knoten. Die Routing-Metrik sagt die aktuelle Position eines Knoten in dessen Adressverteilung voraus. Unsere Evaluierung einer Prototyp-Implementierung für reale Testbeds vergleicht SVR mit einem aktuellen auf virtuellen Koordinaten basierenden Routing-Protokoll, Beacon Vector Routing (BVR). Die Ergebnisse legen nahe dass SVR im Vergleich zu BVR die Hop-Entfernung zu den Beacon-Knoten um 15% reduziert und eine dreimal stabilere Adressierung erreicht. Thanks During my thesis I came to be grateful for all the people that supported me during this time. First of all, I would like to thank Prof. Dr.-Ing. Klaus Wehrle for giving me the opportunity to write a thesis about this subject with his group, which fascinated me from the beginning, and also for his help with realizing my future plans. I am really glad that he brought such a great research group to our university and does his part for great working conditions and a lot of fun during the working hours. I also thank my second examiner, Priv.-Doz. Dr. Thomas Noll, for taking the time to read and grade my thesis, too. The second big pile of thanks goes to my supervisors Muhammad Hamad Alizai and Olaf Landsiedel, who always found the right words to motivate me and to help me through rough times during the thesis. Their input paired with their constant calmness were a huge help. I also want to thank them for the insight they gave me into the life of a PhD student, which further encouraged me to become one myself. A special thanks for that goes to Olaf, as he recruited me as a HiWi and is significantly responsible for my interest in research. I find it hard to thank both of them enough for motivating me again and again, inspiring me to a good performance, and providing me with the drive I needed to produce something I can be proud of. I also have to thank my friends a lot for their support during this chapter of my life, be it by giving feedback, answering questions, or providing moral support and understanding that from time to time and especially towards the end, the thesis demanded a lot from me, and for understanding that my priorities shifted temporarily. Especially my girlfriend Stephie deserves my measureless gratitude for helping me in every way possible to her. The support and solicitousness I experienced from her made coping with difficulties easier as well as enjoying achievements more intensive. I am so glad that she has been by my side during the second half of this project. Furthermore, I would like to mention my colleagues, everybody that ran and runs around in the rooms of the DS Group. I am glad that I will be around a bit longer. They all create an atmosphere that is really pleasant and fun to work in. I never used the coffee maker praised everywhere, but the convenience store and the soccer table also helped a lot with having a nice day again and again. Finally, I express my gratitude to my parents, who not only made my studies possible in the first place but were always so full of support and pride as long as I can remember enabling me to first find and then pursue my goals freely thereby evolving to someone I am proud of myself. Thank you all so much! Contents 1 Introduction 1 1.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.2 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 1.3 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 2 Background 5 2.1 Sensor Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2.2 Wireless Sensor Networks (WSNs) . . . . . . . . . . . . . . . . . . . . 6 2.3 Wireless Connections . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3.1 2.4 2.5 Link Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 10 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 2.4.1 Routing in Wireless Sensor Networks . . . . . . . . . . . . . . 13 2.4.2 Routing on Virtual Coordinates . . . . . . . . . . . . . . . . . 13 Pearson’s χ2 -Test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 3 Related Work 3.1 Routing on Virtual Coordinates . . . . . . . . . . . . . . . . . . . . . 20 3.1.1 3.2 3.3 3.4 19 Beacon Vector Routing (BVR) . . . . . . . . . . . . . . . . . . 20 Link Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3.2.1 BVR’s Link Estimation . . . . . . . . . . . . . . . . . . . . . . 24 3.2.2 Four-Bit Link Estimation . . . . . . . . . . . . . . . . . . . . 25 3.2.3 Bursty Routing Extension . . . . . . . . . . . . . . . . . . . . 27 Link Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 3.3.1 The β-factor . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 3.3.2 Opportunistic Routing . . . . . . . . . . . . . . . . . . . . . . 31 Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 4 Analysis 37 4.1 Experimental Settings . . . . . . . . . . . . . . . . . . . . . . . . . . 37 4.2 Link Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 4.3 Coordinate Dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 5 Design 45 5.1 Design Goals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 5.2 Assumptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5.3 Address Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48 5.4 Statistical Addresses . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 5.5 Routing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 6 Implementation 57 6.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 6.2 Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59 6.3 Neighbor Maintenance . . . . . . . . . . . . . . . . . . . . . . . . . . 62 6.4 Coordinate Distributions . . . . . . . . . . . . . . . . . . . . . . . . . 64 6.5 Routing Decisions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65 7 Evaluation 67 7.1 Memory Requirements . . . . . . . . . . . . . . . . . . . . . . . . . . 67 7.2 Finding Parameter Values . . . . . . . . . . . . . . . . . . . . . . . . 69 7.3 Coordinate Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 7.4 Routing Performance . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 8 Conclusion 8.1 77 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78 Bibliography 79 List of Figures 82 1 Introduction The world comes to discover more and more fields of application for Wireless Sensor Networks (WSNs). These networks normally consist of a large number of tiny devices called sensor nodes, which are equipped with sensors to measure certain conditions of the environment, a low power micro processor, limited memory, and a radio module to communicate with each other. Because of the steady decrease in cost and size of these devices, more and more solutions using this technology become available. Their very limited capacity regarding computational power, memory, and energy, makes it necessary for sensor nodes to achieve their goals in a cooperative way. Furthermore, because of each node’s limited radio range a data packet normally has to be relayed over several nodes to reach its destination, i.e. multi-hop routing has to take place. Routing in WSNs entails a set of challenges, which are not common in other network scenarios. In traditional (wired) networks a good routing algorithm delivers packets quickly, which usually means using only few hops on the way. Because the connections between wired network nodes are stable, problems (i.e. packet loss) can mostly occur because of congestion at one of the participants on the route, which then leads to retransmissions or reroutes. However, in wireless networks the connections between the devices are usually unreliable. Packets can simply get lost in the air on the way to a node, which was considered a direct neighbor of the sender. In this case the sender has to try again or select another next hop for its packet, either way a retransmission takes place. These retransmissions are not only more common than in wired networks, but also have to be considered more costly, since they waste energy, which is very limited on a sensor node and therefore far more precious here than on a device with a stable energy supply. Thus, the total number of transmissions (i.e. hops plus retransmissions) has to be minimized to achieve a good routing performance. 2 1. Introduction 1.1 Problem Statement As we have to assume unreliable links between participants in a wireless network, we are facing a tradeoff while trying to minimize the number of overall transmissions in a WSN. When selecting the next node to forward a packet to, a node has to choose among its neighbors. If a node is selected which is farther away from the current node and closer to the destination fewer relay nodes, i.e. hops, might be needed on the way. However, the greater distance leads to a smaller probability for a successful transmission, meaning a retransmission is more likely than with a neighbor closer to the current node. There are basically two ways of dealing with this problem: • A routing protocol could try to use only those links in the network, which are very stable over a long period of time. In this case the routing protocol would require a link estimator, which constantly monitors and rates the connections to all neighbors. The protocol would then only use those links for routing, which over time show a high quality, minimizing the probability of retransmissions. In doing so, it might miss out opportunities for shorter routes to a destination which are only available at times. Furthermore, this approach assumes that in fact there do exist sufficient long term stable connections in the network to perform all the necessary routing when using only them. Often enough we cannot make that assumption. • A routing protocol could not only use links which are to be considered stable over a longer period of time, but also those whose quality fluctuates and which thereby only from time to time provide a quality good enough to send packets. It would then always use the node for routing which is closest to the destination and is reachable right at the time a packet has to be sent. In this case the protocol would have to deal with the dynamics this approach introduces, meaning constant topology changes because of nodes frequently becoming neighbors and non-neighbors of each other. 1.2 Our Approach For this thesis we chose the latter approach. We used a virtual coordinate system, where the addresses of the nodes in the network are vectors of shortest hop distances to some designated beacon nodes. This means when neighbor relations between the nodes change, their addresses are likely to change, too. A new neighbor might provide a shorter route to a beacon than the ones before, or the shortest route to a beacon so far was provided by a neighbor, which just had to be dropped. This certainly causes a lot of address changes over time, which would lead to a huge overhead for keeping the address information in the network up to date. To avoid this, we do not route on the actual current coordinates of the nodes, which change far too often. Instead we let every node monitor its coordinate changes for a while, derive certain statistical values from this (e.g. the mean coordinates or the probability distribution of the coordinates over a specified period) and then publish only those statistical values in the network. Routing then takes place on these 1.3. Outline 3 statistical values provided by the nodes and not on their actual current coordinates. Our idea is based on the following assumptions: • The statistical values derived from the coordinates change rarely enough over time to not overwhelm the network with the overhead which comes with keeping this address information up to date. The ideal case would be that the values stabilize eventually and would never have to be changed again. • If a node is still far away from the destination, it does not need its exact current coordinates to send the packet on the right way. The statistical values derived from these coordinates suffice to at least get it in the right direction towards the desired node. • Nodes which are close enough to the destination get to know its precise address information and can deliver the packet directly. 1.3 Outline The rest of the thesis is organized as follows. Chapter 2 provides basic background knowledge about sensor networks, TinyOS, an operating system for sensor nodes, for which we developed our solution, link dynamics and link estimation, routing, especially routing on virtual coordinates, and a statistical test we used in our experiments to compare distributions, Pearson’s χ2 -Test. The related work done in the area of routing in WSNs is summarized in chapter 3. We present some link estimation mechanisms in use distinguishing between long-term approaches which leave unstable connections unused and short-term approaches which react to the very current state of the network. We also introduce Beacon Vector Routing (BVR), which we use as a case study and which uses virtual coordinates for routing. Chapter 4 presents our analysis of link dynamics. We explored how link qualities in a network change over time and how it affects the virtual coordinates of the participants. In chapter 5 we describe our design goals, what we want to achieve with our protocol and how we do it based on the results presented in chapter 4. The implementation details of our solution are presented in chapter 6. Chapter 7 shows the results of our evaluation. We evaluate factors, such as memory needed on the nodes and data required to be sent for the protocol to work. Our experiments show how well the protocol performs regarding the number of transmissions and how it handles dynamics in the network. Finally, the thesis is concluded in chapter 8 and a brief outlook on future work is given. 4 1. Introduction 2 Background This chapter contains background knowledge enabling a better understanding of the rest of the thesis. First, we will have a close look on sensor nodes and their architecture. Then, we will point out what separates WSNs from other networks to become aware of their special needs. After that, we will discuss the implications of these needs and of the dynamics of wireless links for the process of designing a routing protocol for WSNs and introduce routing on virtual coordinates, which we use in our approach. Finally, we will describe Pearson’s χ2 -Test, a statistical method we used for our analysis and evaluation. 2.1 Sensor Nodes To grasp what separates wireless sensor networks from other networks and makes them special one first has to get a good look at the components of such WSNs, namely the sensor nodes, also called motes. Sensor nodes are tiny autonomous devices able to form a network and participate in it. Their general architecture is depicted in Figure 2.1, and Figure 2.2 shows some examples of sensor nodes. They usually consist of one or more sensors to measure certain environmental conditions and a micro-processor to process the gathered data. Furthermore, they are equipped with a ROM to store a program image which can be executed by the processor, and some RAM to save data. For exchanging information with other nodes they also have some kind of communication module, usually a radio transceiver. Finally, they need a power source, usually in the form of batteries, as they are supposed to be independent from any stationary infrastructure. The sensors on these nodes can gather rather simple data, e.g. about temperature, humidity, brightness, or vibration but also more sophisticated data such as audio or video input, which of course needs more energy. A node may also have actors to manipulate its environment. These can again be simple ones such as LEDs, but also speakers or maybe the control of a valve or an engine. Furthermore, most sensor 6 2. Background Memory Radio Microprocessor Sensors Battery Figure 2.1 Schematic architecture of a sensor node nodes provide a way to be programmed easily, for example via an USB connector, often referred to as the serial interface. Due to the small size and the consequential small amount of energy at its disposal, a single sensor node is a very limited device. The space and the energy it provides are usually not enough to support a fast micro-processor, large memory or a radio transceiver with a big communication range. This is why programs developed for sensor nodes have rather few resources at their disposal compared to those on other mobile devices or even desktop PCs. Therefore, writing programs requires to think minimalist, meaning to focus on essential functionality. TinyOS [14] is a standard operating system for sensor nodes providing just the very basic services an operating system must have. This is why it is most suitable to support applications developed for sensor nodes. It is written in nesC [9], a dialect of standard C. A TinyOS program consists of several components and requires an event-based programming approach. A component can call an operation provided by another component and gets informed in the event of that operation being completed, meaning a designated event handler function to be implemented by the programmer is called. Also there are certain hardware interrupts, such as when a timer fires or the radio module receives a data packet, which also lead to the execution of the corresponding event handler. TinyOS does not support several processes, but longer executions can be implemented in a task, which can be put in an execution queue. Whenever there are no events to handle, TinyOS executes the tasks inside the queue one by one in a non-preemptive first-in-first-out manner, meaning there is no scheduling for a parallel execution of any kind. TinyOS comes with a very powerful simulator named TinyOS SIMulator (TOSSIM) [13], which can simulate a program written for TinyOS running on up to 1000 nodes simultaneously with the possibility of outputting debug messages and testing programs with different network topologies and noise conditions. 2.2 Wireless Sensor Networks (WSNs) A wireless sensor network comprises many (10s to 1000s) sensor nodes working together. Every sensor node has to make a small contribution to the goal the whole network is supposed to achieve. In contrast to a traditional (general purpose) network, which can provide a lot of applications even simultaneously, a WSN usually is designed for one specific, comparatively small application. The origin of the sensor node technology lies in military usage, but nowadays also many civil applications for WSNs exist. Among others, there are: 2.2. Wireless Sensor Networks (WSNs) 7 • Disaster recovery: In such situations there might neither be the possibility nor the time to set up any fixed communication infrastructure. • Animal behavior analysis: When nodes are small enough to attach them to animals – as it is being done in the RatPack project [4] – they can collect data to analyse their behavior. • Intelligent buildings: With information about the location of people, temperature, or air flow it is possible to control things such as heating or air conditioning more efficiently. • Facility management: Nodes can detect leakages in tanks or provide intrusion detection. • Machine surveillance and preventive maintenance: Sensor nodes can observe if machines work inside given parameter boundaries and maybe directly react to deviations or at least report them. • Precision agriculture: Nodes on a field can give insight about the condition of the ground enabling farmers to apply the optimal substances needed at exactly the right place. • Medicine and health care: Patients who have to be monitored constantly would not have to lie in bed or carry around cables and cumbersome devices. • Logistics: Nodes attached to containers or boxes help identifying/tracking those automatically and can report when their storage requirements are not met, dangerous goods are too close to each other etc. • Traffic: Intelligent roads could create statistics about traffic conditions to administer traffic lights or allocate free parking spaces. Looking at these applications it becomes evident that the requirements for a WSN differ quite significantly from those of a traditional network leading to multifarious differences between these network types. First of all, traditional networks are usually designed to provide a service for some users, which means the participating devices interact with humans. In a WSN the interaction among devices themselves as well as the interaction between the devices and the environment play a far more decisive role. Of course, people want to use the service provided by a WSN, but this means having access to its gathered information, which means the users do not interact with single nodes but with the whole network. This is why WSNs usually perform data-centric networking. The user is not interested in the question where any data is stored or where it comes from exactly, but in the data itself. The user does not care about the identities of the nodes, but of course they are important for the network and its participants, for example to perform routing. Traditional networks, regardless if they are wireless or wired, need some kind of given infrastructure, a backbone comprising routers, gateways, access points, radio posts, or cables. WSNs do not need this, and because every node is considered an equal part of the network, there normally does not exist any fixed infrastructure, as 8 2. Background (a) The SenseNode from Genetlab is used for intrusion detection and classifies vehicles, armoured vehicles, animals, individuals, and groups with or without gun. (b) To study their behavior animals are equipped with sensor nodes. This is done e.g. in the RatPack project. [4] (c) The CASE Abyss from Abyssus Marine Services is used gather of seismic data on the ocean floor. Figure 2.2 Examples for Sensor Nodes the nodes may be deployed in remote and hostile environments. No central entity manages their interaction, assigns roles or coordinates traffic. In WSNs most of the functions are performed in a distributed way. Furthermore, the participants of a WSN are far more limited in their capacities than those of other networks, especially regarding energy, yet they have to be operational over a very long period of time without any maintenance. Thus, they not only have to be auto-configuring, fault tolerant and self-organizing, but also have to work energy efficiently to maximize their lifetime. Besides using low-power hardware, sensor nodes have different power saving modes. They can power down almost completely for a certain period of time if they assume they will not have anything to do, but also power down single components of their hardware temporarily. Another way to save energy is the practice of in-network processing. Because sending only a few bytes over the radio is much more energy consuming than performing a lot of computations on the processor, one always tries to process gathered data as much as possible on the node itself and only forward accumulated or summarized data. A simple example would be a query for the minimum temperature in a certain region. To answer this each node would only have to forward the minimum of its own data and all the data received from its neighbors, not both. Because the actual information provided by single nodes, i.e. the temperature in certain places, is not important but only the combined data provided by all the participants of the network, i.e. the minimum temperature. 2.3 Wireless Connections Connections between participants of a wireless network have a very different nature than those of wired networks, since the air’s reliability as a communication medium and the stability of signals propagating through it are not comparable with the respective properties of cables. A signal sent through a wire propagates directly towards the destination. It may experience interferences or disturbances from the 2.3. Wireless Connections 9 environment caused by other signals nearby, but as it is shielded and isolated, these influences are usually very small. It travels through a medium like copper, which has a very low electrical resistance to minimize the weakening of the signal over distance. Wireless signals, however, travel through air, a medium whose composition cannot be controlled and which does not transport electromagnetic signals very efficiently. In vacuum the strength of a received transmission is proportional to 1/d2 , d being the distance it covered from the sender to the receiver. Air may contain all kinds of gases, pollution, or rain, and there are obstacles such as vegetation, people, or buildings on the way, which means air weakens a radio signal sent through it, and this weakening effect varies due to weather, time of day, and time of year. Although theoretically traveling along the direct line of sight, electromagnetic waves are severely influenced by almost everything located on this line. Signals can be reflected on large planes, scattered at small obstacles or diffracted on edges. This leads to signals being received where no direct line of sight exists, but it also makes signal propagation less predictable. If sender or receiver are moving during the transmission, the issue becomes even more complex, as phenomena like the Doppler effect may have an impact on the signal, too. Furthermore, if there are several devices using the same communication technology in proximity transmitting at the same time, their transmissions interfere. Thus, every signal a device produces poses an interference all devices in the vicinity have to deal with when receiving something. The area in which a device produces an interference for other signals by transmitting something is much bigger than its communication range in which its signal can actually be received correctly. Another important problem arises from the inherent asymmetry of wireless connections. If two nodes A and B are in proximity to each other and node A can receive packets sent by node B, this does not necessarily imply the ability of node B to receive packets from A. Reasons for this can be the two nodes having different transceiver hardware resulting in one of the nodes simply having a bigger communication range than the other. However, also two identical nodes have an asymmetric connection, since a radio signal is mostly disturbed by interferences at the receiver’s location. Node A may be located in an area with a lot of noise produced by other participants constantly sending. This noise may not reach as far as to node B, but node A may not receive anything addressed to it at all. Nevertheless, packets sent from node A to node B may be received without problems. Since it is only possible to establish a communication between two participants of a network if both partners can receive each other’s data, a communication protocol may only assume a connection between two nodes, if this connection has proved to be symmetric. All this leads to links as well as connectivity between nodes being very unstable in a wireless network, especially in a low power network such as WSNs. Even if the nodes themselves do not move1 and even if there are no drastic changes in the environment, link qualities may change a lot over time. Because of these dynamics link estimation on sensor nodes is a challenging task. 1 In this thesis we do not explicitly cover node mobility. The dynamics we address are caused by the instability of wireless links or node failures. 10 2.3.1 2. Background Link Estimation Link estimation is the process of finding a way to predict the chance of being able to use a connection between two devices successfully for data packet transfer. A passive link estimator listens to data packets the node sends and receives and thereby discovers to which of its neighbors the node can communicate at the moment. Active link estimators (additionally) probe the connections to the neighbors of the node periodically with designated packets to derive the link quality between these neighbors. Usually the Packet Reception Rate (PRR) of a link is used as a metric for its quality, i.e. the percentage of successfully received packets during a given time interval. To appreciate the inherent asymmetry of wireless links, the product of the quality of both link directions is considered the quality of the whole communication link, because this corresponds to the probability of both participants being able to communicate with each other. Hence, test packets have to be sent by both communication partners to determine the PRR for each direction of the connection. There are basically two possible approaches: long-term link estimation and shortterm link estimation. A long-term link estimator identifies the neighbors of a node with a very stable connection, i.e. a constantly very high packet reception rate over a long period of time (good links, defined by us as having a PRR of >90%). By restricting communication to those good links one can ensure a lower number of necessary retransmissions. However, neighbors which provide such a good connection are not likely to be very far away. Thus, the routing progress of each hop will not be very high in a protocol only using these links, which usually are about 35% of all the links in the network, as illustrated in Figure 2.3. All the possible other links are left unused. Of course, the majority of these links are not worth being used, because they provide a really bad quality (bad links, defined by us as having a PRR of <10%), but a few links can be considered to be of intermediate link quality (intermediate links, defined by us as having a PRR between 10% and 90%). These intermediate links usually do not have a constant intermediate quality but switch frequently between good and bad link quality, because of short lasting interferences such as people passing by. This then leads to an intermediate average long-term quality measured for those links. In contrast to this, short-term link estimators detect and use the short time periods, in which a link has a good quality, as they measure link qualities during a far more narrow time period. It is not derived how many packets could be sent over a connection in the last several minutes or seconds to have a list of neighbors ready in case data has to be sent. Instead one tries to determine on demand very quickly what quality a link provides right in the moment a packet has to be sent by using the success or failure of a packet to render a decision for the next. After some very few successfully sent packets over a certain link the link estimator declares it a promising link making the routing protocol use it for packet transfer, whereas very few unsuccessful attempts to send a packet (usually only one) directly label a link as unreliable and stop the protocol from using it for a while. In this way, a protocol can use a link of intermediate (fluctuating) quality in the short time periods in which it provides a good quality and avoid using it during times with bad quality. 2.4. Routing 11 1 0.8 0.6 CDF Good�links�(quality�>�0.9) ~ 35% Intermediate�links (0.1�<�quality�<�0.9) ~ 20% 0.4 0.2 0 ~ 45% Bad�links�(quality�<�0.1) 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Long-Term�Link�Quality�(PRR) Figure 2.3 Cumulative Distribution Function (CDF) of link qualities in a usual network based on the packet reception rate (PRR) This also means short-term link estimators can only operate if there are bursts of data to be sent, since they exploit the correlations between consecutive packets. A successfully sent packet correlates with a high success probability for the next packet as well as an unsuccessful attempt to sent correlates with a low probability. However, this correlation becomes weaker as the time between the packets becomes larger. Hence, to take advantage of it a protocol has to use a link immediately, which is to be considered promising because of some successfully sent packets. Furthermore, it has to wait an appropriate amount of time after declaring a link not available before trying to use it again. If packets have to be sent only sporadically, short-term link estimation in this form is not possible. However, if applicable, short-term link estimators can adapt more quickly to changes in the network topology caused by node failures, node mobility or short-term interference. They do not provide a big picture about the average qualities of the node’s connections to its neighbors, but can make use of links with intermediate quality, which may provide shortcuts through the network as they are usually farther away and thus may bring the packet closer to the destination when used as a next hop. Long-term link estimators generally rule out these links, as they concentrate on good links, and frequent influences which disturb the communication only for a short time will decrease the average long-term quality of a link constantly, thereby demoting it to one of intermediate quality. Concrete examples of how link qualities are estimated can be found in chapter 3, where certain long-term and short-term link estimators as well as routing protocols using virtual coordinates are presented in detail. 2.4 Routing Routing is the process of finding the way for a data packet through the network from one specific participant to another. This route should be as short as possible for the 12 2. Background D A B C F E to from A B A - 1 B 1 - 1 C 2(B) 1 - D 3(C) 2(C) 1 E 1 1 C H E K (a) Example topology of the network H I J K 2(C) 1 2(C) 3(E) 3(C) 2(E) 3(E) 4(C) 1 2(B) 1 4(B) 2(F) 3(B) 4(B) 3(F) - 3(C) 1 4(F) 2(F) 4(C) 4(F) 3(F) 1 3(C) 3(B) 2(I) 4(B) - 3(H) 1 1 2(I) 3(I) 4(H) 3(H) 2(H) G 3(I) 3(I) 4(K) 4(K) 2(I) 3(K) - 2(K) 1 1 1 H 4(F) 3(F) 2(F) 2(F) 4(F) 1 2(K) - 3(K) 2(K) 1 J J G 3(B) 3(E) 4(B) 2(E) 3(E) 4(E) I 2(E) 2(E) 3(E) 4(E) 1 I F 1 2(B) 3(B) F 3(C) 2(C) 1 G D 2(B) 3(B) 4(E) 1 3(G) - 1 2(J) 3(I) 3(I) 4(K) 4(K) 2(I) 3(K) 1 2(K) 1 - 1 1 2(J) 1 - K 4(J) 4(J) 3(H) 3(H) 3(J) 2(H) 1 (b) Distance vectors of the nodes in the network (table rows) Figure 2.4 Example for Distance Vector Routing: Each row of the table is a distance vector stored on the node denoted in the first column. A distance of one implies a direct connection to the destination named in the top row. The letter inside the brackets states the next hop to send a packet to on the way to the destination. Sometimes there are several possibilities, for example from node H to node E there exist three routes of equal length. Hence, the corresponding table entry could also be 4(k). In such a case another metric can be consulted to decide on the entry, e.g. the nodes’ workload. packet to be delivered quickly. Traditional (wired) networks such as the Internet or parts of it usually have a stable infrastructure, a backbone with a given topology and hierarchy, over which packets can be routed. Because of the stable connections inside the backbone, packets can travel fast, since every node on the way always chooses the neighbor as the next hop which is closest to the destination. Because of this stability, routes in a traditional network are also quite stable, i.e. packets from a specific sender to a specific destination most of the time take the same route. Every node in the network stores a routing table, which provides information on where to forward a packet to on the way to a certain destination. A common class of traditional routing protocols are the distance-vector routing protocols. Using these protocols routers keep a vector of minimum hop distances to all the other participants in the network. They inform their neighbors about those distances and update them according to the vectors, which they receive from their neighbors. Over time every node derives a distance and next hop for every other node in the network. With this information they can route packets along the shortest path from their position to the destination by always selecting the neighbor as the next hop which has the smallest distance to the destination. Retransmissions can occur on the one hand in case connections break or topologies change, which is very rare, since backbones of big networks are not supposed to be altered a lot and are usually very reliable. However, a packet can be dropped by a router if that router is currently under too much load to process further packets, i.e. if it is congested. In this case, the node which wanted to use this congested router as a next hop has to choose another neighbor to forward the packet to or wait some time and try the same node again, either way it has to retransmit the message. 2.4. Routing 2.4.1 13 Routing in Wireless Sensor Networks Because of the limitations the sensor nodes present, traditional routing protocols are not suitable for WSNs, since they are not scalable. A sensor node does not have the memory necessary to store a big routing table such as a distance vector with information about the whole network. It usually knows about its direct neighbors but has only very limited knowledge about the rest of the nodes. Searching inside a big table also would take some time on a processor as limited as those used on sensor nodes. Additionally, in distance-vector routing protocols far too much time is needed to propagate the distance information throughout the whole network until every participant has a stable knowledge about the network topology at its disposal. In wireless networks, especially in highly dynamic ones such as WSNs, topologies change far too often to ever reach a stable state to be published to all the participants. Furthermore, the hop distance between two nodes is not a meaningful metric in wireless networks, where connections are rather unreliable because of the radio signal getting weaker over distance very quickly. This means with increasing distance of sender and receiver the probability of a packet arriving successfully decreases rapidly. Hence, choosing the closest node to the destination as a next hop may help to reduce the number of hops needed for the packet to arrive, but may simultaneously increase the probability of packet losses and thereby of retransmissions. In wired networks retransmissions can mostly be neglected, since they are quite rare. However, in WSNs they are not only far more common (as WSNs are wireless) but also far more expensive (as energy is very limited and radio communication needs a lot of energy). Thus, the neighbor which brings the packet the farthest towards the destination may not necessarily be the best choice as the next hop of a route. Therefore, routing protocols for wireless networks often use the Expected Transmission Count (ETX) as a distance metric between nodes. It is the expected (mean) number of times the sender has to transmit a packet to a receiver before it is received successfully, i.e. the reciprocal of the probability of a packet sent by the sender arriving at the receiver in the first attempt. On a certain path between two nodes the sum of the ETXs assigned to the connections on the way measures the number of transmissions needed to deliver a packet over this path, meaning the number of hops plus all retransmissions expected due to packet loss. Hence, in wired networks the ETX of a path would almost equal the hop distance, since there are very few retransmissions to be expected. In our experiments we always used the total number of transmissions needed to deliver a message from a sender node to a destination node as a metric for the performance of our routing protocol. 2.4.2 Routing on Virtual Coordinates A lot of applications for WSNs are designed in a way that all the nodes in the network gather some data, which is supposed to arrive eventually at one specific node (a so called sink). This sink may be a gateway node to another network e.g. the Internet or a node to which the user has direct access to query the data. The Collection Tree Protocol (CTP) [10] for example uses this design, which is illustrated in Figure 2.5. 14 2. Background A designated sink node sends periodic beacon packets into the network advertising its existence. These beacon packets are forwarded by the other nodes throughout the whole network. Inside the packets the respective senders always state their distance to the sink node. To derive this distance a node has to calculate the costs for a packet to travel over any of its neighbors by adding the distance it received from a neighbor to the costs of its connection to this neighbor (the costs can be defined as the ETX of the connection or any other suitable metric). The node then publishes the minimum of those calculated costs, i.e. the expected number of transmissions when routing a packet over it. Because in a collection tree there is basically only one direction in which packets have to be sent, routing is quite simple: A node always routes a packet to the neighbor with the minimum costs for a packet to travel to the sink. This neighbor is called its parent. Because with this topology a node basically only has to remember which node is its parent, the required memory is rather small compared to a full-grown routing table filled with information about a lot of nodes. This many-to-one or one-to-many schema (depending on if the sink collects or distributes data) is widely used in WSNs, because it fits the nature of a lot of applications in this area. However, in this thesis we want to design a point-to-point (or any-to-any) routing protocol which can deliver a packet from any node inside a network to any other node. For this we need an addressing schema, which reflects the location of the nodes relatively to each other, such that a packet can be routed according to these addresses. For the addressing of the nodes we build up a virtual coordinate system. The collection tree of CTP can be considered the one-dimensional special case of this idea, where every node derives the path costs from itself to the sink node and publishes this value as its current address. If there are several sink nodes in a network, every node can derive a cost vector for the paths to all these nodes. If those nodes (the so called beacons) are numerous enough and distributed evenly throughout the whole network, they provide the intended coordinate system. A higher value in one component of the distance vector not necessarily implies but correlates with a greater distance to the respective beacon. This means we can assume nodes with similar coordinates to be located in vicinity of each other. Hence, forwarding a packet to the neighbor closest to the destination translates to forwarding it to the neighbor whose coordinates are the most similar to those of the destination. An example for routing on virtual coordinates is given in Figure 2.6. 2.5 Pearson’s χ2-Test Pearson’s χ2 -Test serves to find out whether or not a given frequency distribution of events is consistent with another distribution. An example for this would be to test the distribution of the results after rolling a dice several times against the theoretical distribution of results expected after rolling a perfectly fair dice (where each number between 1 and 6 is shown exactly a sixth of the times the dice is rolled). It is important that all possible events (in the example the numbers 1 to 6) together have a probability of 1 and are mutually exclusive, i.e. there cannot be a 1 and a 3 on top of a dice at the same time. 2.5. Pearson’s χ2 -Test 15 sink sink 0 0 1 1 1 2 1 2 (a) First step: A beacon packet is sent by the sink and all its neighbors acknowledge it as their parent. sink sink 3 0 1 (b) Second step: The direct child nodes of the sink forward the beacon packet with an increased distance inside to become parents on their part. 3 0 1 2 3 1 2 3 1 3 3 2 4 2 3 3 (c) Third step: Every node which has found a parent starts participating in forwarding the beacon message. 4 (d) Fourth step: This procedure goes on until the beacon packet has reached all network participants. Figure 2.5 Creation of a collection tree: Bold lines illustrate parent relations, the number inside a node its distance to the sink. In this example stable links are assumed resulting in costs of one for every edge in the graph. The test has two applications: It is a test of goodness of fit which shows how much two distributions (usually an observed one and a theoretical one) differ from one another. It is also a test for independence of paired observations, for example to test if the age of people correlates with their affinity for a certain political party. As we use only the first application in our experiments, we will also limit our explanation here to the test of goodness of fit. First of all, given two distributions to compare one has to calculate the χ2 statistic for them, called X 2 . It is defined as 2 X = n X (Oi − Ei )2 i=1 Ei where n is the number of categories of the distributions (in the example with the dice n would be 6), Oi is the number of times an event out of the i-th category occurred in the observed distribution (The experiment with the dice would have the categories 1, 2, 3, 4, 5, and 6.), and Ei the number of events of the corresponding category in the theoretical (expected) distribution. Secondly, the degree of freedom has to 16 2. Background B1 0,3,3 1,2,3 2,1,4 B2 3,0,4 B1 0,3,3 1,2,3 R 2,1,4 B2 3,0,4 R 3,1,3 1,3,2 3,4,1 3,1,3 1,3,2 3,4,1 D=6 4,2,2 4,2,2 D=4 2,4,1 3,4,0 2,4,1 B3 S 4,3,1 (a) Using B1, B2, and B3 as beacons, the nodes will derive these address vectors. B1 3,4,0 0,3,3 1,2,3 2,1,4 B2 R 3,0,4 3,1,3 3,4,1 D=7 S 4,3,1 (b) S calculates the difference D of its neighbors’ address vectors to the vector of R and sends the packet to the node with the most similar address (4,2,2). B1 0,3,3 1,2,2 2,1,2 B2 3,0,3 R D=3 1,3,2 B3 3,1,3 1,3,2 2,2,1 D=3 4,2,2 4,2,2 D=4 D=6 2,4,1 3,4,0 B3 S 4,3,1 (c) In the next step 3,1,3 is chosen, after this the options become fewer. The arrows depict the complete route taken by the packet. 2,3,1 3,3,0 B3 D=6 S 3,3,1 (d) If the protocol does not concentrate on long-term stable links, the temporal availability and use of unstable connections (dashed lines) can lead to a lot of changes in addresses and routing decisions. Figure 2.6 Routing on virtual coordinates: B1, B2, and B3 are the beacons defining the coordinate system, S is the sender and R the receiver of the example packet. For simplicity all edges have costs of one. be determined. It is the number of independent frequencies in the distribution. In many cases this is n−1, since the frequency of one event is always determined by the total number of event occurrences. The degree of freedom however is independent of the total number of frequencies. For example, the number of times a 6 was diced is always the total number of events minus the sum of the frequencies for the results 1 to 5, and this holds for every total number of events, i.e. regardless if the dice was rolled 100 or 10,000 times in total. If the different categories of the distribution are further dependent, the degree of freedom might be even lower. In our experiments we always find the degree of freedom to be n − 1. The next thing to do with the X 2 -value and the degree of freedom is to derive the corresponding p-value. The calculation is out of the scope of this thesis, there are many calculators and tables to derive it. A p-value always has a meaning regarding hypotheses. The statistical hypothesis in the case of this test is that the observed distribution is equal to the expected one. The X 2 -value already states how different they are. However, the value also becomes bigger as the distributions contain more 2.5. Pearson’s χ2 -Test 17 data. Also, these differences between the two distributions may be due to random variation. Thus, we need the p-value to decide on the statistical significance of the X 2 -value. A statistic is defined as being statistically significant if it is unlikely to be obtained only by chance, i.e. if it is likely to really mean something. The p-value for a given X 2 -value and degree of freedom is the probability of X 2 being at least as high as it is only due to chance, i.e. the probability of the two distribution’s deviations being random variations. Thus, p is the chance to make the mistake of denying the hypothesis of equal distributions while they in fact are equal. On the other hand, if the p-value is high, there is a good chance the two distributions appear different because they really are different and the deviations really are substantial. This is the reason why the p-value is an indicator for the significance of the result. Usually statistical tests have a significance level of 5% or 1% meaning the p-value has to be at most 0.05 or 0.01 respectively for the hypothesis to be denied, i.e. to consider the distributions different. 18 2. Background 3 Related Work There are countless routing protocols proposed for any kind of network type. As mentioned before, among those designed for WSNs many protocols only support many-to-one data traffic, as this is a common scenario in this domain. Point-topoint traffic (as we intend to provide) is much more challenging especially in WSNs, because more information is needed and routes are more complex to find. A lot of approaches aiming at tackling this can be found in [18]. One can divide them into four main categories [8]: shortest path, hierarchical addressing, geographic coordinates, and virtual coordinates. • Shortest Path: This basically covers distance vector algorithms, adapted to the wireless domain, which we already referred to in section 2.4. Aside from the fact that usually on sensor nodes there is not enough space to store the information needed for this, the instability of topologies in WSNs does not leave enough time to publish a state throughout the entire network before it changes significantly. • Hierarchical Addressing: This works fine in fixed networks such as with IP-addressing in the Internet, where it is easier to make the addressing schema reflect the connectivity relation between the nodes. However, in WSNs connectivity depends on the environment, which is hard to predict and too unstable to use for addressing. Even if it was usable, it would produce unacceptable overhead. • Geographic Coordinates: Mapping the actual location of the nodes into the addressing schema assumes the availability of geographic information (as provided by GPS), a certain minimum precision of this information and that geographic distances of the nodes correlate with their connectivity. Furthermore, this connectivity is assumed to be symmetric. • Virtual Coordinates: Since our approach belongs in this group, we discuss it in more detail. 20 3.1 3. Related Work Routing on Virtual Coordinates Protocols belonging to this class use a designated subset of the nodes in the network to function as beacons, i.e. artificial landmarks. All other nodes define their location relatively to each other by their distance to these beacon nodes. Their real distance, of course, would be very hard to derive for devices as limited as sensor nodes, even if the beacon nodes were aware of their actual geographic location. Approaches without the need for real geographic coordinates are for example NoGeo [17] and GEM [15], which lack practical applicability because of too much node state and message overhead for coordinate construction and maintenance. To our knowledge the only practically usable and published implementation of a point-to-point routing protocol is Beacon Vector Routing (BVR), which is why we used it as a case study for our approach and compared our evaluation results to BVR. In the rest of this chapter we explain how BVR works in detail and present some examples for link estimators (the one BVR uses and the Four-bit link estimator) as well as the Bursty Routing Extension to convey an impression of different ways how link estimation can be done. Furthermore, two ideas how to use certain link dynamics to increase the network throughput are discussed. At the end we point out how our protocol differs from the approaches presented here. 3.1.1 Beacon Vector Routing (BVR) Beacon Vector Routing (BVR) [8] manages a set of (randomly chosen) beacon nodes. Every node determines its distance to all the beacons and declares the vector of these distances its address in the network. To accomplish this, every node (not only the beacons) periodically sends out beacon messages to its neighbors containing its current address vector and constantly updates this vector according to the messages it receives. Because of the unreliable nature of wireless links, BVR nodes do not use all of their potential neighbors. Only those to which their connection has a stable long-term quality can be elected as parents or next hops when forwarding packets. This is why not always the neighbor with the smallest hop distance becomes the parent of a node. The details of the parent selection are described in section 3.2.1, where BVR’s link estimation process is presented. With established coordinates a greedy distance-minimizing routing algorithm is used to deliver data packets. For this, BVR defines a distance metric δ(p, d) on the address vectors, which determines how well a node p would be suited as a next hop on the route to the destination d. The intuition behind this metric is that the packet should be routed to the node, whose address vector is most similar to the one of the destination. For this, in every routing decision the absolute vector distance between the destination’s address and the next hop’s address has to be minimized. In the published formulas this is done for the k beacons closest to the destination d making the adjustment of k a possibility of saving packet size. Yet the available implementation uses all beacons for the calculations. Transmitting a packet in direction towards a beacon is considered better than getting farther away from it, since moving away may lead to increasing the distance to the destination in case it is located on the opposite side of the beacon (cf. Figure 3.1). This intuition is reflected in the following formulas: 3.1. Routing on Virtual Coordinates 21 S D D B B S D D (a) S is farther away from B than D. Therefore, it has to send towards B, which means into the grey circle and thereby definitely also towards D, wherever it may be. (b) S is closer to B than D. Therefore, it has to send away from B, meaning in some direction outside the grey circle, which may or may not be towards D. Figure 3.1 Illustration for routing steps towards beacons assuring progress whereas routing further away does not: With given coordinates, the destination (D) has to be located somewhere on the black circle around the beacon (B). The sender (S) has to elect the next hop for a packet route. δ + (p, d) = X max(pi − di , 0) i∈C(d) δ − (p, d) = X max(di − pi , 0) i∈C(d) δ + (p, d) is the vector distance of the possible next hop’s address and the destination’s address restricted to those components which correspond to the beacons which are closer to d than to p. δ − (p, d) represents this distance for the other components, i.e. those corresponding to beacons farther away from d than from p. Because of the phenomenon explained above, δ + has to be minimized with priority to advantage moving towards beacons. Only in case of a tie this has to be broken by minimizing δ − as well. In the actual implementation this is accomplished by always minimizing δ = Aδ + + δ − for a factor A, large enough to realize the higher priority of the first summand (10 in BVR’s implementation). Beside the destination’s coordinates and its unique identifier, which is made necessary by the possibility of duplicate coordinates and to preserve consistency of identities, every packet carries the minimum distance δmin reached so far on the route. To avoid routing loops, forwarding a packet always has to decrease this distance. Having found a list of suitable neighbors, a node uses up to five retransmissions managed by lower layers to the neighbor at the top of the list. If all those fail, the next neighbor in the list is tried. For the case in which the end of the list is reached or it was empty in the first place, because of low connectivity or the inability of finding a next hop which decreases δmin , BVR defines a fallback mode in which the packet 22 3. Related Work is routed towards the beacon closest to the destination. If the smallest component of the destination’s address vector is at position i, then the packet is sent to the forwarding node’s parent number i, which then proceeds with the normal greedy routing algorithm. Every time a node has to resort to the fallback mode the packet approaches the beacon closest to the destination a bit more, which can ultimately lead to the packet arriving at this beacon. After having received the packet, the beacon will also attempt the normal greedy distance-minimizing routing algorithm. If this fails again, it starts a scoped flooding, sending the packet n hops into the network with n being the ith component of the destination’s address vector, as this is an upper bound for the distance between the beacon and the destination. The authors claim their design to be a guarantee for every packet to reach its destination in a fully connected network. BVR offers a number of advantages over other protocols. The algorithm is comparatively simple meeting the requirements of devices with limited capacities such as sensor nodes. It requires only little state on the nodes, as they only have to store information about their direct neighbors. This neighbor information is based on connectivity, which the nodes are aware of anyway, meaning no additional mechanisms (e.g. deriving real geographic locations) are needed to collect it. For the greedy forwarding no assumptions have to be made about geographic information, beacon structure or the topology of the nodes. During the equally simple coordinate construction algorithm several trees are constructed, but packets are not routed along them. Instead BVR provides point-to-point packet delivery, which together with its other advantages makes us consider it the state-of-the-art implementation for these kind of protocols and therefore the ideal comparison for our prototype. Another proposed routing protocol very similar to BVR is Logical Coordinate Routing (LCR) [5], which has four main differences: • Nodes correct invalid coordinates and hop count differences to a neighbor greater than one with information from their vicinity. • Routing loops are avoided by using source routing. The last hops of the route a packet has taken are sent along in the packet as well as a time to live (TTL). • The fallback mode differs from the one in BVR. The procedure called void avoidance by LCR’s authors is executed when a node does not have a neighbor, which is closer to the destination than itself, i.e. a neighbor underbidding the current δmin . In this case the packet is forwarded to the neighbor closest to the destination. Thereby flooding is avoided, but at the expense of a lower probability of packets reaching their destination. A counter to log how many times the void avoidance had to be done is included in the packets and they are dropped if it exceeds a certain threshold. • LCR uses a different distance metric to compare address vectors, which is simpler than the one used in BVR: v u n uX D = t (Vi − Wi )2 i=1 3.2. Link Estimation 23 1 Link�Quality 0.8 0.6 0.4 0.2 0 measured WMEWMA�(t=20�sec,�alpha=0.6) 0 100 200 300 400 Time�(seconds) 500 Figure 3.2 Impression of the inability of long-term link estimators to capture shortterm fluctuations in link quality: Depicted is the actual real time link quality (dotted line) compared to the quality measured by a long-term link estimator using the WMEWMA approach presented in [22]. [2] 3.2 Link Estimation Long-term link estimators monitor the connections of a node to its neighbors over a rather long period of time and derive the packet reception rate (PRR) or some other metric for the average long-term quality of the link. In this way, routing protocols can constrict their routing choices to links with good long-term quality or at least always select the link with the best quality to forward a packet to. Since only long-term link estimators monitor connections over time, only they can detect intermediate quality links. During a short time a link has a good chance of appearing perfect or non-existent, since it may be operational or not operational the whole time. When looking at a larger time scale, the link is more likely to alternate between these behaviors leading to the estimator deriving an intermediate long-term quality for it. Thus, the probability of a link belonging to this intermediate class (i.e. links with a PRR between 10% and 90%) increases as the time period it is monitored becomes larger. We present the long-term link estimator used by the Beacon Vector Routing protocol (BVR) in section 3.2.1 and the Four-bit (long-term) link estimator, currently for example used by CTP, in section 3.2.2. Since changes in the dynamics of links in a network often happen on a sub-second level, long-term link estimators are unable to capture them. They cannot adapt to short-term link quality changes and therefore generally adapt more slowly, which can be observed in Figure 3.2. Another fact long-term link estimators do not take into account (because it does not concern them) is the correlation of packet losses and packet successes at a short time scale, such as short-term link estimators try to detect and use. Because with intermediate links quality changes appear very frequently, a successful packet increases the probability of further packets being sent successfully only for a very short time as well as a packet loss does not necessarily mean the link will be constantly unavailable but only in the current instant and a little bit later [1]. Because they can detect them, short-term link estimators take 24 3. Related Work advantage of the opportunities intermediate links offer, such as temporary shortcuts in the network, which are likely, as with increasing distance the chance of finding good quality links diminishes while the probability for intermediate links rises. We present the Bursty Routing Extension, which is a short-term link estimation concept to be integrated into traditional routing protocols using long-term link estimation, in section 3.2.3. 3.2.1 BVR’s Link Estimation Beacon Vector Routing (BVR) [8] is a routing protocol for point-to-point data traffic. Its functioning is described in section 3.1.1. Here we want to concentrate on the link estimator used in BVR. Since every node’s coordinates are supposed to represent hop distances to the beacons, the nodes monitor the connections to all their neighbors constantly. Only those neighbors to which the connection is of very high and stable quality are used by the nodes for deriving their coordinates as well as for routing packets. To select the usable subset of all neighbor connections a passive link estimator is used which snoops all packets in the node’s reception range, enabled by the broadcast nature of wireless connections. Additional link estimation packets are not needed, but the nodes are required to receive every packet even when no data addressed to them is to be expected, which costs more energy. Furthermore, passive link estimation always bares the problem of detecting packet loss not before the next successful packet is received and thereby a gap in the sequence numbers is detected, with which every outgoing packet is tagged before its transmission. By means of this numbering nodes can determine how many packets they received from a neighbor and how many they missed. Another way to handle this problem and to avoid the delay in packet loss detection is to assume a minimum data rate in the network, which often enough is realistic in WSNs. BVR uses both mechanisms, as the “hello” messages periodically sent by the nodes to communicate their coordinates to their neighbors implies a lower bound for the frequency of sent packets from every node. Because communication partners in wireless networks always may have an asymmetric connectivity, these link quality estimations can only provide information for incoming links from neighbors of the node. This lack of information is tackled by periodically broadcasting a neighbor list. This information lets nodes use good quality incoming links for routing and to derive the node’s coordinates, if the corresponding neighbors attest good outgoing link quality as well. For the latter every node has to decide which of its neighbors is best connected to a certain beacon and choose this one as its parent in this beacon’s tree. The instability of wireless connections leads to the fact that the best connected neighbor does not have to be the one with the smallest hop count towards the beacon, although the coordinates itself comprise hop distances. Therefore, a metric called expected progress, which combines link quality and progress, is used to determine the best neighbor to become a parent. It is defined as hops times link quality to a neighbor, the latter being represented by the expected number of transmissions (ETX), i.e. the reciprocal of the packet reception rate (PRR). 3.2. Link Estimation 25 The logic of the link quality estimation itself (i.e. calculating the PRR) is based on the approach presented by Woo et. al. [22]. No complex calculations to derive link quality are possible on sensor nodes. Therefore, this relatively simple algorithm was chosen in BVR, a window mean with an exponentially weighted moving average (WMEWMA). It calculates the success rate of the packet delivery (i.e. the PRR) for the recent past. How far this goes back in time is defined by t, the number of message opportunities to consider. The total number of packets inside the time frame is at least the number of expected packets (regarding the assumed minimum data rate). If more packets have been received than were expected, the number of received packets forms the basis for the calculation. This is expressed by the following formula: Packets received during t max(Packets expected during t, Packets received during t) The single data points inside this window are either packet receptions or timer interrupts, which signify packet losses. The average over this data is weighted exponentially meaning the significance of older data points decreases exponentially with their age. In this way, the older the link’s history is the less importance it gets compared to the current state of the link, yet it is still being considered when estimating the link’s quality. 3.2.2 Four-Bit Link Estimation Fonseca et. al. designed the four-bit link estimator [7], a hybrid link estimator for many-to-one scenarios in wireless mesh networks, which combines regular beacon packets including route information on the one hand with knowledge (represented by four bits) gathered from the physical layer (one bit), the link layer (one bit) and the network layer (two bits) on the other hand. They designed narrow platform independent interfaces over which the link estimator can communicate with the different layers of the communication stack to exchange necessary information. For evaluation the link estimator of the standard WSN routing protocol Collection Tree Protocol (CTP) was exchanged with a prototype implementing the idea. Today the Four-bit link estimator is used by CTP shipped together with TinyOS. Despite the approach’s platform independence, the prototype developed by the authors was tested on wireless sensor networks, because they consider it the most challenging network type with its multifarious limitations. The restricted capacities of sensor nodes make it necessary for a link estimator to choose which links to estimate, as usually not all available connections fit into the neighbor table in the memory. Every layer contributes the information to the process, which is most efficiently gathered on this layer. Figure 3.3 depicts the structure of this approach. The physical layer provides information about the channel quality during the reception of a packet. The so called white bit is set when the transmitted symbols have a low chance of containing errors. It is a cheap and quick possibility to filter out unpromising links before even considering them further. However, this layer can only evaluate single packets independently and also only packets which actually have been received by the node. If a channel alternates between good and bad 26 3. Related Work Figure 3.3 Structure of the Four-bit link estimator, which is represented by the triangle in the center: Information represented by arrows leading towards a layer are requested by the link estimator on packets the node receives or connections to its neighbors. Information actively provided by the layers is depicted as arrows pointing towards the triangle. [7] quality, these isolated pieces of information have to be combined to prove useful for the overall estimation of the link. Depending on the hardware of the transceiver, maybe also different information can be consulted to derive the status of the white bit. If there is no useful information to be gathered at all, it can even be left unset for every packet, as it then will be ignored by the link estimator. One requirement of this link estimation approach is that the link layer uses acknowledgement packets, which is the case in most common wireless communication technologies such as IEEE 802.11b or 802.15.4. The ack bit is set whenever the acknowledgement for a transmitted packet has been received. This couples link estimation and data transfer, as all packets are used for the acknowledgement statistics leading to more accurate and realistic results than active probing alone. Moreover, with this technique the loss of a packet is noticed already before the next successfully received packet arrives. Without considering the acknowledgements, only a gap in the sequence numbers of the received packets would indicate failed transmissions. The problem with relying on data packets for link estimation is the circular dependency induced by data traffic presuming link estimation to enable the routing of the data packets. Finally, the decision which links may be helpful for the routing process and which may increase the danger of producing circles in the topology or pruning nodes from the network is incumbent on the network layer and expressed via the pin bit and the compare bit. The former can be set on an entry in the neighbor table to indicate that the link to this neighbor is currently being used and therefore not to be evicted. The latter can be requested by the link estimator resulting in a network layer evaluation if the corresponding link is to be considered more promising than any link currently cached in the neighbor table of the node. Of course, to answer this, access to some routing information is needed, which is provided by periodic beacon packets. Because of the usually high number of potential neighbors of a node in a WSN, it is imperative to saving precious memory that only a subset of the connections to the node’s neighbors are maintained in the routing table. The final decision about which 3.2. Link Estimation 27 sink Figure 3.4 Collection tree established by CTP and additional bursty links (dotted lines), which provide routing shortcuts from time to time connections to store in this table is to be rendered including information of all the layers to avoid incompatible decisions, which would impair the routing performance. 3.2.3 Bursty Routing Extension The Bursty Routing Extension (BRE) does not aim to substitute an existing longterm link estimation mechanism, which provides a stable routing topology. Its purpose is to be seamlessly integrated into existing many-to-one routing protocols to cooperate with the long-term mechanism by disclosing temporarily available shortcut opportunities and to use the long-term solution as backup in case the short-term approach of BRE does not yield any promising routing opportunities. It provides a way to detect bursty links, i.e. links which right at that time allow limited packet transfer, using a short-term link estimation mechanism (STLE). Based on this it defines an adaptive routing strategy, which allows the sensor node to use these bursty links temporarily. BRE was tested accompanying the link estimator of CTP (the Four-bit link estimator). BRE works passively overhearing packets sent by the node’s neighbors and creating a history for each connection, which is used to find out how many consecutive packets were received and not received over it. The sequence number carried in each packet makes it possible to fill the history also with packet loss events as soon as the next successful packet arrives. Thresholds are defined to indicate on the one hand after how many consecutive successful packets a link is to be considered useful for routing and on the other hand after how many consecutive unsuccessful packets a link is to be considered unavailable again. If either one is the case, the routing protocol is informed about this, otherwise the history of the link is collected further. In experiments BRE’s authors found three to be a suitable value for the first threshold and one for the second. To explain the algorithm behind BRE, three types of nodes have to be defined: The sender-node sends packets to its parent in the routing tree, which is the node on the next level towards the sink. A node overhearing packets between those two is the 28 3. Related Work overhearing-node. Every node in the network can assume any number of these roles at any time. If a bursty link was found by the overhearing-node because the threshold for consecutive successful packets has been reached, the routing protocol is queried for the ETX of the sender-node’s parent. It is likely that the overhearing-node has information about this, since it can overhear packets sent between the sender-node and its current parent, therefore the parent can be assumed to be in the overhearing-node’s neighborhood as well. Its ETX is compared with the ETX of the overhearing-node to decide if the latter would be a better parent for the sender-node. If this is the case, the overhearing-node informs the sender-node that it volunteers as its new parent, because this would lead to a routing advantage, i.e. a reduced hop count between the sender-node and the sink. This announcement automatically functions as a symmetry check for the connection between the sender-node and the overhearing-node, which is necessary, because until then the overhearing-node only derived the link to be usable in one direction. In case of a symmetric link the sender-node makes the overhearing-node its temporary parent for the time the bursty link is available. This change, however, is not propagated through the network. The link is used until the sender-node notices the threshold for unsuccessful packets to the temporary parent to be reached, which makes it use another bursty link at its disposal or resort to traditional routing. BRE is a very reliable approach. It was designed to influence the routing process very carefully, which is shown by the facts that really bad links are not considered for routing, because they would not reach the success threshold and that an aggressive back-off is in place, which leads to a fallback to traditional routing triggered after only one failed packet. This assures intermediate links in the network only to be used if the routing process can really benefit from using them. While many traditional routing protocols use periodic beacon messages to measure the PRR of all the links in the network, BRE does not induce additional overhead but works by only passively overhearing data packets, which also is a much more accurate way to measure link qualities because of the usually much higher frequency of data traffic compared to beacon traffic. Overhead is also limited since changes in the topology induced by BRE stay local. The locality of the changes additionally decrease the chance of instabilities, e.g. loops, happening in the network. Those often occur when in case of sudden connectivity loss to its parent a node chooses a neighbor with a very high ETX to replace it. A node can detect a loop by receiving a packet from a neighbor which has a lower ETX than itself. A parent change because of BRE’s algorithm is very unlikely to cause this, since only nodes with lower ETX than the former parent can be chosen to replace them temporarily. The locality of the changes decrease the probability of loops further. In experiments BRE did not let its authors experience any additional loops compared to their already low number in traditional routing. 3.3 Link Dynamics The following concepts also do not work like traditional long-term link estimators. They merely try to exploit certain characteristics of connections between nodes in 29 Conditiona l proba bility Conditiona l proba bility 3.3. Link Dynamics 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 -100 -50 0 50 100 Cons e c utive fa ilure s /s uc c e s s e s -50 0 50 100 Cons e c utive fa ilure s /s uc c e s s e s (b) KW = 0.06, β = 0.8. Conditiona l proba bility Conditiona l proba bility (a) KW = 0, β = 1. 0 -100 1.0 1.0 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 -100 -50 0 50 100 Cons e c utive fa ilure s /s uc c e s s e s (c) KW = 0.20, β = 0.5. 0 -100 -50 0 50 100 Cons e c utive fa ilure s /s uc c e s s e s (d) KW = 0.14, β = 0.3. Figure 3.5 CPDFs and corresponding KW and β values: The positive part of the X-axis represents the consecutive successes, the negative part the failures. The horizontal line denotes the average packet reception rate (PRR) of the link. [20] a wireless network to make use of good phases of intermediate links. As we will see in chapter 4, network links exhibit a bursty behavior. When the link is available, bursts of packets can be sent, and in between there are times with no connectivity and therefore packet loss. β (section 3.3.1) is supposed to be a metric to measure the burstiness of links to configure routing protocols accordingly, while opportunistic routing (section 3.3.2) tries to make use of several neighbors overhearing a transmission to the next hop of a route in case this transmission fails to reach this hop. 3.3.1 The β-factor Srinivasan et. al. proposed a metric for the burstiness of links, which they call β [20]. In their contribution they explain causes for burstiness of links and a way to measure it. Furthermore, they designed an algorithm which exploits this knowledge. They show how to improve the performance of an existing routing protocol (CTP) significantly by adjusting only one parameter based on their findings, namely the time to wait after a packet transmission has failed. Hence, what this link estimation method would contribute to a routing protocol is monitor the links of the network (vicinity) for a while, derive their β-values and configure the timeout accordingly. β is a value between 0 and 1 and states how bursty a link is, meaning how the times of good and bad quality correlate. To define β one first has to understand conditional probability delivery functions (CPDF) [12]. These functions illustrate the probability of a packet to be transmitted successfully after a certain number of consecutive successful transmissions or failures. Some examples can be seen in Figure 3.5. If a link has a β-value of 1 (as in Figure 3.5(a)), this signifies that it exhibits an ideal bursty behavior, i.e. the time periods which allow sending correlate very much. If 30 3. Related Work Figure 3.6 Calculation of KW and β of an example link: The KW between this CPDF and the ideal bursty one is the mean of all the ek . [20] the link has already transmitted an arbitrary number of packets successfully so far (even only one), it has a 100% chance of successfully transmitting more packets after that. However, if any number of packets fail to be transmitted, there is definitely no way of successfully sending another packet. This means, the link is constantly on or off, either way only one packet suffices to find out its status, and more test packets do not provide any additional information. The PRR of the link is either 0 or 1 depending on if it is on or off. On the other side of the spectrum there are links with a β-value of 0 signifying that they are not bursty at all, i.e. the time periods which allow successful sending over these links are independent from each other. The CPDF of such a link would process along the line denoting the PRR of the link, because any number of transmitted or failed packet transmissions would not mean anything for the chance of transmitting the next packet successfully. This chance would constantly be equal to the long-term PRR independent of the number of successful or failed packets sent in a row. Looking at the CPDF of a link as a visualization of its burstiness one can define a scalar value for this property, the Kantorovich Wasserstein distance (KW ) [19], which numbers how close the CPDF of a link is to the CPDF of the ideal bursty link with the same PRR. The KW is the average difference of the components of two vectors. To illustrate this, there is an example CPDF shown in Figure 3.6. Let KW (E) calculate the KW of a given (empirical) CPDF named E and the CPDF of the ideal bursty link with the same PRR as the link belonging to E. With I being the CPDF of the completely independent link we can now define β: β= KW (I) − KW (E) KW (I) Thus, β indicates how much closer to the perfectly bursty link’s CPDF that of an empirical link is compared to that of the completely independent link with same PRR as the empirical link. All this is set in relation to the distance between the perfectly bursty and the perfectly independent link. In terms of Figure 3.6 this translates to: 3.3. Link Dynamics 31 β= mean(i1 , . . . , in ) − mean(e1 , . . . , en ) mean(i1 , . . . , in ) A completely bursty link would have a CPDF, in which all ek were 0, hence β would derive to 1. The CPDF of a completely independent link would cause all ek to be equal to the corresponding ik ’s, which would make β equal to 0. Being aware of β for all the links to a node’s neighbors, the correlation between individual successful and failed packet transmissions can be exploited if it exists, i.e. if β is high. In this case a packet loss implicates a low chance for further packets to be transmitted without failure. Thus, the routing protocol should interrupt the packet flow to avoid wasting energy on retransmissions and wait an appropriate amount of time before it starts transmitting again. There exists a tradeoff between latency (longer waiting) and possible energy waste (shorter waiting), but since usually in WSNs energy is much more valuable than time, a certain latency is acceptable to break the correlation between packet failures. In experiments performed by the authors a reasonable waiting time was determined to be around 500ms, which can easily be spent. If the node has more packets to send to a different neighbor, the waiting time can be filled with this as well. The authors implemented an algorithm, which uses opportune transmissions to exploit the knowledge provided through β. It sends packets back-to-back as quickly as possible until a single failure occurs. In this event the packet flow is paused for the next packet to have an independent chance of getting transmitted. Results show that in networks with many high-beta links a high improvement can be achieved by using opportune transmissions, thereby individually adjusting the inter packet intervals (IPIs), i.e. the time between the packets sent. (The proposed algorithm implies an IPI of almost 0 between successful packets and a significant amount of time (e.g. 500ms) after unsuccessful ones.) This suggests that β indeed measures the link’s burstiness, since opportune transmissions are known to perform well in networks with a lot of bursty links [20]. However, sending packets at a lower rate decreases the improvement, since higher distance between packets weakens or even breaks the correlation of their successful transmissions also leading to lower β’s for the links. Further evaluation shows opportune transmissions only to improve high-beta links and improve links with a low PRR more than good quality links. Moreover, the authors claim their approach to be applicable for a wider range of wireless technologies than just WSNs. 3.3.2 Opportunistic Routing Another approach to exploit short-term link dynamics is Extremely Opportunistic Routing (ExOR) [3], an example for an opportunistic multi-hop routing protocol for wireless networks. With ExOR batches of packets destined for the same host are collected and sent in a burst. The next hop on a route is being selected after the packet was forwarded, therefore the decision is based on which nodes actually received something. In the general idea the sender node broadcasts the packets, then the receiving nodes agree on who is the closest to the destination and then this node 32 3. Related Work broadcasts them further repeating the procedure. The published implementation was tested on a 802.11b mesh testbed. Among the challenges involved in this approach is the agreement algorithm which has to ensure packets to be transmitted with minimal overhead for the process as well as exactly once, i.e. neither duplicate packets nor packet loss should occur. Duplicate packets do not only account for wasted transmissions but also for an increased probability of collisions and higher interference. The overhead can be kept reasonable by batching together a significant amount of packets leading to fewer agreement phases and by restricting the number of possible forwarders which have to agree in each phase. The protocol works as follows: The sender node collects a batch of packets, broadcasts them and includes in every packet header a list of potential forwarders ordered by their closeness to the destination. Every node which receives a packet from this batch and finds itself in the list, buffers all incoming packets and waits until the burst is over. The node in the list with the highest priority (i.e. the one at the top of the list) that has actually received and buffered packets then starts forwarding them including its batch map in each header indicating for every packet the highest priority node that has already received it. The destination (which is always the node with highest priority in the list) only sends batch maps and no data to inform its vicinity of successful packet deliveries. Every node sets a timer to derive when to start broadcasting its buffered packets. This should be after all the nodes with higher priority had a chance to send their buffered packets. When a node receives packets from a higher priority node, the waiting time is extended according to the expected number of packets still to be sent by this node. If no packets are received at all, a minimum waiting time sufficing for five packets of each node with higher priority is assured before the sending begins. In this way, all the forwarders send in order of their priority all the packets they did not already receive an acknowledgement for from a higher priority node (in form of a batch map). After the node with the lowest priority finished transmitting, the cycle starts anew. Nodes refrain from sending any packets during their turn in case their batch map indicates at least 90% of the packets having reached nodes with higher priority. The remaining packets are requested by the destination directly from the source via traditional routing mechanisms, since the overhead of the agreement algorithm would be too high for such few packets and the likelihood for nodes setting wrong timers would increase. The authors of ExOR claim their approach to perform significantly better than traditional routing algorithms, mainly because of two effects: First, traditional routing algorithms always route over one forwarding station, in case of packet loss this leads to retransmissions. Opportunistic routing saves transmissions, because several next hops are tried, which leads to each packet very likely being received by at least one of them making some progress towards the destination. Second, ExOR takes advantage of unexpectedly long or short connections. If a node receives a packet, which is unusually close to the destination, hops and thereby time and transmissions can be saved. In case of a packet not reaching as far as expected in the destination’s direction but at least being received by some node on the way, it has still made 3.4. Our Approach 33 N24 N20 N18 N11 N8 N17 N13 N5 0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 Time (sec) Figure 3.7 Example of an ExOR run: 100 packets were transferred from node N5 to N24. The grey bars indicate packets sent (the longer the more, different shades of grey for different packet batches). [3] some progress and can be forwarded from its new position. In traditional routing the transmission would be lost and would have to be repeated, maybe even in the first case, because the intended next hop was not reached. To compose the forwarder list included in the packets sent by the source, priorities for every node in the network have to be derived to include the possible forwarders in the right order into the list. To calculate the priorities the expected costs to deliver the packet over the corresponding node have to be derived, i.e. the path ETX from itself to the destination. This means the nodes have to obtain and to store global knowledge about packet reception rates. For this, periodic link-state information is flooded through the network. In addition to this, nodes have to keep state of each batch they are participating in, buffer all received packets of the batches and their forwarder lists. Figure 3.7 illustrates an example of an ExOR run. A batch of 100 packets is sent by node N5 including a forwarder list comprising the nodes on the left with decreasing priority from top to bottom. Nodes N24 (the destination) and N20 did not receive any of these packets, otherwise they would have sent them immediately after N5 finished transmitting its packet batch. The highest priority receiver is N18, which starts forwarding every packet stored in its buffer after an appropriate waiting time. After this the other nodes follow in order of their priority. The overlapping in the timeline happens due to nodes not being able to overhear packets sent by higher priority nodes and consequently not waiting long enough before sending (i.e. only the default time for five packets). All forwarders being done sending, the original sender (N5) retransmits all packets it did not overhear being sent. Nodes eventually stop sending one by one when they overheard at least 90% of the packets sent by higher priority nodes. 3.4 Our Approach From studies such as the β-factor we learn about link dynamics and how we could exploit them. β measures the burstiness of a link. This helps to see how usable the 34 3. Related Work connections might be despite their probably low long-term qualities and to support an existing routing protocol. However, our approach does not include introducing metrics for network characteristics to decide which links to use for packet forwarding. Our protocol takes advantage of bursty intermediate links along with all the others just by itself (as long as they appear symmetric), i.e. without specifically identifying them before. The Bursty Routing Extension is, as the name suggests, an extension to an existing long-term link estimator enabling a protocol to use intermediate links additionally if they provide shortcuts. The idea of using links with unstable quality during their promising times also inspired us while designing SVR. However we wanted to avoid using link estimation altogether to save the overhead and the complexity that comes with it. Additionally, BRE was designed for protocols supporting many-toone traffic. It should be adaptable for point-to-point routing on virtual coordinates as well, since this would only be an increase in dimension. However, we think that for this purpose SVR is a more direct approach. Moreover and maybe more important, it is a stand-alone solution. ExOR was tested on 802.11b testbeds, and it seems to be more suitable for this technology, since these environments offer more resources to use, i.e. are less limited than WSNs. Maybe on balance the transmissions saved because of scattering the risk for retransmissions compensate the communication necessary for the agreement process. However, participants of a wireless sensor network could never bear the overhead implied by the calculation of the forwarder list and the necessity for global information on connectivity making periodic link state flooding necessary. Nodes would have to store information about all the packet batches, buffer the messages themselves and calculate priorities for all the nodes in the network for each batch. Moreover, ExOR only works for large bursts of packets, as the overhead would be unacceptable for too small batches. This becomes clear by the fact that ExOR does revert to traditional routing for the last 10% of a batch’s packets, which failed to be transmitted. All this renders opportunistic routing not scalable, hence inapplicable for WSNs, since the costs are proportional to the size of the network. Furthermore, in networks with low connectivity – which is likely to be the case for WSNs – packets would get lost despite the existence of several forwarders, because during the agreement process messages could not be exchanged properly. Since simplicity is one of our goals, we keep the node state small and use unicast messages with a predetermined receiver. Our trust lies within the fact that SVR’s information about the state of a connection is very up to date and therefore reliable. The Four-bit link estimator, although designed to be platform independent, profits from certain radio transceiver features. Furthermore, for the information provided by the network layer (i.e. which links to monitor and which neighbors to elect) additional routing information is required, for which beacon messages have to be exchanged. The routing table in SVR is filled naturally with the neighbors most recently heard of, although more sophisticated methods to determine which neighbors to evict in case of a full table would be interesting to explore. The Four-bit link estimator is a typical long-term approach which cannot detect and act upon short-term dynamics. SVR’s beacon messages combine link estimation (in our case just via “hello” messages) and routing information exchange (coordinates). Thereby we avoid the latency when detecting packet loss because of only passive link estimation as well as the circular dependency from just overhearing data packets. 3.4. Our Approach 35 The most similarities SVR shares with BVR. It also derives virtual coordinates for all the nodes based on their connectivity, which is the most meaningful correlation in WSNs. However, BVR uses a long-term link estimator, which (although not exceptionally complex) induces significant overhead to overhear all the packets sent by neighbors, to send packets to determine link symmetry, calculating link qualities and keeping state of them. In contrast to this, SVR refrains from using link estimation as such thereby ensuring a much simpler functioning. We claim all this complexity not to be necessary to provide reliable, adaptive and scalable point-to-point data traffic in wireless sensor networks. Furthermore, SVR’s routing metric is much simpler than the one BVR uses, although we think that with a more sophisticated metric SVR will outperform BVR even more. 36 3. Related Work 4 Analysis In this chapter we present our preliminary experiments which made us aware of the burstiness of wireless network links and led to the idea of exploiting this burstiness for routing. First we explain our experimental settings, which testbeds we used and how they are configured, how we gathered our data and what aspects we explored, before we present our results. We examined the length of packet bursts in a network as well as some metrics for coordinate dynamics. At the end we summarize the implications these observations have for our design. 4.1 Experimental Settings After testing our code with the TOSSIM simulator and on our local testbed at the university we conducted our experiments (for this chapter and the evaluation chapter) on three different testbeds: MoteLab [21], Indriya [6] and TWIST [11]. Our own testbed comprises almost 20 nodes distributed over several rooms of the floor where our research group is located. All the testbeds utilized by us comprise TMote Sky (also called Telosb) sensor nodes. The Telosb nodes each consist of a Texas Instruments MSP430 processor with 8MHz, 10KB of RAM, 48KB of program flash memory and a Chipcon CC2420 radio transmitting up to 250KB/s at 2.4GHz (IEEE 802.15.4) as well as sensors for humidity, temperature and light intensity, both visible light and infrared. MoteLab1 is a testbed in the Maxwell Dworkin Laboratory, the Electrical Engineering and Computer Science building at Harvard University. It consists of 170 nodes, although at the time we executed our experiments only 99 of them were available for our experiments. Since on MoteLab the distances between the nodes can be large, we let all the nodes always send with full transmission power. 1 http://motelab.eecs.harvard.edu 38 4. Analysis Indriya2 is located at the School of Computing at the National University of Singapore. It comprises 127 nodes, 71 of which are equipped with an additional sensor board (WiEye, SBT80, or SBT30, all from EasySen) providing them with the ability to sense acoustic signals, as well as with a magnetometer, an accelerometer, and a more powerful long-range infrared sensor. TWIST3 is deployed at the building of the Telecommunication Networks Group at the TU Berlin. We used 94 of the 102 available nodes. In contrast to the other testbeds, for TWIST almost no topology information is available due to privacy reasons. Still we experienced a very good connectivity between the nodes. Therefore, we used a power level of -15dBm to simulate a lower node density. Sending with full transmission power would not result in expressive data, since almost no multi-hop communication would take place. In all the testbeds the nodes are attached to walls or ceilings on three floors inside their buildings and are connected to a stable power source (no batteries) and an ethernet backbone (via their USB connection). This allows them to be programmed easily over a central server. For every testbed there exists a web interface to upload program images to nodes and download data produced during their operation. The format of the stored data and the way it is stored vary, but in all testbeds every message is stored, which is sent over the nodes’ Universal Asynchronous Receiver Transmitter (UART), i.e. the USB interface connecting them with the backbone. The way an experiment works is that the nodes are programmed with a TinyOS application, some time passes in which this application is executed by all the nodes, data is produced and stored, and in the end the nodes’ memory is erased, and the next experiment can be initiated. During the time of the experiment one can interact directly with the nodes over their serial interface, meaning messages can be sent and received over the ethernet backbone (and thereby externally over the Internet). In order to gather the right data a programmer has to make the TinyOS application produce the corresponding messages and send them over the backbone. Furthermore, to control nodes during the experiments the application has to react appropriately to messages sent by the experimenter and received over the backbone. We first performed experiments to find out about the length of packet bursts on wireless network links. After becoming confident that we can exploit these bursts, we explored how the link dynamics would influence the coordinates in a virtual coordinate system if we refrain from using explicit link estimation at all. There are some parameters that one can vary in our experiments such as the size of the nodes’ neighbor table. The choice for these parameters and the values we used for them is explained in detail in section 7.2. 4.2 Link Dynamics Previous studies ([1], [2], [20]) have shown the bursty nature of wireless network links. If a link does not have a very stable long-term quality, meaning it is not constantly available for transmission, then there are times in which bursts of packets can be 2 3 http://indriya.comp.nus.edu.sg http://www.twist.tu-berlin.de 4.2. Link Dynamics 39 typedef n x s t r u c t bltMsg { n x u i n t 1 6 t seqNumber ; n x u i n t 1 6 t beaconCounter ; nx uint16 t sender ; } bltMsg t ; typedef n x s t r u c t b l t S e r i a l M s g { nx uint16 t origin ; n x u i n t 1 6 t seqNumber ; n x u i n t 1 6 t beaconCounter ; nx uint16 t sender ; nx uint8 t rssi ; nx uint8 t lqi ; } bltSerialMsg t ; Listing 4.1: bltSerialMsg: This message is logged whenever a message (which is of the type bltMsg) is received over the radio during a link dynamics experiment. sent successfully over this link and unavailable times in between. The length of these bursts may correlate with the long-term quality of the links but does not have to. If a connection is available for many very short times, this will also lead to a rather high racket reception rate (PRR) and thereby to a high long-term quality. We wanted to measure the length of packet bursts in wireless sensor networks (WSNs). In our first experiments we chose 10 nodes more or less evenly distributed over the network4 to sequentially send out very long bursts of packets (3000, basically containing only a sequence number) at a very high rate (every 50ms), which means every packet burst had a length of about 150 seconds. During this for wireless links rather long time period the connectivity between the nodes changed a lot, and therefore usually only parts of the 3000 packets were received by other nodes. We let all the nodes in the whole network log which packets they received from which sending node. The radio module of the telosb nodes also allowed us to gather additional information about the reception quality such as the Received Signal Strength Indication (RSSI) and Link Quality Indication (LQI) values. However, in the end we did not use these values for our statistics but only to get an impression of the links in the network. Overall we logged the following information (cf. Listing 4.1): • origin: Here we store the nodeID of the node that received the original radio message and therefore created this logging message. • seqNumber: This is the overall sequence number of the corresponding sender node. Usually every node sends only one packet burst in every experiment, but in case this does not hold, the bursts can be ordered according to this field’s values. 4 Because we lack topology information for the TWIST testbed, we always have to choose random nodes from it. 40 4. Analysis 1.0 MoteLab Indriya Twist CCDF [P(X>x)] 0.8 0.6 0.4 0.2 0.0 0 10 101 Burst Length [number of packets] 102 Figure 4.1 Distribution of the lengths of packet bursts in a network: We see a lot of long packet bursts, which can be used for successful routing. (Notice the logarithmic x-axis. Bursts longer than 100 packets have been omitted, since their number was negligible.) • beaconCounter: The sequence number inside one packet burst is stored here. If a node sends more than one burst, it is reset each time a new burst starts. • sender: Here is the nodeID of the node that sent the original radio message, which is recorded by this logging message. • rssi: We logged the Received Signal Strength Indication (RSSI) value for the received packet. • lqi: We logged the Link Quality Indication (LQI) value as well. From this data we derived the length of the packet bursts that came through from a sending to a receiving node, i.e. the interval lengths of all consecutive sequence numbers received. The outcome is depicted in Figure 4.1. It shows the Complementary Cumulative Distribution Function (CCDF) of the burst lengths measured during the whole experiment. These functions are the complementary versions of the Cumulative Distribution Functions (CDF). CDFs (similar to the CPDFs in section 3.3.1) state the probability for a value out of a distribution to be at most as high as the value on the x-axis. Consequently, CCDFs comprise the complementary probabilities, i.e. the chance of a value from a distribution to be at least as high as the x-value. This leads to the relation CCDF(x) = 1 - CDF(x). In Figure 4.1 we see that the majority of the packet bursts is rather short, but there also exists a number of longer bursts that can be useful for routing. For example depending on the testbed between 10% and 20% of the bursts are at least 10 packets long, and between 5% and 10% of the bursts even exceed the length of 20. 4.3. Coordinate Dynamics 41 typedef n x s t r u c t SVRBeaconLoggingMsg { nx uint8 t sender ; nx uint16 t seq no ; Coordinates current coords ; C o o r d i n a t e s mean coords ; nx uint8 t number of neighbors ; n x u i n t 8 t n e i g h b o r s i d s [MAX NEIGHBORS ] ; Trace t r a c e s [N ROOT BEACONS ] ; } SVRBeaconLoggingMsg ; Listing 4.2: SVRBeaconLoggingMsg: This message is logged every time a node sends out a beacon message. Its content equals that of a SVRBeaconMsg. 4.3 Coordinate Dynamics These results mean even far away neighbors of nodes may provide a useful link enabling them to receive several packets once in a while. Thus, we implemented a basic algorithm realizing our idea of a virtual coordinate system (cf. section 2.4.2) with several beacon nodes and observed the development of all the nodes’ coordinates over time pursuing a very greedy coordinate establishing mechanism, which section 5.3 explains in detail. We wanted to see how dynamically the coordinates (i.e. hop distance to the beacons) would evolve if we used every opportunity of a shortcut in the network that presents itself. We let every node send out periodic one-hop beacon messages stating its current distance to all beacon nodes in the network. The nodes derived the current distance by adding 1 to the minimum coordinate they received from their neighbors. The frequency of the beacon messages was very high (every second for the first 10 minutes and every 10 seconds for the last 20 minutes of the experiments) to capture even short-term dynamics in the nodes’ connectivity and to compensate for the short experiment duration. Simultaneously to the beacon sending every beacon message sent was logged by all the nodes. The logging message’s format is basically identical to the one of the beacon messages themselves (SVRBeaconMsg), which is given in Listing 6.2 in section 6.2, and the fields all have the same meaning as in the SVRBeaconMsg message type. For this experiments we only used the values of the current coords field, as they contain the node’s current hop distance to the beacon nodes (the current coordinates). The results of these experiments are shown in Figure 4.2. We measured the following values each for the whole duration of the experiments (30 minutes): • coordinate range: the difference between the maximum and minimum coordinate • average coordinate: the average distance to any beacon • coordinate change rate: the share of beacon intervals in which a coordinate changed 4. Analysis 1.0 1.0 0.8 0.8 CDF [P(X x)] CDF [P(X x)] 42 0.6 0.4 0.2 0.00 4 2 6 8 10 Node Coordinate Range 0.01.0 (a) CDF of the coordinate range 1.0 1.0 0.8 0.8 0.6 0.4 0.2 0.00 20 30 40 50 1.5 2.0 2.5 3.0 Node Average Coordinate 3.5 4.0 MoteLab Indriya Twist 0.6 0.4 0.2 MoteLab Indriya Twist 10 MoteLab Indriya Twist (b) CDF of the coordinate average CDF [P(X x)] CDF [P(X x)] 0.4 0.2 MoteLab Indriya Twist 14 16 12 0.6 60 Node Change Rate [Share of Intervals with Change, %] (c) CDF of the coordinate change rate 0.080 85 90 95 Share of Nodes Changed per Interval [%] 100 (d) CDF of the number of changed nodes Figure 4.2 Results from the Coordinate Dynamics experiments: The numbers show a very dynamic environment with frequent and drastic coordinate changes, which makes it unsuitable for routing unless we find a way to reduce the overhead induced by the dynamics. • nodes changed: the percentage of nodes that changed their coordinates in any beacon interval We see an environment which looks very dynamic, although we do not have a point of reference here. A comparison with other environments is given in section 7.3. The testbeds present rather different environments themselves, for example the average coordinates (as a metric for the connectivity in the network) vary quite much between Twist and the other testbeds, as can be seen in Figure 4.2(b). Twist is a very dense and well connected testbed, hence the short paths even for our low power level of -15dBm. Our aim was to evaluate our prototype in several different environments, which our testbeds with different node densities provide. To explain the results we took a closer look at the nodes’ coordinates, their development and distribution. Figure 4.3(a) shows an example of how a node’s distance to a single beacon develops during an experiment. We see a lot of changes, a high coordinate range, but we also see a concentration of the numbers, which is illustrated in Figure 4.3(b). It shows the distribution of the same coordinate, which is almost a normal distribution with a mean of 10. 4.3. Coordinate Dynamics 43 0.30 13 0.25 12 Probability Current Hops to Beacon 14 11 10 9 0.15 0.10 0.05 8 7 0.20 0 20 40 60 80 100 120 140 Beacon Intervals (10 sec) (a) Development of coordinates 160 180 0.00 6 8 10 12 Hops to Beacon 14 (b) Distribution of coordinates Figure 4.3 The distribution of coordinates (i.e. hop distances to a beacon node) shows a lot less dynamics than their actual development. This is what we want to deal with. The idea of Statistical Vector Routing (SVR) implies this distribution to stay more or less the same over a very long time. The actual current distance of a node is always located somewhere on this distribution. There it may move around a lot, but the probability with which it is at a certain position on the distribution should only change if the network topology is altering significantly. Otherwise a node may just as well be identified with its address distribution rather than with its current address. This is what we want to achieve to reduce the overhead for the updates in the address knowledge base of the network. Our analysis creates confidence regarding this goal, that we can exploit the bursty nature of wireless connections to take advantage of opportunities provided by shortcut links inside the network, which present themselves every now and then. To realize this, however, it is important to find ways to represent and publish the coordinate distributions that keep the induced overhead on a reasonable level. In chapter 7 we will see how much our approach reduces the coordinate dynamics and that routing indeed is possible using our newly designed representation of the coordinate distribution. 44 4. Analysis 5 Design This chapter comprises the design of the Statistical Vector Routing (SVR) protocol. It starts by listing the design goals we wanted to achieve, as well as the assumptions we made for our design and the aspects we concentrated on as opposed to the aspects we considered out of the scope. A detailed description of SVR’s addressing mechanism is given, followed by an explanation of the statistical values we use to route on and the way routing decisions are made. 5.1 Design Goals Our goal is to develop a point-to-point routing protocol for wireless sensor networks (WSNs) in contrast to the many-to-one or one-to-many routing schema widely used (e.g. by the Collection Tree Protocol, CTP, cf. section 2.4.2). In CTP the whole network is organized in a routing tree that roots in a sink node designated to collect all the data produced by the network (or dispense data in it). This is practical if all the data actually has to be sent to one node, while arbitrary point-to-point traffic would have to be sent up the whole tree from the sender to the sink and from there on back down to the destination. This would be the only way in the network the nodes know of and hence the only possibility of transmitting data from one arbitrary node to another. With our approach, however, every node in a network shall be able to send data to any other node without the packets taking absurdly long paths to reach their destination. Thus, we need an addressing schema that supports variable sender and receiver pairs and navigating in arbitrary directions through the network. We found that the best way to achieve this is to use a virtual coordinate system into which the nodes are mapped. True absolute location information is very hard to obtain on devices as limited as sensor nodes, but the absolute location of the nodes does not concern us anyway. The nodes’ location in relation to each other suffices to aid routing decisions. Hence, we derive locations in relation to members of a certain 46 5. Design 1 quality�>�0.9 quality�>�0.1 Probability 0.8 0.6 0.4 0.2 0 2 4 6 8 10 12 14 16 18 20 Distance�(metres) Figure 5.1 The probability of finding a good (>0.9) or intermediate (>0.1) quality neighbor decreases with growing distance between the node and its neighbors, but there are possibilities to cover substantial distances from time to time via long distance neighbors [2]. small subset of designated nodes which function as landmarks (i.e. reference points). These nodes are referred to as beacons or beacon nodes. Still, true relative location information needs a lot of effort to be gathered. However, what we are really interested in is not the location of the nodes but their connectivity. Of course these two properties correlate. Nodes which are closer to each other usually have a higher connectivity, i.e. a higher probability of a connection between them and a higher quality of that connection. It is however possible that nodes in direct proximity of each other share a very low quality link, if any at all, or that a node can (at least for a short while) receive packets from a neighbor located very far away. We want to base addressing and routing on short-term connectivity to adapt quickly to changes in the network topology and to use shortcuts provided by temporarily available connections to nodes relatively far away. Instead of estimating long-term link qualities to identify and restrict usage to constantly reliable links in order to achieve a stable routing topology we want to use every opportunity of sending a packet the farthest towards the destination as possible. Figure 5.1 illustrates that the probability of finding a neighbor with a good (>90% packet reception rate, PRR) or intermediate (>10% PRR) link quality correlates with the nodes’ distance, but also that a significant number of long range neighbors can exist in a network. For example in this experiment at a distance of 14 metres the chances of finding a good quality neighbor are still almost 10% and for an intermediate quality neighbor even around 20%. Following this very greedy idea also when establishing the nodes’ addresses (i.e. when deriving the distances between them and the beacons), this mechanism is bound to result in frequent changes in the nodes’ addresses. In order to let other 5.2. Assumptions 47 nodes know where to send a packet to in case they want to reach a certain participant of the network, addresses have to be published somehow. However, publishing these addresses every time they change would induce unacceptable overhead eliminating any advantage in routing performance there might be. Our goal is to condense the necessary routing information by learning from the dynamics in the network and to arrive at information that can guide the routing process but is more stable than the raw information we gather during the address derivation process. Therefore, we see the nodes’ addresses as probability distributions and use certain relatively stable statistical values derived from the addresses over time to publish them and to route on, rather than the ever-changing actual coordinates themselves. These statistical values have to be designed in a way that, on the one hand, they are stable enough to keep the overhead for updates on a reasonable level, and on the other hand, are precise and adaptive enough to ensure a good performance when used as basis for routing. Last but not least, the calculation of these values also should be simple enough and the transmitted and stored data has to be small enough for a device as limited as a sensor node to be able to handle them. The whole protocol aims at simplicity to minimize requirements regarding memory, computational power and transmitted data. To summarize, our design goals are the following: • SVR has to support point-to-point traffic enabling arbitrary nodes to send packets to any other node in the network. • It has to use an addressing schema based on virtual coordinates eliminating the necessity of true location awareness for the nodes. • The nodes’ addresses used for routing have to be stable enough to keep the maintenance overhead on a reasonable level. • Yet the addresses have to be precise and adaptive enough in capturing node connectivity such that routing success can be sustained. • The protocol has to impose minimal requirements on the hardware of the nodes and their capacities. 5.2 Assumptions We make certain assumptions about the conditions we find in a network in which our protocol would offer its functionality. The addressing schema we introduce needs a number of nodes that function as reference points for the virtual coordinates of all the nodes (the so called beacons), and we assume those to be in place and to stay operational. This is not to say that SVR will stop working in case of beacon failures (in fact we experienced a certain resilience against those events). However, we assume a mechanism to be in place to specify which nodes function as beacons as well as to make sure that a beacon is being replaced eventually in case it fails. This problem is not covered in this thesis, as there exist well established solutions for it, such as the one used in the Logical Coordinates Routing [5]. Our prototype of SVR also does not support changing the number of beacons at run time, a feature that can easily be added. However, all the concerns regarding beacon 48 5. Design maintenance we consider out of the scope of our project, since it opens up a whole new problem space on its own. Questions can be asked about the right number of beacons for a certain network topology as well as the right distribution over the deployment area. A related issue is the one of duplicate coordinates, which can always be resolved by increasing the number of beacons (i.e. of the coordinate space’s dimensionality). One could also think of the fact that it might have advantages to rotate the role of a beacon among the nodes in some manner, although SVR under normal circumstances does not induce an increased burden on the beacon nodes1 . In our implementation we define a fixed-sized set of nodes at compile time to function as beacons. Another mechanism we consider already available is a lookup service for addresses in the network. In case data has to be sent from one node to another, the address of the destination has to be found out first. In our experiments we provided this information to the sending nodes over their serial port (USB connector) after first obtaining it from the destination over the same interface (cf. section 7.4). In general we assume a knowledge base to be existent in the network, which every node can query with a node identifier and obtain more or less up to date information about the corresponding node’s address. Again the design of this knowledge base opens up a whole own set of questions to be explored. Wireless sensor networks are most likely to lack any kind of central management entity that could be used to store the location information and handle requests to it, making a distributed solution seem appropriate. This leads to questions such as how the information should be distributed throughout the network and how it can be accessed efficiently. We do not address these problems in this thesis. However, our evaluation gives hints about the rate such a knowledge base would have to be updated with new addressing information to always contain information recent enough to enable an efficient routing procedure (cf. section 7.2). For location services in wireless sensor networks well known solutions exist [16]. Since SVR is a routing protocol, it only tackles the problem of finding paths for packets from a sender node to a destination and leaves related problems to other specialized protocols. For example we assume the data to be sent by a node to fit into one packet and do not consider packet fragmentation. In very big and/or dense networks with a lot of beacon nodes and a high number of neighbors for each node the beacon message format we designed would likely exceed reasonable packet sizes for a wireless sensor network, but for the testbeds we evaluated SVR on this was not an issue. For really vast networks one has to consider certain optimizations anyway, such as a logical partitioning of the network and locally limited dissemination of information. 5.3 Address Derivation The first aspect we explain in detail is the mechanism of finding addresses for all the nodes in the network. This can be considered the basis every routing protocol 1 Beacons will experience more traffic in case of frequent switching to the fallback and flooding routing modes (cf. section 5.5), usually a result of unfavorable node addresses, which might be due to very low connectivity, and which is likely to be resolved by changing the beacon topology. 5.3. Address Derivation 49 requires to work, since routing decisions are made according to node addresses. This is why a routing protocol always needs some initialization time when it is set up for the nodes to establish their addresses, except if those addresses are not based on certain network conditions but are determined a priori, which however is very rarely practical in wireless sensor networks. In reality WSNs are often deployed for a very long time, rendering the length of the initialization period negligible. In Statistical Vector Routing (SVR) every participating node starts composing its address as soon as it has finished booting. This address is a vector with one component for every beacon in the network. Each component of the vector states the node’s distance to the corresponding beacon. As a metric to measure this distance we use the hop count. This is the minimum number of nodes over which a packet would have to be forwarded to reach the beacon, i.e. the length of the shortest path between the node and the beacon. On this path every hop is a neighbor of the hop in front and behind it on the path. Because we also want to capture short-term changes in the network topology (i.e. also use connections that are only available for a short time), we consider two nodes neighbors if they share a link to communicate with each other for the time being regardless of their history and future prognosis. The correlation between node distance and link quality leads to nodes having neighbor relations with relatively many nodes in their real geographic vicinity and only a few neighbors that are far away. Our goal is to utilize these farther neighbors as often and as long as possible, since they might provide significant shortcuts towards the destination. To establish the address vectors of the nodes in the network, the nodes’ coordinate vector has to be communicated to all direct neighbors such that distance trees can form rooting in the beacon nodes. Figure 2.5 in section 2.4.2 illustrates the procedure CTP uses to establish its routing tree which can be considered the one-dimensional special case of the virtual coordinate system we use. Although the picture conveys the impression the beacon node would start sending out its coordinate (which is always zero) and whichever node receives this message starts forwarding it, in SVR all nodes start sending out beacon messages containing their coordinates from the moment on they start operating. In the beginning all the vector components represent invalid components, because no topology information has been gathered so far, but eventually the data sent out will become more and more useful for the node’s neighbors. The missing distinction between beacon nodes and the rest of the network in their behavior agrees with SVR’s philosophy of considering all nodes equal. A beacon does not behave in a special manner during the address derivation procedure compared to other nodes besides the fact that it always automatically sets one of its coordinate components to zero, namely the one representing the distance to itself. A possible optimization would be for a node not to start participating in the procedure before at least a certain number of its coordinate components has become valid. This would safe some transmissions in the initialization phase of the protocol at the beginning while slowing down the whole process. Furthermore, the initialization time span can be considered insignificant compared to the whole lifetime of the network particularly because if there are sufficient beacon nodes and if they are distributed evenly across the network, it should be the case that every node is reached by messages from its nearest beacon(s) rather soon. 50 5. Design Of course the nodes have to use the information they receive from their neighbors to adjust their own coordinate vector. Every node sends a beacon message2 once every beacon interval, which is fixed in our prototype but could also be adjusted according to the degree of change observed in the network. By means of these messages nodes let their neighbors know about their current coordinates (and automatically also about the fact that they are still operational). The coordinates sent along with the beacon packets were derived based on all the beacon messages the node has received during the last n beacon intervals, n being a predefined (or also adaptive) integer value. Because of these messages the node is aware of its neighbors’ distances to all the beacons and according to this information for every beacon determines its neighbor with the shortest distance to this beacon. This node then becomes its parent in the distance tree of the corresponding beacon node. An important fact to keep in mind is that wireless links are not necessarily symmetric. It might happen that a node receives a beacon message from a neighbor but cannot successfully send packets to this node on its part. That is why the symmetry of the link to a neighbor has to be checked by a node before using the coordinates of this neighbor for the derivation of the own coordinates. This could be done reactively on demand with a probing message in the instant the information is needed, or (as we do) it can be a feature integrated into the beacon messages as well. In SVR nodes attach the information from which neighbors they recently received something into their own beacon messages. In this way, a node directly detects if its messages have been received by a neighbor before. In this case the just received beacon message completes the symmetry check, since packets were evidently able to be sent in both directions. The sender of the message would then be declared a valid neighbor and could henceforth be used to adjust the node’s coordinates. It also means that packets can be routed over this neighbor (cf. section 5.5). Of course, this mechanism also enables a node to detect if its messages are not received anymore by a certain neighbor. This would manifest itself in the neighbor’s beacon messages not containing the information about received beacon messages from this node anymore (or in the beacon messages not even arriving anymore). In this case the node would stop using this neighbor for deriving coordinates and routing packets even though it might have valid information about its coordinates. Another information a node has to include in its beacon messages to avoid loops in the distance trees of the beacons is the traces of its parents, i.e. the nodes directly above it in the beacons’ distance trees. Every node keeps a trace for every beacon node in the network, which contains the path from the corresponding beacon to the node itself. To save memory on the node and to keep the beacon message packets small the traces’ length can be bounded, which leads to traces containing only the last part of the path from the beacon to the node, i.e. the t hops nearest to the node itself. In case of unbounded traces the mechanism of providing this information (the so called source beaconing) can prevent loops in the distance trees or in case of bounded traces at least make them very unlikely. If a node receives a beacon message and finds itself in the trace corresponding to a certain vector component, it must not use this neighbor’s information to adjust its own coordinates in this component. Otherwise this would produce a loop, since 2 The exact format of our beacon messages can be seen in section 6.2. 5.3. Address Derivation 51 1 0 1 2 1 B 2 3 0 3 2 B 2 3 - 0 3 4 1 B 2 3 - 0 5 4 1 B 2 3 - 0 5 6 1 B 2 3 time - 1 Figure 5.2 Illustration of the count to infinity problem: The node B is a beacon, the others (1, 2, 3) are part of its distance tree. Every row of the picture shows the network topology at one point in time. the node would then appear twice in the trace and thereby twice in the path to the beacon node. If a node is already in a trace of one of its neighbors, it appears that it must have a higher distance to the corresponding beacon node, since the path of the neighbor to the beacon leads over the node itself. However, sudden node failures on the way to a beacon in a sparse network can lead to a count to infinity phenomenon, which source beaconing can help to avoid. This is illustrated in Figure 5.2: As long as the beacon node is connected and operational (first row), the coordinates of the other nodes are 1 for node 1 and node 2, and 2 for node 3 as to be expected. In case the beacon fails (second row), node 1 correctly declares its coordinate as invalid, since it does not have any connection to the beacon anymore. However, the other two nodes do not notice this, as they always find a neighbor in each other with a valid coordinate for this beacon. They use this neighbor (since they have no other neighbors) to calculate their own coordinate. Node 2 publishes the coordinate 1 while receiving a 2 from node 3. Hence, it assumes that there exists a route of length two from node 3 to B, which is the best route known to node 2. Therefore, it declares node 3 its new parent and sets its own new coordinate to the one of node 3 plus one, i.e. 3. Node 3 on the other hand performs the same. Node 2’s coordinate (i.e. 1 from the time of the first row) is the minimum coordinate it receives. Thus, node 2 becomes node 3’s parent and node 3’s coordinate becomes the one of node 2 plus one, i.e. 2. This procedure continues, the nodes always pick each other as parents towards beacon B and thereby both increase their coordinates indefinitely. The faulty parent selection has led to a loop in the distance tree comprising the nodes 2 and 3. If source beaconing had been performed, node 2 had found itself in the trace sent by node 3 and would have refrained from electing node 3 as its new parent. This would have led to invalid coordinates for both nodes, which would have been consistent with the topology. 52 5. Design Notice that we do not use any explicit link estimation algorithm whatsoever. Every time a node receives beacon messages from one of its neighbors, this neighbor is considered alive and available. In case the symmetry check described above has succeeded as well, the connection to the respective neighbor is recognized, and it is immediately considered a potential parent in the beacon nodes’ distance trees and a potential next hop on a route for a packet having to be forwarded by the node. The idea is that because of the relatively frequent beacon message exchange a node will get a fairly precise knowledge about which of its neighbors it can communicate with reliably, not by analysing the history of the connection as a long-term link estimator would do but by now and then implicitly checking if the neighbor is still there. 5.4 Statistical Addresses The problem caused by the frequent refresh of the neighbors’ state is that because of the potential instability of wireless links the neighbor relations between the nodes tend to change quite often leading to the nodes’ addresses also not remaining the same over a longer time. A node could discover that one of its neighbors has changed its coordinates (or a new neighbor comes up) and now provides a shorter path to a beacon (i.e. has a smaller corresponding vector component) than the node’s former parent for this beacon, hence this neighbor would become the new parent. Another time one of the node’s parents could become silent (for a while) forcing the node to elect another neighbor as parent, which might have a higher distance to the beacon than the former parent used to have. These events lead to nodes changing their coordinates, and they appear quite frequently as can be seen in section 4.3. If the nodes issued an update to the address database every time even one component of their coordinates change, the resulting overhead would be immense negating all positive influence this very adaptive algorithm has on routing performance. Because of this we need to introduce a mechanism to reduce this overhead, since we do not want to give up the positive influence. We are confident that, although nodes in a certain vicinity have to be aware of these frequent coordinate updates (to calculate their own current coordinates), it is not necessary for a good routing performance to communicate them to the whole network. Therefore, in the following we introduce a way to produce condensed addressing information for the nodes, which is more stable and still precise enough to ensure the successful routing of packets, namely a representation of the coordinate distribution over time. To smooth the development of the coordinates thereby avoiding the rapid changes illustrated in section 4.3, in SVR the nodes calculate certain statistical values based on the coordinates and publish those in the network instead of their actual current coordinates. The idea is that the nodes shall learn from the dynamics inside the network. Not the information about the network topology in one instant is important but a less detailed view on the network developed over time. Nodes have to notice coordinate changes in their vicinity (as they will because of the periodic beacon messages), adjust their own coordinates accordingly and let their neighbors know about them. Additionally, they have to keep track of their past coordinates and calculate a distribution of their coordinates over a defined time interval. Figure 5.3 illustrates an example of this concept: 5.4. Statistical Addresses 53 B a c N d propability b beacon 0 1 2 3 4 hop count Figure 5.3 Distributions as addresses in SVR: The function graph depicts the coordinate of node N , i.e. the probability distribution for its possible hop distances to the beacon node B. a, b, c, and d are nodes on possible routes from B to N . Straight lines signify connections with a high stable quality, while dotted lines represent unstable links that are available only from time to time. The distribution of the coordinate of node N can be seen in the function graph inside the figure. The node never has the coordinate 0, since it is not a beacon. Coordinate 1 is rare, it happens only if the unstable direct connection to B can be used. A hop count of 2 between B and N is a bit more common. There are two possible paths for that case, B, a, N and B, b, N , depending on which unstable connection is temporarily available. Most of the time N ’s coordinate will be 3 as there is a very stable path of this length from B to N (i.e. B, a, b, N ), which is supposed to be used every time no shortcut is available. A distance of 4 to B is also very unlikely. It can only be the case if the connection between b and N breaks, which is considered stable. Even then, if at the same time the unstable connection between B and b was also available, N ’s coordinate would be 3 once more. Finally, the unstable connection between a and d might add some probability to the hop counts of 3 and 4. Hence, one can imagine, over time the depicted coordinate distribution might manifest. We assume that the current coordinates of a node will always change relatively often depending on the dynamics in the network topology, since we chose a very greedy approach for the coordinate establishment, but the distribution of these coordinates will stabilize eventually. The ideal case would be that at some point in time the statistical values calculated from a node’s current coordinates stop changing at all because the changes in the network topology were more or less recurrent and the topology itself essentially stayed similar all the time. Only if substantial changes happen regarding node connectivity such as the breakdown of big parts of the network, also the condensed statistical values would change (and should change). They however are supposed to provide an abstraction from all the small, maybe temporary, changes that frequently happen in a network. 54 5. Design The coordinate distributions every node calculates over time and adjusts when its coordinates change have to be published by the nodes in the network instead of their transient current coordinates. We do not specifically cover the latter issue in this thesis but assume that every node updates this information in the knowledge base from time to time and has access to the address information about every other node. The key question is, however, how to represent this distribution in an efficient way regarding costs for its creation, storage and distribution, keeping in mind the inherent tradeoff between those costs and the level of precision the information provides. The most precise way would be to always store and communicate a node’s whole history of coordinates, which of course is impractical. A way to tackle this tradeoff comes from the fact that not every node in the network needs the same information (i.e. the same level of precision) to route a packet towards its destination. Section 5.5 covers how the routing itself is done in SVR. In our prototype implementation all the nodes use their mean coordinates to represent their coordinate distribution, i.e. an (unweighted) moving average of the current coordinates over a certain time interval. We found that this is a very simple solution, which still suffices to realize acceptable routing performance. However, we are confident more sophisticated representations will lead to an even more efficient routing behavior. Consider a node p with current coordinate vector < p1 , . . . , pr > where r is the number of beacons that form the virtual coordinate system and pi is the distance of node p to beacon number i. Its mean coordinates then comprise < p̄1 , . . . , p̄r >, p̄i being the mean distance of p and beacon i, defined as n 1X pi,j , p̄i = n j=1 (5.1) where pi,j is the jth value in the history kept on the node for the component pi and n is the size of this history. Every node recalculates this statistical information periodically to update the published values in the knowledge base. In our prototype this is done once every time a beacon message is sent out, because the information is included in these messages to make it directly available to the node’s neighbors, which need them for routing (cf. section 5.5). The calculation of the distribution can be realized efficiently and aided on the fly during the beacon interval. For details of the calculations see section 6.4. 5.5 Routing After the participants of the network have successfully derived their coordinates, data packets can be sent through the network guided by these addresses. In routing protocols based on virtual coordinates this usually means that nodes select the neighbor with coordinates most similar to those of the destination as next hop on a route to it, because nodes with similar coordinates are likely to be located in vicinity of each other. In SVR nodes do not route on the actual current addresses of the nodes in the network but on the statistical values they can query from the knowledge 5.5. Routing 55 2,1,2 n b0 0,2,2 4 3,3,1 S 2,0,1 1,1,1 n n b1 1,3,2 n 2,2,0 b2 4 D 2,2,2 5 2,2,2 n 3,1,1 Figure 5.4 Illustration of the sum distance: b0, b1, and b2 are beacon nodes, S is a sender, D a destination, and n denotes other nodes. Every node is annotated with its current address vector, perfect links assumed. The sum distance between S and D is calculated by deriving the number of hops from S to a beacon and from there to D and averaging this over all beacons. In this example it would be: [(3 + 1) + (2 + 3) + (1 + 3)]/3 = (4 + 5 + 4)/3 = 13/3 ≈ 4.3 base, i.e. a representation of the destination’s coordinate distribution, which in our implementation is its mean coordinates. Again this leads to a tradeoff, namely between the complexity of the calculations during the routing decision and its costs. An increased effort spent to derive the best neighbor to forward a packet to can lead to a better routing performance, since nodes could consider additional properties of their neighbors besides the similarity of address vectors, such as the neighbor’s coordinate stability. However, even the similarity of the vectors can be exchanged as a metric, as it is supposed to measure the distance between the nodes the vectors belong to. Another example for a metric to measure this would be the sum distance illustrated in Figure 5.4. A way to combine different routing metrics is for nodes to use them depending on their distance to the destination. If it is still far away, simple mean coordinates can already guide a packet in the right direction towards the destination. As the packet gets closer, nodes having to make a routing decision can try to derive the actual current coordinates of the destination, i.e. its current location on its coordinate distribution. Our assumption is that nodes being located in vicinity of each other experience a similar shift in their coordinates whenever changes in the network topology occur, since they affect all those nodes in a similar way. Hence, a node might be able to reason the destination’s current coordinates via assuming a similar location shift on the destination’s coordinate distribution as the node itself experienced on its own distribution. Finally, since nodes are aware of the mean coordinates of their direct neighbors, a neighbor can be identified as the packet’s destination. If that is the case, clearly, it should be forwarded directly to this neighbor. As our prototype strives for simplicity regarding the routing metric and because optimizing the routing performance was not our primary concern, we refrained from implementing the second mechanism for the time being. A node always calculates the 56 5. Design difference (i.e. absolute distance) of its neighbors’ mean address vectors to the one of the destination and forwards a packet to the neighbor minimizing this difference. If p̄i represents the mean distance of neighbor p to beacon i calculated as shown at the end of section 5.4 and Ck (d) is the set of the k closest beacons to d, the vector difference δk between the mean address vector of a possible next hop p and that of the destination d considering the k closest beacons is defined as:3 ¯ = δk (p̄, d) X |p̄i − d¯i | (5.2) i∈Ck (d) Of course, if one of the direct neighbors of a node happens to be the packet’s destination, then it is delivered directly. We included the mean coordinates into the beacon messages such that every node is always aware of its neighbors coordinates. Additionally to this normal routing mode SVR has a fallback mode as the one used by the Beacon Vector Routing (BVR) protocol, which is described in detail in section 3.1.1. If no suitable neighbor is found to bring the packet closer to the destination or if the suitable neighbors are not reachable (anymore), the packet is forwarded towards the beacon closest to the destination. If it reaches this beacon, a scoped flooding is initiated, where the scope is determined by the component of the destination’s address vector corresponding to the sending beacon node, since this represents the distance between destination and beacon and can therefore serve as an orientation for the number of hops necessary to reach it. 3 k is another parameter, which can be adjusted to save space and computing time in case there exist a lot of beacon nodes in the network. However, our implementation uses all available beacons. 6 Implementation This chapter provides an overview on important implementation details of our prototype for the Statistical Vector Routing (SVR) protocol. First, we explain for which platform it has been developed and give an insight into the implementation of the Beacon Vector Routing (BVR) protocol we used as a basis. Then, we present the data structures we introduced to store the necessary information on the node and to communicate this information to other nodes in the network. Last, we point out how we use these data structures to realize the address derivation procedure we designed (i.e. the way a node keeps track of its neighbors), how we calculate the statistical values to be published in the network (i.e. the coordinate distributions), and how routing decisions are made. 6.1 Prerequisites We developed our prototype for wireless sensor networks (WSNs), although we believe that it can be implemented to support other wireless technologies, such as 802.11-based wireless mesh networks, as well. The network is expected to exhibit common size and density properties as they can be found in most available testbeds. Our testing and evaluation took place on the MoteLab [21], Indriya [6], and TWIST [11] testbeds (for details about the hardware used in these testbeds see section 4.1). Our prototype was implemented for TinyOS 2.x [14], a widely used operating system for sensor nodes. It supports component-based applications and an event-driven programming approach. One can write software in a C dialect named nesC [9] and use system components provided by TinyOS, such as timer and radio abstractions of the real node hardware, to write one’s own components. Every component is supposed to handle a specific aspect of the whole application and can provide methods to be used by other components. To avoid having to implement everything from scratch we used a prototype of BVR [8] and its infrastructure and exchanged the modules responsible for keeping the 58 6. Implementation test application receive data send data send data router respond next hops query next hops receive data send packet comm stack receive beacon receive packet hardware radio send beacon query next hops state respond next hops coordinate table Figure 6.1 Structure of the relevant part of BVR: The test application component sends data over the router component, which queries the state component for a route before it hands over the data to the communication stack component. The state component delegates the task of finding suitable neighbors for routing to the coordinate table component and uses the communication stack itself to send beacon messages. protocol-relevant state on the node. A more detailed view on the general design of BVR can be found in the sections 3.1.1 and 3.2.1. Here we give a brief explanation of the structure of the implementation we had at our disposal. Figure 6.1 shows an illustration: A test application component is used to produce data packets which are supposed to be sent to some node in the network. It thereby simulates the activity of a real application being executed on the sensor node. This application component uses a router component to forward its data packets. It just calls a certain method provided by the router, hands over the packet, and the router component takes care of the delivery. The router also handles data packets that are received by the node. In case that this node is the packet’s destination it is handed over to the application, otherwise it is forwarded to the appropriate neighbor of the current node. Another very important component is the state component, which handles the exchange of beacon messages between the nodes and the resulting address and neighbor maintenance. This component can be queried for the node’s current address and for a subset of its neighbors which are suited as possible next hops on the way to a packet’s destination. To achieve this the state component uses a coordinate table component which stores the necessary neighbor information. The router as well as the state component both use the communication stack component to send data or beacon packets, respectively. This component has several subcomponents organized as a stack which realize a message queue enabling several components to send packets without hindering each other. Furthermore, it prepares 6.2. Data Structures 59 the packets to be used by BVR’s passive link estimator, which needs an additional packet header with another sequence number to measure packet loss. 6.2 Data Structures One of the most significant differences between BVR and SVR is the latter defines two kinds of coordinates for every node – its current coordinates and its mean coordinates. Both are represented in a Coordinates data structure, a one-byte integer array with a cell for every beacon in the network. On the one hand, the current coordinates are the ones being derived by the exchange of bacon messages, i.e. they represent the node’s distance from all the beacons at one point in time. These values are potentially updated with every beacon message that the node receives1 . A detailed description of the address derivation and neighbor maintenance mechanisms can be found in section 6.3. The mean coordinates on the other hand are our chosen representation of the coordinate distribution a node publishes in the network. They are calculated by averaging a history of the current coordinates over the last beacon intervals. Details about this process are given in section 6.4. There is also a list to keep track of the current parents of the node. Only the parents’ nodeIDs are stored here with which every node identifies itself and which are usually assigned to the nodes a priori. Detailed information about the nodes corresponding to the IDs are stored in the coordinate table, which is maintained by a separate component. This coordinate table component provides methods to store, find, update and remove entries, which contain all the information about the node’s neighbors that are necessary for the functioning of the protocol. Listing 6.1 shows the structure of a coordinate table entry containing the following data fields: • valid: This field states if the entry is considered valid. It is used to mark free entries. If an entry is being evicted, only its valid field is set to zero, and in case a new neighbor has to be stored an invalid entry is searched for. • symmetric: This field contains a 1 if and only if the corresponding node is a symmetric neighbor. Symmetric neighbors are known to receive beacon messages from the node on which the coordinate table is stored and also successfully send beacon messages back on their part. • first hop: The value stored here is the same as in addr. It was kept to ensure compatibility with BVR, that somehow supports 2-hop neighbors or plans to support them in the future. • last seqno: This is the sequence number of the last beacon packet successfully received from this neighbor. It is used to filter out duplicate messages. • addr: Here the nodeID of the node corresponding to this entry can be found. • current coords: This field contains the current coordinates of the neighbor. They are used to determine the current coordinates of the node to which the coordinate table belongs. 1 Although the current coordinates assigned to the node at the end of the last beacon interval are stored as well (the so called last coordinates). 60 6. Implementation typedef n x s t r u c t { nx uint8 t valid ; n x u i n t 8 t symmetric ; nx uint16 t first hop ; nx uint16 t last seqno ; n x u i n t 1 6 t addr ; Coordinates current coords ; C o o r d i n a t e s mean coords ; Trace t r a c e s [N ROOT BEACONS ] ; nx uint8 t quality ; n x u i n t 8 t age ; n x u i n t 8 t pos ; } CoordinateTableEntry ; Listing 6.1: CoordinateTableEntry: An entry in the coordinate table contains all the necessary information about a node’s neighbor. • mean coords: These are the mean coordinates of the neighbor. They are used to estimate the neighbor’s distance to the destination of a packet. • traces: The path a beacon packet took from the beacon node itself to this neighbor (or at least the last part of this path) has to be kept here to avoid loops in the topology. The Trace data structure is a one-byte integer array of a fixed length to be configured at compile time. • quality: This field was used by BVR to store the long-term quality of the link to this neighbor. In SVR it is unused, i.e. all neighbor nodes are assigned the highest quality value possible, but still kept in the data structure to ensure compatibility. • age: Here the age of the stored information is kept, i.e. the number of beacon intervals that no beacon message has been received from this neighbor. • pos: This is supposed to represent an ordering on the table entries. Another field that SVR does not use but BVR might. To interact with their direct neighbors for the address derivation mechanism beacon message packets are exchanged between the nodes. The concrete format of these messages is shown in Listing 6.2. SVRBeaconMsg is the type of the message being transmitted, which contains an additional header (in the header field) used by BVR’s link estimator. Here every packet is tagged with a sequence number that enables the neighbors’ link estimators to detect packet loss easily and thereby calculate the link’s packet reception rate (PRR). Although this header is not used anymore, since SVR does not perform link estimation, we kept the structure for compatibility to the rest of BVR’s infrastructure. The real information the node’s neighbors need for the functioning of SVR is included in the type data field and therefore in the type SVRBeaconMsgData with the following fields: 6.2. Data Structures 61 typedef n x s t r u c t SVRBeaconMsgData { nx uint8 t sender ; nx uint16 t seq no ; Coordinates current coords ; C o o r d i n a t e s mean coords ; nx uint8 t number of neighbors ; n x u i n t 8 t n e i g h b o r s i d s [MAX NEIGHBORS ] ; Trace t r a c e s [N ROOT BEACONS ] ; } SVRBeaconMsgData ; typedef n x s t r u c t { nx uint16 t last hop ; n x u i n t 1 6 t seqno ; } LEHeader ; typedef n x s t r u c t SVRBeaconMsg { LEHeader header ; SVRBeaconMsgData t y p e d a t a ; } SVRBeaconMsg ; Listing 6.2: SVRBeaconMsg: This is the beacon message format used by SVR; LEHeader: BVR’s link estimator uses this header to calculate PRRs; SVRBeaconMsgData: The actual payload of SVR’s beacon message presents this structure. • sender: This field contains the nodeID of the node the beacon message came from. • seq no: This is the sequence number of the beacon message. It is incremented on every node in every beacon interval to enable nodes to detect duplicate messages and missed beacons. • current coords: Here the current coordinates of the node are transmitted to its neighbors to participate in the address derivation process. • mean coords: The node’s mean coordinates are transmitted as well for its neighbors to determine the node’s distance to packet destinations. • number of neighbors: In this field the number of the node’s neighbors is stored, only to help receiving nodes decoding the message, as it determines the length of the neighbors ids field and thereby the starting position of the traces field inside the packet, which otherwise would be cumbersome to calculate. • neighbors ids: This is a list of all the node’s neighbors, meaning not only the ones with symmetric links. It is used for the link symmetry check, since it contains the nodeIDs of all the neighbors from which the node received beacons during a defined number of the last beacon intervals. • traces: To avoid loops in the beacon nodes’ distance trees the path from the beacon node to the node sending this beacon has to be transmitted. The packet contains one trace for every beacon node there is in the network. 62 6. Implementation typedef n x s t r u c t { n x u i n t 1 6 t n e x t h o p s [MAX NEXT HOPS + MAX FALLBACK HOPS ] ; n x u i n t 1 6 t d i s t a n c e s [MAX NEXT HOPS ] ; nx uint8 t n ; nx uint8 t f ; nx uint8 t index ; } nextHopInfo ; Listing 6.3: nextHopInfo: This represents a collection of neighbors that have been determined as possible next hops for a packet to be forwarded When the router component has to forward a packet and therefore has to determine which of the node’s neighbors would qualify as a next hop on the route, this task is delegated to the coordinate table component. This component determines the best choices according to the neighbor information in the coordinate table. This process is explained in more detail in section 6.5. At the end a list of nodes is returned collected in a nextHopInfo structure presented in Listing 6.3 and comprising the following data fields: • next hops: This is a list of possible next hops for the packet ordered by distance, i.e. the MAX NEXT HOPS neighbors closest to the packet’s destination. At the end there are also the next hops for a possible routing in fallback mode, which in our current implementation not more than one entry. • distances: The list of distances to the destination of the nodes collected in next hops is given here. • n: This field stores the number of possible next hops in the first field, which are to be used for the normal (i.e. not fallback) routing mode. • f: Here is the number of fallback routing options at the end of next hops. • index: When routing a packet the entries of next hops are tried one by one. This field keeps track of the number of the entry, which is currently processed. 6.3 Neighbor Maintenance The main component responsible for the maintenance of neighbor information is the coordinate table component. It basically manages an array of coordinate table entry instances (cf. section 6.2). Since applications developed for TinyOS always follow an event-driven approach, every action has to be triggered by an event. In case of SVR the important events are the arrival of a beacon message packet and the firing2 of the timer that represents the beacon interval, the BeaconTimer. The effects of these events are detailed here: 2 In TinyOS an instance t of the timer component can be started once or periodic set to a certain time period. When this period is over (in the periodic case: every time it is over), the t.fired() event is triggered and the corresponding event handler function is executed. 6.3. Neighbor Maintenance message received m3: maybe update 63 current coordinates m1: receive beacon m2: store or update neighbor entry comm stack t3: maybe update t6: send beacon coordinate table t5: provide beacon info t2: clean table, age entries beacon timer t4: always update mean coordinates t1: restart timer Figure 6.2 Effects of the two main events in SVR: If a beacon message is received over the communication stack component, the effects m1, m2 and m3 happen and in case the BeaconTimer fires t1 - t6 are performed. When a message is being received during a beacon interval, the information from the node’s neighbor contained in the message has to be processed directly, as there might not be enough space on the node to buffer several messages. An entry for the sender node is searched for in the coordinate table and is updated or stored with the message’s content. Also the age field of the entry (regardless if new or old) is (re)set to zero. The details of the aging process are explained further below as this is used when the BeaconTimer fires. After updating the coordinate table it has to be determined if the new information the message provided has to lead to a change in the node’s coordinates. If the newly discovered neighbor (in case it did not exist in the coordinate table) or the known neighbor with new coordinates (in case it already existed in the table) presents a better parent than one of the current ones, it becomes the new parent in the distance tree of the corresponding beacon node, and the current coordinates of the node are adjusted accordingly. However, the neighbor is only considered for the coordinate update if it does not create a loop in the topology and has a symmetric connection to the receiving node. The former is achieved by making sure that the receiving node’s nodeID is not part of the trace included in the message, which is associated with the beacon in whose tree the neighbor would become the receiving node’s parent. For the latter the symmetric field in the coordinate table entry has to be checked, which contains a one if the receiving node’s ID was found in the neighbor list of some beacon message received by the neighbor in the last beacon intervals (also if it was received just now in this very message). Asymmetric neighbors are still kept in the coordinate table but are not considered as potential parents. The only case the neighbor information is not included in the table is if a new entry would have to be created and the table is full. Our implementation does not have any neighbor eviction strategy yet except the aging process described below. In our preliminary tests however the table very rarely reached its capacity. 64 6. Implementation When the BeaconTimer fires and thereby determines the end of a beacon interval the following actions are performed: First, the timer is restarted to begin the next beacon interval3 . Next, the coordinate table is cleaned, which means that all entries with a certain age are evicted. This ensures that information of nodes which stay silent for a long time (e.g. due to node failure) does not have effect on the node’s current coordinates and does not block precious space in the coordinate table for long. Our prototype evicts neighbor entries after not having received beacon messages from them over three consecutive beacon intervals. After the table has been cleaned, the age of all the entries that remain in the coordinate table is increased. The value of the age field is to be understood as the number of beacon intervals that the corresponding neighbor was not heard of. Therefore it is incremented by one every time a beacon interval ends and reset to zero in the event a beacon message from the corresponding neighbor is being received. After making sure that the coordinate table does not contain outdated information, it is checked if the current coordinates have to be adjusted according to the table’s new (clean) content. Although the coordinates are potentially updated with every message received during a beacon interval, it may be the case that one of the node’s parents has just been evicted from the table due to its age which would then call for coordinate reconciliation. Still the online update of the coordinates with every beacon message received makes sense, since it leads to the coordinates always having a very up to date state on the node which becomes even more important when longer beacon intervals are configured. However, the state of the current coordinates at the end of a beacon interval is always kept as well (in the last coords) in case they are needed. With the updated current coordinates the mean coordinates can now also be updated, which is explained in detail in section (6.4). At the end the sequence number for the beacons is incremented, the beacon message packet is filled with the necessary information and is handed over to the communication stack component to be sent. Notice that the neighbors ids field in the beacon message is filled with information from all coordinate table entries, not only those of symmetric neighbors, since it is supposed to list all neighbors from which the node sending the beacon message received something regardless if that node has ever received anything itself. 6.4 Coordinate Distributions Every time a beacon interval ends (i.e. the BeaconTimer fires) the current coordinates of the node are updated (cf. section 6.3). This is also the time the node’s mean coordinates are potentially updated according to the new current coordinates. Every node stores a history of its current coordinates during the last beacon intervals. Our prototype keeps the last 30 coordinates which, assuming a beacon interval length of ten seconds, leads to a history size of five minutes4 . The history is a ring buffer of Coordinates (cf. section 6.2), which is appended with the current values of the node’s 3 Even though the interval length stays constant (ten seconds in our prototype), a random jitter is always added to it to help avoiding beacon message collisions in the network. 4 We justify all these parameters in section 7.2. 6.5. Routing Decisions 65 current coordinates at the end of each beacon interval, regardless if the coordinates have been changed during this interval or not. Additional to the history we also keep a representation of the coordinates’ distribution on the node. This is not necessarily needed to calculate the node’s mean coordinates, as they can be derived directly from the history as well. However, as we will see in section 7.1, it needs only a little memory space (240 bytes in our prototype) and simplifies the mean coordinate calculation. It is an array of two-byte integers for every beacon in the network sized from zero to some maximum number of hops (20 in our implementation). Each cell holds the number of times this hop count appears in the history of the node’s current coordinates. Every beacon interval the values in the cells representing the old coordinates from the last interval are decremented and the cells for the new coordinates are incremented. This procedure assures a very low overhead for keeping this data structure up to date. Furthermore, there exists an array of two-byte integers, one cell for every beacon, counting the number of invalid coordinates, as they do not belong in the coordinate distribution but are still important to keep track of, as a very high number of invalid coordinates attests a very bad connectivity to the node. Now to calculate the node’s mean coordinates one could use the history or (as we do) the distribution table. This is possible because we use a simple unweighted moving average for our mean coordinates. The hop distance to the beacons in each of the past beacon intervals until the end of the history length is considered with an equal impact on the result. Later versions of our protocol might use a more sophisticated way to calculate maybe even totally different statistical values to be published in the network. 6.5 Routing Decisions Whenever SVR’s router component has to send a data packet with a certain destination, it needs a next hop, i.e. a neighbor of the node to forward the packet to5 . In case the packet’s destination can be found among the node’s direct neighbors, the packet shall be forwarded there, of course. Otherwise the neighbor closest to the destination is to be determined, meaning the one whose address vector is the most similar to the one of the destination. The state component is queried to elect this next hop, which first delegates the task to the coordinate table component. Here a size-limited list of symmetric neighbors is generated, in ascending order of their distance to the destination. This distance can be defined in several ways, and in fact there are different distance metrics implemented in our prototype to choose from. However, in our tests and experiments we used the simple absolute vector difference between the destination’s and the neighbor’s address vector6 . For the distance calculations only coordinate components that are valid in the destination’s and the neighbor’s mean coordinates are used, the others are ignored. The absolute difference of all components is calculated and summed up leading to a distance value for every neighbor in the coordinate table. 5 The beacon packets do not need a destination, as they are always broadcasted and not forwarded by other nodes. 6 For all these calculations the mean coordinates are used, as those are the coordinates we assume available in the network to route on. Their derivation formula can be found in section 5.4 66 6. Implementation Only neighbors underbidding the minimal distance to the destination the packet has reached so far are added to the list of possible next hops, i.e. all neighbors that can provide a routing progress. To use the packet’s real minimal distance it would have to be carried in the packet itself. To avoid this we use an upper bound for it, namely the distance of the current forwarding node. Theoretically this would guarantee a distance progress in every routing step assuming every node on the way finds a suitable next hop that is closer to the destination than itself. This might fail in case a node does not find such neighbors in its table or the packet does not reach any of them. If this happens SVR switches to the fallback routing mode. The fallback mode is the same as in the Beacon Vector Routing (BVR) protocol, which is described in detail in section 3.1.1. The next hop neighbors for the normal routing mode and for the fallback mode are aligned in the same list, in the nextHopInfo structure described in section 6.2, which is returned by the coordinate table over the state component to the router. It is tried to forward the packet to the nodes in this list from the top on. Every time a certain number of (re)transmissions to a node in the list fail the next entry is tried. At the end of the list there is at most one fallback entry, namely the parent of the distance tree of the beacon closest to the destination. In case this parent does not exist (meaning the forwarding node does not have a valid component for this beacon) or in case this parent does not receive the packet either, the sending of the packet fails as both routing modes did not find a next hop. A possible improvement would be to alternatively forward the packet to the beacon second closest to the destination, the third closest and so on and to finally resort to flooding the packet in hope that any of the node’s neighbors will have better neighbors in its coordinate table. However, aside from the fact that single data packets in WSNs are hardly that important to justify the resulting overhead in additional transmissions, this event that both routing modes fail to forward the packet is very unlikely. The forwarding node would have to be already very close to the destination, still not a direct neighbor and must have no nodes in its coordinate table which are closer to the destination than itself. Even then it would also need to have a bad connectivity to lack the one parent it needs to forward the packet to in fallback mode. In our experiments this practically never happened, as we always gave the protocol some time in the beginning to establish the coordinates of all the nodes before we started sending data packets. Normally the fallback routine leads to the packet being forwarded towards the beacon closest to the destination. If a node on this way finds a neighbor that can deliver the packet in normal mode, the protocol switches back to this mode. However, if the packet eventually reaches the beacon it aimed for, this beacon initiates a scoped flooding, which is supposed to deliver the packet assuming that the nodes are connected. 7 Evaluation In this chapter we present the results of our evaluation. First we give an overview over the memory that is needed for the data structures we introduced (cf. section 6.2) and explain which parameters exist to influence the functioning of our Statistical Vector Routing (SVR) prototype and the values we chose for them. The next part refers to section 4.3 of our analysis and compares the dynamics induced by the ephemeral current coordinates shown there with the ones of our newly introduced mean coordinates (cf. section 5.4). We also compare our results to the ones we obtained with the original Beacon Vector Routing (BVR) protocol implementation to see if we benefit from our abandonment of traditional link estimation. Furthermore, we compare SVR’s and BVR’s routing performance. Section 4.1 explains the testbeds we used for this and the experiments we conducted in detail. 7.1 Memory Requirements Certain parameters influence the behavior of our Statistical Vector Routing protocol. Since they all represent some limit of a data structure size, increasing them always costs memory. Since most of the costs are dependent on more than one parameter at the same time, we give an insight into how much memory it costs (on the nodes and in the beacon messages) to increase each value assuming the others stay the same. We identify the following parameters: • history size (h): This influences the length of the time period from which the current coordinates are taken to derive the nodes’ address distribution, i.e. its mean coordinates. • distribution size (d): This denotes the maximum hop distance that can be recorded in the structure, which stores the coordinate distribution. 68 7. Evaluation • neighbor table size (n): This is the highest number of neighbors a node can be aware of and have information about at the same time. • trace length (t): This is how much of the path from the node to the beacon is stored for the source beaconing during the address derivation process. • number of beacon nodes (b): This states how many beacons there are in the network, i.e. the dimensionality of the virtual coordinate system. The history size (h) parameter has the least influence on the memory footprint. It is used to calculate the coordinate distribution and the mean coordinates. Since the history stores coordinates, its entries’ size depends on b. Aside from the history array itself there is an array of pointers to access its entries. On our sensor nodes pointers need two bytes of memory space, hence the overall impact on the memory consumption of a transition from h to h + 1 is b + 2. Since the history is only transmitted to other nodes in form of the mean coordinates, the history size does not affect the beacon messages’ packet size. The distribution structure, whose size is determined by the distribution size (d) parameter, keeps track of the hop count frequencies of the nodes. It only takes up two bytes for each possible hop count. However, there is one such distribution structure for each beacon node in the network. Thus, a change from d to d + 1 leads to an increase in memory of 2b bytes, yet it does not have any impact on the size of the beacon packets. This structure contains redundant information, which is also stored in the history itself. If the explicit coordinates are not needed but their frequencies suffice, the distribution structure could provide the necessary information on its own. Omitting the history would save a significant amount of memory. A change in the neighbor table size (n) has a major effect on the memory footprint, since it increases the number of Coordinate Table Entries (CTEs) on the nodes. A CTE takes up 11 + 2b + bt bytes of memory: two coordinates (2b), a trace for every beacon (tb) and 11 bytes of static information. The details about the format of the CTEs can be found in Listing 6.1 in section 6.2. Furthermore, the beacon packet size increases with more neighbors as well (only 1 byte for every n), because all nodeIDs of all entries in the coordinate table have to be included here. Increasing the trace length (t) leads to more memory consumption on the node as well as in the beacon packets. On the nodes the size of the entries in the coordinate table (CTEs) changes, since they all contain a trace for every of the corresponding neighbor’s parents. Additionally, if a neighbor becomes a node’s new parent, one of that traces has to be extended with its nodeID and henceforth to be sent around in the beacon messages from that node. Hence, increasing the trace length from t to t + 1 leads to the memory footprint increasing by nb, as there are n CTEs and each CTE stores b traces, which get larger. The size of the beacon packets grows by b bytes, one byte for each trace in the packet. The most memory growth is caused by an increase in the number of beacon nodes (b), since it affects the size of every data structure that stores coordinates. In total we have a memory increase of 8 + 2d + h + tn + 2n when changing b to b + 1, because we have to store four coordinates in the router and the state component (4), the distribution structure becomes longer (another array and a bigger array for invalid 7.2. Finding Parameter Values parameter h significance formula node history size d distribution size n neighbor table size t trace length b number of beacons Table 7.1 Formulas for the 69 formula packet increase node increase packet b+2 – 8 0 2b – 12 0 11+2b+bt 1 53 1 nb b 180 6 8+2d+h+ t + 2 288 7 tn + 2n memory usage increase for the different parameters coordinates, hence 2d+2), we need to store an additional parent nodeID (2), a longer components in the history (h), and in every CTE the current and mean coordinate both become longer (2n) as well as every entry gets an additional trace (tn). In the beacon packets the traces and both coordinates are extended resulting in a size increase of t + 2. Table 7.1 summarizes the formulas to calculate the memory usage increase and the actual values a parameter change would cause in our prototype. 7.2 Finding Parameter Values The parameters mentioned in the section above all not only influence SVR’s memory footprint but also have an impact on its functioning. We had to decide on appropriate values for them to conduct our experiments. As explained in section 5.2, we do not consider beacon placement and election to be in our scope. We made good experiences with 6 beacon nodes placed evenly distributed on the edges of the deployment area, so we kept this setting. For the values of the trace length, the distribution size and the maximum number of neighbors we ran some experiments with rather high values to see how close the nodes would get in reaching these upper bounds or if they would even exceed them frequently. In the end we concluded the values of 5 for the trace length, 20 for the distribution size and 30 for the maximum number of neighbors. The distribution structure is only a way to simplify the calculation of the statistical values to publish. It contains redundant information which is also available in the coordinate history. If memory becomes an issue, we can always refrain from using it. The size of the neighbor table was very rarely reached in our experiments. We consider the few times this happens acceptable, as the neighbor table takes a lot of memory space, meaning we cannot make it too large. Furthermore, the table’s filling degree strongly depends on the transmission power level used, meaning if the tables on the nodes become too full, we can always reduce the power level to avoid discarding random neighbors. Later, optimizations such as an appropriate eviction strategy will lead to a more efficient use of the table space. 70 7. Evaluation 20 Beacons Mean used by SVR (300sec; 6.5% error) Error Probability [%] 15 10 5 0 101 102 103 History Size [seconds] 104 105 Figure 7.1 Error levels for different history sizes: The straight line represents the mean of all six beacons. The value chosen for SVR is denoted by the vertical line. It keeps the average error at 6.5%. To get below 5% is only feasible in longer experiments. The traces are the only data structure that frequently reach their maximum size, since paths from a beacon to any node in the network may very well at least temporarily exceed the length of 5. However, to avoid loops in the beacons’ distance trees knowledge about the end of the paths is more important than about the beginning, since a node would always extend the path at its end. This fact paired with the rather high memory consumption of the traces especially inside the beacon packets made us stick to this value. The history size parameter deeply effects our statistical values, i.e. the mean coordinates published by the nodes, which are supposed to approximate the real coordinate development and distribution. To explore the effects of different history sizes we used the data from the experiments, in which we derived the development of the nodes’ current coordinates and which are described in section 4.3. We partitioned the whole experiment duration such that one part represents the time for which the coordinate information can be stored in the history. A history size of six beacon intervals for example would lead to parts of one minute length, considering a beacon interval length of ten seconds. We derived the coordinate distribution over the whole experiment duration and for the single parts of the duration and compared them using the χ2 test described in section 2.5. The test calculates how similar two sets of values are and, based on this, how likely it is that these sets can be represented by the same distribution. We considered its outcome, i.e. the probability that the two distributions are equal, as a 7.3. Coordinate Stability 71 metric for the similarity of the distributions and explored the dependency between this similarity and the history size. Of course, the bigger the history size is, the more similar the distributions get, as with a history size as large as the whole experiment duration the probability of both distributions being equal will be one. However, we have to consider the limited capacity of the sensor nodes, which can only keep track of rather few of their last coordinates. We wanted to find out how much of the coordinates’ history is needed to produce a distribution that is similar enough to the real distribution such that it can represent it and be used instead. The results of these calculations are shown in Figure 7.1. We see that the error rates drop rather quickly. To achieve an error rate1 of less than 5%, which would be a very good approximation, we would need a history size of at least 600 seconds, i.e. 10 minutes. Since our experiments only last 30 minutes because of the limitations dictated by MoteLab and Indriya, this was not feasible for us. We chose a history size of 300 seconds (5 minutes), which still leads to a good error probability of 6.5%. Furthermore, it does not take up too much space on the nodes and does not make up too much of the whole experiment duration. The numbers shown in Figure 7.1 were derived on Twist in an experiment running 24 hours, as this makes the outcome more reliable. Twist does not have a direct upper bound for the experiment length as it is the case on MoteLab and Indriya. However, when calculated from the data of a 30 minutes experiments the error rates are much higher, as the distribution for the whole experiment duration is too small to be useful as a reference. There are a lot of other parameters in the implementation, which can influence the functioning of the protocol, such as the length of the list derived by the coordinate table component containing a packet’s possible next hops. However, we consider the impact of these other parameters to be rather small and did not conduct special experiments to determine their optimal value. There might still be room for optimization by adjusting these values, but for our experiments we left them on their default value as given in BVR’s implementation. The values for all the parameters mentioned in this section may as well be optimized in dedicated experiments. We consider this future work. 7.3 Coordinate Stability With the parameters fixed as explained in section 7.2 we ran experiments to evaluate if our efforts really led to the coordinates being more stable than we had observed in section 4.3. The results presented in Figure 7.2 and 7.3 confirmed this hypothesis. For our three testbeds we compared: • coordinate range: the difference between the maximum and minimum coordinate • average coordinate: the average distance to any beacon 1 The error is the complementary probability of the χ2 analysis and thereby the chance of the distributions to be different. 7. Evaluation 1.0 1.0 0.8 0.8 0.8 0.6 0.4 0.2 0.00 2 4 6 8 0.6 0.4 0.2 SVR (current coords) SVR (mean coords) BVR 14 12 16 18 10 CDF [P(X x)] 1.0 CDF [P(X x)] CDF [P(X x)] 72 0.00 Node Coordinate Range 2 4 6 8 0.6 0.4 0.2 SVR (current coords) SVR (mean coords) BVR 14 10 12 0.00 Node Coordinate Range 4 2 6 8 10 SVR (current coords) SVR (mean coords) BVR 14 12 16 Node Coordinate Range 1.0 1.0 0.8 0.8 0.8 0.6 0.4 0.2 0.01.5 2.0 3.0 coordinate 0.6 0.4 0.2 SVR (current coords) SVR (mean coords) BVR 3.5 4.0 0.01.5 Node Average Coordinate (d) Average MoteLab 2.0 2.5 3.0 0.6 0.4 0.2 SVR (current coords) SVR (mean coords) BVR 4.5 3.5 4.0 0.01.0 Node Average Coordinate 1.0 1.0 0.8 0.8 0.8 0.4 0.2 SVR (current coords) SVR (mean coords) BVR 0.0 -1 10 101 100 102 Node Change Rate [Share of Intervals with Change, %] CDF [P(X x)] 1.0 0.6 1.5 2.0 2.5 SVR (current coords) SVR (mean coords) BVR 3.0 3.5 Node Average Coordinate in (e) Average coordinate in In- (f) Average coordinate in Twist driya CDF [P(X x)] CDF [P(X x)] 2.5 CDF [P(X x)] 1.0 CDF [P(X x)] CDF [P(X x)] (a) Coordinate range in Mote- (b) Coordinate range in Indriya (c) Coordinate range in Twist Lab 0.6 0.4 0.2 SVR (current coords) SVR (mean coords) BVR 0.0 -1 10 101 100 102 Node Change Rate [Share of Intervals with Change, %] 0.6 0.4 0.2 SVR (current coords) SVR (mean coords) BVR 0.0 -1 10 101 100 102 Node Change Rate [Share of Intervals with Change, %] (g) Coordinate change rate in (h) Coordinate change rate in (i) Coordinate change rate in MoteLab Indriya Twist Figure 7.2 Results from the Coordinate Stability experiments: In most cases SVR (with its mean coordinates) produces similar results as BVR or even outperforms it. The unfavorable dynamics observed with SVR’s current coordinates are definitely under control. • coordinate change rate: the share of beacon intervals in which a coordinate changed We always ran separate experiments for SVR and BVR. The metrics were always derived for SVR’s current coordinates, i.e. the actual ephemeral node distances to the beacons, BVR’s coordinates, and the mean coordinates SVR uses for routing. Figure 7.2 makes clear that the current coordinates (as expected and seen in section 4.2) show a very dynamic behavior, which is much lower with the mean coordinates. In all the cases SVR’s mean coordinates produce better results than the ones used in BVR. We also see this very clearly in Figure 7.3, where the average values are presented. SVR always shows a smaller coordinate range, average coordinate, and change rate despite BVR’s very cautious approach, where only very reliable links towards the beacon nodes are used in their distance trees. In section 7.4 we will see that our more greedy approach leads to fewer hops not only on the paths between 7.4. Routing Performance 9 Coordinate Range 73 3.5 8 Average Coordinate 7 1.2 2.5 5 2.0 4 1.5 3 1.0 0.6 0.4 0.5 1 0 MoteLab Indriya Twist 0.0 MoteLab SVR BVR 0.8 1.0 2 Change Rate [%] 1.4 3.0 6 1.6 0.2 Indriya Twist 0.0 MoteLab Indriya Twist Figure 7.3 Comparison of the average values from the Coordinate Stability experiments: The values for SVR’s current coordinates were constantly off the charts. “SVR” here means the mean coordinates. We omitted the current ones, as they only impaired the graph’s readability. the nodes and the beacons (leading to smaller average coordinates), but also on the paths the data packets take through the network. 7.4 Routing Performance The other important question we wanted to answer was if our approach to stabilize the virtual coordinates still allows us to achieve a good routing performance on this new addressing schema. For this we included data transfer in our tests. We ran the experiments for BVR and SVR as well to compare their performance. Every experiment lasted 30 minutes, where we let both protocols establish their coordinate system in the first 15 minutes of every experiment. Since really long experiments were not possible on every testbed, we let the nodes send beacons at a faster rate (every second) in the first 10 minutes to speed up the coordinate system establishment. The last 20 minutes of the experiment beacons were sent with a frequency of 10 seconds as usual. After the initialization phase the data transfer started. We chose ten node pairs, where for every pair a packet from the sender to the receiver node had to take several hops. First we queried the destination node over its serial interface for its mean coordinates2 . The received coordinates were sent to the sender node together with the command to send a burst of 500 packets at a rate of 100ms to the destination. Each time a sender node was done sending the next burst was initiated on the next sender after querying the new destination for its mean coordinates. With ten sender and receiver pairs and this amount of data almost the whole rest of the experiment was filled with data packet transfers. Every time a node received a packet, regardless if it was designated for itself or to be forwarded, a message of the type SVRRouteLoggingMsg was logged for us to reconstruct the routes the packets took through the network. All the information contained in this logging message is taken from the data packet arriving and from 2 Normally the sender node would send a query to the knowledge base in the network. 7. Evaluation 1.0 1.0 0.8 0.8 0.8 0.6 0.4 0.2 0.01 0.6 0.4 0.2 SVR BVR 2 3 4 Number of Hops 5 0.00 7 6 (a) Hops needed in MoteLab 0.6 0.4 0.2 SVR BVR 2 4 6 8 10 Number of Hops 12 14 1.0 0.8 0.8 0.8 0.4 0.2 0.00 CDF [P(X x)] 1.0 0.6 0.4 0.2 SVR BVR 5 10 15 Number of Transmissions 20 0.00 4 2 6 Number of Hops 8 10 12 (c) Hops needed in Twist 1.0 0.6 SVR BVR 0.00 16 (b) Hops needed in Indriya CDF [P(X x)] CDF [P(X x)] CDF [P(X x)] 1.0 CDF [P(X x)] CDF [P(X x)] 74 0.6 0.4 0.2 SVR BVR 5 10 15 Number of Transmissions 20 0.00 SVR BVR 2 4 6 8 10 12 Number of Transmissions 14 16 (d) Transmissions needed rate (e) Transmissions needed rate (f) Transmissions needed rate in MoteLab in Indriya in Twist Figure 7.4 Number of hops and transmissions needed in SVR and BVR: The dashed line represents BVR’s performance, the straight line the performance of SVR. In most of the cases SVR outperforms BVR. a data structure the router component maintains for each routing process, which basically contains nodeIDs of the neighbor that forwarded the packet to the node and the one the packet is supposed to be forwarded to in the next routing step. The format of the logging message can be found in Listing 7.1: • origin: This is the nodeID of the node that originally sent the packet triggering this logging message, i.e. the packet’s original sender. • msg id: Here the messageID of the packet is stored, i.e. its sequence number. • mode: This field contains information about the mode the packet was sent in. Details about the normal, fallback and flooding mode are given in section 3.1.1. • last hop: This is the nodeID of the packet’s last hop, i.e. the neighbor that sent the packet to the node issuing this logging message. • this hop: This is the node currently handling the packet, which also includes logging this message. • next hop: Here the node that is supposed to be the next to receive the packet can be found, i.e. the first in the list of possible next hops determined via the information in the coordinate table. • hopcount: This field keeps track of the number of hops the packet has taken so far from its original sender to the node logging this message. 7.4. Routing Performance 5 Hops 75 7 Transmissions 20 6 4 15 5 3 4 2 1 SVR BVR 10 3 2 0 MoteLab Packet Loss [%] 5 1 Indriya Twist 0 MoteLab Indriya Twist 0 MoteLab Indriya Twist Figure 7.5 Comparison of the average values for routing: SVR reduces the packet loss significantly in every testbed. The number of hops and transmissions is reduced most of the times. typedef n x s t r u c t SVRRouteLoggingMsg { nx uint16 t origin ; n x u i n t 1 6 t msg id ; n x u i n t 8 t mode ; nx uint16 t last hop ; nx uint16 t this hop ; n x u i n t 1 6 t next hop ; n x u i n t 8 t hopcount ; n x u i n t 8 t number trans ; nx uint16 t dest id ; Coordinates dest coords ; } SVRRouteLoggingMsg ; Listing 7.1: SVRRouteLoggingMsg: This message is logged every time a node receives a message, regardless whether the packet still has to be forwarded or not. • number trans: This field, however, records all transmissions needed so far to get the packet to this node, i.e. the hops as well as the retransmissions. • dest id: Here the nodeID of the packet’s destination is stored. If a neighbor with this ID is found in the coordinate table, it can be delivered directly. • dest coords: These are the mean coordinates of the destination, i.e. the address the packet is routed to. Figure 7.4 summarizes the results of the routing performance analysis. We see that regarding the number of hops and transmissions SVR’s routing performance is mostly better than BVR. For MoteLab the CDF lines cross one and the other time, but the averages (cf. Figure 7.5) show a clear advantage for SVR, whose magnitude seems to depend on the connectivity of the testbed. Our prototype almost consistently produces better results than BVR. The packet loss shows the most significant difference, particularly in the less connected testbeds. A poorer connectivity among the nodes seems to challenge BVR’s rather slow adaptability. However, our approach 76 7. Evaluation while producing a little bit more overhead caused by coordinate changes copes well with the network dynamics still delivering packets with acceptable costs. 8 Conclusion In this thesis we presented Statistical Vector Routing (SVR), a multi-hop point-topoint routing protocol for wireless sensor networks (WSNs), which bases on virtual coordinates. The challenges of the routing task are caused by the dynamics present in wireless networks. Connections between nodes are not stable and reliable, but their quality fluctuates over time. In section 4.3 we saw the dimension of these dynamics. The common way for routing protocols to deal with this problem involves a longterm link estimator, which measures the connections to a node’s neighbors. Routing steps are then restricted to those neighbors, to which a connection of very high and stable long-term quality exists. In this way, mostly very close neighbors are used as a next hop for routing, because the link quality between nodes correlates with their distance. However, farther away nodes can present opportunities to cover a lot of distance towards the packet’s destination with one routing hop. These shortcuts can seldom be used by conservative routing protocols because of their cautious link selection. Our approach aims at taking advantage of these opportunities by utilizing connections to all neighbors for routing as well as for the establishment of the virtual coordinate system, regardless of their long-term link quality. Therefore, our design does not need any explicit link estimation, and adapts very quickly to changes in the nodes’ connectivity. We use a very simple routing metric, yet our evaluation shows that SVR achieves a similar routing performance as Beacon Vector Routing (BVR). Regarding packet loss BVR is even clearly outperformed by SVR. However, this very greedy approach creates the necessity to deal with the overhead resulting from the constant shifts in the nodes’ coordinates due to changing neighbor relations. With our idea of considering the nodes’ addresses coordinate distributions we derive statistical values, which are published by the nodes instead of their ephemeral actual current coordinates. Our evaluation proves these statistical values to be stable enough to reduce the overhead for keeping them up to date in 78 8. Conclusion the knowledge base of the network to an acceptable level. Therefore, our chosen representation for these distributions, i.e. the mean coordinates, are precise and stable enough to be used for routing. 8.1 Future Work In section 7.2 we explored which parameters can influence the functionality of SVR and reasoned why we use certain values for them, which could be extended to other parameters to further optimize SVR. A candidate for this would be the aging threshold. Entries in the coordinate table are deleted if a node does not receive any beacon messages from the corresponding neighbor for a certain time. Together with a more complex eviction strategy, which maybe takes into account more properties of a neighbor, this could lead to a more efficient use of the coordinate table space. We are confident that a more sophisticated distance metric will improve SVR’s routing performance even more, since we consider our small average coordinates a sign for the success of our greedy neighbor selection mechanism. We would like to benefit from this for the routing decisions as well. A related idea is developing several routing metrics and always using the one that promises the most gain. Right now all nodes in the network have the same addressing information at their disposal. A possible improvement could be to employ several levels of precision for the information that a node distributes such that nodes farther away only get a rough idea of it and closer nodes get more precise information for routing. We would also like to perform a stress test for our protocol to see how quickly packets can be delivered and how many nodes can send packets simultaneously. Related to that, SVR’s ability to adapt to changes in the network topology would be another interesting issue to explore. We expect SVR to recover quite fast from node failures. Combined with this is the question of the ideal beacon interval. Right now we are using a rather high frequency of beacon messages, which surely can be reduced, at least after some initialization time. Unfortunately, we were not able to test SVR in really long experiments, as this would have yield further insight in its ability to stabilize. We would expect that after an initialization phase the nodes’ coordinate distributions will not change significantly anymore, unless a severe change in the network topology occurs. During normal operation, however, only very few beacon messages would be needed. Of course, this again rises the question of the appropriate history size, i.e. the time period which best represents the nodes’ coordinate distribution, which we already tried to answer for shorter time frames in section 7.2. Our implementation of SVR is a first prototype, which proves that the concept of statistical addresses we developed is feasible and can achieve a good performance while being much simpler and consuming less resources than existing solutions such as BVR. Although it produces very promising results, there is still room for optimization and improvement. Bibliography [1] Alizai, M. H., Landsiedel, O., Bitsch Link, J. A., Goetz, S., and Wehrle, K. Bursty traffic over bursty links. In Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems (SenSys’09) (Berkeley, California - November 4-6, 2009, 2009). [2] Becher, A., Landsiedel, O., Kunz, G., and Wehrle, K. Towards short-term wireless link quality estimation. In 7. GI/ITG KuVS Fachgespräch Drahtlose Sensornetze (September 2008), pp. 27–30. [3] Biswas, S., and Morris, R. Exor: opportunistic multi-hop routing for wireless networks. SIGCOMM Comput. Commun. Rev. 35, 4 (2005), 133–144. [4] Bitsch Link, J. A., Wehrle, K., Osechas, O., and Thiele, J. Ratpack: Communication in a sparse dynamic network. In ACM SIGCOMM 2008 Poster Proceedings (Seattle, USA, 2008). [5] Cao, Q., and Abdelzaher, T. A scalable logical coordinates framework for routing in wireless sensor networks. In RTSS ’04: Proceedings of the 25th IEEE International Real-Time Systems Symposium (Washington, DC, USA, 2004), IEEE Computer Society, pp. 349–358. [6] Doddavenkatappa, M., Chan, M., and Ananda, A. Indriya: A Low Cost, 3D Wireless Sensor Network Testbed. Tech. rep., School of Computing, National University of Singapore (NUS), 2009. [7] Fonseca, R., Gnawali, O., Jamieson, K., and Levis, P. Four-bit wireless link estimation. In Sixth Workshop on Hot Topics in Networks (HotNets) (November 2007). [8] Fonseca, R., Ratnasamy, S., Zhao, J., Ee, C. T., Culler, D., Shenker, S., and Stoica, I. Beacon vector routing: Scalable point-topoint routing in wireless sensornets. In 2nd Symposium on Networked Systems Design and Implementation (May 2005). [9] Gay, D., Levis, P., von Behren, R., Welsh, M., Brewer, E., and Culler, D. The nesc language: A holistic approach to networked embedded systems. In PLDI ’03: Proceedings of the ACM SIGPLAN 2003 conference on Programming language design and implementation (New York, NY, USA, 2003), ACM, pp. 1–11. [10] Gnawali, O., Fonseca, R., Jamieson, K., Moss, D., and Levis, P. The collection tree protocol (ctp). Tech. rep., TinyOS, 2009. 80 Bibliography [11] Handziski, V., Köpke, A., Willig, A., and Wolisz, A. Twist: a scalable and reconfigurable testbed for wireless indoor experiments with sensor networks. In REALMAN ’06: Proceedings of the 2nd international workshop on Multi-hop ad hoc networks: from theory to reality (New York, NY, USA, 2006), ACM, pp. 63–70. [12] Lee, H., Cerpa, A., and Levis, P. Improving wireless simulation through noise modeling. In IPSN ’07: Proceedings of the 6th international conference on Information processing in sensor networks (New York, NY, USA, 2007), ACM, pp. 21–30. [13] Levis, P., Lee, N., Welsh, M., and Culler, D. Tossim: accurate and scalable simulation of entire tinyos applications. In SenSys ’03: Proceedings of the 1st international conference on Embedded networked sensor systems (New York, NY, USA, 2003), ACM, pp. 126–137. [14] Levis, P., Madden, S., Polastre, J., Szewczyk, R., Whitehouse, K., Woo, A., Gay, D., Hill, J., Welsh, M., Brewer, E., and Culler, D. Tinyos: An operating system for sensor networks. In Ambient Intelligence (2004). [15] Newsome, J., and Song, D. Gem: Graph embedding for routing and datacentric storage in sensor networks without geographic information. In SenSys ’03: Proceedings of the 1st international conference on Embedded networked sensor systems (New York, NY, USA, 2003), ACM, pp. 76–88. [16] Ortiz, J., Baker, C. R., Moon, D., Fonseca, R., and Stoica, I. Beacon location service: a location service for point-to-point routing in wireless sensor networks. In IPSN ’07: Proceedings of the 6th international conference on Information processing in sensor networks (New York, NY, USA, 2007), ACM, pp. 166–175. [17] Rao, A., Ratnasamy, S., Papadimitriou, C., Shenker, S., and Stoica, I. Geographic routing without location information. In MobiCom ’03: Proceedings of the 9th annual international conference on Mobile computing and networking (New York, NY, USA, 2003), ACM, pp. 96–108. [18] Royer, E., and Toh, C. A review of current routing protocols for ad-hoc mobile wireless networks. IEEE personal communications (1999). [19] Rubner, Y., Tomasi, C., and Guibas, L. J. A metric for distributions with applications to image databases. In Computer Vision, 1998. Sixth International Conference on (1998), pp. 59–66. [20] Srinivasan, K., Kazandjieva, M. A., Agarwal, S., and Levis, P. The beta-factor: measuring wireless link burstiness. In SenSys ’08: Proceedings of the 6th ACM conference on Embedded network sensor systems (New York, NY, USA, 2008), ACM, pp. 29–42. [21] Werner-Allen, G., Swieskowski, P., and Welsh, M. Motelab: a wireless sensor network testbed. In Information Processing in Sensor Networks, 2005. IPSN 2005. Fourth International Symposium on (2005), pp. 483–488. Bibliography 81 [22] Woo, A., Tong, T., and Culler, D. Taming the underlying challenges of reliable multihop routing in sensor networks. In SenSys ’03: Proceedings of the 1st international conference on Embedded networked sensor systems (New York, NY, USA, 2003), ACM, pp. 14–27. 82 Bibliography List of Figures 2.1 Schematic architecture of a sensor node . . . . . . . . . . . . . . . . . 6 2.2 Examples for Sensor Nodes . . . . . . . . . . . . . . . . . . . . . . . . 8 2.3 Cumulative Distribution Function (CDF) . . . . . . . . . . . . . . . . 11 2.4 Example for Distance Vector Routing . . . . . . . . . . . . . . . . . . 12 2.5 Creation of a collection tree . . . . . . . . . . . . . . . . . . . . . . . 15 2.6 Routing on virtual coordinates . . . . . . . . . . . . . . . . . . . . . . 16 3.1 Illustration for routing steps towards beacons assuring progress while routing further away does not . . . . . . . . . . . . . . . . . . . . . . 21 3.2 Impression of the inability of long-term link estimators to capture short-term fluctuations in link quality . . . . . . . . . . . . . . . . . . 23 3.3 Structure of the Four-bit link estimator . . . . . . . . . . . . . . . . . 26 3.4 Collection tree established by CTP and additional bursty links . . . . 27 3.5 CPDFs and corresponding KW and β values . . . . . . . . . . . . . . 29 3.6 Calculation of KW and β of an example link . . . . . . . . . . . . . . 30 3.7 Example of an ExOR run . . . . . . . . . . . . . . . . . . . . . . . . 33 4.1 Distribution of the lengths of packet bursts in a network . . . . . . . 40 4.2 Results from the Coordinate Dynamics experiments . . . . . . . . . . 42 4.3 Development and distribution of coordinates . . . . . . . . . . . . . . 43 5.1 Probability of finding a good or intermediate quality neighbor depending on distance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 5.2 Illustration of the count to infinity problem . . . . . . . . . . . . . . . 51 5.3 Distributions as addresses in SVR . . . . . . . . . . . . . . . . . . . . 53 5.4 Illustration of the sum distance . . . . . . . . . . . . . . . . . . . . . 55 6.1 Structure of the relevant part of BVR . . . . . . . . . . . . . . . . . . 58 84 List of Figures 6.2 Effects of the two main events in SVR . . . . . . . . . . . . . . . . . 63 7.1 Error levels for different history sizes . . . . . . . . . . . . . . . . . . 70 7.2 Results from the Coordinate Stability experiments . . . . . . . . . . . 72 7.3 Comparison of the average values from the Coordinate Stability experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 7.4 Hops and transmissions needed in SVR and BVR . . . . . . . . . . . 74 7.5 Comparison of the average values for routing . . . . . . . . . . . . . . 75