The reliability of wireless backhaul mesh networks Geir Egeland 1 , Paal E. Engelstad 2 1 Department of Electrical and Computer Engineering, University of Stavanger, 4036 Stavanger Norway 1 2 2 geir.egeland@gmail.com Telenor Research and Innovation 1331 Fornebu, Norway paal.engelstad@telenor.com Abstract—When planning the structure of a backhaul mesh network for public access, it is common to introduce redundant nodes in the mesh network. These are called redundant, because they do not increase the network capacity under normal operation, due to the shortest-path metric of the routing protocol. Instead, their sole purpose is to increase the network reliability by providing failover links when a link in the shortest-path fails. This paper shows how to estimate the additional network reliability that results from introducing a redundant node. The best location of a redundant node, i.e. where to locate the node in order to maximise the additional network reliability, can also be found. I. I NTRODUCTION The popular IEEE 802.11 standard [1] is widely used for wireless local area networks (WLAN) and is deployed in a wide range of different devices. At the moment, IEEE is developing a new extension to the standard, IEEE 802.11s [2], in order to use the 802.11 technology also for wireless mesh networking. A wireless mesh network (hereafter referred to as a mesh network or a mesh) is an ad hoc network composed of a group of self-configured and self-organised nodes interconnected via wireless links. A node is also referred to as a mesh point (MP) or a router. An MP is referred to as a mesh access point (MAP) if it includes the functionality of 802.11 access points allowing regular 802.11 stations (STAs) to connect to the mesh infrastructure as clients. Furthermore, an MP is referred to as a Mesh Portal (MPP) if it has additional functionality for connecting the mesh network to other network infrastructures. Mesh networking is related to mobile ad hoc network (MANET) technology. While the MANET effort is mainly focusing on the routing aspects of ad hoc networks, 802.11s standardises all necessary functionality for a fully operational network, including network discovery and negotiation of networking parameters, authentication and security, inter-working with other networks and routing. For routing, Radio-Metric AODV or Radio-Aware OLSR can be used. These are the radio-aware versions of the routing protocols AODV [3] and OLSR [4] that were originally developed for MANETs. There are many promising applications of mesh technology. A large group of applications takes advantage of the fact that a mesh network can be set up in an ad hoc fashion. Mesh technology can therefore provide a self-formed temporary communication infrastructure for disaster recovery operations, for communication at conferences or conventions and in tactical military operations. Another group of applications appears as a consequence of the high costs associated with interconnecting the MPs (i.e. routers) with wired links. With mesh technology it is possible to extend the reach of the wired backbone through a wireless backhaul mesh of MPs in a cost-efficient manner. Currently, there are a large number of commercial deployments of such solutions for public access in urban areas, where the nodes are often placed on roof-tops. Such backhaul mesh networks are normally not formed in an ad hoc manner. Instead, the location and placement of each MP, MPP and MAP is carefully planned (network planning). Since the reliability of such mesh networks often is poor and a considerable hurdle for wide deployment, it is common to introduce redundant MPs in the network in order to improve network reliability. STA STA MAP STA Redundant MP MP STA STA MAP MP MP: Mesh Point MPP: Mesh Portal MAP: Mesh Access Point STA: Station MPP Wired infrastructure Figure 1. A backhaul mesh network with a redundant node. Figure 1 illustrates the introduction of a redundant node in a backhaul mesh network. In normal operation, each STA is connected to a MAP, and the STA’s traffic is forwarded along the shortest path between the MAP and the MPP. However, if a link in the shortest path becomes unavailable, the routing protocol ensures that a new path between the MAP and the MPP is formed via the redundant node. Note that due to the shortest-path feature of the routing protocol, there is no load-sharing on the network. This means that the introduction of the redundant node does not increase 1 the overall throughput of the network. Its main function is to improve the network reliability. Thus, in our study, we are solely concerned with the connectivity measures of a wireless mesh network, while other network performance metrics, such as throughput and delay, are not relevant here. For many mesh networks, including commercial backhaul mesh networks in urban areas (e.g. roof-top mesh networks), there is a high site-acquisition cost associated with each node, and it is not economically feasible to introduce too many redundant nodes. Instead, one needs careful network planning in order to estimate the optimal number of redundant nodes. The network planning should include a cost-benefit analysis, where the additional network reliability of adding a redundant MP is weighted against the additional costs (including equipment cost, installation cost, site-acquisition cost and operational cost) of adding the node. While the cost of adding a redundant node is normally easy to forecast, it is more difficult to forecast the additional reliability of adding the node. The main reason is that little have been published about network reliability for mesh networks. Thus, the benefit in terms of improved reliability of adding a redundant node is unknown for the network planner. One component of network planning is to also find a good location for nodes, including the redundant nodes. Applying common sense, it is clear that there are certain mesh topologies that are less favourable than others with regards to reliability. The main purpose of this paper is to analyse the reliability of mesh networks, providing results that can be useful for network planning. By using the methodology presented here, a similar analysis can easily be applied to other mesh topologies. The rest of the paper is organised as follows: In Section II we describe some related work. Section III provides some background on mesh network and the reliability of these. Reliability and availability metrics for a mesh network are presented in Section IV. In Section V we evaluate the effect of redundant MPs in a backhaul mesh network. In Section VI we describe how to obtain the mean time to failure for a backhaul mesh and finally, Section VII concludes the paper. II. R ELATED WORK There has been several studies on two-terminal reliability for wired networks [5][6][7]. The results from most work on network reliability of fixed networks are not generally applicable to wireless networks. The main reason is that in fixed networks the probability of a link failure is so low compared to the probability of a node failure, that in the analysis it is common to consider the link as invulnerable to failure. In wireless networks, on the other hand, link failures occur frequently due to the inherent characteristics of the radio channel, such as radio fading, signal attenuation, radio interference and background noise. The link failure frequency is normally so much higher than the node failure frequency that it is natural to model the nodes as invulnerable to failure, and only focus only on the link failures in the analysis. In [7], however, link failures in fixed ring or double ring network structures are analysed. The work focuses on the design of a physical network topology that meets a high level of reliability using unreliable network elements. It is shown that for independent link failures, network design should be optimised with respect to reliability under high stress, as reliability under low stress is less sensitive to network topology. There are several other reasons why fixed network analyses are not applicable to wireless networks. For example, in fixed networks a link monitoring mechanism, or a Link Integrity Test operation is often used to identify if a link has failed, while there is no equivalent to this in wireless networks. There is, however, only a limited number of studies on network reliability of wireless networks. The early work in [8], analyses radio broadcast networks showing that computing the two-terminal problem for these networks are computational difficult. The work in [9] deals with the problems of computing a measure for the reliability of distributed sensor network and for the expected and the maximum message delay between data sources. The two-terminal reliability of ad hoc networks is computed in [10]. This work focuses on the reliability of nodes and on the effects of node mobility, while the effects of link reliability in static topologies - which we investigate in this paper - are not considered. III. BACKGROUND A. The reliability of the links in the mesh Different layers of the networking stack introduce repair mechanisms that try to remedy and hide the effects of loss of signal or loss of packets due to various radio effects. Such repair mechanisms include modulation and coding techniques at the physical layer and the use of two-way (DATA-ACK) and four-way (RTS-CTS-DATA-ACK) handshakes at the MAC layer, as well as retransmission of lost MAC frames (until the retry counter expires). Furhermore, often the routing module exchanges periodically HELLO-packets with the immediate neighbours to detect link failures (e.g. OLSR), but hides the effects of the unreliable delivery of HELLO-packets by not considering the link as failed before a number of consecutive HELLO-packets (typically three packets) are lost. The routing module also deals with failures of other links in the network (e.g. by the TC-messages of OLSR). The routing protocol hides the effects of a link failure by trying to find an alternative route. However, if the link failure disconnects the mesh networks, it might not be possible to repair the failure. In other words, the mesh network does not fail to provide connectivity between two nodes, before the mesh network is considered as disconnected. In the next subsection, we will investigate reliability metrics for mesh networks, with this routing perspective in mind. B. The reliability of a mesh network The wireless mesh network is composed of the nodes vj ∈ S, and is modelled as an undirected graph G where the nodes vj serve as vertices. Any of two distinct nodes vj and vi create an edge i,j if there is a link between them. The visibility graph is defined as a graph where there is an edge between two nodes, if the two nodes are within radio transmission range of each other. The connectivity graph is defined as a graph where there is an edge between two nodes, if the two nodes are within radio transmission range of each other and if the communication link between the two nodes has not failed. A minimal set of edges in the graph whose removal disconnects the graph is an edge cutset. The minimum cardinality of an edge cutset is the edge connectivity or cohesion β(G). A (minimal) set of nodes that has the same property is a node cutset, and the minimum cardinality of this is the node connectivity χ(G). To provide an adequate measure of network reliability, one has to use probabilistic reliability metrics and a probabilistic graph. This is an undirectional graph where each node has an associated probability of being in an operational state, and likewise for each edge. A starting point for the analysis of network reliability in this paper is the all-terminal and two-terminal reliability of a network. These measures are each a specialised version of the k-terminal reliability, which is defined as the probability that a path exists and connects k nodes in a network (k = n or k = 2). IV. T HE RELIABILITY AND AVAILABILITY OF BACKHAUL MESH NETWORKS A. All-terminal and two-terminal reliabilities As discussed earlier, for our analysis of mesh networks we assume that the nodes vj ∈ S in the topology are invulnerable to failure. Furthermore, we assume that a link s,d connecting two nodes vs and vd fail independent from i,j ∈ S\{vs , vd }. The all-terminal reliability, Pc (G, p), is the probability that all the nodes vj ∈ S in the network have operating paths between them. It is given by: Pc (G, p) = = X Ai (1 − p)i p−i i=n−1 X 1− Ci pi (1 − p)−i (1) (2) i=β where Ai denotes the number of connected subgraph with i edges, and Ci denotes the number of edge cutsets of cardinality i. The two-terminal reliability, on the other hand, is the probability that a given pair of nodes, vs and vd , have an operating path between them: Pcsd (G, p) = X i −i Asd i (1 − p) p (3) i=wsd = 1− X Cisd pi (1 − p)−i (4) nodes vs and vd , αsd is the minimum number of edge failures required to disconnect nodes vs and vd . Cisd is the number of cutsets with respect to nodes vs and vd of cardinality i. B. The network availability The network reliability [11] is defined as the probability that a network G is disconnected at a time t = ta , given that it was not disconnected at the time t = 0, and incorporates the transient behaviour of the network. The network availability, on the other hand, is the steady-state probability that the network is not disconnected as t → ∞. For a network planner, the network availability is an important reliability measure because it says how big share of the time the network is operational. Let us assume that link failures on an operational link are Poisson distributed with an failure rate parameter λ. We also assume that a failed link, , can be repaired, and that the repairs are Poisson distributed with a repair rate parameter µ. Then, the link can be modelled by a two-state Markov diagram, where one state represents the link being up, while the other represents the link being down (Figure 2) λ qu Figure 2. qd A simple two state Markov model of the reliability of a link At steady-state, the state probabilities are: λ µ , qd = (5) qu = µ+λ µ+λ C. The reliability and availability of a backhaul mesh network Consider a mesh network, G, that works as a backhaul mesh network and includes k − 1 different distribution nodes di in D, D = (d1 , d2 , ..., dk−1 ), and one root node r. According to our previous terminology of IEEE 802.11s, a distribution node corresponds to a MAP in an IEEE 802.11s network, while the root node corresponds to an MPP (Figure 1). Under normal network operation, transit traffic in the network is directed along the shortest path between the root node r and each distribution node, di in D. If any distribution node is disconnected from the root node, the network has failed, as it is not operating as intended. Thus, the network planner should consider the network as fully operational only if there is an operational path between the root node and each of the distribution nodes. This is true if, and only if, k nodes, consisting of the root node r and the k − 1 distribution nodes are all connected. Thus, the network planner should analyse the reliability of the network using the k-terminal reliability. When considering the reliability, the network availability is of highest interest. Using the expressions above, we can now find the k-terminal availability as: i=αsd where wsd is the shortest path length between nodes vs and vd , Asd i is the number of subgraphs with i edges that connect µ Pcr,d1 ,··,dk−1 (G) = 1 − X i=β r,d1 ,··,dk−1 Ci i (qd ) (1 − qd ) −i (6) X i −i Ci (qd ) (1 − qd ) 1.0 0.6 0.4 p = 1 × 10−1 p = 3 × 10−1 p = 5 × 10−1 0.6 0.5 0.3 0.0 0.0 0.2 0.4 0.6 Probability of link failure (p) 0.8 0.2 1.0 (a) Availability {r, d5 , d6 } 0 1 2 3 4 Number of redundant MPs 5 6 (b) Effect of redundant MPs Service availability when adding redundant MPs We can also deduce from Figure 4 that the effect of adding redundant MPs is greatest for a probability of link failures in the approximate range p ∈ {0.1, 0.5}. In Figure 4 b) we show the increase in availability as we add redundant MPs, providing a clearer picture of how they improve the availability. (8) MAP d6 MAP d6 MP d4 MP d4 MP d2 MP d2 d10 where Wk is a weighting factor. V. E VALUATION In this section we evaluate the availability for a topology (Figure 3) where two MAPs are distributed and connected to a MPP. With respect to network planning, we are interested in finding the availability for the service provided by the MAPs, thus one has to determine the steady-state condition of the three-terminal reliability Pcr,d5 ,d6 . MAP d6 MAP d6 d10 MP d4 MP d4 d8 d12 MP d2 MP d2 d7 d9 d11 r d1 d3 d5 r d1 d3 d5 MPP MP MP MAP MPP MP MP MAP Figure 3. 0.7 0.4 Figure 4. Pc (G, D, r) = W1 Pcr,d1 + W2 Pcr,d2 , · · · , Wk−1 Pcr,dk−1 0.8 0.2 (7) Finally, since some MAPs might be of higher importance than other MAPs, the network planner might instead prefer to use a weighted average of the different two-terminal availabilities: The availability of a backhaul mesh network, Pc (G, D, r), can then be defined as a combination of different two-terminal availabilities: 0.9 Availability (Pcr,d5 ,d6 ) 0.8 i=β (a) Initial topology 1.0 No redundancy With d7 With d7 , d8 With d7 , · · ·, d9 With d7 , · · ·, d10 With d7 , · · ·, d11 With d7 , · · ·, d12 (b) With redundant MPs d8 d7 r d1 d3 d5 MPP MP MP MAP (a) Initial topology Figure 5. d11 d1 d3 d5 MPP MP MP MAP An alternate deployment of redundant MPs An alternate configuration for the redundant MPs, is to deploy them geographically closer to the backhaul mesh as illustrated in Figure 5. Assuming the redundant MPs have equivalent transmission range as the ones in Figure 3, positioning the MPs closer to the backhaul mesh nodes, will increase the number of links from the redundant MPs to the backhaul. Figure 6 shows the improved reliability by adding redundant MPs according to the topology in Figure 5. Topology with and without redundant MPs 1.0 1.0 No redundancy With d7 With d7 , d8 With d7 , · · ·, d9 With d7 , · · ·, d10 With d7 , · · ·, d11 Availability (Pcr,d5 ,d6 ) 0.8 In the network planning phase, a cost-benefit analysis needs to be performed in order to determine cost versus the added reliability introduced by a redundant MP. Ignoring the cost factor of adding redundant MPs, we plot the availability and the effect when an extra MP is added to the network (Figure 4). The redundant MPs are added in a particular order, d7 , · · · , d12 (Figure 3). From Figure 4 a), we see that the availability is increasing as we add the redundant nodes. However, adding MP d12 only d9 r (b) With redundant MPs 0.9 0.8 Availability (Pcr,d5 ,d6 ) Pc (G) = 1 − add negligible improvement to the availability, since at this point, all the nodes in the backhaul has at least two links on which their service can be reached. Adding d12 , does not change the number of links connecting the backhaul, and can therefore be removed from the network without any noticeable loss in availability. Availability (Pcr,d5 ,d6 ) With current IEEE 802.11s equipment, it is possible to let the STAs connect to the MAPs at one frequency band (e.g. using 802.11b or 802.11g) and use another frequency band for the communications between the MPs in the backhaul mesh network. Since the extra equipment cost of such a configuration is normally minimal compared to other costs, such as site-acquisition costs, it is anticipated that many commercial mesh networks will implement a MAP at each MP in the network. With such a configuration, one has to consider the all-terminal availability of the network. The (all-terminal) network availability is the probability of a network G being connected at steady state. It can now be expressed using Eq. 25 as: 0.6 0.4 0.7 0.6 0.5 0.4 0.2 p = 1 × 10−1 p = 3 × 10−1 p = 5 × 10−1 0.3 0.0 0.0 0.2 0.4 0.6 Probability of link failure (p) 0.8 1.0 (a) Availability {r, d5 , d6 } Figure 6. 0.2 0 1 2 3 Number of redundant MPs 4 5 (b) Effect of redundant MPs Service availability when adding redundant MPs VI. T RANSIENT NETWORK RELIABILITY 35 80 70 A. The transient all-terminal reliability without link repair In this section we investigate the all-terminal reliability for arbitrary mesh topologies. We assume that networks starts out with all links between two nodes within radio range of each other is fully operational, so that the connectivity graph is equal to the visibility graph. We further assume that link failures are Poisson distributed with parameter λ and that when a link fail, it can not be repaired. The probability of a topology G being disconnected is then given by Pd (G, t < T ) = X Ci (1 − e−λt )i (e−λt )−i (9) i=β It is easy to show that the mean time for when the topology will fail, is equal to Eq. 10, where B(x, y) is the complete Beta function. −1 1 X (−1)n+1 X − Ci B(i + 1, − i) (10) E(Td ) = λ n=1 n n i=β B. The transient k-terminal reliability without link repair Similarly, we can show that the expected time that k-nodes will be disconnected, is given by: 1 E(Tdvi ,··,vk ) = × λ −1 n+1 X X (−1) − Civi ,··,vk B(i + 1, − i) (11) n n n=1 i=β Using Eq. 11, we can find the MTFF for the topologies in Section V. This results are shown in Figure 7. Percentage increase in MTFF [%] While the network availability is a useful reliability measure, it does not provide a full picture of the network reliability, since it is a steady-state measure. For example, it gives the share of time the network is up, but it does not provide insight into the mean time to failure (MTTF) given that the network is connected at time t = 0 or the mean time between failures (MTBF). To estimate such time-dependent measures, one needs to study the transient behaviour of the k-terminal reliability. Analysis of transient behaviour normally becomes very complex when considering link failures with repair. Thus, when studying transient reliability, it is not uncommon to simplify the analysis by considering link failures without repair. We will do the same in this section. However, assuming that links that fail have infinite repair time is not very realistic. We therefore intend to continue our analysis to consider link failures with repair, and plan to address this topic in follow-on research. Percentage increase in MTFF [%] 30 25 20 15 10 5 60 50 40 30 20 1 2 3 4 Number of redundant MPs 5 6 (a) Topology in Figure 3 10 1.0 1.5 2.0 2.5 3.0 3.5 Number of redundant MPs 4.0 4.5 5.0 (b) Topology in Figure 5 Figure 7. Relatively variation in MTFF as the number of redundant MPs is increased (λ = 1) VII. C ONCLUSION AND FUTURE WORK This paper investigates the reliability and availability of backhaul mesh networking, more specifically finding the benefit of adding redundant nodes in such networks. To the best of our knowledge, this has not been studied in previous works. Our results show how redundant nodes improve network reliability. Although analyses and results depend on the actual topology, the same analyses as presented here can be applied to any specific backhaul mesh topology of interest. For future work we want to analyze the OLSR ad hoc routing protocol and use a Markov model to describe the onehop neighbour discovery mechanism as a probability function for whether the link is interpreted as up or down. Also, we want to analyse the MAC- and physical- layer behaviour and derive a more adequate probability distribution for link failures in mesh networks. In a wireless environment, a link failure is often correlated with link errors in other parts of the network, thus we also want to include link failure dependencies in the analysis. R EFERENCES [1] Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY) Specification, IEEE Std. 802.11, 1997. [2] LAN/MAN Specific Requirements - Part 11: Wireless Medium Access Control (MAC) and physical layer (PHY) specifications: Amendment: ESS Mesh Networking, IEEE Std. 802.11s Draft 2.0, 2008. [3] C. Perkins, E. Belding-Royer, and S. Das, “Ad hoc on-demand distance vector (aodv) routing,” IETF, July 2003. [4] T. Clausen and P. Jacquet, “Optimized link state routing protocol (olsr),” IETF, October 2003. [5] C. J. Colbourn, “Reliability issues in telecommunications network planning,” in Telecommunications network planning, B. S. P. Soriano, Ed. Kluwer Academic Publishers, 1999, ch. 9, pp. 135–146. [6] ——, The Combinatorics of Network Reliability. Oxford Univ. Press, 1987. [7] G. Weichenberg, V. Chan, and M. Medard, “High-reliability topological architectures for networks under stress,” Selected Areas in Communications, IEEE Journal on, vol. 22, no. 9, pp. 1830–1845, Nov. 2004. [8] H. AboElFotoh and C. J. Colbourn, “Computing 2-terminal reliability for radio-broadcast networks,” Reliability, IEEE Transactions on, vol. 38, no. 5, pp. 538–555, Dec 1989. [9] H. AboElFotoh, S. Lyengar, and K. Chakrabarty, “Computing reliability and message delay for cooperative wireless distributed sensor networks subject to random failures,” Reliability, IEEE Transactions on, vol. 54, no. 1, pp. 145–155, March 2005. [10] S. Kharbash and W. Wang, “Computing two-terminal reliability in mobile ad hoc networks,” Wireless Communications and Networking Conference, 2007.WCNC 2007. IEEE, 11-15 March 2007. [11] M. L. Shooman, Reliability of Computer Systems and Networks: Fault Tolerance, Analysis, and Design. John Wiley and Sons, Inc, 2002.