The reliability of wireless backhaul mesh networks Geir Egeland

advertisement
The reliability of wireless backhaul mesh networks
Geir Egeland 1 , Paal E. Engelstad 2
1
Department of Electrical and Computer Engineering, University of Stavanger,
4036 Stavanger Norway
1
2
2
geir.egeland@gmail.com
Telenor Research and Innovation
1331 Fornebu, Norway
paal.engelstad@telenor.com
Abstract—When planning the structure of a backhaul mesh
network for public access, it is common to introduce redundant
nodes in the mesh network. These are called redundant, because
they do not increase the network capacity under normal operation, due to the shortest-path metric of the routing protocol.
Instead, their sole purpose is to increase the network reliability by
providing failover links when a link in the shortest-path fails. This
paper shows how to estimate the additional network reliability
that results from introducing a redundant node. The best location
of a redundant node, i.e. where to locate the node in order to
maximise the additional network reliability, can also be found.
I. I NTRODUCTION
The popular IEEE 802.11 standard [1] is widely used for
wireless local area networks (WLAN) and is deployed in a
wide range of different devices. At the moment, IEEE is
developing a new extension to the standard, IEEE 802.11s
[2], in order to use the 802.11 technology also for wireless
mesh networking.
A wireless mesh network (hereafter referred to as a mesh
network or a mesh) is an ad hoc network composed of a group
of self-configured and self-organised nodes interconnected via
wireless links. A node is also referred to as a mesh point
(MP) or a router. An MP is referred to as a mesh access point
(MAP) if it includes the functionality of 802.11 access points
allowing regular 802.11 stations (STAs) to connect to the mesh
infrastructure as clients. Furthermore, an MP is referred to
as a Mesh Portal (MPP) if it has additional functionality for
connecting the mesh network to other network infrastructures.
Mesh networking is related to mobile ad hoc network
(MANET) technology. While the MANET effort is mainly
focusing on the routing aspects of ad hoc networks, 802.11s
standardises all necessary functionality for a fully operational
network, including network discovery and negotiation of networking parameters, authentication and security, inter-working
with other networks and routing. For routing, Radio-Metric
AODV or Radio-Aware OLSR can be used. These are the
radio-aware versions of the routing protocols AODV [3] and
OLSR [4] that were originally developed for MANETs.
There are many promising applications of mesh technology.
A large group of applications takes advantage of the fact that
a mesh network can be set up in an ad hoc fashion. Mesh
technology can therefore provide a self-formed temporary
communication infrastructure for disaster recovery operations,
for communication at conferences or conventions and in
tactical military operations.
Another group of applications appears as a consequence of
the high costs associated with interconnecting the MPs (i.e.
routers) with wired links. With mesh technology it is possible
to extend the reach of the wired backbone through a wireless
backhaul mesh of MPs in a cost-efficient manner. Currently,
there are a large number of commercial deployments of such
solutions for public access in urban areas, where the nodes are
often placed on roof-tops.
Such backhaul mesh networks are normally not formed in
an ad hoc manner. Instead, the location and placement of each
MP, MPP and MAP is carefully planned (network planning).
Since the reliability of such mesh networks often is poor and
a considerable hurdle for wide deployment, it is common to
introduce redundant MPs in the network in order to improve
network reliability.
STA
STA
MAP
STA
Redundant
MP
MP
STA
STA
MAP
MP
MP: Mesh Point
MPP: Mesh Portal
MAP: Mesh Access Point
STA: Station
MPP
Wired infrastructure
Figure 1.
A backhaul mesh network with a redundant node.
Figure 1 illustrates the introduction of a redundant node in
a backhaul mesh network. In normal operation, each STA is
connected to a MAP, and the STA’s traffic is forwarded along
the shortest path between the MAP and the MPP. However,
if a link in the shortest path becomes unavailable, the routing
protocol ensures that a new path between the MAP and the
MPP is formed via the redundant node.
Note that due to the shortest-path feature of the routing
protocol, there is no load-sharing on the network. This means
that the introduction of the redundant node does not increase
1
the overall throughput of the network. Its main function is to
improve the network reliability. Thus, in our study, we are
solely concerned with the connectivity measures of a wireless
mesh network, while other network performance metrics, such
as throughput and delay, are not relevant here.
For many mesh networks, including commercial backhaul
mesh networks in urban areas (e.g. roof-top mesh networks),
there is a high site-acquisition cost associated with each node,
and it is not economically feasible to introduce too many
redundant nodes. Instead, one needs careful network planning
in order to estimate the optimal number of redundant nodes.
The network planning should include a cost-benefit analysis,
where the additional network reliability of adding a redundant MP is weighted against the additional costs (including
equipment cost, installation cost, site-acquisition cost and
operational cost) of adding the node. While the cost of adding
a redundant node is normally easy to forecast, it is more
difficult to forecast the additional reliability of adding the
node. The main reason is that little have been published about
network reliability for mesh networks. Thus, the benefit in
terms of improved reliability of adding a redundant node is
unknown for the network planner.
One component of network planning is to also find a good
location for nodes, including the redundant nodes. Applying
common sense, it is clear that there are certain mesh topologies
that are less favourable than others with regards to reliability.
The main purpose of this paper is to analyse the reliability
of mesh networks, providing results that can be useful for
network planning. By using the methodology presented here, a
similar analysis can easily be applied to other mesh topologies.
The rest of the paper is organised as follows: In Section
II we describe some related work. Section III provides some
background on mesh network and the reliability of these.
Reliability and availability metrics for a mesh network are
presented in Section IV. In Section V we evaluate the effect
of redundant MPs in a backhaul mesh network. In Section
VI we describe how to obtain the mean time to failure for a
backhaul mesh and finally, Section VII concludes the paper.
II. R ELATED WORK
There has been several studies on two-terminal reliability
for wired networks [5][6][7]. The results from most work
on network reliability of fixed networks are not generally
applicable to wireless networks. The main reason is that in
fixed networks the probability of a link failure is so low
compared to the probability of a node failure, that in the
analysis it is common to consider the link as invulnerable to
failure. In wireless networks, on the other hand, link failures
occur frequently due to the inherent characteristics of the
radio channel, such as radio fading, signal attenuation, radio
interference and background noise. The link failure frequency
is normally so much higher than the node failure frequency
that it is natural to model the nodes as invulnerable to failure,
and only focus only on the link failures in the analysis.
In [7], however, link failures in fixed ring or double ring
network structures are analysed. The work focuses on the
design of a physical network topology that meets a high level
of reliability using unreliable network elements. It is shown
that for independent link failures, network design should
be optimised with respect to reliability under high stress,
as reliability under low stress is less sensitive to network
topology.
There are several other reasons why fixed network analyses
are not applicable to wireless networks. For example, in fixed
networks a link monitoring mechanism, or a Link Integrity Test
operation is often used to identify if a link has failed, while
there is no equivalent to this in wireless networks.
There is, however, only a limited number of studies on
network reliability of wireless networks. The early work in
[8], analyses radio broadcast networks showing that computing
the two-terminal problem for these networks are computational
difficult. The work in [9] deals with the problems of computing
a measure for the reliability of distributed sensor network and
for the expected and the maximum message delay between
data sources. The two-terminal reliability of ad hoc networks
is computed in [10]. This work focuses on the reliability of
nodes and on the effects of node mobility, while the effects
of link reliability in static topologies - which we investigate
in this paper - are not considered.
III. BACKGROUND
A. The reliability of the links in the mesh
Different layers of the networking stack introduce repair
mechanisms that try to remedy and hide the effects of loss
of signal or loss of packets due to various radio effects. Such
repair mechanisms include modulation and coding techniques
at the physical layer and the use of two-way (DATA-ACK)
and four-way (RTS-CTS-DATA-ACK) handshakes at the MAC
layer, as well as retransmission of lost MAC frames (until the
retry counter expires). Furhermore, often the routing module
exchanges periodically HELLO-packets with the immediate
neighbours to detect link failures (e.g. OLSR), but hides the
effects of the unreliable delivery of HELLO-packets by not
considering the link as failed before a number of consecutive
HELLO-packets (typically three packets) are lost.
The routing module also deals with failures of other links in
the network (e.g. by the TC-messages of OLSR). The routing
protocol hides the effects of a link failure by trying to find an
alternative route. However, if the link failure disconnects the
mesh networks, it might not be possible to repair the failure.
In other words, the mesh network does not fail to provide
connectivity between two nodes, before the mesh network is
considered as disconnected. In the next subsection, we will
investigate reliability metrics for mesh networks, with this
routing perspective in mind.
B. The reliability of a mesh network
The wireless mesh network is composed of the nodes vj ∈
S, and is modelled as an undirected graph G where the nodes
vj serve as vertices. Any of two distinct nodes vj and vi create
an edge i,j if there is a link between them.
The visibility graph is defined as a graph where there is an
edge between two nodes, if the two nodes are within radio
transmission range of each other. The connectivity graph is
defined as a graph where there is an edge between two nodes,
if the two nodes are within radio transmission range of each
other and if the communication link between the two nodes
has not failed.
A minimal set of edges in the graph whose removal disconnects the graph is an edge cutset. The minimum cardinality
of an edge cutset is the edge connectivity or cohesion β(G).
A (minimal) set of nodes that has the same property is a
node cutset, and the minimum cardinality of this is the node
connectivity χ(G).
To provide an adequate measure of network reliability, one
has to use probabilistic reliability metrics and a probabilistic
graph. This is an undirectional graph where each node has
an associated probability of being in an operational state, and
likewise for each edge.
A starting point for the analysis of network reliability in
this paper is the all-terminal and two-terminal reliability of a
network. These measures are each a specialised version of the
k-terminal reliability, which is defined as the probability that
a path exists and connects k nodes in a network (k = n or
k = 2).
IV. T HE RELIABILITY AND AVAILABILITY OF BACKHAUL
MESH NETWORKS
A. All-terminal and two-terminal reliabilities
As discussed earlier, for our analysis of mesh networks we
assume that the nodes vj ∈ S in the topology are invulnerable
to failure. Furthermore, we assume that a link s,d connecting
two nodes vs and vd fail independent from i,j ∈ S\{vs , vd }.
The all-terminal reliability, Pc (G, p), is the probability that
all the nodes vj ∈ S in the network have operating paths
between them. It is given by:
Pc (G, p)
=
=
X
Ai (1 − p)i p−i
i=n−1
X
1−
Ci pi (1 − p)−i
(1)
(2)
i=β
where Ai denotes the number of connected subgraph with
i edges, and Ci denotes the number of edge cutsets of
cardinality i.
The two-terminal reliability, on the other hand, is the
probability that a given pair of nodes, vs and vd , have an
operating path between them:
Pcsd (G, p)
=
X
i −i
Asd
i (1 − p) p
(3)
i=wsd
=
1−
X
Cisd pi (1 − p)−i
(4)
nodes vs and vd , αsd is the minimum number of edge failures
required to disconnect nodes vs and vd . Cisd is the number of
cutsets with respect to nodes vs and vd of cardinality i.
B. The network availability
The network reliability [11] is defined as the probability that
a network G is disconnected at a time t = ta , given that it
was not disconnected at the time t = 0, and incorporates the
transient behaviour of the network. The network availability,
on the other hand, is the steady-state probability that the
network is not disconnected as t → ∞. For a network planner,
the network availability is an important reliability measure
because it says how big share of the time the network is
operational.
Let us assume that link failures on an operational link are Poisson distributed with an failure rate parameter λ. We
also assume that a failed link, , can be repaired, and that
the repairs are Poisson distributed with a repair rate parameter
µ. Then, the link can be modelled by a two-state Markov
diagram, where one state represents the link being up, while
the other represents the link being down (Figure 2)
λ
qu
Figure 2.
qd
A simple two state Markov model of the reliability of a link
At steady-state, the state probabilities are:
λ
µ
, qd =
(5)
qu =
µ+λ
µ+λ
C. The reliability and availability of a backhaul mesh network
Consider a mesh network, G, that works as a backhaul mesh
network and includes k − 1 different distribution nodes di in
D, D = (d1 , d2 , ..., dk−1 ), and one root node r. According to
our previous terminology of IEEE 802.11s, a distribution node
corresponds to a MAP in an IEEE 802.11s network, while the
root node corresponds to an MPP (Figure 1).
Under normal network operation, transit traffic in the network is directed along the shortest path between the root node
r and each distribution node, di in D. If any distribution node
is disconnected from the root node, the network has failed,
as it is not operating as intended. Thus, the network planner
should consider the network as fully operational only if there
is an operational path between the root node and each of
the distribution nodes. This is true if, and only if, k nodes,
consisting of the root node r and the k − 1 distribution nodes
are all connected. Thus, the network planner should analyse
the reliability of the network using the k-terminal reliability.
When considering the reliability, the network availability is
of highest interest. Using the expressions above, we can now
find the k-terminal availability as:
i=αsd
where wsd is the shortest path length between nodes vs and
vd , Asd
i is the number of subgraphs with i edges that connect
µ
Pcr,d1 ,··,dk−1 (G) = 1 −
X
i=β
r,d1 ,··,dk−1
Ci
i
(qd ) (1 − qd )
−i
(6)
X
i
−i
Ci (qd ) (1 − qd )
1.0
0.6
0.4
p = 1 × 10−1
p = 3 × 10−1
p = 5 × 10−1
0.6
0.5
0.3
0.0
0.0
0.2
0.4
0.6
Probability of link failure (p)
0.8
0.2
1.0
(a) Availability {r, d5 , d6 }
0
1
2
3
4
Number of redundant MPs
5
6
(b) Effect of redundant MPs
Service availability when adding redundant MPs
We can also deduce from Figure 4 that the effect of adding
redundant MPs is greatest for a probability of link failures
in the approximate range p ∈ {0.1, 0.5}. In Figure 4 b)
we show the increase in availability as we add redundant
MPs, providing a clearer picture of how they improve the
availability.
(8)
MAP
d6
MAP
d6
MP
d4
MP
d4
MP
d2
MP
d2
d10
where Wk is a weighting factor.
V. E VALUATION
In this section we evaluate the availability for a topology
(Figure 3) where two MAPs are distributed and connected to
a MPP. With respect to network planning, we are interested in
finding the availability for the service provided by the MAPs,
thus one has to determine the steady-state condition of the
three-terminal reliability Pcr,d5 ,d6 .
MAP
d6
MAP
d6
d10
MP
d4
MP
d4
d8
d12
MP
d2
MP
d2
d7
d9
d11
r
d1
d3
d5
r
d1
d3
d5
MPP
MP
MP
MAP
MPP
MP
MP
MAP
Figure 3.
0.7
0.4
Figure 4.
Pc (G, D, r) = W1 Pcr,d1 + W2 Pcr,d2 , · · · , Wk−1 Pcr,dk−1
0.8
0.2
(7)
Finally, since some MAPs might be of higher importance
than other MAPs, the network planner might instead prefer to
use a weighted average of the different two-terminal availabilities: The availability of a backhaul mesh network, Pc (G, D, r),
can then be defined as a combination of different two-terminal
availabilities:
0.9
Availability (Pcr,d5 ,d6 )
0.8
i=β
(a) Initial topology
1.0
No redundancy
With d7
With d7 , d8
With d7 , · · ·, d9
With d7 , · · ·, d10
With d7 , · · ·, d11
With d7 , · · ·, d12
(b) With redundant MPs
d8
d7
r
d1
d3
d5
MPP
MP
MP
MAP
(a) Initial topology
Figure 5.
d11
d1
d3
d5
MPP
MP
MP
MAP
An alternate deployment of redundant MPs
An alternate configuration for the redundant MPs, is to
deploy them geographically closer to the backhaul mesh as
illustrated in Figure 5. Assuming the redundant MPs have
equivalent transmission range as the ones in Figure 3, positioning the MPs closer to the backhaul mesh nodes, will increase
the number of links from the redundant MPs to the backhaul.
Figure 6 shows the improved reliability by adding redundant
MPs according to the topology in Figure 5.
Topology with and without redundant MPs
1.0
1.0
No redundancy
With d7
With d7 , d8
With d7 , · · ·, d9
With d7 , · · ·, d10
With d7 , · · ·, d11
Availability (Pcr,d5 ,d6 )
0.8
In the network planning phase, a cost-benefit analysis needs
to be performed in order to determine cost versus the added
reliability introduced by a redundant MP. Ignoring the cost
factor of adding redundant MPs, we plot the availability and
the effect when an extra MP is added to the network (Figure 4).
The redundant MPs are added in a particular order, d7 , · · · , d12
(Figure 3).
From Figure 4 a), we see that the availability is increasing as
we add the redundant nodes. However, adding MP d12 only
d9
r
(b) With redundant MPs
0.9
0.8
Availability (Pcr,d5 ,d6 )
Pc (G) = 1 −
add negligible improvement to the availability, since at this
point, all the nodes in the backhaul has at least two links
on which their service can be reached. Adding d12 , does not
change the number of links connecting the backhaul, and can
therefore be removed from the network without any noticeable
loss in availability.
Availability (Pcr,d5 ,d6 )
With current IEEE 802.11s equipment, it is possible to let
the STAs connect to the MAPs at one frequency band (e.g.
using 802.11b or 802.11g) and use another frequency band
for the communications between the MPs in the backhaul
mesh network. Since the extra equipment cost of such a
configuration is normally minimal compared to other costs,
such as site-acquisition costs, it is anticipated that many
commercial mesh networks will implement a MAP at each
MP in the network. With such a configuration, one has to
consider the all-terminal availability of the network.
The (all-terminal) network availability is the probability of
a network G being connected at steady state. It can now be
expressed using Eq. 25 as:
0.6
0.4
0.7
0.6
0.5
0.4
0.2
p = 1 × 10−1
p = 3 × 10−1
p = 5 × 10−1
0.3
0.0
0.0
0.2
0.4
0.6
Probability of link failure (p)
0.8
1.0
(a) Availability {r, d5 , d6 }
Figure 6.
0.2
0
1
2
3
Number of redundant MPs
4
5
(b) Effect of redundant MPs
Service availability when adding redundant MPs
VI. T RANSIENT NETWORK RELIABILITY
35
80
70
A. The transient all-terminal reliability without link repair
In this section we investigate the all-terminal reliability for
arbitrary mesh topologies. We assume that networks starts
out with all links between two nodes within radio range of
each other is fully operational, so that the connectivity graph
is equal to the visibility graph. We further assume that link
failures are Poisson distributed with parameter λ and that when
a link fail, it can not be repaired.
The probability of a topology G being disconnected is then
given by
Pd (G, t < T ) =
X
Ci (1 − e−λt )i (e−λt )−i
(9)
i=β
It is easy to show that the mean time for when the topology
will fail, is equal to Eq. 10, where B(x, y) is the complete
Beta function.


−1
1 X  (−1)n+1 X
−
Ci B(i + 1, − i) (10)
E(Td ) =
λ n=1
n
n
i=β
B. The transient k-terminal reliability without link repair
Similarly, we can show that the expected time that k-nodes
will be disconnected, is given by:
1
E(Tdvi ,··,vk ) = ×
λ


−1
n+1
X
X
(−1)

−
Civi ,··,vk B(i + 1, − i) (11)
n
n
n=1
i=β
Using Eq. 11, we can find the MTFF for the topologies in
Section V. This results are shown in Figure 7.
Percentage increase in MTFF [%]
While the network availability is a useful reliability measure, it does not provide a full picture of the network reliability,
since it is a steady-state measure. For example, it gives the
share of time the network is up, but it does not provide
insight into the mean time to failure (MTTF) given that the
network is connected at time t = 0 or the mean time between
failures (MTBF). To estimate such time-dependent measures,
one needs to study the transient behaviour of the k-terminal
reliability.
Analysis of transient behaviour normally becomes very
complex when considering link failures with repair. Thus,
when studying transient reliability, it is not uncommon to
simplify the analysis by considering link failures without
repair. We will do the same in this section. However, assuming
that links that fail have infinite repair time is not very realistic.
We therefore intend to continue our analysis to consider link
failures with repair, and plan to address this topic in follow-on
research.
Percentage increase in MTFF [%]
30
25
20
15
10
5
60
50
40
30
20
1
2
3
4
Number of redundant MPs
5
6
(a) Topology in Figure 3
10
1.0
1.5
2.0
2.5
3.0
3.5
Number of redundant MPs
4.0
4.5
5.0
(b) Topology in Figure 5
Figure 7. Relatively variation in MTFF as the number of redundant MPs is
increased (λ = 1)
VII. C ONCLUSION AND FUTURE WORK
This paper investigates the reliability and availability of
backhaul mesh networking, more specifically finding the benefit of adding redundant nodes in such networks. To the best
of our knowledge, this has not been studied in previous works.
Our results show how redundant nodes improve network
reliability. Although analyses and results depend on the actual
topology, the same analyses as presented here can be applied
to any specific backhaul mesh topology of interest.
For future work we want to analyze the OLSR ad hoc
routing protocol and use a Markov model to describe the onehop neighbour discovery mechanism as a probability function
for whether the link is interpreted as up or down. Also, we
want to analyse the MAC- and physical- layer behaviour and
derive a more adequate probability distribution for link failures
in mesh networks. In a wireless environment, a link failure is
often correlated with link errors in other parts of the network,
thus we also want to include link failure dependencies in the
analysis.
R EFERENCES
[1] Wireless LAN Medium Access Control (MAC) and Physical Layer (PHY)
Specification, IEEE Std. 802.11, 1997.
[2] LAN/MAN Specific Requirements - Part 11: Wireless Medium Access
Control (MAC) and physical layer (PHY) specifications: Amendment:
ESS Mesh Networking, IEEE Std. 802.11s Draft 2.0, 2008.
[3] C. Perkins, E. Belding-Royer, and S. Das, “Ad hoc on-demand distance
vector (aodv) routing,” IETF, July 2003.
[4] T. Clausen and P. Jacquet, “Optimized link state routing protocol (olsr),”
IETF, October 2003.
[5] C. J. Colbourn, “Reliability issues in telecommunications network
planning,” in Telecommunications network planning, B. S. P. Soriano,
Ed. Kluwer Academic Publishers, 1999, ch. 9, pp. 135–146.
[6] ——, The Combinatorics of Network Reliability. Oxford Univ. Press,
1987.
[7] G. Weichenberg, V. Chan, and M. Medard, “High-reliability topological
architectures for networks under stress,” Selected Areas in Communications, IEEE Journal on, vol. 22, no. 9, pp. 1830–1845, Nov. 2004.
[8] H. AboElFotoh and C. J. Colbourn, “Computing 2-terminal reliability
for radio-broadcast networks,” Reliability, IEEE Transactions on, vol. 38,
no. 5, pp. 538–555, Dec 1989.
[9] H. AboElFotoh, S. Lyengar, and K. Chakrabarty, “Computing reliability
and message delay for cooperative wireless distributed sensor networks
subject to random failures,” Reliability, IEEE Transactions on, vol. 54,
no. 1, pp. 145–155, March 2005.
[10] S. Kharbash and W. Wang, “Computing two-terminal reliability in
mobile ad hoc networks,” Wireless Communications and Networking
Conference, 2007.WCNC 2007. IEEE, 11-15 March 2007.
[11] M. L. Shooman, Reliability of Computer Systems and Networks: Fault
Tolerance, Analysis, and Design. John Wiley and Sons, Inc, 2002.
Download