as a PDF

advertisement
A Distributed Scheme for Responsive Network Engineering
J. Göbel
A.E. Krzesinski and D. Stapelberg
Department of Informatics
University of Hamburg
22527 Hamburg, Germany
Department of Mathematical Sciences
University of Stellenbosch
7600 Stellenbosch, South Africa
Abstract—Optimal bandwidth utilisation together with resilience and recovery from failure are two key drivers for Traffic
Engineering (TE) which have been widely addressed by the
IP community. Most current IGP routing protocols deployed
in the Internet and the extensions proposed to adapt these
protocols for TE are concerned either with optimality or with
resilience. This leads to a duplication of routing protocols and
algorithms where each of these objectives (optimality, resilience)
is addressed by its own protocol or algorithm. The interactions
among these protocols introduce additional complexities which
do not necessarily translate into equivalent performance gains.
This paper aims at the integration of these two objectives
into a unified Network Resource Controller. At random time
instants the NRC computes bandwidth prices which are used
in an automated scheme to dynamically adjust the bandwidths
of the network paths in response to the traffic and network
equipment conditions. The distinguishing features of the NRC
are that it works without centralised control and thus scales to
large networks, and that rather than using TE to move network
flows to where the network bandwidth is located, the NRC uses
Network Engineering (NE) to move network bandwidth to where
the network flows are located.
We next present an efficient heuristic to find diversely routed
backup paths and to provision the network links with the least
amount of backup (spare) bandwidth in order to be able to deploy
equivalent recovery paths for any failure scenario which leaves
the network connected.
Simulation results are presented which show that the reallocation scheme provides prompt bandwidth provisioning both for
random traffic fluctuations during normal operating conditions,
and when provisioning recovery routes in the event of network
failure.
Index Terms—bandwidth prices; bandwidth reconfiguration;
network planning and optimisation; network survivability; network recovery.
I. I NTRODUCTION
In this paper we present a model of network bandwidth
management that is based on a distributed scheme for automatic bandwidth reallocation [1]. The reallocation scheme
can be used in the context of a path-oriented network where
relatively long-lived paths (the terms path and route will be
used interchangeably) are used to provision resources for
connections or flows, whose average holding times are much
less than the path lifetimes. Multi-Protocol Label Switching
(MPLS) networks, in which Label Switched Paths (LSPs) act
as the long-lived paths, provide a possible environment in
which such a bandwidth reallocation scheme could be useful.
This work was supported by grant numbers 2054027 and 2677 from the
South African National Research Foundation, Siemens Telecommunications
and Telkom SA Limited.
We restrict ourselves to connection-oriented traffic with
admission controls. Hence we use the terms connections and
lost connections or equivalently calls and lost calls. The
number of connections that can be simultaneously carried on
a route depends on the amount of bandwidth allocated to
the route. However, at any point in time, it is possible that
due to traffic fluctuations or network equipment changes, the
connections in service on one route are using only a small part
of the bandwidth allocated to that route, while the bandwidth
on another route is heavily utilised. Two methods have been
developed to deal with this situation. First, the arriving calls
can be offered to the least utilised route of a multi-path route
connecting the origin-destination pair; second, bandwidth can
be transferred from underutilised routes to overutilised routes.
The first method forms the basis of traffic engineering
(TE) where network flows are moved to where the network
bandwidth is located. Many TE methods have been developed
to manage connectivity and to deliver resilience and Quality of
Service (QoS) across the network. See [2] and the references
therein for a description of TE methods and applications. The
second method forms the basis of network engineering (NE)
where bandwidth is moved to where the network flows are
located. This paper investigates the use of NE rather than TE
to manage the network QoS.
A systematic way of applying NE is to assign a broker
referred to as a bandwidth manager to each route. At random
time instants the manager calculates the expected value, over a
short period of time known as the planning horizon, of an extra
unit of bandwidth (the “buying price”) and also the expected
value, over the same short period of time, that the route would
lose should it give up a unit of bandwidth (the “selling price”).
Bandwidth can then be transferred from routes that place a
low value on bandwidth to routes that place a high value
on bandwidth. An essential component of the reallocation
scheme is that the bandwidth transfer process is based on local
information only.
The bandwidth managers are thus aware of local resource
demands and bandwidth prices, and reallocate bandwidth
among themselves in order to maintain the performance of
their routes. In this way the managers are autonomous, act
without centralised control from a system coordinator and
behave entirely according to local rules. Such a scheme is
distributed and scalable.
The purpose of this paper is to evaluate the efficacy of such
a bandwidth reallocation scheme both during normal operating conditions to provide prompt bandwidth provisioning for
random traffic fluctuations, and as a recovery mechanism [3],
[4] in rapidly reallocating bandwidth after a network failure.
The remainder of the paper is organised as follows. Section II presents a method to compute buying and selling prices
for bandwidth in connection-oriented networks and discusses
how the prices are used in the bandwidth reallocation scheme.
Section III presents some results from simulation experiments
to test the efficacy of the reallocation scheme during normal
operating conditions. Section IV presents an efficient heuristic
for the off-line calculation of the minimal amount of additional
bandwidth necessary to construct equivalent recovery paths
after a network failure. Section V presents some results from
simulation experiments to test the efficacy of the reallocation
scheme when recovering from the failure of a single link. Our
conclusions are presented in Section VI.
II. BANDWIDTH
REALLOCATION
We use a bandwidth pricing method [5] which models a
route as an Erlang loss system and computes the expected
lost revenue due to connections being blocked, conditional on
the system starting in a given state. The expected lost revenue
is used to compute both a buying price and selling price for
bandwidth, relying on knowledge of the state at time zero.
A. The price of bandwidth
A manager is assigned to each route and we assume that
a route’s manager is making decisions for a planning horizon
of τ time units. We can regard the value of U extra units of
bandwidth, or the buying price, as the difference in the total
expected lost revenue over time [0, τ ] if the route were to
increase its bandwidth by U units at time zero. Likewise, we
can calculate the selling price of U units of bandwidth as the
difference in the total expected lost revenue over time [0, τ ] if
the route were to decrease its bandwidth by U units.
For a route with bandwidth C, let Rc,C (τ ) denote the
expected revenue lost in the interval [0, τ ], given that there
are c connections at time 0. The buying and selling prices
Bc,C (τ, U ) and Sc,C (τ, U ) of U units of bandwidth are then
given by
Bc,C (τ, U ) = Rc,C (τ ) − Rc,C+U (τ )
and
Sc,C (τ, U )
Rc,C−U (τ ) − Rc,C (τ ),
=
RC−U,C−U (τ ) − Rc,C (τ ),
0<c≤C−U
C − U < c ≤ C.
Rc,C (τ ) is given by [5] the inverse of the Laplace transform
ec,C (s) =
R
θ(λ/s)Pc (s/λ)
(s + Cµ)PC (s/λ) − CµPC−1 (s/λ)
where λ and µ are the parameters of the exponential connection arrival and connection service processes respectively, θ is
the expected revenue earned per connection,
Pc (s/λ) = (−µ/λ)c Γ(λ/µ)
(−s/µ)
c
and
Γ(λ/µ)
c
(−s/µ) =
c−k
c X
c
−s/µ
−λ
k=0
k
k
µ
k!
is a Charlier polynomial. A computationally stable method
ec,C (s) is presented
for efficiently calculating the functions R
in [6]. These are numerically inverted to yield the Rc,C (τ ).
B. A distributed bandwidth reallocation scheme
Under our proposed scheme, an automated bandwidth manager is assigned to each route. We shall view routes as being of
two types: direct routes, which traverse just a single physical
link, and transit routes, which traverse more than one physical
link. We assume that each physical link supports a direct route.
The direct routes on the links of a transit route are referred
to as its constituent direct routes. Bandwidth reallocation is
driven by the managers of transit routes, and bandwidth is
reallocated between the transit routes and their constituent
direct routes.
The manager of a transit route obtains the buying and selling
prices by means of a signalling mechanism. Signals or control
packets are sent at random intervals of time along each transit
route, recording the buying and selling prices of the constituent
direct routes. When a control packet reaches the egress router,
a calculation is performed: if the sum of the direct route buying
prices is greater than the transit route selling price, then the
transit route will give up U units of bandwidth, which are taken
up by each of the direct routes. Alternatively, if the sum of the
direct route selling prices is less than the transit route buying
price, then each of the direct routes will give up U units of
bandwidth, which are taken up by the transit route. The control
packet is returned along the reverse route from the egress
router to the ingress router making the necessary reallocations
to the capacities of the constituent direct routes. The capacity
(the terms bandwidth and capacity are used interchangeably)
of the transit route is adjusted when the control packet reaches
the ingress router.
Bandwidth reallocation takes place between transit routes
and their constituent direct routes: since the transit route managers work independently and do not communicate directly
with each other, the reallocation algorithm is distributed and
scalable. However, the transit route managers can communicate indirectly with each other via any constituent direct routes
that they have in common along their transit routes. See [1]
for a discussion on how often the managers of transit routes
send out control packets, the determination of the planning
horizons τ used in the calculation of buying and selling
prices, the determination of the amount U of bandwidth to
be transferred in each reallocation, issues concerning the loss
or delay of control packets and the overhead costs incurred in
the signalling scheme.
III. T HE PERFORMANCE
OF THE REALLOCATION SCHEME:
NORMAL OPERATION
In this section we present simulation results taken from [1]
to evaluate the performance of the bandwidth reallocation
IV. N ETWORK
RESTORATION
Network restoration schemes address the reconfiguration of
the network and the reallocation of traffic flows after a node or
link failure, a node failure being regarded as the failure of the
links incident upon the failed node. Link restoration schemes
require the nodes adjacent to the failed link(s) to be responsible
for rerouting the traffic demands by patching around the failed
link(s). Path restoration requires the end nodes of the failed
path to be responsible for rerouting the traffic demands onto
a recovery path.
A. The Spare Capacity Assignment (SCA) problem
Fig. 1. The average network loss probability when the connection arrival
rates are repeatedly perturbed.
scheme when applied to a 29-node 45-link network model
where each of the 406 origin-destination pairs is connected
by a single least cost (shortest hop count) route. The network
model and its traffic matrix are described in [1]. A simulator
was developed to model the connection arrival, admission
and service processes, the signalling process, bandwidth price
calculation and bandwidth reallocation.
The network links are initially capacitated so that 2% of
the offered connections are lost (a QoS of 2%) when no
reallocations are attempted. Our simulations [1] show that for
the network model under investigation, bandwidth reallocation
reduces the average connection loss probability by a factor of 3
from 0.02 to 0.007. The efficacy of bandwidth reallocation is
relatively insensitive to the connection holding time distribution and to reasonable signalling delays. Reallocation attempts
take place relatively infrequently so that effective bandwidth
reallocation can be implemented while keeping the signalling
cost incurred in reallocation low.
As an example we investigate how the bandwidth reallocation scheme deals with traffic patterns which differ substantially from the traffic pattern that was originally used to
configure the link capacities.
Let λr (0) denote the original average arrival rate of connections to route r. The traffics are perturbed at regular time
intervals. The ith traffic perturbation yields a new set of
connection arrival rates
λr (i) = λr (0)(1 + ar (i))
where i > 0 and ar (i) is a random variable sampled from
a uniform distribution in the range [−0.2, 0.2] so that the
connection arrival rates to a randomly selected half (approximately) of the routes are increased while the arrival rates to
the remaining routes are decreased.
Fig. 1 shows that bandwidth reallocation substantially reduces the network loss probability when the network traffic is
subject to repeated random perturbations of ±20%. The network cannot meet its target QoS with these traffic perturbations
if bandwidth reallocation is not applied.
The SCA problem describes how to route backup paths and
configure the network links with the least amount of spare
(backup) capacity in order to be able to deploy equivalent
failure-disjoint recovery paths. An equivalent recovery path
provides the same QoS as its working counterpart. Two routes
are failure-disjoint if in any failure scenario, at most one of the
routes fails: the recovery routes thus offer 1-to-1 protection [3].
The SCA problem for a directed network is given by
minQ,s φ(s)
(1)
such that
s =
G =
T+Q ≤
QBT
=
max G
(2)
T
Q MU
1
(3)
(4)
D.
(5)
The objective (1) is to minimise the total cost φ(s) of spare
capacity by selection of the backup paths Q and the spare
capacity allocation s. Constraints (2) and (3) compute the spare
capacity s and the spare provision matrix G where M is a
diagonal matrix of bandwidth allocations and U = P ⊙ FT
is the path failure incidence matrix, where P is the working
path link incidence matrix and F is the link failure incidence
matrix. The operators max and ⊙ and the various matrices are
defined in Table I.
Constraint (4) guarantees that the backup paths Q do not
use any links which might fail simultaneously with their
working paths where T = U ⊙ F is the flow tabu matrix.
Constraint (5) expresses flow conservation where B is the
node link incidence matrix and D is the route node incidence
matrix. The SCA problem is known to be NP complete.
B. Successive Survivable Routing (SSR)
Liu et al. [7] developed the SSR heuristic which provides an
approximate solution to the SCA problem in networks subject
to arbitrary failure scenarios. The SSR heuristic addresses
failure independent (state independent) path restoration where
each working path has one failure-disjoint recovery counterpart. The essential elements of the SSR heuristic are presented
in lines 8 through 18 of the algorithm presented in Fig. 2 with
the conditions on Gk and Z in line 8 replaced by a suitable
stopping rule, and lines 11 and 13 modified to exclude tabu
links.
TABLE I
AND NOTATION
M ATRIX OPERATIONS
C = A+
C = A⊙B
c =P
max A
c=
A
N, L, R, K
n, ℓ, r, k
B = {bnℓ }N×L
D = {drn }R×N
Ek = {erk }R×K
F = {fkℓ }K×L
Gk = {gkℓr }L×R
G = {gℓk }L×K
H = {hℓk }L×K
M = Diag({mr })R×R
P = {prℓ }R×L
Q = {qrℓ }R×L
Qk = {qkrℓ }R×L
T = {trℓ }R×L
U = {ukℓ }R×K
φ = {φℓ }L×1
s = {sℓ }L×1
vr = {vrℓ }L×1
Z = {zrk }R×K
The matrix A is made non-negative, thus cij = max(aij , 0).
Binary matrix multiplication.
A vector containing the maximum value of each row of A.
P P
a .
The sum of all elements of A. If A is a V × W matrix then c =
v
w vw
The number of nodes, links, paths and failure scenarios.
The indices of the nodes, links, paths and failure scenarios.
The node link incidence matrix where bnℓ = ±1 if node n is the origin or destination of link ℓ and zero otherwise.
The route node matrix where drn = ±1 if node n is the origin or destination of route r and is zero otherwise.
A matrix where erk = 1 and erj = 0 for j 6= k.
The link failure incidence matrix: fkℓ = 1 if link ℓ fails in scenario k and 0 otherwise.
The spare provision matrix per failure scenario: gkℓr is the backup capacity required on link ℓ for path r in failure scenario k.
The spare provision matrix: gℓk is the backup capacity required on link ℓ for failure scenario k.
The returned capacity matrix: hℓk is the capacity returned to the non-failed link ℓ for failure scenario k.
A diagonal matrix of the bandwidth mr of path r.
The working path link incidence matrix: prℓ = 1 if the working path pr traverses link ℓ and 0 otherwise.
The backup path link incidence matrix: qrℓ = 1 if the backup path qr traverses link ℓ and 0 otherwise.
The backup path link incidence matrix: qkrℓ = 1 if the backup path qr in scenario k traverses link ℓ and 0 otherwise.
The flow tabu matrix: trℓ = 1 if the recovery path qr cannot use link ℓ and is zero otherwise.
The path failure incidence matrix: urk = 1 if the working path pr fails in scenario k and 0 otherwise.
The backup capacity cost function: φℓ is the cost of a single unit of backup capacity on link ℓ.
The spare provision vector: sℓ is the backup capacity required on link ℓ for any failure scenario.
The link cost vector: vrℓ is the cost of including link ℓ into the currently rerouted backup path qr .
zrk counts the successive unsuccessful attempts at changing the backup path qr for scenario k.
C. State dependent SSR (SD-SSR)
In this section the SSR heuristic is extended to state
dependent (failure dependent) restoration which uses different
recovery paths to protect against different failures.
The SD-SSR heuristic computes a set of K backup path
link incidence matrices where Qk describes the backup paths
to be used in failure scenario k. The spare provision matrices
Gk that hold each link’s backup capacity requirement for each
backup path qr in failure scenario k are given by
Gk = QTk MU.
PK
The spare provision matrix G is given by G = k=1 Gk Ek .
The vector s of link backup capacity is given by s =
max(G − H)+ where the give-back matrix H = PT MU
describes the bandwidth returned (stub-release) by the failed
working paths to the
Pnon-failed links [8]. The overall backup
capacity costs
are
ℓ φ(s), and the total amount of backup
P
capacity is ℓ sℓ .
Not all network topologies can benefit from stub-release
and SD-SSR. For example, consider a network consisting of
two source/sink nodes connected by two or more parallel,
link-disjoint routes. Such networks benefit from stub-release,
but SD-SSR without stub-release will yield only a marginal
improvement over SSR. Rings and fully meshed networks do
not benefit from stub-release, nor will SD-SSR yield a lower
spare capacity requirement than SSR for these topologies.
D. The SD-SSR algorithm
Fig. 2 presents the essential elements of the SD-SSR heuristic. The notation qr = (Qk )r refers to row r of matrix Qk
where qr is the recovery counterpart for working route pr
under failure scenario k.
The (outer) optimisation loop of the SD-SSR heuristic (see
line 6) iterates link by link trying to reduce the amount of
backup capacity placed on each link. For each link ℓ with a
non-zero backup capacity requirement sℓ (sℓ > 0, see line 6),
we examine every scenario k in which sℓ units of backup
capacity are required (gℓk − hℓk = sℓ , see line 7).
For each such scenario k the algorithm tries to reroute
the backup paths qr that contribute to the backup capacity
requirement on link ℓ in scenario k (gkℓr > 0, see line 8). The
rerouting of each backup path qr in each failure scenario k
is not tried again after two successive unsuccessful attempts
during a single optimisation cycle (zrk < 2 does not hold
anymore). Increasing this number of attempts may lead to
better results, but substantially increases the computation time.
The rerouting itself (lines 9 to 18) is nearly the same as
in SSR [7]. The best recovery route discovered thus far is
saved in line 9. Line 10 computes the spare capacity required
if no backup path for path r existed. Line 11 computes the
spare capacity required if the new backup path for path r can
use every non-failed link in scenario k. The element vrℓ′ of
the vector vr is interpreted as cost of including link ℓ′ into
the backup path. Dijkstra’s algorithm is used in line 13 to
determine the least cost path qnew
which is stored in Qk if
r
its cost is lower than the cost of the current recovery path qr .
Note that in line 12 the cost vrℓ of the currently selected
link ℓ is incremented by 1. This is necessary in order to move
capacity from a link ℓ on which two (or more) scenarios
require the same amount of capacity. Without this modification, such scenarios’ capacity requirements would be able
to permanently “hide” behind each other. Processing the first
scenario, the backup capacity would be available at zero cost
and would be likely kept (because removing any backup path
from this link in the first scenario leads to no decrease in
backup capacity provision requirement due to the unchanged
amount of backup capacity needed by the second scenario
and vice versa). Due to the lack of a cost incentive, backup
Input: Values for L, R, K, matrices P, M, F, H and the cost function φ.
Result: Matrices Qk and s containing respectively the backup paths and the backup capacity for failure scenario k.
1: begin
2: Determine the initial backup path matrices Qk for each scenario k ∈ (1, . . . , K) using Dijkstra’s algorithm.
3: Calculate the spare provision matrices Gk and G and the spare provision vector s.
4: repeat
P
5:
Set oldCost =
φ(s) and set Z = (0)R×K .
6:
for each link ℓ ∈ (1, . . . , L) with sℓ > 0 in random order begin
7:
for each scenario k ∈ (1, . . . , K) with gℓk − hℓk = sℓ in random order begin
8:
for each path r ∈ (1, . . . , R) with gkℓr > 0 and zrk < 2 in random order begin
9:
Set qr = (Qk )r .
10:
Determine s0 = s by setting (Qk )r = (0)1×L and recalculating Gk , G and s .
11:
Determine s∗ = s by setting (Qk )r = (1)1×L − (FT )k and recalculating Gk , G and s .
12:
Determine vr = φ(s∗ ) − φ(s0 ) and add 1 to vrℓ .
13:
Determine the least cost backup path qnew
for path r in scenario k using Dijkstra’s algorithm with any link ℓ′ that is not
r
failed (fkℓ′ = 1) enabled at a cost vrℓ′ .
14:
if qnew
vr < qr vr then
r
15:
set (Qk )r = qnew
and set zrk = 0.
r
16:
else
17:
set (Qk )r = qr and increment zrk by 1.
18:
Recalculate
P Gk , G and s.
19: until oldCost =
φ(s) thus no further reduction in the total cost.
20: end
Fig. 2.
The SD-SSR heuristic.
capacity from this link would possibly never be decreased.
The optimisation loop is repeated until for a full cycle
(trying to decrease the backup capacity requirement on any
link) no backup capacity reduction was made (see line 5) so
that the backup capacity cost is the same as when starting the
current cycle (see line 5).
The SSR heuristic can readily be expressed in a form
suitable for distributed computation [7]. We have also included
simulated annealing mechanisms into the SD-SSR heuristic
which can, at an increased computational expense, test the
near optimality of the SSR solutions.
E. SSR versus SD-SSR numerical results
Table II presents the spare capacity required to protect various networks against the failure of any single link (K = L).
The spare capacity requirement is expressed in terms of the
redundancy η which is the ratio of the backup capacity s
to the original capacity of all the links in the network. The
simple two step method [7] computes the shortest recovery
routes which, apart from the failed links, need not be link
TABLE II
REDUNDANCY
N ETWORK
Network
Test network
Test network
Network [1]
Network 4 [7]
Scale-free [9]
Random [9]
Test network
Network 1 [7]
Test network
N/L
nodes
links
10/12
10/15
29/45
17/31
50/97
50/100
10/20
10/22
10/25
node
degree
2.4
3.0
3.1
3.7
3.9
4.0
4.0
4.4
5.0
two
step
η%
99.1
81.4
99.6
78.1
58.2
39.4
79.2
74.6
69.3
η
SSR
η%
86.3
51.8
68.4
45.5
42.8
27.6
48.1
43.7
46.2
SD
SSR
η%
83.4
49.4
65.2
32.1
38.3
23.3
40.3
32.5
30.8
∆
%
2.9
2.4
3.2
13.4
4.5
4.3
7.8
11.2
15.4
disjoint with their working counterparts. The SSR and SD-SSR
heuristics were each replicated 20 times using different random
number seeds for sample networks with randomly generated
traffic loads. The networks are all 2-edge connected so that
failure disjoint recovery routes can be found. The two scalefree and random networks were generated by the synthetic
network generator BRITE [9].
Table II, which is sorted according to the average network
node degree, shows that for the networks modelled, SD-SSR
yields an improvement ∆ of between 2 to 15% over SSR when
protecting against the failure of any single link; in general the
redundancy improves (decreases) as the average network node
degree increases.
V. T HE PERFORMANCE
OF THE REALLOCATION SCHEME:
FAILURE RECOVERY
In this section we present simulation experiments to investigate the efficacy of the bandwidth reallocation scheme in
recovering from the failure of the largest link (the tagged link)
in the 29-node 45-link network model.
The resources required in the recovery process are bandwidth, buffers and router processing capacity. Our recovery
model only takes the bandwidth resource into account. Note
too that we do not compute backup paths and spare capacity
to recover from any single link failure. The network would
then be provisioned with more than enough spare capacity to
recover from the failure of the tagged link. Instead we will
protect the network against the failure of the tagged link only
so that the efficacy of the recovery process can be investigated
when it has to work with the least amount of spare capacity
necessary.
The tagged link carries 68 of the network’s 406 routes.
The first step is to compute the recovery routes and the spare
capacity requirements that are necessary in order to recover
Fig. 3.
Loss probabilities before, during and after link failure.
from the failure of the tagged link. The SSR heuristic assigns
spare capacity on average to 28 of the 45 links, increasing the
network capacity by 38.5%; the average length of the recovery
routes is 6.6 ± 1.4 hops. The SD-SSR heuristic assigns spare
capacity to 9 links, increasing the network capacity by 21.7%;
the average length of the recovery routes is 7.6 ± 2.2 hops.
The recovery paths and the spare capacities are computed
prior to the network failure. In principle, the recovery paths can
offer 1-for-1 protection switching: the backup capacity can be
used by the working routes before the failure occurs either to
transport best effort traffic or to capacitate the recovery paths
before the failure so that multi-path routing can be used to
balance the traffic loads across the two paths prior to failure.
However, in this study the spare capacity will remain unused
until the failure occurs.
The network failure and subsequent recovery are modelled
as follows. Before the link failure the reallocation scheme
transfers bandwidth in units of 4 at a time. The link failure is
detected by the managers of the routes (the failed routes) that
use the failed link. The calls in progress on the failed routes are
submitted for reconnection on their recovery counterparts. The
bandwidth from the failed routes is returned to the non-failed
links. The first reallocation signal after the link failure (the
recovery signal) on each recovery route attempts to provision
the recovery route to its pre-failure capacity. Apart from
these recovery reallocations, all other reallocations transfer
bandwidth in units of 4.
Because of the distributed nature of the reallocation scheme,
race conditions may arise between the recovery signals and the
other reallocation signals so that the recovery signals may not
be able to provision their recovery routes to their pre-failure
QoS. A degradation of service may therefore be experienced
for a short period of time while the bandwidth reallocation
scheme is allocating bandwidth to the recovery routes.
Each simulation processed a total of 108 call completions.
The tagged link carrying 68 routes is failed after 50,000,000
calls have completed. Some 86,000 calls are in progress when
the link fails: some 24,000 of these calls are in progress on
the failed routes and are restarted on the recovery routes.
Fig. 3 presents the average loss probability for the recovery
routes (before the failure the recovery routes denote the set of
routes that use the tagged link) and the average loss probability
for the routes that are unaffected by the failure (these routes do
not use the tagged link). Note that the simulation measurement
interval is reduced while the network recovers from the link
failure (the reduced measurement interval may contribute to
the volatility of the loss probability plot during the recovery
cycle), and that the plot for the loss probability on the
surviving routes has been shifted to the right by 5×106 calls in
order to make the two plots more readable. Minor oscillations
in the loss probabilities are evident after the link fails, but
these are rapidly damped. The network is able to meet its
target QoS of a 2% connection loss probability before, during
and after the link failure.
VI. C ONCLUSIONS
This paper presents a scheme for bandwidth reallocation in
a path-oriented transport network. A bandwidth manager is
assigned to each route. The managers are autonomous, acting
without centralised control from a system coordinator and
behave entirely according to local rules. The managers are
aware of local resource demands and bandwidth prices. The
managers reallocate bandwidth among themselves in order to
maintain the QoS of their routes.
We present a simulation model of the bandwidth reallocation scheme. Initial studies of a 29-node 45-link network
model reveal that bandwidth reallocation can provide efficient
bandwidth provisioning both for random traffic fluctuations,
and also during failure conditions to move bandwidth rapidly
from failed routes to recovery routes.
R EFERENCES
[1] Å. Arvidsson, B.A. Chiera, A.E. Krzesinski and P.G. Taylor, “A
Distributed Scheme for Value-Based Bandwidth Re-Configuration”,
submitted, 2006. http://www.cs.sun.ac.za/∼aek1/COE/dowloads/
four authors.pdf
[2] S. Kandula, D. Katabi, B. Davie and A. Charny, “Walking the
Tightrope: Responsive yet Stable Traffic Engineering”, ACM SIGCOMM
Computer Communication Review, vol. 35, issue 4, 2005, pp. 253–264.
[3] V. Sharma and F. Hellstrand (Eds), “RFC 3469: Framework for
Multi-Protocol Label Switching (MPLS)-based Recovery”, Feb 2003.
[4] J.-P. Vasseur, M. Pickavet and P. Demeester, Network Recovery:
Protection and Restoration of Optical, SONET-SDH, IP, and MPLS,
Morgan Kaufmann, 2004.
[5] B.A. Chiera and P.G. Taylor, “What is a Unit of Capacity Worth?”,
Probability in the Engineering and Informational Sciences, vol. 16
no. 4, pp. 513–522, 2002.
[6] B.A. Chiera, A.E. Krzesinski and P.G. Taylor, “Some Properties of the
Capacity Value Function”, SIAM Journal on Applied Mathematics,
vol. 65 no. 4, pp 1407–1419, 2005.
[7] Y. Liu, D. Tipper and P. Siripongwutikorn, “Approximating Optimal
Spare Capacity Allocation by Successive Survivable Routing”, IEEE
Transactions on Networking, vol. 13 no. 1, pp 198–211, Feb 2005.
[8] R. Irashko, M. MacGregor and W. Grover, “Optimal Capacity Placement
for Path Restoration in STM or ATM Mesh-Survivable Networks”,
IEEE Transactions on Networking, vol. 6 no. 3, pp 325–336, June 1998.
[9] A. Medina, A. Lakhina, I. Matta and J. Byers, BRITE: An Approach to
Universal Topology Generation, in Proc 9th International Workshop on
Modeling, Analysis and Simulation of Computer and Telecommunication
Systems (MASCOTS’01), Cincinnati, USA, Aug 2001, pp 346–353.
Download