A Distributed Scheme for Responsive Network Engineering J. Göbel A.E. Krzesinski and D. Stapelberg Department of Informatics University of Hamburg 22527 Hamburg, Germany Department of Mathematical Sciences University of Stellenbosch 7600 Stellenbosch, South Africa Abstract—Optimal bandwidth utilisation together with resilience and recovery from failure are two key drivers for Traffic Engineering (TE) which have been widely addressed by the IP community. Most current IGP routing protocols deployed in the Internet and the extensions proposed to adapt these protocols for TE are concerned either with optimality or with resilience. This leads to a duplication of routing protocols and algorithms where each of these objectives (optimality, resilience) is addressed by its own protocol or algorithm. The interactions among these protocols introduce additional complexities which do not necessarily translate into equivalent performance gains. This paper aims at the integration of these two objectives into a unified Network Resource Controller. At random time instants the NRC computes bandwidth prices which are used in an automated scheme to dynamically adjust the bandwidths of the network paths in response to the traffic and network equipment conditions. The distinguishing features of the NRC are that it works without centralised control and thus scales to large networks, and that rather than using TE to move network flows to where the network bandwidth is located, the NRC uses Network Engineering (NE) to move network bandwidth to where the network flows are located. We next present an efficient heuristic to find diversely routed backup paths and to provision the network links with the least amount of backup (spare) bandwidth in order to be able to deploy equivalent recovery paths for any failure scenario which leaves the network connected. Simulation results are presented which show that the reallocation scheme provides prompt bandwidth provisioning both for random traffic fluctuations during normal operating conditions, and when provisioning recovery routes in the event of network failure. Index Terms—bandwidth prices; bandwidth reconfiguration; network planning and optimisation; network survivability; network recovery. I. I NTRODUCTION In this paper we present a model of network bandwidth management that is based on a distributed scheme for automatic bandwidth reallocation [1]. The reallocation scheme can be used in the context of a path-oriented network where relatively long-lived paths (the terms path and route will be used interchangeably) are used to provision resources for connections or flows, whose average holding times are much less than the path lifetimes. Multi-Protocol Label Switching (MPLS) networks, in which Label Switched Paths (LSPs) act as the long-lived paths, provide a possible environment in which such a bandwidth reallocation scheme could be useful. This work was supported by grant numbers 2054027 and 2677 from the South African National Research Foundation, Siemens Telecommunications and Telkom SA Limited. We restrict ourselves to connection-oriented traffic with admission controls. Hence we use the terms connections and lost connections or equivalently calls and lost calls. The number of connections that can be simultaneously carried on a route depends on the amount of bandwidth allocated to the route. However, at any point in time, it is possible that due to traffic fluctuations or network equipment changes, the connections in service on one route are using only a small part of the bandwidth allocated to that route, while the bandwidth on another route is heavily utilised. Two methods have been developed to deal with this situation. First, the arriving calls can be offered to the least utilised route of a multi-path route connecting the origin-destination pair; second, bandwidth can be transferred from underutilised routes to overutilised routes. The first method forms the basis of traffic engineering (TE) where network flows are moved to where the network bandwidth is located. Many TE methods have been developed to manage connectivity and to deliver resilience and Quality of Service (QoS) across the network. See [2] and the references therein for a description of TE methods and applications. The second method forms the basis of network engineering (NE) where bandwidth is moved to where the network flows are located. This paper investigates the use of NE rather than TE to manage the network QoS. A systematic way of applying NE is to assign a broker referred to as a bandwidth manager to each route. At random time instants the manager calculates the expected value, over a short period of time known as the planning horizon, of an extra unit of bandwidth (the “buying price”) and also the expected value, over the same short period of time, that the route would lose should it give up a unit of bandwidth (the “selling price”). Bandwidth can then be transferred from routes that place a low value on bandwidth to routes that place a high value on bandwidth. An essential component of the reallocation scheme is that the bandwidth transfer process is based on local information only. The bandwidth managers are thus aware of local resource demands and bandwidth prices, and reallocate bandwidth among themselves in order to maintain the performance of their routes. In this way the managers are autonomous, act without centralised control from a system coordinator and behave entirely according to local rules. Such a scheme is distributed and scalable. The purpose of this paper is to evaluate the efficacy of such a bandwidth reallocation scheme both during normal operating conditions to provide prompt bandwidth provisioning for random traffic fluctuations, and as a recovery mechanism [3], [4] in rapidly reallocating bandwidth after a network failure. The remainder of the paper is organised as follows. Section II presents a method to compute buying and selling prices for bandwidth in connection-oriented networks and discusses how the prices are used in the bandwidth reallocation scheme. Section III presents some results from simulation experiments to test the efficacy of the reallocation scheme during normal operating conditions. Section IV presents an efficient heuristic for the off-line calculation of the minimal amount of additional bandwidth necessary to construct equivalent recovery paths after a network failure. Section V presents some results from simulation experiments to test the efficacy of the reallocation scheme when recovering from the failure of a single link. Our conclusions are presented in Section VI. II. BANDWIDTH REALLOCATION We use a bandwidth pricing method [5] which models a route as an Erlang loss system and computes the expected lost revenue due to connections being blocked, conditional on the system starting in a given state. The expected lost revenue is used to compute both a buying price and selling price for bandwidth, relying on knowledge of the state at time zero. A. The price of bandwidth A manager is assigned to each route and we assume that a route’s manager is making decisions for a planning horizon of τ time units. We can regard the value of U extra units of bandwidth, or the buying price, as the difference in the total expected lost revenue over time [0, τ ] if the route were to increase its bandwidth by U units at time zero. Likewise, we can calculate the selling price of U units of bandwidth as the difference in the total expected lost revenue over time [0, τ ] if the route were to decrease its bandwidth by U units. For a route with bandwidth C, let Rc,C (τ ) denote the expected revenue lost in the interval [0, τ ], given that there are c connections at time 0. The buying and selling prices Bc,C (τ, U ) and Sc,C (τ, U ) of U units of bandwidth are then given by Bc,C (τ, U ) = Rc,C (τ ) − Rc,C+U (τ ) and Sc,C (τ, U ) Rc,C−U (τ ) − Rc,C (τ ), = RC−U,C−U (τ ) − Rc,C (τ ), 0<c≤C−U C − U < c ≤ C. Rc,C (τ ) is given by [5] the inverse of the Laplace transform ec,C (s) = R θ(λ/s)Pc (s/λ) (s + Cµ)PC (s/λ) − CµPC−1 (s/λ) where λ and µ are the parameters of the exponential connection arrival and connection service processes respectively, θ is the expected revenue earned per connection, Pc (s/λ) = (−µ/λ)c Γ(λ/µ) (−s/µ) c and Γ(λ/µ) c (−s/µ) = c−k c X c −s/µ −λ k=0 k k µ k! is a Charlier polynomial. A computationally stable method ec,C (s) is presented for efficiently calculating the functions R in [6]. These are numerically inverted to yield the Rc,C (τ ). B. A distributed bandwidth reallocation scheme Under our proposed scheme, an automated bandwidth manager is assigned to each route. We shall view routes as being of two types: direct routes, which traverse just a single physical link, and transit routes, which traverse more than one physical link. We assume that each physical link supports a direct route. The direct routes on the links of a transit route are referred to as its constituent direct routes. Bandwidth reallocation is driven by the managers of transit routes, and bandwidth is reallocated between the transit routes and their constituent direct routes. The manager of a transit route obtains the buying and selling prices by means of a signalling mechanism. Signals or control packets are sent at random intervals of time along each transit route, recording the buying and selling prices of the constituent direct routes. When a control packet reaches the egress router, a calculation is performed: if the sum of the direct route buying prices is greater than the transit route selling price, then the transit route will give up U units of bandwidth, which are taken up by each of the direct routes. Alternatively, if the sum of the direct route selling prices is less than the transit route buying price, then each of the direct routes will give up U units of bandwidth, which are taken up by the transit route. The control packet is returned along the reverse route from the egress router to the ingress router making the necessary reallocations to the capacities of the constituent direct routes. The capacity (the terms bandwidth and capacity are used interchangeably) of the transit route is adjusted when the control packet reaches the ingress router. Bandwidth reallocation takes place between transit routes and their constituent direct routes: since the transit route managers work independently and do not communicate directly with each other, the reallocation algorithm is distributed and scalable. However, the transit route managers can communicate indirectly with each other via any constituent direct routes that they have in common along their transit routes. See [1] for a discussion on how often the managers of transit routes send out control packets, the determination of the planning horizons τ used in the calculation of buying and selling prices, the determination of the amount U of bandwidth to be transferred in each reallocation, issues concerning the loss or delay of control packets and the overhead costs incurred in the signalling scheme. III. T HE PERFORMANCE OF THE REALLOCATION SCHEME: NORMAL OPERATION In this section we present simulation results taken from [1] to evaluate the performance of the bandwidth reallocation IV. N ETWORK RESTORATION Network restoration schemes address the reconfiguration of the network and the reallocation of traffic flows after a node or link failure, a node failure being regarded as the failure of the links incident upon the failed node. Link restoration schemes require the nodes adjacent to the failed link(s) to be responsible for rerouting the traffic demands by patching around the failed link(s). Path restoration requires the end nodes of the failed path to be responsible for rerouting the traffic demands onto a recovery path. A. The Spare Capacity Assignment (SCA) problem Fig. 1. The average network loss probability when the connection arrival rates are repeatedly perturbed. scheme when applied to a 29-node 45-link network model where each of the 406 origin-destination pairs is connected by a single least cost (shortest hop count) route. The network model and its traffic matrix are described in [1]. A simulator was developed to model the connection arrival, admission and service processes, the signalling process, bandwidth price calculation and bandwidth reallocation. The network links are initially capacitated so that 2% of the offered connections are lost (a QoS of 2%) when no reallocations are attempted. Our simulations [1] show that for the network model under investigation, bandwidth reallocation reduces the average connection loss probability by a factor of 3 from 0.02 to 0.007. The efficacy of bandwidth reallocation is relatively insensitive to the connection holding time distribution and to reasonable signalling delays. Reallocation attempts take place relatively infrequently so that effective bandwidth reallocation can be implemented while keeping the signalling cost incurred in reallocation low. As an example we investigate how the bandwidth reallocation scheme deals with traffic patterns which differ substantially from the traffic pattern that was originally used to configure the link capacities. Let λr (0) denote the original average arrival rate of connections to route r. The traffics are perturbed at regular time intervals. The ith traffic perturbation yields a new set of connection arrival rates λr (i) = λr (0)(1 + ar (i)) where i > 0 and ar (i) is a random variable sampled from a uniform distribution in the range [−0.2, 0.2] so that the connection arrival rates to a randomly selected half (approximately) of the routes are increased while the arrival rates to the remaining routes are decreased. Fig. 1 shows that bandwidth reallocation substantially reduces the network loss probability when the network traffic is subject to repeated random perturbations of ±20%. The network cannot meet its target QoS with these traffic perturbations if bandwidth reallocation is not applied. The SCA problem describes how to route backup paths and configure the network links with the least amount of spare (backup) capacity in order to be able to deploy equivalent failure-disjoint recovery paths. An equivalent recovery path provides the same QoS as its working counterpart. Two routes are failure-disjoint if in any failure scenario, at most one of the routes fails: the recovery routes thus offer 1-to-1 protection [3]. The SCA problem for a directed network is given by minQ,s φ(s) (1) such that s = G = T+Q ≤ QBT = max G (2) T Q MU 1 (3) (4) D. (5) The objective (1) is to minimise the total cost φ(s) of spare capacity by selection of the backup paths Q and the spare capacity allocation s. Constraints (2) and (3) compute the spare capacity s and the spare provision matrix G where M is a diagonal matrix of bandwidth allocations and U = P ⊙ FT is the path failure incidence matrix, where P is the working path link incidence matrix and F is the link failure incidence matrix. The operators max and ⊙ and the various matrices are defined in Table I. Constraint (4) guarantees that the backup paths Q do not use any links which might fail simultaneously with their working paths where T = U ⊙ F is the flow tabu matrix. Constraint (5) expresses flow conservation where B is the node link incidence matrix and D is the route node incidence matrix. The SCA problem is known to be NP complete. B. Successive Survivable Routing (SSR) Liu et al. [7] developed the SSR heuristic which provides an approximate solution to the SCA problem in networks subject to arbitrary failure scenarios. The SSR heuristic addresses failure independent (state independent) path restoration where each working path has one failure-disjoint recovery counterpart. The essential elements of the SSR heuristic are presented in lines 8 through 18 of the algorithm presented in Fig. 2 with the conditions on Gk and Z in line 8 replaced by a suitable stopping rule, and lines 11 and 13 modified to exclude tabu links. TABLE I AND NOTATION M ATRIX OPERATIONS C = A+ C = A⊙B c =P max A c= A N, L, R, K n, ℓ, r, k B = {bnℓ }N×L D = {drn }R×N Ek = {erk }R×K F = {fkℓ }K×L Gk = {gkℓr }L×R G = {gℓk }L×K H = {hℓk }L×K M = Diag({mr })R×R P = {prℓ }R×L Q = {qrℓ }R×L Qk = {qkrℓ }R×L T = {trℓ }R×L U = {ukℓ }R×K φ = {φℓ }L×1 s = {sℓ }L×1 vr = {vrℓ }L×1 Z = {zrk }R×K The matrix A is made non-negative, thus cij = max(aij , 0). Binary matrix multiplication. A vector containing the maximum value of each row of A. P P a . The sum of all elements of A. If A is a V × W matrix then c = v w vw The number of nodes, links, paths and failure scenarios. The indices of the nodes, links, paths and failure scenarios. The node link incidence matrix where bnℓ = ±1 if node n is the origin or destination of link ℓ and zero otherwise. The route node matrix where drn = ±1 if node n is the origin or destination of route r and is zero otherwise. A matrix where erk = 1 and erj = 0 for j 6= k. The link failure incidence matrix: fkℓ = 1 if link ℓ fails in scenario k and 0 otherwise. The spare provision matrix per failure scenario: gkℓr is the backup capacity required on link ℓ for path r in failure scenario k. The spare provision matrix: gℓk is the backup capacity required on link ℓ for failure scenario k. The returned capacity matrix: hℓk is the capacity returned to the non-failed link ℓ for failure scenario k. A diagonal matrix of the bandwidth mr of path r. The working path link incidence matrix: prℓ = 1 if the working path pr traverses link ℓ and 0 otherwise. The backup path link incidence matrix: qrℓ = 1 if the backup path qr traverses link ℓ and 0 otherwise. The backup path link incidence matrix: qkrℓ = 1 if the backup path qr in scenario k traverses link ℓ and 0 otherwise. The flow tabu matrix: trℓ = 1 if the recovery path qr cannot use link ℓ and is zero otherwise. The path failure incidence matrix: urk = 1 if the working path pr fails in scenario k and 0 otherwise. The backup capacity cost function: φℓ is the cost of a single unit of backup capacity on link ℓ. The spare provision vector: sℓ is the backup capacity required on link ℓ for any failure scenario. The link cost vector: vrℓ is the cost of including link ℓ into the currently rerouted backup path qr . zrk counts the successive unsuccessful attempts at changing the backup path qr for scenario k. C. State dependent SSR (SD-SSR) In this section the SSR heuristic is extended to state dependent (failure dependent) restoration which uses different recovery paths to protect against different failures. The SD-SSR heuristic computes a set of K backup path link incidence matrices where Qk describes the backup paths to be used in failure scenario k. The spare provision matrices Gk that hold each link’s backup capacity requirement for each backup path qr in failure scenario k are given by Gk = QTk MU. PK The spare provision matrix G is given by G = k=1 Gk Ek . The vector s of link backup capacity is given by s = max(G − H)+ where the give-back matrix H = PT MU describes the bandwidth returned (stub-release) by the failed working paths to the Pnon-failed links [8]. The overall backup capacity costs are ℓ φ(s), and the total amount of backup P capacity is ℓ sℓ . Not all network topologies can benefit from stub-release and SD-SSR. For example, consider a network consisting of two source/sink nodes connected by two or more parallel, link-disjoint routes. Such networks benefit from stub-release, but SD-SSR without stub-release will yield only a marginal improvement over SSR. Rings and fully meshed networks do not benefit from stub-release, nor will SD-SSR yield a lower spare capacity requirement than SSR for these topologies. D. The SD-SSR algorithm Fig. 2 presents the essential elements of the SD-SSR heuristic. The notation qr = (Qk )r refers to row r of matrix Qk where qr is the recovery counterpart for working route pr under failure scenario k. The (outer) optimisation loop of the SD-SSR heuristic (see line 6) iterates link by link trying to reduce the amount of backup capacity placed on each link. For each link ℓ with a non-zero backup capacity requirement sℓ (sℓ > 0, see line 6), we examine every scenario k in which sℓ units of backup capacity are required (gℓk − hℓk = sℓ , see line 7). For each such scenario k the algorithm tries to reroute the backup paths qr that contribute to the backup capacity requirement on link ℓ in scenario k (gkℓr > 0, see line 8). The rerouting of each backup path qr in each failure scenario k is not tried again after two successive unsuccessful attempts during a single optimisation cycle (zrk < 2 does not hold anymore). Increasing this number of attempts may lead to better results, but substantially increases the computation time. The rerouting itself (lines 9 to 18) is nearly the same as in SSR [7]. The best recovery route discovered thus far is saved in line 9. Line 10 computes the spare capacity required if no backup path for path r existed. Line 11 computes the spare capacity required if the new backup path for path r can use every non-failed link in scenario k. The element vrℓ′ of the vector vr is interpreted as cost of including link ℓ′ into the backup path. Dijkstra’s algorithm is used in line 13 to determine the least cost path qnew which is stored in Qk if r its cost is lower than the cost of the current recovery path qr . Note that in line 12 the cost vrℓ of the currently selected link ℓ is incremented by 1. This is necessary in order to move capacity from a link ℓ on which two (or more) scenarios require the same amount of capacity. Without this modification, such scenarios’ capacity requirements would be able to permanently “hide” behind each other. Processing the first scenario, the backup capacity would be available at zero cost and would be likely kept (because removing any backup path from this link in the first scenario leads to no decrease in backup capacity provision requirement due to the unchanged amount of backup capacity needed by the second scenario and vice versa). Due to the lack of a cost incentive, backup Input: Values for L, R, K, matrices P, M, F, H and the cost function φ. Result: Matrices Qk and s containing respectively the backup paths and the backup capacity for failure scenario k. 1: begin 2: Determine the initial backup path matrices Qk for each scenario k ∈ (1, . . . , K) using Dijkstra’s algorithm. 3: Calculate the spare provision matrices Gk and G and the spare provision vector s. 4: repeat P 5: Set oldCost = φ(s) and set Z = (0)R×K . 6: for each link ℓ ∈ (1, . . . , L) with sℓ > 0 in random order begin 7: for each scenario k ∈ (1, . . . , K) with gℓk − hℓk = sℓ in random order begin 8: for each path r ∈ (1, . . . , R) with gkℓr > 0 and zrk < 2 in random order begin 9: Set qr = (Qk )r . 10: Determine s0 = s by setting (Qk )r = (0)1×L and recalculating Gk , G and s . 11: Determine s∗ = s by setting (Qk )r = (1)1×L − (FT )k and recalculating Gk , G and s . 12: Determine vr = φ(s∗ ) − φ(s0 ) and add 1 to vrℓ . 13: Determine the least cost backup path qnew for path r in scenario k using Dijkstra’s algorithm with any link ℓ′ that is not r failed (fkℓ′ = 1) enabled at a cost vrℓ′ . 14: if qnew vr < qr vr then r 15: set (Qk )r = qnew and set zrk = 0. r 16: else 17: set (Qk )r = qr and increment zrk by 1. 18: Recalculate P Gk , G and s. 19: until oldCost = φ(s) thus no further reduction in the total cost. 20: end Fig. 2. The SD-SSR heuristic. capacity from this link would possibly never be decreased. The optimisation loop is repeated until for a full cycle (trying to decrease the backup capacity requirement on any link) no backup capacity reduction was made (see line 5) so that the backup capacity cost is the same as when starting the current cycle (see line 5). The SSR heuristic can readily be expressed in a form suitable for distributed computation [7]. We have also included simulated annealing mechanisms into the SD-SSR heuristic which can, at an increased computational expense, test the near optimality of the SSR solutions. E. SSR versus SD-SSR numerical results Table II presents the spare capacity required to protect various networks against the failure of any single link (K = L). The spare capacity requirement is expressed in terms of the redundancy η which is the ratio of the backup capacity s to the original capacity of all the links in the network. The simple two step method [7] computes the shortest recovery routes which, apart from the failed links, need not be link TABLE II REDUNDANCY N ETWORK Network Test network Test network Network [1] Network 4 [7] Scale-free [9] Random [9] Test network Network 1 [7] Test network N/L nodes links 10/12 10/15 29/45 17/31 50/97 50/100 10/20 10/22 10/25 node degree 2.4 3.0 3.1 3.7 3.9 4.0 4.0 4.4 5.0 two step η% 99.1 81.4 99.6 78.1 58.2 39.4 79.2 74.6 69.3 η SSR η% 86.3 51.8 68.4 45.5 42.8 27.6 48.1 43.7 46.2 SD SSR η% 83.4 49.4 65.2 32.1 38.3 23.3 40.3 32.5 30.8 ∆ % 2.9 2.4 3.2 13.4 4.5 4.3 7.8 11.2 15.4 disjoint with their working counterparts. The SSR and SD-SSR heuristics were each replicated 20 times using different random number seeds for sample networks with randomly generated traffic loads. The networks are all 2-edge connected so that failure disjoint recovery routes can be found. The two scalefree and random networks were generated by the synthetic network generator BRITE [9]. Table II, which is sorted according to the average network node degree, shows that for the networks modelled, SD-SSR yields an improvement ∆ of between 2 to 15% over SSR when protecting against the failure of any single link; in general the redundancy improves (decreases) as the average network node degree increases. V. T HE PERFORMANCE OF THE REALLOCATION SCHEME: FAILURE RECOVERY In this section we present simulation experiments to investigate the efficacy of the bandwidth reallocation scheme in recovering from the failure of the largest link (the tagged link) in the 29-node 45-link network model. The resources required in the recovery process are bandwidth, buffers and router processing capacity. Our recovery model only takes the bandwidth resource into account. Note too that we do not compute backup paths and spare capacity to recover from any single link failure. The network would then be provisioned with more than enough spare capacity to recover from the failure of the tagged link. Instead we will protect the network against the failure of the tagged link only so that the efficacy of the recovery process can be investigated when it has to work with the least amount of spare capacity necessary. The tagged link carries 68 of the network’s 406 routes. The first step is to compute the recovery routes and the spare capacity requirements that are necessary in order to recover Fig. 3. Loss probabilities before, during and after link failure. from the failure of the tagged link. The SSR heuristic assigns spare capacity on average to 28 of the 45 links, increasing the network capacity by 38.5%; the average length of the recovery routes is 6.6 ± 1.4 hops. The SD-SSR heuristic assigns spare capacity to 9 links, increasing the network capacity by 21.7%; the average length of the recovery routes is 7.6 ± 2.2 hops. The recovery paths and the spare capacities are computed prior to the network failure. In principle, the recovery paths can offer 1-for-1 protection switching: the backup capacity can be used by the working routes before the failure occurs either to transport best effort traffic or to capacitate the recovery paths before the failure so that multi-path routing can be used to balance the traffic loads across the two paths prior to failure. However, in this study the spare capacity will remain unused until the failure occurs. The network failure and subsequent recovery are modelled as follows. Before the link failure the reallocation scheme transfers bandwidth in units of 4 at a time. The link failure is detected by the managers of the routes (the failed routes) that use the failed link. The calls in progress on the failed routes are submitted for reconnection on their recovery counterparts. The bandwidth from the failed routes is returned to the non-failed links. The first reallocation signal after the link failure (the recovery signal) on each recovery route attempts to provision the recovery route to its pre-failure capacity. Apart from these recovery reallocations, all other reallocations transfer bandwidth in units of 4. Because of the distributed nature of the reallocation scheme, race conditions may arise between the recovery signals and the other reallocation signals so that the recovery signals may not be able to provision their recovery routes to their pre-failure QoS. A degradation of service may therefore be experienced for a short period of time while the bandwidth reallocation scheme is allocating bandwidth to the recovery routes. Each simulation processed a total of 108 call completions. The tagged link carrying 68 routes is failed after 50,000,000 calls have completed. Some 86,000 calls are in progress when the link fails: some 24,000 of these calls are in progress on the failed routes and are restarted on the recovery routes. Fig. 3 presents the average loss probability for the recovery routes (before the failure the recovery routes denote the set of routes that use the tagged link) and the average loss probability for the routes that are unaffected by the failure (these routes do not use the tagged link). Note that the simulation measurement interval is reduced while the network recovers from the link failure (the reduced measurement interval may contribute to the volatility of the loss probability plot during the recovery cycle), and that the plot for the loss probability on the surviving routes has been shifted to the right by 5×106 calls in order to make the two plots more readable. Minor oscillations in the loss probabilities are evident after the link fails, but these are rapidly damped. The network is able to meet its target QoS of a 2% connection loss probability before, during and after the link failure. VI. C ONCLUSIONS This paper presents a scheme for bandwidth reallocation in a path-oriented transport network. A bandwidth manager is assigned to each route. The managers are autonomous, acting without centralised control from a system coordinator and behave entirely according to local rules. The managers are aware of local resource demands and bandwidth prices. The managers reallocate bandwidth among themselves in order to maintain the QoS of their routes. We present a simulation model of the bandwidth reallocation scheme. Initial studies of a 29-node 45-link network model reveal that bandwidth reallocation can provide efficient bandwidth provisioning both for random traffic fluctuations, and also during failure conditions to move bandwidth rapidly from failed routes to recovery routes. R EFERENCES [1] Å. Arvidsson, B.A. Chiera, A.E. Krzesinski and P.G. Taylor, “A Distributed Scheme for Value-Based Bandwidth Re-Configuration”, submitted, 2006. http://www.cs.sun.ac.za/∼aek1/COE/dowloads/ four authors.pdf [2] S. Kandula, D. Katabi, B. Davie and A. Charny, “Walking the Tightrope: Responsive yet Stable Traffic Engineering”, ACM SIGCOMM Computer Communication Review, vol. 35, issue 4, 2005, pp. 253–264. [3] V. Sharma and F. Hellstrand (Eds), “RFC 3469: Framework for Multi-Protocol Label Switching (MPLS)-based Recovery”, Feb 2003. [4] J.-P. Vasseur, M. Pickavet and P. Demeester, Network Recovery: Protection and Restoration of Optical, SONET-SDH, IP, and MPLS, Morgan Kaufmann, 2004. [5] B.A. Chiera and P.G. Taylor, “What is a Unit of Capacity Worth?”, Probability in the Engineering and Informational Sciences, vol. 16 no. 4, pp. 513–522, 2002. [6] B.A. Chiera, A.E. Krzesinski and P.G. Taylor, “Some Properties of the Capacity Value Function”, SIAM Journal on Applied Mathematics, vol. 65 no. 4, pp 1407–1419, 2005. [7] Y. Liu, D. Tipper and P. Siripongwutikorn, “Approximating Optimal Spare Capacity Allocation by Successive Survivable Routing”, IEEE Transactions on Networking, vol. 13 no. 1, pp 198–211, Feb 2005. [8] R. Irashko, M. MacGregor and W. Grover, “Optimal Capacity Placement for Path Restoration in STM or ATM Mesh-Survivable Networks”, IEEE Transactions on Networking, vol. 6 no. 3, pp 325–336, June 1998. [9] A. Medina, A. Lakhina, I. Matta and J. Byers, BRITE: An Approach to Universal Topology Generation, in Proc 9th International Workshop on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (MASCOTS’01), Cincinnati, USA, Aug 2001, pp 346–353.