Efficient Restoration Capacity Design in MPLS Networks Guangzhi Li, Dongmei Wang, Jennifer Yates, Chuck Kalmanek, Robert Doverspike AT&T Labs (Research), NJ Extended Abstract: With the development of Multi-Protocol Label Switching (MPLS) technology, increasing numbers of Internet Service Providers (ISP) are evolving their networks into a common MPLS core. In MPLS core network design, rapid service restoration after a network failure is critical to meet the strict requirements of packet delay/loss sensitive applications such as voice over IP. There has been a great deal of recent work on restoration schemes in MPLS networks, including those based on routing protocol reconvergence and path-based restoration. In routing protocol convergence schemes, new label switched paths (LSPs) will be automatically established along the shortest paths selected as the routing protocol converges upon network failure. In pathbased restoration, two LSPs are typically pre-established: a primary LSP and a restoration LSP. Traffic is switched to the restoration LSP on failure of the primary LSP. Since restoration LSPs do not consume bandwidth before their primary LSPs fail, the restoration LSPs are able to share restoration capacity on common links as long as their primary LSPs do not fail together. Thus, optimization algorithms can maximize the restoration LSP sharing to reduce the bandwidth capacity requirements. Efficient restoration capacity sharing strongly relies on effective selection of LSP routes, especially restoration LSP routes. There has been little work to date investigating efficient restoration path selection in MPLS (packet) networks. However, there has been signficant work on a related problem – namely restoration path selection in reconfigurable optical (circuit) networks. In optical networks, the LSPs are bi-directional and LSP bandwidth is granular (e.g., taking on only fixed, defined SONET/SDH rates). In contrast, in MPLS networks, the LSPs are uni-directional and they can have any bandwidth. Furthermore, once the application traffic is switched to the restoration LSP, the bandwidth along the primary LSP will be released automatically in packet networks, as compared with optical networks where this bandwidth is either not released, or requires explicit signaling to release its bandwidth. In this paper, we extend our proposed restoration path selection scheme for optical networks [1] to provide an efficient restoration LSP route selection algorithm for MPLS networks. We compare the proposed scheme’s bandwidth efficiency and restoration time with other well-known algorithms. To design a restorable MPLS network, we assume that the network operator has full knowledge of the network topology and the traffic demands, including the physical fiberspan information. This information is used to design and dimension the network under particular failure scenarios. To do this, we specify the set of failure scenarios to be considered (e.g., all single router and fiberspan failures) and then enumerate each failure scenario in turn, routing the traffic in response to the failure. The capacity dimensioned for each link is the maximum required across all failure scenarios. Routing protocols (such as OSPF and IS-IS) have been used for many years in packet networks and are thus widely deployed and understood. Routing protocol convergence methods in MPLS networks establish new routes after routing tables update upon network topology changes, including nework failures or repair. However, the disadvantage of using them for restoration of network failures is that some implementations can take up to tens of seconds or more to settle. This restoration time includes the failure detection time and the routing protocol convergence time (i.e., update the next-hop tables) plus new routes creation time. Networks that utilize hop-by-hop packet forwarding make routing decisions at each network node before forwarding a packet. These decisions must be consistent at each node to avoid routing loops, and thus the route selection algorithm must be identical at each network node. Route selection algorithms in routing convergence protocols are standardized. However, in pathbased restoration schemes, LSPs are signaled along a selected path, which is only made by the source node of each LSP. Thus, the route selection algorithm is not standardized and, hence, open to innovation. Disjoint shortest path selection is a commonly used algorithm: for each demand, the shortest route is selected as the service LSP (since the service LSPs cannot be shared and the shortest path is locally optimal), and the restoration LSP is then selected as the disjoint shortest path. In path-based restoration, the restoration time includes the failure detection and notification times, along with the time required to switch the traffic to the restoration LSP. Overall, this is typically faster than the routing convergence schemes, taking less than a second to a few seconds depending on failure detection mechanisms. However, additional complexity is introduced as enhanced MPLS signaling protocols (MPLS-TE) must be deployed within networks implementing path-based restoration schemes and additional intelligence is required in the source LSR to compute the LSP routes. The disjoint shortest path selection algorithm is simple, but inefficient because the LSP route selection algorithm does not consider restoration bandwidth sharing. We instead propose a more efficient algorithm for restoration LSP route selection in MPLS networks. For simplicity, we discuss the proposed algorithm here in the context of a centralized server implementation, although the algorithm can also be implemented in a distributed way. We start with a set of demands, order them in some fashion, and then route each one in seqence. For each demand d, the service LSP is always routed alone the the shortest route Ps. To select the restoration route Pr, we define the matrix failneed(s,kj), to be the amount of restoration capacity required on link k in direction j to restore all failed service LSPs when failure s occurs, where j=1,2 stands for the two directions, k1and k2. The bandwidth required for service LSPs on a given uni-directional link kj is denoted as S(kj). Since MPLS links usually have the same bandwidth in both directions, if we denote total(s,k) as the total bandwidth required on link k when failure s occurs, then total(s,k) = max (failneed(s,k1)+S(k1), failneed(s,k2)+S(k2)). Similarly, we define failrelease(s,kj) and failreroute(s,kj) to be the capacity released and rerouted on link k in direction j after service traffic reroutes upon failure s respectively. Then, failneed(s,kj) =max(0,failreroute(s,kj)-failrelease(s,kj)). The total unused capacity (including both spare and restoration capacity) on a link in direction j is then calculated as the maximum capacity required across all failure scenarios, i.e., U(kj) = maxS (total(s,k)-S(kj)). We also define M(kj) as the maximum required capacity on uni-directional link kj upon Ps failure, which is calculated as maxS failneed(s,kj) among the set of possible failure scenarios along Ps. The routing weights on link k are denoted as w(k). We then select the restoration route as the shortest path defined by the followign link weights: (b C ( k j )) / b w( k ) v( k j ) if b C (k j ) 0 and k Ps if b C (k j ) 0 and k Ps if k share at least one failure with links in Ps where C(kj) = U(kj)-M(kj) represents the existing restoration capacity on uni-directional link kj and b is the capacity requirement of demand d. is a very small positive value which is much less than the weight of a link. The link weights are selected so as to route restoration paths along links with less increasing capacity – the idea being that if the restoration LSP for demand d is routed over unidirectional link kj and the bandwidth requirement of demand d does not exceed C(kj), then the total unused capacity required on unidirectional link kj does not increase. To compare the bandwidth efficiency of the three restoration methods, we simulated them on a simplified US backbone MPLS network with 18 nodes, 32 OC192 links, and full-mesh unidirectional demands. The failure set included all of the fiberspans and backbone LSRs. Our results demonstrate a 6.8% bandwidth reduction using the diverse shortest path routing and a 12.6% bandwidth reduction using our proposed restoration path selection algorithm compared to the shortest-path based (reconvergence) schemes. These results demonstrate that path-based restoration schemes outperform the shortest-path based (reconvergence) schemes, and that our proposed route selection algorithm significantly outperforms the simpler disjoint shortest-path algorithm for path-based restoration. [1] Guangzhi Li, Dongmei Wang, Charles Kalmanek and Robert Doverpike, "Efficient Distributed Path Selection for Shared Restoration Connections," IEEE Infocom, New York, Vol. 1, pp.140-149, 2002.