On Shared Risk Link Group Optimization Guangzhi Li, Dongmei Wang, Timothy Gallivan, and Robert Doverspike AT&T Labs, New Jersey, USA, {guangzhi.li,dongmei.wang,timothy.gallivan,rdoverspike}@att.com Abstract — this paper takes a closer look at the shared risk link group (SRLG) optimization issue and proposes algorithm on how to reduce the size of the SRLGs for different applications with correctness proofs. OCIS codes: (060.0060) Fiber optics and optical communications; (060.4256) Networks, network optimization I. INTRODUCTION Most service provider overlay networks are built on top of optical networks and all optical networks are built over some combination of DWDM equipment and/or fibers. If there is a single DWDM system outage or fiber outage, the set of overlay links routed over that fiber would fail at the same time. The set of links is called a shared risk link group (SRLG). SRLGs are often represented by IDs in layer planning systems at various network layers (e.g., “bundle” ID). For example, in the IP layer (overlay network) a link between routers may belong to multiple SRLGs. Thus the SRLG Information for each IP-layer link describes a list of SRLGs to which the link belongs. An SRLG can also represent potential node outages at a given layer, such as total or partial switch outage or switch maintenance or software upgrade procedure in the IP layer. To manage the total number of SRLG IDs, the lower layer topology information is often consolidated. For example, for the purpose of restoration planning, a long path of fiber cables that do not bifurcate at intermediate locations can be aggregated into a single SRLG ID (i.e., the path does not encounter a splice location where some fibers are spliced into a different cable or end at an fiber patch panel in a central office). And, vice versa, for a given SRLG ID, we can list all the IP-layer links that route over that SRLG. Each SRLG ID is unique within a network routing domain. For diverse routing and protection purposes, IGP (interior gateway protocol) routing protocols in both standardized specification and commercial products would support the SRLG information. IETF RFC 4203 [1] defines SRLG as a sub-TLV (type, length, value) of the link TLV. The value is an unordered list of integers of SRLG IDs that the link belongs to. In the AT&T Intelligent Optical Switch (IOS)-layer, the equipment node is the Ciena Core Director. The nodes are connected by “lines” (a SONET OC-48 or OC-192) and multiple lines between the same switch pair are aggregated into “links”. The Ciena OSRP routing protocol defines a list of SRLG bundle IDs for each OSRP link [2]. From the point of view of the Core Director and its Element Management System, theses bundle IDs are simply data and constraints associated with each link, i.e., they have no actual topological graph model for lower layer networks. Each bundle ID represents a portion of the underlying fiber path. IEFT RFC 4203 does not specify a maximum length for the list of SRLGs per link, but some commercial implementations implement maxima. For example, the Ciena OSRP routing protocol enforces a maximum list length of the number of bundle IDs per link [2]. However, in reality, there are links that exceed the maximum. For example, as mentioned above, if each SRLG represents the smallest unit of an individual fiber span (i.e., cable between two cable splice locations1), the number of SRLG IDs could exceed 50,000 IDs easily in a large carrier. One solution is to combine multiple fiber spans (without bifurcation) to one super fiber span to reduce the number of SRLGs. This paper presents a simple algorithm for such a process. II. SRLG FUNCTIONS In a network with IGP supporting SRLG information, each node has the view of the entire network, including the list of SRLG IDs in each link. Then each node or the element management system is able to provide following functions: (1) Fast reroute: In link-based MPLS FRR (multi-protocol label switching fast reroute), each backup LSP (label switched path) is a list of bundled links that are diversely routed from a given bundled link. In node-based FRR, a backup LSP is also a list of bundled links, but further depends on the next hop of the primary LSP at each node along its path (the backup LSP skips the next node). Each node in the LSP is required to create an SRLG diverse backup LSP to its next-next hop node except the last two nodes. The second last node is required to create a SRLG diverse backup LSP to the last node. During any outage, the right upstream node detects the outage and it switches the LSP traffic to the SRLG-diverse backup LSP immediately. In any single SRLG outage, such a scheme provides the fastest recovery to the failed LSPs; (2) Diverse routing: In some cases, a customer may want to create his own overlay network by provisioning several mutual diverse LSPs. Since finding two SRLG diverse paths is NP-complete problem, the network will use either heuristic algorithm or integer linear programming to find the diverse routing paths. Either way, we do not want any two paths to share any common SRLG; (3) Maximal restoration capacity calculation: the IOS network, to enable rapid restoration, pre-calculates the restoration path for each service path and stores the path in source node of the service path [4]. Once an outage occurs, the source node detects (or is notified of) the outage and starts the restoration process using the pre1This fiber span definition is over-simplified to make it clear. A fiber span can encompass multiple fiber cables and may not have splice points at its ends. Two cables travelling diversely could converge in a man-hole cover (no splice) and travel together for some distance before separating again. In such a case, a single fiber span would contain both cables for the distance that they travel together. That is, fiber spans are really defined in terms of physical proximity of multiple cables. calculated SRLG-diverse restoration path. To reduce required capacity, the network is usually designed to consider only single SRLG outages and thus restoration paths could share restoration capacity over non-simultaneous SRLG outages. This is called shared mesh restoration [3]. Network management needs to calculate the maximal restoration capacity reserved at each link [3]. III. Figure 1: Grouping spans into SRLGs SRLG OPTIMIZATION One may define any potential outage as one SRLG, such as a city, a building, a switch or cross-connect component, a conduit, a fiber span, etc. Although there are SRLGs in upper layers, such as router common equipment outages, the most common SRLG represents the potential outage of some sort of fiber spans, or spans for simplicity. As mentioned previously, in this case we need to explore methods to consolidate the size of the SRLG set for large carriers, but without affecting the network restoration metrics or network availability objectives. The diagram in Figure 1 illustrates the relationship among links of an overlay network (such as the IP layer or IOS layer), SRLGs, and spans. Each circle represents an overlay link and contains the spans over which it routes. By examining the areas of overlap, for the purposes of restoration calculation of this overlay network, we could group the 15 fiber spans into 7 SRLGs. For example, the link associated with the red bubble route (the top circle) over SRLGs 1, 2, 3 and 4 and the blue route (the right circle) over SRLGs, 3, 4, 6, and 7. Thus, for network capacity design, we independently consider the failure of each SRLG and how rerouting is accomplished. For example, the outage of SRLG-2 represents an outage of either fiber-span 6 or 10. Next, we will formally describe how to combine spans2 into SRLGs for a specific network G(V,E), where V is the set of overlay network nodes and E is the set of overlay network links (which we will refer to simply as links). Assume we know the specific fiber routes of each link. Then each link l has a list of fiber spans that it routes over, denoted as Fl , and each fiber span f has a list of links that route over it (called the dependent set of links), denoted as Lf. Now we take a close for following cases: (1) The dependent set of fiber span x equals the dependent set of another fiber span y, Lx = Ly : in this case, we can combine these two fiber spans into one single SRLG because if two upper links are diverse from one fiber span, it must be diverse from another fiber span. Actually we can combine all fiber spans with the same dependent set into one single SRLG. This is exactly what we showed in Figure 1. (2) The dependent set of fiber span x is a subset of the dependent set of fiber span y: if two upper links do not share span y, they must not share span x. For all three SRLG related functions in section II, we can remove fiber span x and list span y as one SRLG only. The above observations can be cast into an algorithm to quickly compute a reduced set of SLRGs for each network link that are adequate to accurately perform the three functions described in Section II. Input: network G(V,E) and fiber span route of each link Output: SRLG info for each link [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] 2 read network find the list of fiber spans for each link Fl for each fiber span, find the list of links over it Lf and set its removed mark as 0 for i=0 to n-1, where n is the number of fiber spans for j=i+1 to n if Li = Lj , mark span[j].removed = 1 else if Li Lj , mark span[i].removed = 1 else if Lj Li , mark span[j].removed = 1 end j end i for each span x, associate SRLG[x] with span x if span x is not removed for each link l, define reduced SRLG set Sl = for each span x in Fl if span x is not removed, add SRLG[x] to Sl end for each span x end for each link l We use span consolidation as one example. Other failure modes can be consolidated similarly. IV. CORRECTNESS OF THE ALGORITHM In above SRLG optimization algorithm, we mark following two types of fiber spans as removed. (1) if two fiber spans have exactly the same set of upper links, we combine them together and leave only one fiber span to represent them. Basically the network separates the set of spans into different groups based on their supporting upper links. Each group is assigned one SRLG ID. This idea is illustrated in Figure 1; (2) if the supporting links of one group is a subset of supporting links of another group, we can drop the first group and only keep the second group. The reason is that when two upper links are diverse, they must be diverse on all SRLG groups, i.e., they do not share any common SRLG. If two upper links are diverse on the second group, they must be diverse on the first group. So we are safe to drop the first group for diverse routing and fast reroute. In Figure 1, SRLG1 only supports link1, SRLG2 supports both link1 and link2, while SRLG3 supports link1, link2, and link3. In this case, we can drop SRLG1 and SRLG2, and only keep SRLG3. Similarly we can also drop SRLG4, SRLG5, SRLG6, and SRLG7 without losing essential information about link diversity. Thus it is easy to verify the algorithm correctness for function 1 and function 2 listed in section II. Next we look at the third function: maximal restoration capacity calculation. In a shared mesh restoration scheme [4], we defined a matrix called failneed[f][l], where f is the SRLG index and l is the link index. Matrix failneed[f][l] maintains the restoration capacity needed in link l if SRLG f fails. The maximal restoration capacity is defined as: R[l] = maxf failneed[f][l] over all SRLGs. For any outage f and a set of path P, we define Pf = {pP, p∩Lf ≠}. For any path pP, we define Kp as the set of links of p, Cp as the capacity of p, and p* as the pre-calculated fully diverse restoration path of p. We further define Vf = {l: lp*, pPf}. Then we have failneed[f][l] = p Pf, link l p* Cp, i.e, failneed[f][l] is the sum of the bandwidth of all circuits failed by f and their restoration paths use link l. Assume Lf1 Lf2, then for any p Pf1, p∩Lf1 ≠, we have p∩Lf12 ≠. So we have Pf1 Pf2. Thus for any l Vf1, we have failneed[f1][l] failneed[f2][l]. For any lVf1, failneed[f1]l]=0. So for any link l, we have failneed[f1][l] failneed[f2][l] when Lf1 Lf2. According to the definition of maximal restoration capacity calculation formula, we can drop SRLG f1 without impacting the maximal restoration capacity calculation for any link. The optimization procedures described above may have impacts outside of the three functions considered in the section II, like maximally-diverse routing. V. CASE STUDY We have used SRLG optimization process in many AT&T internal management tools. In this case study, we describe a variation of the SRLG optimization process to include maximal diverse scenario. AT&T has a large IOS layer [4]. The Ciena Core Director defines a list of bundle IDs for each link with a limited maximal number of bundle IDs. This is typically less than the number of fiber spans needed to describe the link's diversity. If the number of real bundle IDs is larger than the maximal number, the list of bundle IDs will be truncated. In this case, which bundle ID should be dropped becomes a critical question and bundle ID optimization is required. Here the bundle ID is the same as SRLG ID. Bundles are re-computed periodically to keep current with ongoing changes in the network link and fiber span data. When changes are required, an attempt is made to minimize changes to existing bundle IDs. In the IOS layer, we are required to provide both diverse routing and maximal diverse routing, as well as maximal restoration capacity calculation. Thus we cannot completely drop SRLGs with subset link groups, instead we use a numeric number to measure the importance of each SRLG related to the nature of the link overlaps and its total mileage. We consider three factors for each SRLG: (1) the mileage associated with the SRLG; (2) the number of simultaneous outage that the SRLG represents; (3) whether the links failed by the SRLG are a subset of those failed by another SRLG. When the number of SRLG exceeds the maximum allowed, SRLGs having lesser importance are dropped until the desired number is achieved. VI. SUMMARY In this paper, we studied the SRLG optimization issue in detail. After considering the relationship and importance of individual SRLGs, we proposed algorithm on how to reduce the number of SRLGs. In real network operation environment, network management systems may choose to order SRLGs based on importance and drop non-important SRLGs. REFERENCES [1] K. Kompella ed., “OSPF extensions in support of Generalized Multi-Protocol Lable Switching(GMPLS)”, RFC 4203. [2] Byrav Ramamurthy et al, “CoreDirector CI system description”, http://groups.geni.net/geni/wiki/Ciena%20Core%20Director%20switch%20component%20manager%20interface. [3] Guangzhi Li et al, “Efficient distributed restoration path selection for shared mesh restoration”, IEEE/ACM ToN, 11(5), October 2003, pages 761-771. [4] Bruce Cortez, “The emerging intelligent optical network: now a reality”, OFC 2002, WH1.