Paper Title (use style: paper title)

advertisement
On Shared Risk Link Group Optimization
Guangzhi Li, Dongmei Wang, Timothy Gallivan, and Robert Doverspike
AT&T Labs, New Jersey, USA, {guangzhi.li,dongmei.wang,timothy.gallivan,rdoverspike}@att.com
Abstract — this paper takes a closer look at the shared risk link group (SRLG) optimization issue and proposes algorithm on how to
reduce the size of the SRLGs for different applications with correctness proofs.
OCIS codes: (060.0060) Fiber optics and optical communications; (060.4256) Networks, network optimization
I.
INTRODUCTION
Most service provider overlay networks are built on top of optical networks and all optical networks are built over some
combination of DWDM equipment and/or fibers. If there is a single DWDM system outage or fiber outage, the set of overlay
links routed over that fiber would fail at the same time. The set of links is called a shared risk link group (SRLG). SRLGs are
often represented by IDs in layer planning systems at various network layers (e.g., “bundle” ID). For example, in the IP layer
(overlay network) a link between routers may belong to multiple SRLGs. Thus the SRLG Information for each IP-layer link
describes a list of SRLGs to which the link belongs. An SRLG can also represent potential node outages at a given layer, such as
total or partial switch outage or switch maintenance or software upgrade procedure in the IP layer. To manage the total number
of SRLG IDs, the lower layer topology information is often consolidated. For example, for the purpose of restoration planning, a
long path of fiber cables that do not bifurcate at intermediate locations can be aggregated into a single SRLG ID (i.e., the path
does not encounter a splice location where some fibers are spliced into a different cable or end at an fiber patch panel in a central
office). And, vice versa, for a given SRLG ID, we can list all the IP-layer links that route over that SRLG. Each SRLG ID is
unique within a network routing domain. For diverse routing and protection purposes, IGP (interior gateway protocol) routing
protocols in both standardized specification and commercial products would support the SRLG information. IETF RFC 4203 [1]
defines SRLG as a sub-TLV (type, length, value) of the link TLV. The value is an unordered list of integers of SRLG IDs that
the link belongs to.
In the AT&T Intelligent Optical Switch (IOS)-layer, the equipment node is the Ciena Core Director. The nodes are
connected by “lines” (a SONET OC-48 or OC-192) and multiple lines between the same switch pair are aggregated into “links”.
The Ciena OSRP routing protocol defines a list of SRLG bundle IDs for each OSRP link [2]. From the point of view of the Core
Director and its Element Management System, theses bundle IDs are simply data and constraints associated with each link, i.e.,
they have no actual topological graph model for lower layer networks. Each bundle ID represents a portion of the underlying
fiber path. IEFT RFC 4203 does not specify a maximum length for the list of SRLGs per link, but some commercial
implementations implement maxima. For example, the Ciena OSRP routing protocol enforces a maximum list length of the
number of bundle IDs per link [2]. However, in reality, there are links that exceed the maximum. For example, as mentioned
above, if each SRLG represents the smallest unit of an individual fiber span (i.e., cable between two cable splice locations1), the
number of SRLG IDs could exceed 50,000 IDs easily in a large carrier. One solution is to combine multiple fiber spans (without
bifurcation) to one super fiber span to reduce the number of SRLGs. This paper presents a simple algorithm for such a process.
II.
SRLG FUNCTIONS
In a network with IGP supporting SRLG information, each node has the view of the entire network, including the list of SRLG
IDs in each link. Then each node or the element management system is able to provide following functions: (1) Fast reroute: In
link-based MPLS FRR (multi-protocol label switching fast reroute), each backup LSP (label switched path) is a list of bundled
links that are diversely routed from a given bundled link. In node-based FRR, a backup LSP is also a list of bundled links, but
further depends on the next hop of the primary LSP at each node along its path (the backup LSP skips the next node). Each node
in the LSP is required to create an SRLG diverse backup LSP to its next-next hop node except the last two nodes. The second last
node is required to create a SRLG diverse backup LSP to the last node. During any outage, the right upstream node detects the
outage and it switches the LSP traffic to the SRLG-diverse backup LSP immediately. In any single SRLG outage, such a scheme
provides the fastest recovery to the failed LSPs; (2) Diverse routing: In some cases, a customer may want to create his own
overlay network by provisioning several mutual diverse LSPs. Since finding two SRLG diverse paths is NP-complete problem,
the network will use either heuristic algorithm or integer linear programming to find the diverse routing paths. Either way, we do
not want any two paths to share any common SRLG; (3) Maximal restoration capacity calculation: the IOS network, to enable
rapid restoration, pre-calculates the restoration path for each service path and stores the path in source node of the service path [4].
Once an outage occurs, the source node detects (or is notified of) the outage and starts the restoration process using the pre1This
fiber span definition is over-simplified to make it clear. A fiber span can encompass multiple fiber cables and may not have splice points
at its ends. Two cables travelling diversely could converge in a man-hole cover (no splice) and travel together for some distance before
separating again. In such a case, a single fiber span would contain both cables for the distance that they travel together. That is, fiber spans
are really defined in terms of physical proximity of multiple cables.
calculated SRLG-diverse restoration path. To reduce required capacity, the network is usually designed to consider only single
SRLG outages and thus restoration paths could share restoration capacity over non-simultaneous SRLG outages. This is called
shared mesh restoration [3]. Network management needs to calculate the maximal restoration capacity reserved at each link [3].
III.
Figure 1: Grouping spans into SRLGs
SRLG OPTIMIZATION
One may define any potential outage as one SRLG, such as a city, a
building, a switch or cross-connect component, a conduit, a fiber span, etc.
Although there are SRLGs in upper layers, such as router common equipment
outages, the most common SRLG represents the potential outage of some sort
of fiber spans, or spans for simplicity. As mentioned previously, in this case we
need to explore methods to consolidate the size of the SRLG set for large
carriers, but without affecting the network restoration metrics or network
availability objectives. The diagram in Figure 1 illustrates the relationship
among links of an overlay network (such as the IP layer or IOS layer), SRLGs,
and spans. Each circle represents an overlay link and contains the spans over
which it routes. By examining the areas of overlap, for the purposes of
restoration calculation of this overlay network, we could group the 15 fiber
spans into 7 SRLGs. For example, the link associated with the red bubble route
(the top circle) over SRLGs 1, 2, 3 and 4 and the blue route (the right circle)
over SRLGs, 3, 4, 6, and 7.
Thus, for network capacity design, we independently consider the failure
of each SRLG and how rerouting is accomplished. For example, the outage of SRLG-2 represents an outage of either fiber-span 6
or 10. Next, we will formally describe how to combine spans2 into SRLGs for a specific network G(V,E), where V is the set of
overlay network nodes and E is the set of overlay network links (which we will refer to simply as links). Assume we know the
specific fiber routes of each link. Then each link l has a list of fiber spans that it routes over, denoted as Fl , and each fiber span f
has a list of links that route over it (called the dependent set of links), denoted as Lf. Now we take a close for following cases:
(1) The dependent set of fiber span x equals the dependent set of another fiber span y, Lx = Ly : in this case, we can combine
these two fiber spans into one single SRLG because if two upper links are diverse from one fiber span, it must be diverse
from another fiber span. Actually we can combine all fiber spans with the same dependent set into one single SRLG.
This is exactly what we showed in Figure 1.
(2) The dependent set of fiber span x is a subset of the dependent set of fiber span y: if two upper links do not share span y,
they must not share span x. For all three SRLG related functions in section II, we can remove fiber span x and list span y
as one SRLG only.
The above observations can be cast into an algorithm to quickly compute a reduced set of SLRGs for each network link that
are adequate to accurately perform the three functions described in Section II.
Input: network G(V,E) and fiber span route of each link
Output: SRLG info for each link
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
2
read network
find the list of fiber spans for each link Fl
for each fiber span, find the list of links over it Lf and set its removed mark as 0
for i=0 to n-1, where n is the number of fiber spans
for j=i+1 to n
if Li = Lj , mark span[j].removed = 1
else if Li  Lj , mark span[i].removed = 1
else if Lj  Li , mark span[j].removed = 1
end j
end i
for each span x, associate SRLG[x] with span x if span x is not removed
for each link l, define reduced SRLG set Sl = 
for each span x in Fl
if span x is not removed, add SRLG[x] to Sl
end for each span x
end for each link l
We use span consolidation as one example. Other failure modes can be consolidated similarly.
IV.
CORRECTNESS OF THE ALGORITHM
In above SRLG optimization algorithm, we mark following two types of fiber spans as removed. (1) if two fiber spans have
exactly the same set of upper links, we combine them together and leave only one fiber span to represent them. Basically the
network separates the set of spans into different groups based on their supporting upper links. Each group is assigned one SRLG
ID. This idea is illustrated in Figure 1; (2) if the supporting links of one group is a subset of supporting links of another group,
we can drop the first group and only keep the second group. The reason is that when two upper links are diverse, they must be
diverse on all SRLG groups, i.e., they do not share any common SRLG. If two upper links are diverse on the second group, they
must be diverse on the first group. So we are safe to drop the first group for diverse routing and fast reroute. In Figure 1, SRLG1
only supports link1, SRLG2 supports both link1 and link2, while SRLG3 supports link1, link2, and link3. In this case, we can
drop SRLG1 and SRLG2, and only keep SRLG3. Similarly we can also drop SRLG4, SRLG5, SRLG6, and SRLG7 without
losing essential information about link diversity. Thus it is easy to verify the algorithm correctness for function 1 and function 2
listed in section II.
Next we look at the third function: maximal restoration capacity calculation. In a shared mesh restoration scheme [4], we
defined a matrix called failneed[f][l], where f is the SRLG index and l is the link index. Matrix failneed[f][l] maintains the
restoration capacity needed in link l if SRLG f fails. The maximal restoration capacity is defined as: R[l] = maxf failneed[f][l]
over all SRLGs. For any outage f and a set of path P, we define Pf = {pP, p∩Lf ≠}. For any path pP, we define Kp as the
set of links of p, Cp as the capacity of p, and p* as the pre-calculated fully diverse restoration path of p. We further define Vf =
{l: lp*, pPf}. Then we have failneed[f][l] = p  Pf, link l  p* Cp, i.e, failneed[f][l] is the sum of the bandwidth of all circuits
failed by f and their restoration paths use link l. Assume Lf1  Lf2, then for any p Pf1, p∩Lf1 ≠, we have p∩Lf12 ≠. So we
have Pf1  Pf2. Thus for any l Vf1, we have failneed[f1][l] failneed[f2][l]. For any lVf1, failneed[f1]l]=0. So for any link l,
we have failneed[f1][l] failneed[f2][l] when Lf1  Lf2. According to the definition of maximal restoration capacity calculation
formula, we can drop SRLG f1 without impacting the maximal restoration capacity calculation for any link.
The optimization procedures described above may have impacts outside of the three functions considered in the section II,
like maximally-diverse routing.
V.
CASE STUDY
We have used SRLG optimization process in many AT&T internal management tools. In this case study, we describe a
variation of the SRLG optimization process to include maximal diverse scenario.
AT&T has a large IOS layer [4]. The Ciena Core Director defines a list of bundle IDs for each link with a limited maximal
number of bundle IDs. This is typically less than the number of fiber spans needed to describe the link's diversity. If the number of
real bundle IDs is larger than the maximal number, the list of bundle IDs will be truncated. In this case, which bundle ID should
be dropped becomes a critical question and bundle ID optimization is required. Here the bundle ID is the same as SRLG ID.
Bundles are re-computed periodically to keep current with ongoing changes in the network link and fiber span data. When
changes are required, an attempt is made to minimize changes to existing bundle IDs.
In the IOS layer, we are required to provide both diverse routing and maximal diverse routing, as well as maximal restoration
capacity calculation. Thus we cannot completely drop SRLGs with subset link groups, instead we use a numeric number to
measure the importance of each SRLG related to the nature of the link overlaps and its total mileage. We consider three factors for
each SRLG: (1) the mileage associated with the SRLG; (2) the number of simultaneous outage that the SRLG represents; (3)
whether the links failed by the SRLG are a subset of those failed by another SRLG. When the number of SRLG exceeds the
maximum allowed, SRLGs having lesser importance are dropped until the desired number is achieved.
VI.
SUMMARY
In this paper, we studied the SRLG optimization issue in detail. After considering the relationship and importance of
individual SRLGs, we proposed algorithm on how to reduce the number of SRLGs. In real network operation environment,
network management systems may choose to order SRLGs based on importance and drop non-important SRLGs.
REFERENCES
[1] K. Kompella ed., “OSPF extensions in support of Generalized Multi-Protocol Lable Switching(GMPLS)”, RFC 4203.
[2] Byrav Ramamurthy et al, “CoreDirector CI system description”,
http://groups.geni.net/geni/wiki/Ciena%20Core%20Director%20switch%20component%20manager%20interface.
[3] Guangzhi Li et al, “Efficient distributed restoration path selection for shared mesh restoration”, IEEE/ACM ToN, 11(5), October 2003,
pages 761-771.
[4] Bruce Cortez, “The emerging intelligent optical network: now a reality”, OFC 2002, WH1.
Download