Increasing IP Network Survivability: An Introduction to Protection Mechanisms 20 October 22, 2000 Jonathan Sadler Lead Engineer - ONG SE File Name Motivation There is increasing demand to carry mission critical traffic, real-time traffic, and other high priority traffic over the public internet Any network that carries critical, high-priority traffic needs to be resilient to faults As network technologies continue to improve and converge, protection and restoration schemes have become available at multiple layers File Name Protection What is it? Automated mechanism for recovering traffic path Invoked when the current working path fails Requirements Fast restoration time Voice / video / data can tolerate small outages ( 50ms) Predictable Protection path is pre-determined Can be dedicated (1+1) or shared (M:N) Can be preemptive File Name Protection How is protection different from dynamic rerouting? Dynamic rerouting develops a new path utilizing current network state information Delay incurred as state updates are flooded through network Time to re-converge on new end-to-end path is long Therefore time until destinations become re-reachable is long Side Effect: State information will be received by nodes that are not involved in restoration causing unnecessary CPU usage While best effort services may tolerate this behavior, new services will not VoIP Virtual leased line File Name Protection Domains Method of dividing up a network into separate sub- networks in which a protection mechanism will operate Cross domain coordination is required File Name Protection Topologies Within a protection domain, a number of protection topologies may be used Linear Ring Mesh For any topology the following terminology applies: Working: The path or span being used to carry live traffic Protect: traffic File Name The path or span that will be used to recover live Protection Topologies - Linear Two nodes connected to each other with two or more sets of links Working Protect (1+1) File Name Working Protect (1:n) Protection Topologies - Ring Two or more nodes connected to each other with a ring of links Line vs. Drop interfaces East vs. West interfaces W D E L E L W Working Protect W E E File Name W Protection Topologies - Mesh Three or more nodes connected to each other Can be sparse or complete meshes Spans may be individually protected with linear protection Overall edge-to-edge connectivity is protected through multiple paths Working Protect File Name Protection Mechanisms Protection mechanisms are the algorithms which will restore services carried by a specific network topology Typically take advantage of topology characteristics Two different approaches exist Link oriented Multiple links that support end-to-end connectivity can be individually switched to restore service Path oriented Two paths exist which can be “globally” switched to restore service File Name Protection Mechanisms - Linear APS Two nodes connected to each other with two or more sets of transmission facilities Receiving node will signal source node to change from working to protect facility via out-of-band communication “Switchover” Protect Working Working Protect “OK” File Name Protection Mechanisms - BLSR Bi-directional = both directions are handled as one unit Line Switched = multiple nodes reconfigure line behavior Ring Node that determines need for change will signal out-of- band to other node. All intermediate nodes on protect path then “reconfigure”. A “OK” ? Pros: Efficient Cons: Not as fast as other protection mechanisms File Name A-Z Working Protect Z-A Working Protect Z-A Working Protect “Switchover” A-Z Working Protect ? Z Protection Mechanisms - BLSR cont’d How is this efficient? Each node is involved in reconfiguring when a protection switch is necessary. Consequently, each node knows if the bandwidth reserved for a service is actually in use. If a specific route is declared the “primary route” for the service, then the protect path will only be used when trying to restore a failure on the primary route. As a result, it is possible to insert a second signal on the protect path. When a protection switch is necessary to handle the higher priority traffic, then the “Extra Traffic” will be removed by the nodes as part of the switchover activity. File Name Protection Mechanisms - BLSR cont’d Why is more time needed for a protection switch? Signaling latency Traffic cross connect activation / deactivation in intermediate nodes Definitely needed when Extra Traffic is in use File Name Protection Mechanisms - UPSR Unidirectional = Each traffic direction is independent Path Switched = Not handled “node-by-node” Ring Source generates two copies of signal Destination evaluates both copies and chooses “best path” signal A ? Pros: Low switch time Cons: Not efficient A-Z Working Protect Z-A Working A-Z Working Protect Z-A Protect ? Z File Name Protection Mechanisms - Mesh End-to-End Path Oriented Requires: Topology Discovery Constrained Route Selection (x2) Primary route Protection route Resource affinity (diversity) Signaling Protocol Service setup Protection switchover No standard solutions (yet) File Name Protection Mech. - Revertive Switching Once the failed path has been restored, should the traffic be moved back? Non-revertive Switching Done when failed path is no longer going to be used with service (i.e. service rolls) Revertive Switching Automatic System determines primary path is acceptable Wait to Restore Time Manual Technician determines primary path is acceptable Good in cases where the fault is experienced only under load File Name Protection Domain Consideration What should be the scope of repair? Global Repair Traffic is restored using facilities within the global network Local Repair Traffic is restored using the minimum amount of facilities Lacks network view, leading to potentially inefficient resource utilization File Name Protection Hierarchy Protection functionality is defined for: Optical Layer SONET ATM / Frame Relay MPLS / IP How should all these layers interact? They shouldn’t File Name Two Layer Recovery Model Most providers are adopting a two-layer model, where: Very-fast bulk restoration is done as close to the transport media as possible Optical Switching SONET where Optical Switching is not available Service level restoration is done at the specific service layer SONET -- VT1.5, STS-1, STS-3c, STS-12c, STS-48c services ATM / FR -- Switched Data Services MPLS -- IP Services Layers in between are not used for restoration Service level restoration timers are set so that transport restoration can be attempted first File Name Two Layer Recovery Model - Why? Why have two layers instead of one? Optical switching allows for the greatest number of services to be restored with the least amount of overhead Optical switching will find out about physical failures first Loss of light Optical AIS Optical protection domains are typically smaller than service-level protection domains, reducing signaling time Service layers understand service specific performance requirements best, but may have a large number of services to restore File Name Protection in SONET/SDH Topologies / Mechanisms Available 1+1 Linear APS UPSR BLSR 2-fiber Restoration channels must be reserved, reducing protected capacity 4-fiber -- two sets of Tx/Rx fibers for each line interface Span Switch: Can restore by utilizing alternate Tx/Rx fibers Ring Switch: Utilizes restoration channels located on a separate ring Extra Traffic possible APS, BLSR signaling done in K0 / K1 bytes of overhead File Name Protection in SONET/SDH (cont’d) Failure Criteria Loss of Signal (LOS) Loss of Frame (LOF) Threshold Crossing Bit Error Rate (BER) Coding Violations (CV) Excessive SONET Pointer Justifications Alarm Indication Signal (AIS) File Name Applying Protection to MPLS What does this do for me? Provides fast restoration of MPLS services Can be done on a service-by-service basis. For example: Best effort could be biased to use Extra Traffic links Bronze could be put on unprotected, but avoid Extra Traffic Silver could be protected 1:n Gold could be protected 1+1 File Name Applying Protection to MPLS - How? Perform constraint based route selection for primary path Signal creation of working path LSP Perform constraint-based route selection for secondary path, adding a constraint which removes links that do not meet diversity requirements Signal “reservation” of protect path LSP Working Protect File Name Applying Protection to MPLS - How? Extensions to IS-IS / OSPF Utilizes the same Constraint Routing extensions as TE New constraint: Shared Resource Link Group (SRLG) Used for diversity determination Extensions to CR-LDP / RSVP-TE Add Protection LSP declaration to ERO Add Reverse Notification Tree & Fault Notification Messages File Name MPLS Protection - General Mesh Mech? End-to-End Path Oriented Requires: Topology Discovery Constrained Route Selection (x2) Primary route Protection route OSPF w/ TE IS-IS w/ TE Resource affinity (diversity) Signaling Protocol Service setup Protection switchover File Name RSVP-TE CR-LDP Benefits of a Generalized Control Plane Extension of MPLS to non-IP technologies allows for: Rapid provisioning of lower layer connections Optical trails SONET / SDH trails Cut-through connections Reduces traffic load on core routers Extension of IP semantics (i.e. diff-serv) Validates services that paid for protection are protected File Name Cut-through connection (simplified example) Four IP Routers operating over Optical Network Initial overlay network connects routers in a hub / spoke topology High traffic load exists between Router A and D Router A realizes need for direct path (based on link load threshold crossing), and signals request for path into network New direct path is B now used for A-D traffic D A File Name C Summary New services require mechanisms to recover working traffic as fast as possible Optical Layer protection tools provide restoration with the least amount of overhead Service Layer protection is also necessary MPLS-TE with extensions can provide protection support for IP Networks Can be extended to support any mesh network Use of MPLS to integrate Optical and IP control planes allows IP service semantics to control protection mechanisms used at lower layers File Name Sample Deployment LATA Long-Haul Network Distrib. Router LATA LATA Router LATA Router Distrib. Router Interconnection Point SONET Ring DACS SONET Ring DACS DACS SONET Ring DACS ADM SONET Ring Core Router Core Router OXC Router Router OXC DACS DACS DACS Distrib. Router Interconnection Point LATA Router OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC Core Router LATA Router OXC Core Router OXC OXC OXC OXC OXC OXC OXC OXC OXC Core Router Core Router Interconnection Point Distrib. Router Interconnection Point Core Router OXC Core Router Core Router Interconnection Point T1 ADM Core Router File Name OXC / 3w OC PS A / 3w OC PS A T1 DACS OXC Router Router Sample Deployment - LATA SONET Protection in LATA Local Loop Network IP Mesh Protection in Distrib. Router Distribution Network for IP services SONET Ring SONET Protection in Distribution Network for Private Line services SONET Ring / 3w OC S AP T1 DACS Router Router File Name DACS Distrib. Router LATA Router DACS DACS LATA Router Sample Deployment - Long-Haul Private Line and IP services Long-Haul Network are clients of Optical Core Network Optical Core Network is a OXC OXC sparse mesh protected by MPLS mechanisms OXC OXC OXC OXC File Name OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC OXC References GR-253-CORE, “Synchronous Optical NETwork (SONET) Transport Systems: Common Generic Criteria,” Issue 2 rev 2, (Bellcore, January 1999) GR-1230-CORE, “SONET Bi-directional Line Switched Ring (BLSR) Equipment Generic Criteria,” Issue 4, (Bellcore, December 1998) GR-1400-CORE, “SONET Dual-Fed Unidirectional Path Switched Ring (UPSR) Equipment Generic Criteria,” Issue 2, (Bellcore, January 1999) draft-owens-te-network-survivability-00.txt, “Network Survivability Considerations for Traffic Engineered IP Networks,” (IETF, March 2000) draft-ietf-mpls-recovery-frmwrk-00.txt, “Framework for MPLS-based Recovery,” (IETF, September 2000) draft-chang-mpls-path-protection-01.txt, “A Path Protection / Restoration Mechanism for MPLS Networks,” (IETF, July 2000) draft-chang-mpls-rsvpte-path-protection-ext-00.txt, “Extensions to RSVP-TE for MPLS Path Protection,” (IETF, June 2000) File Name File Name