MPLS Traffic Engineering NANOG18 Robert Raszuk - IOS Engineering raszuk@cisco.com © 1999, Cisco Systems, Inc. 1 Location of files This presentation, handouts & demo are located at: ftp://ftpeng.cisco.com/rraszuk/nanog18 • RR_MPLS_TE_Nanog.pdf - this presentation • TE_Monitor.pdf - show & debug commands • TE_Config.pdf - full configuration syntax • TE_SampleCfg.pdf - configuration sample • TE_DEMO.tar - Tared TE offline demo (HTML) • TEisistdp_1.pdf - Demo’s Lab Topology NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 2 Traffic Engineering: Motivations • Reduce the overall cost of operations by more efficient use of bandwidth resources – by preventing a situation where some parts of a service provider network are over-utilized (congested), while other parts under-utilized The ultimate goal is cost saving ! NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 3 Traffic Engineering: Motivations • MPLS and Traffic Eng allows for one to spread the traffic and distribute it across the entire network infrastructure like magnetic fields between poles while also providing the redundancy required for high availability service. (Eric Dean) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 4 Without Traffic Engineering Cars: SFO-LAX SAN-SMF LAX-SFO SMF-SAN No Traffic Engineering analogy to Human Drivers NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 5 With Traffic Engineering Cars: SFO-LAX SAN-SMF LAX-SFO SMF-SAN Traffic Engineering analogy to Auto Pilot NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 6 Routing solution to Traffic Engineering R2 R3 R1 • Construct routes for traffic streams within a service provider in such a way, as to avoids causing some parts of the provider’s network to be over-utilized, while others parts remain underutilized NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 7 The “Overlay” Solution L3 L3 L2 L3 L2 L2 L2 L3 L2 L2 L3 L3 L3 L3 L3 L3 L3 L3 Physical Logical • Routing at layer 2 (ATM or FR) is used for traffic engineering • Analogy to direct highways between SFO-LAX & SAN-SMF. Nobody enters the highway in between. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 8 Traffic engineering with overlay R2 R3 R1 PVC for R2 to R3 traffic PVC for R1 to R3 traffic NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 9 “Overlay” solution: drawbacks • Extra network devices (cost) • More complex network management (cost) – two-level network without integrated network management – additional training, technical support, field engineering • IGP routing scalability issue for meshes • Additional bandwidth overhead (“cell tax”) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 10 Traffic engineering with Layer 3 R2 R3 R1 IP routing: destination-based least-cost routing Path for R2 to R3 traffic Path for R1 to R3 traffic under-utilized alternate path NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 11 Traffic engineering with Layer 3 R2 R3 R1 IP routing: destination-based least-cost routing Path for R2 to R3 traffic Path for R1 to R3 traffic under-utilized alternate path NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 12 Traffic engineering with Layer 3 what is missing ? • Path computation based just on IGP metric is not enough • Support for “explicit” routing (aka “source routing”) is not available • Analogy: NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. San San Jose Jose 13 MPLS Traffic Engineering © 1999, Cisco Systems, Inc. 14 TE - key mechanisms • “Explicit” routing (aka “source routing”) – Constrained-based Path Selection Algorithm (Example: Choose path with no congestion, avoid highways, select scenic roads etc…) – Extensions to OSPF/ISIS for flooding of resources / policy information (Live collection of traffic statistics - pilot tests in Europe) – MPLS as the forwarding mechanism (Auto Pilot programmed in each car when entering city) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 15 TE - key mechanisms • “Explicit” routing (aka “source routing”) – RSVP as the mechanism for establishing Label Switched Paths (LSPs) – use of the explicitly routed LSP’s in the forwarding table NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 16 What is a “traffic trunk” ? A D B C • Aggregation of (micro) flows that are: – forwarded along a common path (within a service provider) – often from a POP to another POP – share a common QoS requirement (if L-LSPs are used) • Essential for scalability NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 17 TE basics • Traffic within a Service Provider as a collection of “POP to POP traffic trunks” with known bandwidth and policy requirements • TE provides traffic trunk routing that meets the goal of Traffic Engineering – via a combination of on-line and offline procedures NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 18 Requirements: • Differentiating traffic trunks: – large, ‘critical’ traffic trunks must be well routed in preference to other trunks • Handling failures: – automated re-routing in the presence of failures • Pre-configured paths: – for use in conjunction with the off-line route computation procedures • Support of multiple Classes of Service NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 19 Requirements (cont.) • Constraining sub-optimality: – should re-optimize on new/restored bandwidth • in a non-disruptive fashion - maintain the existing route until the new route is established, without any double counting • Ability to “spread” traffic trunk across multiple Label Switched Paths (LSPs) – could provide more efficient use of networking resources • Ability to include / exclude certain links for certain traffic trunks NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 20 Design Constraints • Constrained to a single routing domain – initially constrained to a single area • Requires OSPF or IS-IS • Unicast traffic • Focus on supporting routing based on a combination of administrative + bandwidth constraints NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 21 Trunks Attributes © 1999, Cisco Systems, Inc. 22 Trunk Attributes • Configured at the head-end of the trunk • Bandwidth • Priorities – setup priority: priority for taking a resource – holding priority: priority for holding a resource NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 23 Trunk attributes • Ordered list of Path Options – possible administratively specified paths (via an off-line central server) - {explicit list} – Constrained-based Dynamically computed paths based on combo of Bw and policies • Re-optimization – each path option is enabled or not for reoptimization, interval given in seconds. – Max 1 week (7*24*3600), Disable 0, Def 1h. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 24 Trunk Attributes • Resource class affinity (Policy) – supports the ability to include/exclude certain links for certain traffic trunks based on a user-defined Policy – Tunnel is characterized by a • 32-bit resource-class affinity bit string • 32-bit resource-class mask (0= don’t care, I care) – Link is characterized by a 32-bit resource-class attribute string – Default-value of tunnel/link bits is 0 – Default value of the tunnel mask = 0x0000FFFF NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 25 Example0: 4-bit string, default C A 0000 0000 B 0000 0000 0000 D E • Trunk A to B: – tunnel = 0000, t-mask = 0011 • ADEB and ADCEB are possible NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 26 Example1a: 4-bit string C A 0000 0000 B 0000 0000 0010 D E • Setting a link bit in the lower half drives all tunnels off the link, except those specially configured • Trunk A to B: – tunnel = 0000, t-mask = 0011 • Only ADCEB is possible NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 27 Example1b: 4-bit string C A 0000 0000 B 0000 0000 0010 D E • A specific tunnel can then be configured to allow such links by clearing the bit in its affinity attribute mask • Trunk A to B: – tunnel = 0000, t-mask = 0001 • Again, ADEB and ADCEB are possible NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 28 Example1c: 4-bit string C A 0000 0000 B 0000 0000 0010 D E • A specific tunnel can be restricted to only such links by instead turning on the bit in its affinity attribute bits • Trunk A to B: – tunnel = 0010, t-mask = 0011 • No path is possible NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 29 Example2a: 4-bit string C A 0000 0000 B 0000 0000 0100 D E • Setting a link bit in the upper half drives has no immediate effect • Trunk A to B: – tunnel = 0000, t-mask = 0011 • ADEB and ADCEB are both possible NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 30 Example2b: 4-bit string C A 0000 0000 B 0000 0000 0100 D E • A specific tunnel can be driven off the link by setting the bit in its mask • Trunk A to B: – tunnel = 0000, t-mask = 0111 • Only ADCEB is possible NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 31 Example2c: 4-bit string C A 0000 0000 B 0000 0000 0100 D E • A specific tunnel can be restricted to only such links • Trunk A to B: – tunnel = 0100, t-mask = 0111 • No path is possible NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 32 Trunk Attribute Resource Class Affinity (Policy) • The user defines the semantics: – this bit/mask says “low-delay path excluded” • Flexible (maybe too flexible :) – 1c vs 2c ?… in 1c, the default tunnels will not be willing to flow via the special links NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 33 Link Attributes and their flooding © 1999, Cisco Systems, Inc. 34 Link Resource Attributes • Resource attributes are configured on every link in a network – bandwidth – Link Attributes – TE-specific link metric NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 35 Link Resource Attributes • Resource attributes are flooded throughout the network – bandwidth per priority (0-7) – Link Attributes (Policy) – TE-specific link metric – draft-li-mpls-igp-te-00.txt NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 36 Per-Priority Available BW D T=0 T=1 D advertises: AB(0)=100=…= AB(7)=100 AB(i) = ‘Available Bandwidth at priority I” Setup of a tunnel over L at priority=3 for 30 units D T=2 T=3 Link L, BW=100 Link L, BW=100 D advertises: AB(0)=AB(1)=AB(2)=100 AB(3)=AB(4)=…=AB(7)=70 Setup of an additional tunnel over L at priority=5 for 30 units D T=4 NANOG18 - Robert Raszuk Link L, BW=100 © 2000, Cisco Systems, Inc. D advertises: AB(0)=AB(1)=AB(2)=100 AB(3)=AB(4)=70 AB(5)=AB(6)=AB(7)=40 37 Information Distribution • Re-use the flooding service from the Link-State IGP – opaque LSA for OSPF • draft-katz-yeung-ospf-traffic-00.txt – new wide TLV for IS-IS • draft-ietf-isis-traffic-00.txt NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 38 Information Distribution • Periodic (timer-based) • On significant changes of available bandwidth (threshold scheme) • On link configuration changes • On LSP Setup failure NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 39 Periodic Timer • Periodically, a node checks if the current TE status is the same as the one lastly broadcasted. • If different, it floods its updated TE Links status NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 40 Significant Change 100% 92% 85% 70% Update 50% Update • Each time a threshold is crossed, an update is sent • Denser population as utilization increases • Different thresholds for UP and Down (stabler) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 41 LSP Setup Failure • Due to the threshold scheme, it is possible that one node thinks he can signal an LSP tunnel via node Z while in fact, Z does not have the required resources • When Z receives the Resv message and refuses the LSP tunnel, it broadcasts an update of its status NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 42 Constrained-based Computation © 1999, Cisco Systems, Inc. 43 Constrained-Based Routing • “In general, path computation for an LSP may seek to satisfy a set of requirements associated with the LSP, taking into account a set of constraints imposed by administrative policies and the prevailing state of the network -- which usually relates to topology data and resource availability. Computation of an engineered path that satisfies an arbitrary set of constraints is referred to as "constraint based routing”. Draft-li-mpls-igp-te-00.txt NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 44 Path Computation “On demand” by the trunk’s head-end: – for a new trunk – for an existing trunk whose (current) LSP failed – for an existing trunk when doing reoptimization NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 45 Path Computation Input: – configured attributes of traffic trunks originated at this router – attributes associated with resources • available from IS-IS or OSPF – topology state information • available from IS-IS or OSPF NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 46 Path Computation • Prune links if: – insufficient resources (e.g., bandwidth) – violates policy constraints • Compute shortest distance path – TE uses its own metric – Tie-break: selects the path with the highest minimum bandwdith so far, then with the smallest hop-count NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 47 Path Computation Output: – explicit route - expressed as a sequence of router IP addresses • interface addresses for numbered links • loopback address for unnumbered links – used as an input to the path setup component NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 48 Example C BW(3)=80 0100 1000 BW(3)=60 A 0000 BW(3)=50 D 0000 BW(3)=20 E B 0000 BW(3)=80 0010 BW(3)=70 1000 BW(3)=50 G • Tunnel’s request: – Priority 3, BW = 30 units, – Policy string: 0000, mask: 0011 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 49 MPLS as the forwarding mechanism © 1999, Cisco Systems, Inc. 50 MPLS Labels Two types of MPLS Labels: Prefix Labels & Tunnel Labels Distributed by: LDP RSVP MP-BGP CR-LDP PIM NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 51 MPLS as forwarding engine • Traffic engineering requires explicit routing capability • IP supports only the destination-based routing – not adequate for traffic engineering • MPLS provides simple and efficient support for explicit routing – label swapping – separation of routing and forwarding NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 52 LSP tunnel Setup © 1999, Cisco Systems, Inc. 53 RSVP Extensions to RFC2205 for LSP Tunnels • downstream-on-demand label distribution • instantiation of explicit label switched paths • allocation of network resources (e.g., bandwidth) to explicit LSPs • rerouting of established LSP-tunnels in a smooth fashion using • the concept of make-before-break • tracking of the actual route traversed by an LSP-tunnel • diagnostics on LSP-tunnels • the concept of nodal abstraction • preemption options that are administratively controllable draft-ietf-mpls-rsvp-lsp-tunnel-0X.txt NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 54 RSVP Extensions: new objects • LABEL_REQUEST found in Path • LABEL found in Resv • EXPLICIT_ROUTE found in Path • RECORD_ROUTE found in Path, Resv • SESSION_ATTRIBUTE found in Path 0x01 Fast Reroute Capable, 0x02 Permit Merging, 0x04 May Reoptimize => SE • New C-Types are also assigned for the SESSION, SENDER_TEMPLATE, FILTER_SPEC, FLOWSPEC objects. • All new objects are optional with respect to RSVP (RFC2205). • The LABEL_REQUEST and LABEL objects are mandatory with respect to MPLS LSP signalisation specification. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 55 LSP Setup • Initiated at the head-end of a trunk • Uses RSVP (with extensions) to establish Label Switched Paths (LSPs) for traffic trunks NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 56 Path Setup - Example R9 R8 R3 R4 R2 Pop R5 R1 Label 32 Label 49 Label 17 R6 R7 Label 22 Setup: Path (ERO = R1->R2->R6->R7->R4->R9) Reply: Resv communicates labels and reserves bandwidth on each link NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 57 Path Setup - more details R1 2 1 R2 R3 2 1 Path: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, R3-1) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route(R1-2) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 58 Path Setup - more details R1 2 1 R2 R3 2 1 Path State: Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, R3-1) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route (R1-2) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 59 Path Setup - more details R1 2 1 R2 R3 2 1 Path: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R2-2) Label_Request(IP) ERO (R3-1) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route (R1-2, R2-2) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 60 Path Setup - more details R1 2 1 R2 R3 2 1 Path State: Session(R3-lo0, 0, R1-lo0) PHOP(R2-2) Label_Request(IP) ERO () Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 00) Sender_Tspec(2Mbps) Record_Route (R1-2, R2-2, R3-1) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 61 Path Setup - more details R1 2 1 R2 R3 2 1 Resv: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R3-1) Style=SE FlowSpec(2Mbps) Sender_Template(R1-lo0, 00) Label=POP Record_Route(R3-1) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 62 Path Setup - more details R1 2 1 R2 R3 2 1 Resv State Session(R3-lo0, 0, R1-lo0) PHOP(R3-1) Style=SE FlowSpec (2Mbps) Sender_Template(R1-lo0, 00) OutLabel=POP IntLabel=5 Record_Route(R3-1) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 63 Path Setup - more details R1 2 1 R2 R3 2 1 Resv: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec (2Mbps) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R2-1, R3-1) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 64 Path Setup - more details R1 2 1 R2 R3 2 1 Resv state: Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec (2Mbps) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R1-2, R2-1, R3-1) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 65 Trunk Admission Control • Performed by routers along a Label Switched Path (LSP) • Determines if resources are available • May tear down (existing) LSPs with a lower priority • Does the local accounting • Triggers IGP information distribution when resource thresholds are crossed NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 66 Link Admission Control • Already invoked by Path message – if BW is available, this BW is put aside in a waiting pool (waiting for the RESV msg) – if this process required the pre-emption of resources, LCAC notified RSVP of the pre-emption which then sent PathErr and/or ResvErr for the preempted tunnel – if BW is not available, LCAC says “No” to RSVP and a Path error is sent. A flooding of the node’s resource info is triggered, if needed – ”draft-ietf-mpls-rsvp-lsp-tunnel-02.txt” NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 67 Path Monitoring • Use of new Record Route Object – keep track of the exact tunnel path – detects loops – copy of RRO to ERO allows for route pinning NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 68 Path Re-Optimization • Looks for opportunities to re-optimize – make before break – no double counting of reservations – via RSVP “shared explicit” style! NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 69 Non-disruptive rerouting - new path setup R9 R8 R3 R4 R2 Pop R5 R1 32 49 17 R6 R7 22 Current Path (ERO = R1->R2->R6->R7->R4->R9) New Path (ERO = R1->R2->R3->R4->R9) - shared with Current Path Until R9 gets new Path Message, current Resv is refreshed NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 70 Non-disruptive rerouting switching paths R9 R8 R3 R4 R2 Pop Pop 26 89 R5 R1 32 38 49 17 R6 R7 22 Resv: allocates labels for both paths Reserves bandwidth once per link PathTear can then be sent to remove old path (and release resources) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 71 Reroute - More Details ERO (R2-1, R3-1) Sender_Template(R1-lo0, 00) Session(R3-lo0, 0, R1-lo0) 00 R1 2 R3 1 R2 2 01 1 3 01 3 01 Resource Sharing NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. ERO (R2-1, …, R3-3) Sender_Template(R1-lo0, 01) 72 Reroute - More Details R1 2 1 R2 R3 2 3 1 3 Path: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, …,R3-3) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 01) Sender_Tspec(3Mbps) Record_Route(R1-2) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 73 Reroute - More Details R1 2 1 R2 R3 3 3 Path State: Session(R3-lo0, 0, R1-lo0) PHOP(R1-2) Label_Request(IP) ERO (R2-1, …,R3-3) Session_Attribute (S(3), H(3), 0x04) Sender_Template(R1-lo0, 01) Sender_Tspec(3Mbps) Record_Route (R1-2) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 74 Reroute - More Details R1 2 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 1 R2 R3 3 3 75 Reroute - More Details R1 2 1 R2 R3 3 3 RSVP: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R3-3) Style=SE FlowSpec(3Mbps) Sender_Template(R1-lo0, 01) Label=POP Record_Route(R3-3) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 76 Reroute - More Details R1 2 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 1 R2 R3 3 3 77 Reroute - More Details R1 2 1 R2 R3 3 3 RSVP: Common_Header Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec (3Mbps) Sender_Template(R1-lo0, 01) Label=6 Record_Route(R2-1, …, R3-3) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R2-1, R3-1) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 78 Reroute - More Details R1 2 1 R2 R3 3 3 RSVP state: Session(R3-lo0, 0, R1-lo0) PHOP(R2-1) Style=SE FlowSpec Sender_Template(R1-lo0, 01) Label=6 Record_Route(R2-1, …, R3-3) Sender_Template(R1-lo0, 00) Label=5 Record_Route(R2-1, R3-1) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 79 Fast Restoration Handling link failures - two complementary mechanisms: • Path protection • Link/Node protection NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 80 Path Protection © 1999, Cisco Systems, Inc. 81 Path Protection • Step1: link failure detection – O(depends on L2/L1) • Step2a: IGP reaction (ISIS case) – Either via Step1 or via IGP hello expiration (30s by default for ISIS) – 5s (default) must occur by default before the generation of a new LSP – 5.5s (default) must occur before a change of the LSPDB and the consecutive SPF run. The next SPF run can only occur 10s after (default) – Flooding time (LSP are paced (16ms for first LSP, 33ms between LSP’s, depend also on link speed) – Once the RIB is updated, this change must be incorporated into CEF. The Head-end finally computes the new topology and finds out that some established LSP’s are affected. It schedules a reoptimization for them NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 82 Path Protection • Step2b: RSVP signalisation – rsvp path states with the failed intf as oif is detected – check if another oif available (if loose ero) – if not, clear path state and send tear to head-end • Step2: Either stepA or stepB alarms the head-end • Step3: Re-optimization – dijkstra computation: O(0.5)ms per node (rule of thumb) – RSVP signalisation time to instal rerouted tunnel convergence in the order of several seconds (at least). NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 83 Path Protection Speed it Up • Fine Tune the IGP convergence – Through adequate tuning, ISIS could be tuned to converge in 2-3s, this ensuring that the convergence time bottleneck is the signalisation time for the new tunnel. • Several tunnels in parallel with load-babalancing – if combined with the IGP convergence, the path resilience could be brought to around 2-3s • One end-2-end tunnel in parallel but in backup mode – feature under development (Fast Path Protection) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 84 Fast ReRoute (aka Link Protection) An Overview © 1999, Cisco Systems, Inc. 85 Objective • FRR allows for temporarily routing around a failed link or node while the head-end may reoptimize the entire LSP – rerouting under 50ms – scalable (must support lots of LSP’s) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 86 Fast reroute Overview • Controlled by the routers at ends of a failed link – link protection is configured on a per link basis – Session_Attribute’s Flag 0x01 allows the use of Link Protection for the signalled LSP • Uses nested LSPs (stack of labels) – original LSP nested within link protection LSP NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 87 Static backup Tunnel R9 R8 R4 R2 R5 R1 Pop 17 R6 R7 22 Setup: Path (R2->R6->R7->R4) Labels Established on Resv message NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 88 Routing prior R2-R4 link failure R9 R8 R4 R2 Pop R1 R5 14 37 R6 R7 Setup: Path (R1->R2->R4->R9) Labels Established on Resv message NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 89 Link Protection Active R9 R8 R4 R2 R5 R1 R6 R7 On failure of link from R2 -> R4, R2 simply changes outgoing Label Stack from 14 to <17, 14> NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 90 Link Protection Active R8 R9 Pop 14 Swap 37->14 Push 17 R4 R2 Push 37 R5 R1 R7 R6 Swap 17->22 Label Stack: Pop 22 R1 R2 37 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. R6 17 14 R7 22 14 R4 14 R9 None 91 Fast ReRoute More details on Link Protection (FRR v1) © 1999, Cisco Systems, Inc. 92 V1 Constrain • We protect the facility (link), not individual LSP’s – scalability vs granularity • No node resilience • Static backup tunnel • The protected link must use the Global Label space • A backup tunnel can backup at most one link, but n LSPs travelling via this link NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 93 Terminology R9 R8 R4 R2 R5 R1 R6 R7 • LSP: end-to-end tunnel onto which data normally flows (eg R1 to R9) • BackUp tunnel: temporary route to take in the event of a failure NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 94 Terminology • Link Protection – In the event of a link failure, an LSP is rerouted to the next-hop using a preconfigured backup tunnel NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 95 How to indicate a link is protected and which tunnel is the backup? • On R2 (For LSP’s flowing from R2 to R4): interface pos <r2tor4> mpls traffic-eng backup tunnel 1000 link • LSP’s are unidirectional, so the same protection should be enable for the opposite direction if reverse LSP is conf. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 96 How to setup the backup tunnel? • Just as a normal tunnel whose headend is R2 and tail-end is R4 – v1 requires a manually configured ERO interface Tunnel1000 ip unnumbered Loopback0 tunnel destination R4 tunnel mode mpls traffic-eng tunnel mpls traffic-eng priority 7 7 tunnel mpls traffic-eng bandwidth 800 tunnel mpls traffic-eng path-option 1 explicit name backuppath1 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. ip explicit-path name backuppath1 enable next-address R6 next-address R7 next-address R4 97 Which LSP’s can be rerouted on R2 in the event of R2-R4 failure? • The LSP’s flowing through R2 that – have R2-R4 as Outgoing Interface – have been signalled by their respective head-ends with a session attribute flag 0x01=ON (may use fastreroute tunnels) • int tunnel 1 ## config on the head-end tunnel mpls traffic-eng fast-reroute NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 98 Global Label Allocation POP R8 14 R9 R4 R2 R5 R1 R6 R7 • For the blue LSP, R4 bound a global label of 14 • Any MPLS frame received by R4, with label 14, will be switched onto the link to R9 with a POP, whatever the incoming interface NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 99 How fast is fast? • Link Failure Notification – Usual PoS alarm detection – PoS driver optimisation to interrupt RP in < 1ms – Expected call to net_cstate(idb, UP/DOWN) identifying the DOWN state of the protected int to start our protection action. • RP updates the master TFIB (replace a swap by a swap-push) – < 1ms • Master TFIB change notified to the linecards – < 1ms NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 100 Path state while Rerouting Path (…, PHOP=R2, …) R8 BackUP tunnel R9 Path state R4 R2 R5 R1 R6 R7 PathError (Reservation in Place) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 101 Path & Resv Msgs [Error & Tear] R2 R1 R4 R3 When no link protection: Resv Tear Conf. Path Tear Conf. Resv Tear When link protection: Path Error Resv in place NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. R4 waits for refresh 102 LSP reoptimization • Head-end notified by PathError – special flag (reservation in place) indicates that the path states must not be destroyed. It is just a hint to the head-end that the path should be reoptimized • Head-end notified by IGP NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 103 Why the Patherror? • The Patherror might be faster • In case of multi-area IGP, the IGP will not provide the information • In case of very fast up-down-up, the LSP will be put on the backup tunnel and will stay there as the IGP will not have originated a new LSP/LSA – a router waits a certain time before originating a new LSP/LSA after a topological change • Reliable PathErr optimization NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 104 Resv state while Rerouting The loss of the interface does not affect the Path and Resv states for the LSP’s received on that interface that are marked fast reroutable! R9 R8 Resv state BackUP tunnel R4 R2 R5 R1 Resv R6 R7 • Resv Message is unicast to the Phop (R2) • R2’s Path State has been informed that the Resv might arrive over a different intf as the one used by the Path message NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 105 DiffServ and LSP Reoptimization • In order to optimize the bandwdith usage, backup tunnels might be configured with 0kbps – no ‘non-working’ bandwdith as in SDH! • Although usually the backbone is though as being congestion-free, during rerouting some local congestion might occur – Use diffserv to handle this short-term congestion – Use LSP reoptimization to handle the long-term congestion NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 106 Layer1/2 and Layer3 • Backup Tunnel should not use – the protected L3 link – the protected L1/L2 links!!! • Use WANDL (loaded with both L3 and L1/2 topologies) to compute the best paths for backup tunnels – Download this as static backup tunnels to the routers NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 107 Fast ReRoute Node Protection © 1999, Cisco Systems, Inc. 108 Overview R9 R8 R4 R3 R2 R5 R1 R7 R6 Backup Tunnel to the next-hop of the LSP’s next-hop NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 109 A few More details • Assume – R2 is configured with resilience for R3 – R2 receives a path message for a new LSP whose ERO is {R3, R4, …}, whose Session is (R9, 1, R1), whose sender is (R1, 1) and whose session attribute is (0x01 ON, 0x02 OFF) • 0x01: may use local fast-reroute if available • 0x02: merge capable NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 110 A few More details • Then – R2 checks if it already has a tunnel to R4 – If not, R2 builds a backup tunnel to R4 (currently just like in link protection - manual explicit setup). – R2 sends a Path onto the tunnel with Session (R9, 1, R1), Sender (R2, 1), Session Attribute (0x01 OFF, 0x02 ON) and PHOP R2 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 111 A few More details • When R4 receives this Path message, – it matches the session with the LSP’s one – merge (and thus stop) this path message – sends a RESV back to R2 (unicast) and allocate the appropriate label L NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 112 A few More details • When R2 detects R3’s failure, – For the TFIB entry for the LSP, R2 changes the existing ‘swap’ by a ‘swap to L’ and a ‘push of the backup tunnel label’ • R4’s states are refreshed by the secondary path messages (over the backup tunnels) • ERO of the original path is adjusted at R2 • NHOP is modified in R2 (from R3 to R4) • PHOP is modified in R4 (from R3 to R2) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 113 A few More Details • RESV is being sent back from R4 to R2 directly • If R3 is still active and just the R2-R3 link failed R4 needs to ignore & drop any Tear-Down msg R3 would be sending after the termination of reception of path refreshes from R2. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 114 How to detect R3’s failure? • A node may fail while the link is still up • A node’s linecard processes might survive, a main process failure (freeze of the RP process) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 115 A possible solution RP LC RP LC ... LC • Keepalives between LC’s • Keepalives between a LC and its master RP NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 116 Assigning traffic to Paths (aka autoroute) © 1999, Cisco Systems, Inc. 117 Enhancement to SPF • During SPF each new node found is moved from a TENTative list to PATHS list. Now the first-hop is being determined via: A. Check if there is any TE tunnel terminating at this node from the current router and if so do the metric check B. If there is no TE tunnel and the node is directly connected use the first-hop from adj database C. In non of the above applies the first-hop is copied from the parent of this new node. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 118 Enhancement to SPF - metric check Tunnel metric: A. Relative +/- X B. Absolute Y The default is relative metric of 0. Example: Metric of native IP path to the found node = 50 1. Tunnel with relative metric of -10 => 40 2. Tunnel with relative metric of +10 => 60 3. Tunnel with absolute metric of 10 => 10 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 119 Enhancement to SPF - metric check • If the metric of the found TE tunnel at this node is higher then the metric for other tunnels or native IGP path this tunnel is not installed as next hop • If the metric of the found TE tunnel is equal to other TE tunnels the tunnel is added to the existing nexthops • If the metric of the found TE tunnel is lower then the metric of other TE tunnels or native IGP the tunnel replaces them as the only next-hop. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 120 Other TE New Features © 1999, Cisco Systems, Inc. 121 Auto-Bandwidth Global command: • Monitor marked tunnels’ 5-min average counters every X minutes – default: X = 300 (seconds) – (config)# mpls traffic-eng auto-bw timers frequency <seconds> NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 122 Auto-Bandwidth Per tunnel command: • Every Y minutes, update the BW constraint of the tunnel with the maximum of: – the largest 5-min values sampled during the last Y minutes (Def Y = 24 * 3600sec) - 24h – a configured maximum value – (config-if)# tunnel mpls traffic-eng auto-bw {frequency <seconds>} {max-bw <kbs>} – if the new Bw is not available, the old one is maintained (the new BW is signalled via a 2nd tunnel to follow make before break model) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 123 Example NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 124 Verbatim • Applies to explicitly routed LSP’s • Disable any check against TE/IGP database of the head end • RSVP still check BW (and policy when this will be in Path) hop by hop • Application: manual TE through multi-area IGP CLI: tunnel mpls traffic-eng path-option verbatim NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 125 In-Progress • Allows an end-head to account for bw consumed by tunnels that it has just signalled and for whom the IGP LSA/LSP update has not reflected the available bandwdith NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 126 Example In-Prog Bw: 55 10 Avail Bw: 100 All tunnels require 45 units of BW • In-progress counters reset upon new LSA/LSP reception • In-progress counter decremented upon receipt of path-error NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 127 Benefits • Speed-up the installation of tunnels as it avoids spending time trying not working solutions • Allows for better load-balancing – igp metric then max(min(path-bw)! NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 128 Under/Overbook • ML: Maximum link bandwidth: –This sub-TLV contains the maximum bandwidth that can be used on this link in this direction (from the system originating the LSP to its neighbors). This is useful for traffic engineering. • MR: Maximum reservable link bandwidth: – This sub-TLV contains the maximum amount of bandwidth that can be reserved in this direction on this link. Note that for oversubscription purposes, this can be greater than the bandwidth of the link. • UR(I): Unreserved bandwidth at Priority i: – This sub-TLV contains the amount of bandwidth reservable on this direction on this link, at a certain priority. Note that for oversubscription purposes, this can be greater than the bandwidth of the link. NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 129 Under/Overbook A’s config: int s0 bandwidth <B1> ip rsvp bandwdith <B2> (eg 1500 kbps) (eg 4000 kbps) ... A Physical T1 s0 B ... • ML is set to B1 (eg 1500) • MR is set to B2 (eg 4000) • At t=0, for all i 0 to 7, UB(i) = M = (eg 4000) • routerA's LCAC will not accept an LSP tunnel asking more than ML even if there is available bandwdith at the requested priority. • However, LCAC would allow for example 5 trunks each asking 700 kbps (thus each asking less than ML) while the aggregate is smaller than MR: because { 700 < ML=1500 } and { 3500 < MR=4000 } NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 130 Standby Current solution Tu1: bw1 A B Tu2: bw2 Tu3: bw3 Tu4: bw4 • Solution: – 4 tunnels from A to B: – Tu1’s relative metric: -3 – Tu2 and tu3’s relative metric: -2 – Tu4’s relative metric: -1 NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 131 Last hop label • IETF draft-ietf-mpls-label-encaps-07.txt – A value of 0 represents the "IPv4 Explicit NULL Label” – A value of 1 represents the "Router Alert Label” – A value of 2 represents the "IPv6 Explicit NULL Label" – A value of 3 represents the "Implicit NULL Label” New cli forces tailend to send implicit-null (3) instead of explicit null (0) - default. # [no] mpls traffic-eng signalling advertise implicit-null [<acl>] On receipt (n-1) node we must map 0, 1 or 3 to internal Implicit Null [1 only for historical reasons] NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 132 QoS and RRR © 1999, Cisco Systems, Inc. 133 QoS and RRR • MPLS TE can operate simultaneously (and orthogonally) with MPLS Diff-Serv • All Precedence/DSCP packets follow the same TE tunnels •Diff-Serv provides selective discard (via WRED), and selective scheduling (via WFQ) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 134 QoS and RRR • Future: – Scalable per-tunnel scheduling and policing • Guaranteed PIPE in MPLS-VPN CoS – per-DSCP/per-FEC traffic engineering • diffserv backbone capacity management NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 135 DiffServ and fast-reroute/TE • In order to optimize the bandwdith usage, backup tunnels might be configured with 0kbps – no ‘non-working’ bandwdith as in SDH! • Although usually the backbone is though as being congestion-free, during rerouting some local congestion might occur – Use diffserv to handle this short-term congestion – Use LSP reoptimization to handle the long-term congestion NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 136 RSVP LSP Signalling Protocol for Traffic Engineering © 1999, Cisco Systems, Inc. 137 MPLS-TE Signalling Protocol • Two proposed signaling mechanisms for MPLS traffic engineering are being considered by the IETF’s MPLS work group – RSVP (Cisco and a number of Gigabit router startups (Avici, Argon, Ironbridge, Juniper, and Torrent)) – CR-LDP (Ericsson, Ennovate, GDC, Nortel) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 138 Why RSVP ? • What is needed: An IP signalling Protocol! – ability to establish and maintain Label Switched Path along an explicit route – ability to reserve resources when establishing a path • Interdependent, not independent tasks – benefit from consolidation NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 139 Do I need RSVP only for TE ? NO ! Other uses of RSVP in today’s networks: • Voice over IP call setup, Video (IPTV) • Hybrid deployments (only where needed) • QoS DiffServ Engineering (Cops) • Qualitative Service for DiffServ with RSVP (as opposed to Quantitative RSVP IntServ model) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 140 RSVP is a natural choice • RFC2205: “provides a general facility for creating and maintaining distributed reservation state across a mesh of multicast and unicast delivery paths” • TE: use as a general facility for creating and maintaining distributed forwarding & reservation state across a mesh of delivery paths NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 141 RSVP is a natural choice • RFC2205: “transfers and manipulates QoS control parameters as opaque data, passing them to the appropriate traffic control module for interpretation” • TE: transfer and manipulate explicit route and label control parameters as opaque data pass explicit route parameter to the appropriate routing module, and label parameter to the MPLS module NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 142 RSVP is a natural choice • Leverage Standardized Protocols – PIM for Multicast MPLS – BGP for MPLS VPN’s – RSVP for MPLS Traffic Engineering – LDP (TDP) has been designed because it was easier than fixing all IGP’s (RIP, EIGRP, OSPF, ISIS) – fast deployments and engineering consistency • Leverage Deployed Experience – RSVP deployed since 1996 (IOS 11.2) – ww.isi.edu/rsvp/DOCUMENTS/ietf_rsvp_qos_survey for a list of RSVP implementations NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 143 RSVP is a natural choice • RSVP easily supports – Dynamic resizing of tunnels or paths through refresh messages –Supports strict as well as loose source routes –No double counting of bandwidth when rerouting sub-optimal routes • Extensible via definition of new objects NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 144 RSVP/TE and Scalability Very Different than IntServ context • State applies to a collection of flows (i.e. a traffic trunk), rather than to a single (micro) flow • RSVP sessions are used between routers, not hosts • Sessions are long-lived (up to a few weeks) • Paths are not bound by destination-based routing • Reference: ‘Applicability Statement for Extensions to RSVP for LSP-Tunnels’ (draft-awduche-mpls-rsvp-tunnelapplicability-01.txt) NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 145 RSVP/TE and Scalability Very Different than IntServ context • RFC2208: “the resource requirements for running RSVP on a router increases proportionally with the number of separate sessions” • TE: that is why using traffic trunks to aggregate flows is essential • RFC2208: “supporting numerous small reservations on a high-bandwidth link may easily overtax the routers and is inadvisable” • TE : n/a in the context of TE - traffic trunks aggregate multiple flows NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 146 TE/RSVP Scalability • With basic RSVP (RFC2205), 10000 RRR LSP tunnels flowing through a 75x0 or 12000 is not a problem • Already Deployed on a number of Tier-1 ISP backbones – http://www.nanog.org/mtg-9905/hanna.html – Ship with 12.0(5)S • Refresh Aggregation work will again enhance this scalability NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 147 Conclusion • Using RSVP as MPLS/TE signalling protocol is the natural and consistent choice • It is however only one part of a whole solution: – MPLS as forwarding engine – IGP (OSPF/ISIS) extensions – Constrained Base Routing (RRR) – RSVP as MPLS/TE Signalling Protocol – Installation of Tunnels in the FIB NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 148 Summary © 1999, Cisco Systems, Inc. 149 Traffic Eng • Provides traffic engineering capabilities at Layer 3 – above and beyond of what is provided with ATM • Could be used for other applications as well • Shipping and deployed in production NANOG18 - Robert Raszuk © 2000, Cisco Systems, Inc. 150 Presentation_ID © 1999, Cisco Systems, Inc. 151