Week 9 MPLS: Multiprotocol Label Switching 1 MPLS…What is it? MPLS – Multi-protocol Label Switching can be applied for any layer 2 network protocol MPLS is a natural evolution of Internet It is based on IP and routing protocols such as BGP-4,OSPF and IS-IS Typically MPLS resides in service providers networks and not in private networks 3 MPLS Protocol Stack Application IP or Multi-Service MPLS Layer 2 (PPP, ATM, FR,..) Physical 4 What is MPLS ? MPLS provides connection oriented switching based on a label applied at the edge of an MPLS domain. IP is used to signal MPLS connections. Major Applications are: Network Scalability Traffic Engineering VPNs 5 MPLS - Best Of Both Worlds PACKET Forwarding HYBRID IP MPLS 1. 2. 3. 4. 5. Scalable Flexible Easily Deployable Inexpensive Dynamic Routing Packet SWITCHING ATM/FR 1. 2. 3. 4. 5. Performance Connection Oriented Traffic Engineering Security QoS Route at the Edge & Switch at the Core 6 Why MPLS ? MPLS converts connectionless IP to a connection- oriented mode MPLS allows IP to be switched through the Internet instead of routed With MPLS the first IP packet of a stream establishes a switched path for all subsequent packets to follow The packets will only be switched at each hop and not routed. 7 Towards a connection-oriented IP.. MPLS is the evolution of current IP and connection oriented protocols Strength and scalability of IP routing PVC like connectivity ATM like QoS Explicit routing Plus Path protection Path optimization 8 MPLS Routers LER: Label Edge Router Ingress LER examines inbound packets, classifies packets, adds MPLS header and assigns initial label. Egress LER removes the MPLS header and routes packets as pure IP LSR: Label Switch Router Transit switch that forwards packets based on MPLS labels 9 IP and MPLS Forwarding IP packets are classified into FECs (Forwarding Equivalence Class) at each hop based on the destination address in conventional routing IP packet forwarding works by assigning a packet to a FEC determining the next-hop of each FEC MPLS group Packets by assigning a label in to FEC based on Class of Service (CoS) and forward fast in a defined path with the same forwarding treatment 10 Traditional IP Forwarding Dest 47.1 47.2 47.3 Dest 47.1 47.2 47.3 Out 1 2 3 1 47.1 1 Dest 47.1 47.2 47.3 Out 1 2 3 IP 47.1.1.1 2 IP 47.1.1.1 3 Out 1 2 3 2 IP 47.1.1.1 1 47.2 47.3 3 2 IP 47.1.1.1 11 Label Switch Path (LSP) LSP: Label Switched Path – Simplex L2 tunnel – Equivalent of Virtual Circuit 4 Byte Label is inserted in to the IP Packet at egress MPLS node to Switch IP alone the created Label Switch Path (LSP) Such MPLS nodes are called Label Switch Routers (LSR). With Labels IP Packet header is analyzed only at ingress and egress LSRs where Labels are inserted and removed. Labels only have a Local significance and change from hop to hop on the MPLS network. (Like DLCI in FR and VCI/VPI in ATM) 12 Label Swapping Label Push: When an IP Packet enters the MPLS network the Label is inserted by ingress LSR Label Pop: When the IP Packet exits the MPLS network the Label is removed by the egress LSR The Label is swapped at each intermediate hop based on a Label mapping table Label mapping table is called the Label Information Base (LIB) 13 MPLS Header Label: Label value, 20 bits Exp: Experimental (CoS), 3 bits ToS /DSCP to Exp mapping S: Bottom of stack, 1 bit TTL: Time to Live, 8 bit – Ingress LER sets MPLS TTL to IP TTL – Egress LER may set IP TTL to MPLS TTL or not 14 MPLS Operation IP Forwarding LABEL SWITCHING Standard Routing protocols Labels are exchanged . Egress Ingress LSR Ingress LSR receives IP packets, performs packet classification (into FECs), assigns a label, & forwards the labeled packet LSR IP Forwarding LSR LSR Egress LSR LSRs forward removes label before packets based on forwarding IP packets the label (no outside MPLS packet classification network in the core) Label IP Hdr Payload + 15 MPLS Packet Flow Step 1: Ingress LER classifies IP packet , adds MPLS header and assigns label Step 2: Transit LSR forwards label packet using label swapping Step 3: Egress LER removes MPLS header and performs IP processing 16 MPLS IP forwarding via LSP Intf Label Dest Intf Label In In Out Out 3 0.50 47.1 1 0.40 3 1 47.3 3 Label Dest Intf In Out 0.40 47.1 1 IP 47.1.1.1 1 47.1 Intf Dest Intf Label In Out Out 3 3 47.1 1 0.50 1 Intf In 3 2 2 47.2 2 IP 47.1.1.1 17 MPLS Terminology LDP: Label Distribution Protocol LSP: Label Switched Path FEC: Forwarding Equivalence Class LSR: Label Switching Router LER: Label Edge Router (Useful term not in standards) 18 Forwarding Equivalence Classes LSR LSR LER LER LSP IP1 IP1 IP1 #L1 IP1 #L2 IP1 #L3 IP2 #L1 IP2 #L2 IP2 #L3 IP2 IP2 Packets are destined for different address prefixes, but can be mapped to common path • FEC = “A subset of packets that are all treated the same way by a router” • The concept of FECs provides for a great deal of flexibility and scalability • In conventional routing, a packet is assigned to a FEC at each hop (i.e. L3 look-up), in MPLS it is only done once at the network ingress 19 LABEL SWITCHED PATH (vanilla) #216 #14 #311 #99 #311 #963 #311 #963 #14 #612 #5 #462 #99 #311 - A Vanilla LSP is actually part of a tree from every source to that destination (unidirectional). - Vanilla LDP builds that tree using existing IP forwarding tables to route the control messages. 20 MPLS BUILT ON STANDARD IP Dest 47.1 47.2 47.3 Out 1 2 3 Dest 47.1 47.2 47.3 Out 1 2 3 1 47.1 3 1 Dest 47.1 47.2 47.3 Out 1 2 3 2 3 2 1 47.2 47.3 3 2 • Destination based forwarding tables as built by OSPF, IS-IS, RIP, etc. 21 IP FORWARDING USED BY HOP-BYHOP CONTROL Dest 47.1 47.2 47.3 Dest 47.1 47.2 47.3 Out 1 2 3 1 47.1 1 Dest 47.1 47.2 47.3 Out 1 2 3 IP 47.1.1.1 2 IP 47.1.1.1 3 Out 1 2 3 2 IP 47.1.1.1 1 47.2 47.3 3 2 IP 47.1.1.1 22 MPLS Label Distribution Intf Label Dest Intf Label In In Out Out 3 0.50 47.1 1 0.40 Intf In 3 Label Dest Intf In Out 0.40 47.1 1 1 Request: 47.1 3 Intf Dest Intf Label In Out Out 3 47.1 1 0.50 3 2 1 1 47.3 3 47.1 Mapping: 0.40 2 47.2 2 23 Label Switched Path (LSP) Intf Label Dest Intf Label In In Out Out 3 0.50 47.1 1 0.40 Intf Dest Intf Label In Out Out 3 47.1 1 0.50 3 1 47.3 3 Label Dest Intf In Out 0.40 47.1 1 IP 47.1.1.1 1 47.1 3 1 Intf In 3 2 2 47.2 2 IP 47.1.1.1 24 Label Distribution Protocols The network automatically builds the routing tables by IGP protocols such as OSPF, IS-IS. The label distribution protocol (LDP) uses the established routing topology to route the LSP request path between adjacent LSRs. A LDP is a set of procedures by which one LSR informs another LSR of the label to FEC bindings it has made. The LDP also encompasses any negotiations in which two label distribution peers (Ingress LSR and egress LSR) need to engage in order to offer CoS to particular IP stream. 25 Hop-by-hop routed LSP 0 A 1 2 0 1 E 0 2 1 B 2 1 2 0 D C 192.6/16 0 Incoming Label Outgoing Label Next hop Outgoing Interface A 100 ? B 1 B 6 ? E 1 C 17 ? D 2 D 5 ? E 0 E 6 ? E 0 26 Example Contd Incoming Label Outgoing Label Next hop Outgoing Interface A 100 6 B 1 B 6 6 E 1 C 17 5 D 2 D 5 6 E 0 E 6 ? E 0 27 EXPLICITLY ROUTED OR ER-LSP Route= {A,B,C} #14 #972 #216 B #14 A C #972 #462 - ER-LSP follows route that source chooses. In other words, the control message to establish the LSP (label request) is source routed. 28 EXPLICITLY ROUTED LSP ER-LSP Intf Label Dest Intf Label In In Out Out 3 0.50 47.1 1 0.40 Intf In 3 3 Dest 47.1.1 47.1 Intf Out 2 1 Label Out 1.33 0.50 3 1 47.3 3 Label Dest Intf In Out 0.40 47.1 1 IP 47.1.1.1 1 47.1 3 1 Intf In 3 2 2 47.2 2 IP 47.1.1.1 29 ER LSP - advantages •Operator has routing flexibility (policy-based, QoS-based) •Can use routes other than shortest path •Can compute routes based on constraints in exactly the same manner as ATM based on distributed topology database. (traffic engineering) 30 Routing Scalability via MPLS Routing Domain C Routing Domain B X V T W Z Y Routing Domain A 31 Routing Domain A: Routes to W Incoming Label Outgoing Label Next hop T N/A 10 X X 10 12 Y Y 12 17 W W 17 N/A W 32 Example Contd – Label Stacking Routing Domain C Routing Domain B V 10 2 5 T X 12 2 6 17 W 2 Z Y Routing Domain A 33 Routing Transients R1 R5 R4 R2 R3 Routing transients happen due to failure detection (in the order of milliseconds) LSP dissemination (in the order of propagation delays) SPF tree calculation (in the order of several hundred milliseconds) 34 MPLS Example LSR9 LSR8 LSR4 LSR2 Pop LSR1 14 37 LSR5 LSR6 LSR7 Setup: PATH (ERO = LSR1, LSR2, LSR4, LSR9) Labels established on RESV message 35 Fast Reroute - Protection Path LSR8 LSR9 LSR4 LSR2 Pop LSR1 17 LSR5 LSR7 LSR6 22 Setup: PATH (LSR2, LSR6, LSR7, LSR4) Labels established on RESV message 36 LSR8 Example Swap 37 --> 14 Push 17 Pop 14 LSR4 LSR9 LSR2 Push 37 LSR1 LSR5 LSR6 Swap 17 --> 22 LSR7 Pop 22 Label Stack LSR1 LSR2 LSR6 LSR7 LSR4 37 17 22 14 None 14 14 37 MPLS Protection May result in suboptimal forwarding but service interruption is negligible A single protection LSP could be used to fast- route not one but multiple LSPs Protection on a per-LSP basis (end-to-end) rather than on a per-link basis is also possible better forwarding properties in case of failures a single protection may not protect as many LSPs handles both node and link failures detection time may be larger will require the computation of link and node disjoint paths 38 Label Encapsulation L2 ATM FR Label VPI VCI DLCI Ethern PPP et “Shim Label” “Shim Label” ……. IP | PAYLOAD MPLS Encapsulation is specified over various media types. Top labels may use existing format, lower label(s) use a new “shim” label format. 39 Traffic Engineering - Objectives Performance optimization of operational networks. Reduce congestion hot spots. Improve resource utilization. Why current IP routing is not sufficient from TE perspective? Fish problem. Destination-based Local optimization 40 IP Routing & “the Fish” R8 R3 R4 R2 R5 R1 R6 R7 IP (Mostly) Uses Destination-Based Least-Cost Routing Flows from R8 and R1 Merge at R2 and Become Indistinguishable From R2, Traffic to R3, R4, R5 Use Upper Route Alternate Path Under-Utilized 6 Deficiencies in IP Routing Chronic local congestion Load balancing Across long haul links Size of links Difficult to get IP to make good use unequal size links without overloading the lower speed link 42 Peer Model Peer model OSPF routing + link weights. Key technique: weight setting. Networks operate as it does today. Much more scalable than overlay model. 43 Load Balancing Making good use of expensive links simply by adjusting IGP metrics can be a frustrating exercise! 44 Overlay Motivation Separate Layer 2 Network (Frame Relay or ATM) “The use of the explicit Layer 2 transit layer gives us very exacting control of how traffic uses the available bandwidth in ways not currently possible by tinkering with Layer 3-only metrics.” The Overlay Solution L3 L3 L2 L3 L2 L2 L2 L3 L2 L3 L2 L3 L3 L3 L3 L3 L3 L3 Physical Logical Layer 2 (for example ATM) network used to manage the bandwidth Layer 3 sees a complete mesh Overlay Drawbacks Extra network devices (cost) More complex network management Two-level network without integrated NM Additional training, technical support, field engineering IGP routing doesn’t scale for meshes Number of LSPs generated for a failed router is O(n3); n = number of routers Overlay Drawbacks Every router is permanently connected to every other router (fullmesh) PVCs are provisioned with given bandwidths Delays are short Problem: scalability • For N routers, N x (N-1)/2 ATM VCs • Also: • The IP link-state routing protocol (e.g. OSPF) has to handle a huge number of links, and link State Advertisements packets are flooded on every link Worse: when an ATM link fails, all VCs using that link fail, andmany IP routers have to update their routing tables at the same time Amount of routing information can be as much as N^4 In practice, this solution does not scale beyond 100 routers (± 5000 PVCs) 48 Traffic Engineering & MPLS + Router ATM Switch or = MPLS Router ATM MPLS Router MPLS fuses Layer 2 and Layer 3 Layer 2 capabilities of MPLS can be exploited for IP traffic engineering Single box / network solution An LSP Tunnel R8 R3 R4 R2 R5 R1 R6 R7 Labels, like VCIs can be used to establish virtual circuits Normal Route R1->R2->R3->R4->R5 Tunnel: R1->R2->R6->R7->R4 50 Comprehensive Traffic Engineering Network design Engineer the topology to fit the traffic Traffic engineering Engineer the traffic to fit the topology Given a fixed topology and a traffic matrix, what set of explicit routes offers the best overall network performance? 51 Constraint-based routing Two basic elements Route optimization: Select routes for traffic demands subject to a given set of constraints. Route placement: Implement the selected routes in the network so that the traffic flows will follow them. Mathematical formulation Assumptions Network is represented as a directed graph G(V, E). Network links and capacities are directional. Average traffic demand is known. Traffic demand between two edge nodes is directional. Objectives All traffic demands are fulfilled. Minimize the maximum of link utilization. 52 Off-line Formulation Notations G=(V,E) cij be the capacity of link (i,j), for all (i,j) in E. K: the set of traffic demands between a pair of edge nodes. (dk,sk,tk): (bandwidth demand, source node, destination node), for all k in K. Xijk the percentage of k’s bandwidth demand satisfied by link (i,j). α: the maximum of link utilization among all the links. 53 Off-line Formulation 54 On-line TE Shortest path (SP) Link metric for link (i,j) is inversely proportional to the bandwidth. Minimize the total resource consumption per route. Minimum hop (MH) Link metric is set to 1 uniformly for each hop. Still run shortest path algorithm. 55 On-line TE Shortest-widest path (SWP) Link metric is set to as bandwidth. Always selecting the path with largest bottleneck bandwidth. The one with minimum hops or shortest distance is chosen when multiple paths are available. Hybrid algorithm Motivation. Solution: appropriate weight assignment, and link utilization (instead of using link residual bandwidth). Metric: path cost + link utilization. 56 Mathematical Formulation Notations fij: current load (used capacity) of link (i,j); initial value is 0 cij: total capacity of link (i,j) α: current maximum link utilization; initial value is 0 αij: link (i,j) cost metric 57 Multi-path Algorithms L1 Incoming Traffic Traffic Splitter Outgoing Traffic L2 58 Traffic Splitting Basic requirements Traffic splitting is in the packet-forwarding path, and executed for every packet. To reduce implementation complexity, the system should preferably keep no or little state info. Traffic-splitting schemes produce stable traffic distribution across multiple outgoing links with minimum fluctuation. Traffic-splitting algorithms must maintain per- flow packet ordering. 59 Hashing Direct hashing Hashing of destination address H(•)=DestIP mod N N: the number of outgoing links Hashing using XOR folding of source/destination addresses H(•)=(S1⊗S2⊗S3⊗S4⊗D1⊗D2⊗D3⊗D4) mod N ⊗: XOR operation Si: the ith octet of the source address Di: the ith octet of the destination address CRC 16 (16-bit cyclic redundant checksum) H(•)=CRC16(5-tuple) mod N (−) distributing traffic load evenly 60 Hashing Split a traffic stream into M bins. The M bins are mapped to N outgoing links based on an allocation table, i.e., compute the corresponding hash value. By changing the allocation of the bins to the outgoing links, we can distribute traffic in a predefined ratio. 1 N 2 1 3 1 4 N 1 N M-1 3 M 1 61 Bilkent’s Traffic Engineering Onur Alparslan’s MS thesis work • Introduction • Traffic Engineering (TE) • Problem Statement • Our Proposed TE Architecture • Path Establishment • Queuing Models • Feedback Mechanism and Rate Control • Traffic Splitting Algorithm • Simulations • Conclusion 62 Traffic Engineering (TE) Definition: The process of controlling how traffic flows through a network so as to optimize resource utilization and network performance, and reconfiguration of mapping in changing network conditions. Advantages: • Provide ISPs precise control over the placement of traffic flows. • Balance the traffic load on the various links, routers, and switches in the network so that none of these components is overutilized or underutilized. • Provide more efficient use of available aggregate bandwidth. • Avoid hot spots in the network. 63 Problem Statement Our Goal: • Our main aim is to increase total amount of carried traffic and balance the load of links in the network by using two disjoint paths (multipath). • No need for prior information on traffic matrix. • Eliminate knock-on effect. • Consider the load balancing performance of elastic TCP flows. • Apply methods that are TCP friendly • Capabilities to simulate a mesh network with thousands of TCP flows by using ns-2 64 Problem Statement Our Approach: • A primary and a disjoint secondary path are established from an ingress node to each egress node. • Split TCP traffic between the primary and secondary paths using a distributed mechanism based on ECN marking and AIMD-based rate control. • Primary paths have strict priority over the secondary paths with respect to packet forwarding • TCP splitting mechanism operates on a per-flow basis in order to prevent packet reordering which can substantially reduce TCP performance 65 Path Establishment Without Traffic Information • We establish two disjoint paths between each source destination pair • The first one is the Primary Path (PP) and uses shortest path found using Dijkstra’s algorithm. • The second one is the Secondary Path (SP) and it is computed after pruning the links used by PP and using Dijkstra’s algorithm in the remaining network graph. Node 3 Node 1 Destination Node 2 Node 5 Source Node 4 66 Queuing Model Egress Node 1 Backbone Network Per-egress queuing PP Queue SP Queue PP Queue Per-class queuing Gold Queue RM + TCP ACK Silver Queue SP Queue Bronze Queue Egress Node 0 Egress Node 2 67 Knock-on Effect • Giving equal priority to PPs and SPs may decrease the performance of PPs since a SP may share links with PPs of other node pairs • Traffic increase on a SP may force sources of PPs sharing links with this SP to move traffic to their own SPs • This further decreases performance, because SPs typically use longer routes and can also force other PPs to move traffic to their SPs Edge 3 Edge 1 Edge 2 68 Bistability in Single Overlay: Phone Network Phone network is an overlay Logical link between each pair of switches Phone call put on one-hop path, when possible … and two-hop alternate path otherwise Problem: inefficient path assignment Two-hop path for one phone call … stops another call from using direct path … forcing the use of a two-hop alternate path busy busy 69 Preventing Inefficient Routes: Trunk Reservation Two stable states for the system Mostly one-hop calls with low blocking rate Mostly two-hop calls with high blocking rate Making the system stable Reserve a portion of each link for direct calls When link load exceeds threshold… • … disallow two-hop paths from using the link Rejects some two-hop calls • … to keep some spare capacity for future one-hop calls Stability through trunk reservation Single efficient, stable state with right threshold 70 FIFO (First In First Out) Queuing • TCP data packets of PPs and SPs join the same silver queue and we do not make use of the Bronze Queue at all • ACK and Probe Packets (RM) join the Gold Queue • Gold Queue has strict priority over Silver Queue. Gold Silver ACK Probe Packet (RM) PP Data Packet SP Data Packet Core Router 71 SP (Strict Priority) Queuing • Data packets of PPs enter Silver Queue. Data packets of SPs enter Bronze Queue • ACK and RM Packets join the Gold Queue • Gold Queue has strict priority over other queues, Silver Queue has strict priority over Bronze Queue Gold Silver Bronze ACK Probe Packet (RM) PP Data Packet SP Data Packet Core Router 72 Hybrid SP – Deficit Round Robin Scheduler Give priority to TCP ACK and RM packets Of the remaining capacity 90 % is given to PP flows 10 % is given to LP flows Very similar to strict priority queueing except that SP flows are not starved 73 Feedback Mechanism Gold Silver Bronze ACK Primary RM (P-RM) Secondary RM (S-RM) Primary Path Data Packet 0 0 0 0 Secondary Path Data Packet Core 0 0 0 0 Gold Silver Bronze Ingress Egress 1 0 0 1 0 1 Core 0 0 1 74 Feedback Mechanism ACK Primary RM (P-RM) Secondary RM (S-RM) Primary Path Data Packet 0 0 0 0 Core 0 Ingress 0 Secondary Path Data Packet 0 0 0 Egress 1 1 1 1 1 1 Core 1 1 1 75 Rate Control • When the Ingress Node receives the congestion information about the path, it will invoke an AIMD (Additive Increase Multiplicative Decrease) algorithm to compute the ATR (Allowed Transmission Rate) of the corresponding path. ATR: Allowed Transmission Rate RDF: Rate Decrease Factor RIF: Rate Increase Factor MTR: Minimum Transmission Rate PTR: Peak Transmission Rate 76 Traffic Splitting Traffic Splitting Units Per-egress queuing DPP Incoming Flows For Destination 1 AIMD PP Queue Per-class queuing Gold Queue + RM + TCP ACK SP Queue DSP DPP Incoming Flows For Destination 2 Silver Queue PP Queue + SP Queue Bronze Queue DSP • • • When a new flow arrives at an ingress router, a decision on how to forward the packets of this new flow needs to be made. We compute the DPP and DSP delay estimates for the PP and SP queues at the Edge Node, respectively. Then calculate and update dn that is averaged (smoothed) difference, DPP - DSP, at the epoch of the nth packet arrival 77 Random Early Reroute • By using the updated dn value, we decide whether to assign this new flow to PP or SP: • Assign the new flow to PP with probability (1-p(dn)) • Assign the new flow to SP with probability (p(dn)) • We call this policy as Random Early Reroute (RER). It is used for controlling the delay difference of queues of PP and SP • This policy gives priority to PP over SP on the edge nodes. 78 Simulation Setting Three Node Network Topology Edge 2 • Flow size dist. = Bounded Pareto • Flow interarrival dist. = Poisson • Total traffic from each node = 70 Mb/s • Speed of core links = 50 Mb/s Core 2 Core 1 Edge 1 Core 3 Edge 3 79 Simulation Parameters Mesh Network Topology ch cl de ny sl sf dc sj at da la hs • This topology and the traffic demand matrix T[i,j] are used from the data given in www.fictitious.org/omp. • Each link is bi-directional and has 45 Mb/s capacity in both directions except the links between de-ch and ch-cl have 90 Mb/s capacity in both directions. 80 Simulation Setting • TRM = 0.1 s for 3-node, 0.02s otherwise • p0 = 1 • RER: • minth= 1 ms • maxth= 15 ms • Shortest Delay: • minth= 0 ms • maxth= 0 ms Proposed architecture is implemented over ns2 network simulator as an extension 81 Results - 3 nodes 83 Results – Large topology 84 Triangle Effect – 3 Nodes 85 Large Topology 86