Document

Chapter 4 Internetworking 4.1 Simple Internetworking (IP) 4.2 Routing 4.3 Global Internet 4.4 Multicast 4.5 Multiprotocol Label Switching (MPLS) 1 4.1 Simple Internetworking (IP)  Best Effort Service Model  Global Addressing Scheme  ARP (Address Resolution Protocol  ICMP (Internet Message Control Protocol) 2 IP Internet Network 1 (Ethernet)  Concatenation of Networks H1 H2 H7 H3 R3 H8 Network 4 (point-to-point) Network 2 (Ethernet) R1 R2 H4  Protocol Stack Network 3 (FDDI) H6 H5 H1 H8 T CP R1 IP ETH R2 IP ETH R3 IP FDDI FDDI IP PPP PPP T CP IP ETH ETH 3 Service Model  Connectionless (datagram-based)  Best-effort delivery (unreliable service)  packets are lost  packets are delivered out of order  duplicate copies of a packet are delivered  packets can be delayed for a long time  Datagram format 0 4 Version 8 HLen 16 T OS 31 Length Ident T TL 19 Flags Protocol Offset Checksum SourceAddr DestinationAddr Options (variable) Pad (variable) Data 4 Fragmentation and Reassembly Each network has some MTU Design decisions fragment when necessary (MTU < Datagram) try to avoid fragmentation at source host re-fragmentation is possible fragments are self-contained datagrams use CS-PDU (not cells) for ATM delay reassembly until destination host do not recover from lost fragments 5 Start of header Ident = x Example 0 Offset = 0 Rest of header (a) 1400 data bytes Start of header Ident = x 1 Offset = 0 Rest of header H1 R1 R1 R2 R2 R3 R3 H8 512 data bytes (b) Start of header ETH IP (1400) FDDI IP (1400) Ident = x 1 Offset = 64 PPP IP (512) ETH IP (512) PPP IP (512) ETH IP (512) Rest of header PPP IP (376) ETH IP (376) 512 data bytes Start of header Ident = x 0 Offset = 128 Rest of header 376 data bytes 6 Global Addresses  Properties  globally unique  hierarchical: network + host  Dot Notation  10.3.2.4  128.96.33.81  192.12.69.77 7 (a) 0 24 Network Host 14 (b) 1 0 16 Network Host 21 (c) 1 1 0 Network 8 Host 7 Datagram Forwarding  Strategy  every datagram contains destination’s address  if connected to destination network, then forward to host  if not directly connected, then forward to some router  forwarding table maps network number into next hop  each host has a default router  each router maintains a forwarding table  Example (R2) Network Number 1 2 3 4 Next Hop R3 R1 interface 1 interface 0 8 Address Translation  Map IP addresses into physical addresses  destination host  next hop router  Techniques  encode physical address in host part of IP address  table-based  ARP  table of IP to physical address bindings  broadcast request if IP address not in table  target machine responds with its physical address  table entries are discarded if not refreshed 9 ARP Details  Request Format  HardwareType: type of physical network (e.g., Ethernet)  ProtocolType: type of higher layer protocol (e.g., IP)  HLEN & PLEN: length of physical and protocol addresses  Operation: request or response  Source/Target-Physical/Protocol addresses  Notes  table entries timeout in about 10 minutes  update table with source when you are the target  update table if already have an entry  do not refresh table entries upon reference 10 ARP Packet Format 0 8 16 Hardware type = 1 HLen = 48 PLen = 32 31 ProtocolType = 0x0800 Operation SourceHardwareAddr (bytes 0― 3) SourceHardwareAddr (bytes 4- 5) SourceProtocolAddr (bytes 0 - 1) SourceProtocolAddr (bytes 2 - 3) TargetHardwareAddr (bytes 0 – 1) TargetHardwareAddr (bytes 2 - 5) TargetProtocolAddr (bytes 0 - 3) 11 Internet Control Message Protocol (ICMP)  Echo (ping)  Redirect (from router to source host)  Destination unreachable (protocol, port, or host)  TTL exceeded (so datagrams don’t cycle forever)  Checksum failed  Reassembly failed  Cannot fragment 12 Redirect G1 Network (1) H1 Network (2) G2 Network H2  G2 finds that H1 is directly connected and will inform H1 to redirect the IP datagrams to G2. 4.2 Routing  Forwarding vs Routing  forwarding: to select an output port based on destination address and routing table  routing: process by which routing table is built  Network as a Graph A 3 4 C 6 1 2 1 B 9 E F 1 D  Problem: Find lowest cost path between two nodes  Factors  static: topology  dynamic: load 14 Distance Vector  Each node maintains a set of triples  (Destination, Cost, NextHop)  Directly connected neighbors exchange updates  periodically (on the order of several seconds)  whenever table changes (called triggered update)  Each update is a list of pairs:  (Destination, Cost)  Update local table if receive a “better” route  smaller cost  came from next-hop  Refresh existing routes; delete if they time out 15 Routing Table Example (Node B) B C A D E F G Destination Cost NextHop A 1 A C 1 C D 2 C E 2 A F 2 A G 3 A 16 Routing Loops  Example 1  F detects that link to G has failed  F sets distance to G to infinity and sends update to A  A sets distance to G to infinity since it uses F to reach G  A receives periodic update from C with 2-hop path to G  A sets distance to G to 3 and sends update to F  F decides it can reach G in 4 hops via A B C A D E F G 17 Routing Loops  Example 2  link from A to E fails  A advertises distance of infinity to E  B and C advertise a distance of 2 to E  B decides it can reach E in 3 hops; advertises this to A  A decides it can read E in 4 hops; advertises this to C  C decides that it can reach E in 5 hops… B C A D E F G 18 Distance Vector: link cost changes Link cost changes:  node detects local link cost change 1  updates routing info, recalculates distance vector  if DV changes, notify neighbors “good news travels fast” x 4 y 50 1 z At time t0, y detects the link-cost change, updates its DV, and informs its neighbors. At time t1, z receives the update from y and updates its table. It computes a new least cost to x and sends its neighbors its DV. At time t2, y receives z’s update and updates its distance table. y’s least costs do not change and hence y does not send any message to z. 19 Distance Vector: link cost changes “good news Travels fast” Dy 1 x 4 y 50 1 z algorithm terminates Dz 20 Distance Vector: link cost changes Link cost changes:  bad news travels slow - “count to infinity” problem!  44 iterations before algorithm stabilizes  z (y) does not know that the least distance from y (z) 60 X 4 Y 1 Z 50 to x that y (z) tells z (y) is the distance of the path yz-y-x (z-y-x) algorithm continues on! 21 Distance Vector: poisoned reverse If Z routes through Y to get to X :  Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z)  will this completely solve count to infinity problem?  Loops involving three or more nodes cannot be solved using the technique 60 X 4 Y 50 1 Z algorithm terminates 22 RIP ( Routing Information Protocol)  Distance vector algorithm  Included in BSD-UNIX Distribution in 1982  Distance metric: # of hops (max = 15 hops) Source node: A u v A z C B D w x y destination hops u 1 v 2 w 2 x 3 y 3 z 2 23 RIP advertisements  Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement)  Each advertisement: a list of up to 25 destination subnets within AS 0 8 Command 16 Version Family of net 1 31 Must be zero Address of net 1 Address of net 1 Distance to net 1 Family of net 2 Address of net 2 Address of net 2 Distance to net 2 24 RIP: Example z w A x D B y C Destination Network w y z x …. Next Router Num. of hops to dest. …. .... A B B -- 2 2 7 1 Routing table in D 25 RIP: Example Dest w x z …. Next C … w hops 4 ... A Advertisement from A to D z x Destination Network w y z x …. D B C y Next Router Num. of hops to dest. …. .... A B B A -- Routing table in D 2 2 7 5 1 26 RIP: Link Failure and Recovery If no advertisement heard after 180 sec --> neighbor or link declared dead  routes via neighbor invalidated  new advertisements sent to neighbors  neighbors in turn send out new advertisements (if tables changed)  link failure info quickly propagates to entire net  poison reverse used to prevent ping-pong loops (infinite distance = 16 hops) 27 RIP Table processing  RIP routing tables managed by application-level process called route-d (daemon)  advertisements sent in UDP packets, periodically repeated routed routed Transprt (UDP) network (IP) link physical Transprt (UDP) forwarding table forwarding table network (IP) link physical 28 Link State  Strategy  send to all nodes (not just neighbors) information about directly connected links (not entire routing table)  Link State Packet (LSP)  id of the node that created the LSP  cost of link to each directly connected neighbor  sequence number (SEQNO)  time-to-live (TTL) for this packet 29 Link State (cont) Reliable flooding store most recent LSP from each node forward LSP to all nodes but one that sent it generate new LSP periodically increment SEQNO start SEQNO at 0 when reboot decrement TTL of each stored LSP discard when TTL=0 30 Reliable Flooding X A C B D X A C B (a) X A C B (c) D (b) D X A C B D (d) 31 Route Calculation  Dijkstra’s shortest path algorithm  Let      N denotes set of nodes in the graph l (i, j) denotes non-negative cost (weight) for edge (i, j) s denotes this node M denotes the set of nodes incorporated so far C(n) denotes cost of the path from s to node n M = {s} for each n in N - {s} C(n) = l(s, n) while (N != M) M = M union {w} such that C(w) is the minimum for all w in (N - M) for each n in (N - M) C(n) = MIN(C(n), C (w) + l(w, n )) 32 A Link-State Routing Algorithm Dijkstra’s algorithm  net topology, link costs known to all nodes  accomplished via “link state broadcast”  all nodes have same info  computes least cost paths from one node (‘source”) to all other nodes  gives forwarding table for that node  iterative: after k iterations, know least cost path to k destinations Notation:  c(x,y): link cost from node x to y; = ∞ if not direct neighbors  D(v): current value of cost of path from source to destination v  p(v): predecessor node along path from source to v  N': set of nodes whose least cost path definitively known 33 Dijsktra’s Algorithm 1 Initialization: u: source node 2 N' = {u} 3 for all nodes v 4 if v adjacent to u 5 then D(v) = c(u,v) 6 else D(v) = ∞ 7 8 Loop 9 find w not in N' such that D(w) is a minimum 10 add w to N' 11 update D(v) for all v adjacent to w and not in N' : 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N' 34 Dijkstra’s algorithm: example Step 0 1 2 3 4 5 N' u ux uxy uxyv uxyvw uxyvwz D(v),p(v) D(w),p(w) 2,u 5,u 2,u 4,x 2,u 3,y 3,y D(x),p(x) 1,u D(y),p(y) ∞ 2,x D(z),p(z) ∞ ∞ 4,y 4,y 4,y 5 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 35 Dijkstra’s algorithm: example 5 5 2 u 3 v 2 1 x w 3 5 2 y 1 z 1 2 u 2 1 2 u 2 1 x w 5 z 1 3 x 2 y 1 5 5 v 3 v 3 w 3 1 5 z 1 y 2 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 36 Dijkstra’s algorithm: example 5 5 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 2 u v 2 1 x 3 w 3 1 5 z 1 y 2 37 Dijkstra’s algorithm, discussion Algorithm complexity: n nodes  each iteration: need to check all nodes, w, not in N  n(n+1)/2 comparisons: O(n2)  more efficient implementations possible: O(nlogn) Oscillations possible:  e.g., link cost = amount of carried traffic D 1 1 0 A 0 0 C e 1+e e initially B 1 2+e A 0 D 1+e 1 B 0 0 C … recompute routing 0 D 1 A 0 0 C 2+e B 1+e … recompute 2+e A 0 D 1+e 1 B e 0 C … recompute 38 OSPF (Open Shortest Path First)  “open”: publicly available – defined in RFC 2328  Uses Link State algorithm  Link-State packet dissemination  Topology map at each node  Route computation using Dijkstra’s algorithm  OSPF advertisement carries one entry per neighbor router  Advertisements disseminated to entire AS (via flooding)  Carried in OSPF messages directly over IP (rather than TCP or UDP) 39 OSPF “advanced” features (not in RIP)  Security: all OSPF messages authenticated (to prevent malicious intrusion)  Load Balancing: Multiple same-cost paths allowed (only one path in RIP)  For each link, multiple cost metrics for different TOS (e.g., satellite link cost set “low” for best effort; high for real time)  Integrated uni- and multicast support:  Multicast OSPF (MOSPF) uses same topology data base as OSPF  Hierarchical OSPF in large domains. 40 Hierarchical OSPF  An OSPF autonomous system (AS) can be configured into areas  Exactly one OSPF area in the AS is configured to be the backbone area  Each area runs its own OSPF link-state routing algorithm  Two-level hierarchy: local area, backbone.  Link-state advertisements only in area  each nodes has detailed area topology; only know direction (shortest path) to nets in other areas. 41 Hierarchical OSPF 42 Hierarchical OSPF Four types of routers  Internal routers: perform only intra AS routing  Area border routers: belong to both an area and the backbone  Backbone routers: run OSPF routing limited to backbone.  Boundary routers: connect to other AS’s. 43 OSPF Advertisement Format 0 8 16 31 LS Age Version Type Message length Checksum Authentication type T ype=1 Link-state ID Advertising router SourceAddr AreaId Options LS sequence number LS checksum Length 0 Flags 0 Number of links Link ID Authentication Link data Link type Num_T OS Metric Optional T OS information More links Header Format Link-State Advertisement 44 Comparison of LS and DV algorithms Message complexity  LS: with n nodes, E links, O(nE) messages sent  DV: exchange between neighbors only  convergence time varies Speed of Convergence  LS: O(n2) algorithm requires O(nE) messages  may have oscillations  DV: convergence time varies  may be routing loops  count-to-infinity problem Robustness: what happens if router malfunctions? LS:  node can advertise incorrect link cost  each node computes only its own table DV:  DV node can advertise incorrect path cost  each node’s table used by others  error propagate thru network 45 Metrics  Original ARPANET metric  measures number of packets queued on each link  took neither latency or bandwidth into consideration  New ARPANET metric  stamp each incoming packet with its arrival time (AT)  record departure time (DT)  when link-level ACK arrives, compute Delay = (DT - AT) + Transmit + Latency  if timeout, reset DT to departure time for retransmission  link cost = average delay over some time period 46 Metrics  Still has problems  Under light load, it works well since the two static factors of delay dominated the cost.  Under heavy load, a congested link would start to advertise a very high cost. This caused all the traffic to move off that link, leaving it idle, so then it advertise a low cost,…  The range of link values was much too large.  Fine Tuning  compressed dynamic range  replaced Delay with link utilization 47 Revised ARPANET routing metric versus link utilization 225 9.6-Kbps satellite link 140 9.6-Kbps terrestrial link 56-Kbps satellite link 56-Kbps terrestrial link 90 75 60 30 25% 50% Utilization 75% 100% 48 Revised ARPANET routing metric versus link utilization  A highly loaded link never shows a cost of more than three times its cost when idle  The most expensive link is only seven times the cost of least expensive  A high-speed satellite link is more attractive than a low-speed terrestrial link  Cost is a function of link utilization only at moderate to high loads. 49 4.3 Global Internet Structure Tree Structure of the Internet in 1990 NSFNET backbone Stanford BARRNET regional Berkeley Westnet regional PARC MidNet regional ■■■ UNM NCAR ISU UNL KU UA 50 Global Internet  One of the salient features of this topology is that it consists of “end user” sites (e.g, Stanford university) that connect to “service provider” networks (e.g, BARRNET)  Each provider and end user is likely to be an administratively independent entity – Autonomous System (AS).  Scalability problems  Scalability of routing  Address utilization  Subnetting – deals with address space utilization  Classless routing or supernetting – tackles both address utilization and routing scalability 51 Subnetting  Inefficient use of Hierarchical Address Space  class C with 2 hosts (2/255 = 0.78% efficient)  class B with 256 hosts (256/65535 = 0.39% efficient)  Still Too Many Networks  routing tables do not scale  route propagation protocols do not scale  Subnetting provides an elegantly simple way to reduce the total number of networks that are assigned  The idea is to take a single IP network number and allocate the IP addresses with that network number to several physical networks – subnets. 52 Subnetting  Add another level to address/routing hierarchy: subnet  Subnet masks define variable partition of host part  A single network number can be shared among multiple networks involves configuring all the nodes on each subnet with a subnet mask.  Subnets visible only within site Network number Host number Class B address 111111111111111111111111 00000000 Subnet mask (255.255.255.0) Network number Subnet ID Host ID 53 Subnetted address Subnet Example H1  H2 255.255.255.128 128.96.34.139 128.96.34.128 Subnet mask: 255.255.255.128 Subnet number: 128.96.34.0 128.96.34.15 128.96.34.1 R1 H1 Subnet mask: 255.255.255.128 Subnet number: 128.96.34.128 128.96.34.130 128.96.34.139 128.96.34.129 H3 R2 H2 R1 255.255.255.128 128.96.34.139 128.96.34.128 128.96.33.1 128.96.33.14 Forwarding table at router R1 Subnet mask: 255.255.255.0 Subnet number: 128.96.33.0 Subnet Number 128.96.34.0 128.96.34.128 128.96.33.0 Subnet Mask 255.255.255.128 255.255.255.128 255.255.255.0 Next Hop interface 0 interface 1 R2 54 Forwarding Algorithm D = destination IP address for each entry (SubnetNum, SubnetMask, NextHop) D1 = SubnetMask & D if D1 = SubnetNum if NextHop is an interface deliver datagram directly to D else deliver datagram to NextHop  Use a default router if nothing matches  Not necessary for all 1s in subnet mask to be contiguous  Can put multiple subnets on one physical network  Subnets not visible from the rest of the Internet 55 Classless Routing (CIDR) Supernetting  CIDR: Classless Inter-Domain Routing  A technique that addresses two scaling concerns:  the growth of backbone routing tables, and  the potential for the 32-bit IP address space to be exhausted well before the 4 billionth host is attached to the Internet.  Even though subnetting can help to assign addresses carefully, it does not get around the fact that any AS with more than 255 hosts wants a class B address – exhaustion of IP address space. 56 Classless Routing (CIDR) Supernetting  CIDR tries to balance the desire to minimize the number of routes that a router needs to know against the need to hand out addresses efficiently  Assign block of contiguous network numbers to nearby networks  Represent blocks with a single pair (first_network_address, count)  Restrict block sizes to powers of 2  Use a bit mask (CIDR mask) to identify block size  All routers must understand CIDR addressing 57 Route aggregation with CIDR Customers 128.112.128/24 Advertise 128.112.128/21 ISP .. . 128.112.135/24  Since all of the customers are reachable through the same Provider network, it can advertise a single route to all of Them by just advertising the common 21-bit prefix they share 58 IP Forwarding Revisited  Find the network number in a packet and then lookup that number in a forwarding table.  Reexamine this assumption with CIDR  Prefixes length 2-32 bits  Prefixes may “overlap”  Some addresses may match more than one prefix.  Longest Prefix Matching (LPM)  For example  171.69 (16-bit prefix)  171.69.10 (24-bit prefix)  171.69.10.5 matches both  171.69.20.5 only matches 171.69 59 Interdomain Routing (BGP) • AS = routing domain • Routing Policies • Two major Interdomain routing protocols -- Exterior gateway Protocol (EGP) -- Border gateway Protocol (BGP-4) R1 R3 R2 Autonomous system 1 R4 Autonomous system 2 R5 R6 60 BGP-4: Border Gateway Protocol  AS Types  stub AS: has a single connection to one other AS carries local traffic only  multihomed AS: has connections to more than one AS refuses to carry transit traffic  transit AS: has connections to more than one AS carries both transit and local traffic  Each AS has:  one or more border routers  one BGP speaker that advertises: local networks other reachable networks (transit AS only) gives path information 61 Today’s multibackbone Internet Large corporation “Consumer” ISP Peering point Backbone service provider “Consumer” ISP Large corporation Peering point “Consumer” ISP Small corporation 62 BGP Example  Speaker for AS2 advertises reachability to P and Q  network 128.96, 192.4.153, 192.4.32, and 192.4.3, can be reached directly from AS2 Customer P (AS 4) 128.96 192.4.153 Customer Q (AS 5) 192.4.32 192.4.3 Customer R (AS 6) 192.12.69 Customer S (AS 7) 192.4.54 192.4.23 Regional provider A (AS 2) Backbone network (AS 1) Regional provider B (AS 3)  Speaker for backbone advertises  networks 128.96, 192.4.153, 192.4.32, and 192.4.3 can be reached along the path (AS1, AS2).  Speaker can cancel previously advertised paths 63 Internet inter-AS routing: BGP  BGP (Border Gateway Protocol): the de facto standard  BGP provides each AS a means to: 1. Obtain subnet reachability information from neighboring ASs. 2. Propagate the reachability information to all routers internal to the AS. 3. Determine “good” routes to subnets based on reachability information and policy.  Allows a subnet to advertise its existence to rest of the Internet: “I am here” 64 BGP basics  Pairs of routers (BGP peers) exchange routing information over semi-permanent TCP connections: BGP sessions  Note that BGP sessions do not correspond to physical links.  When AS2 advertises a prefix to AS1, AS2 is promising it will forward any datagrams destined to that prefix towards the prefix.  AS2 can aggregate prefixes in its advertisement 3c 3a 3b AS3 1a AS1 2a 1c 1d 1b 2c AS2 2b External BGP (eBGP) session Internal BGP (iBGP) session 65 Aggregation of prefixes 138.16.64/24 138.16.65/24 138.16.66/24 ＝＞ 138.16.64/22 138.16.67/24 66 Distributing reachability info  With eBGP session between 3a and 1c, AS3 sends prefix reachability information to AS1.  1c can then use iBGP to distribute this new prefix reachability information to all routers in AS1  1b can then re-advertise the new reachability information to AS2 over the 1b-to-2a eBGP session  When router learns about a new prefix, it creates an entry for the prefix in its forwarding table. 3c 3a 3b AS3 1a AS1 2a 1c 1d 1b 2c AS2 2b eBGP session iBGP session 67 Path attributes & BGP routes  When advertising a prefix, advertisement includes BGP attributes.  prefix + attributes = “route”  Two important attributes:  AS-PATH: contains the ASs through which the advertisement for the prefix passed: AS 67 AS 17  used to detect and prevent looping advertisement  also use in choosing among multiple path to the same prefix  NEXT-HOP: Indicates the specific internal-AS router to next-hop AS. (There may be multiple links from current AS to next-hopAS.)  When gateway router receives route advertisement, uses import policy to accept/decline. 68 BGP route selection  Router may learn about more than 1 route to any one prefix. Router must select route.  Elimination rules invoked sequentially until one route remains: 1. Local preference value attribute: policy decision – AS’s network administrator 2. Shortest AS-PATH 3. Closest NEXT-HOP router: hot potato routing 4. Additional criteria 69 BGP messages  BGP messages exchanged using TCP.  BGP messages:  OPEN: opens TCP connection to peer and authenticates sender  UPDATE: advertises new path (or withdraws old)  KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request  NOTIFICATION: reports errors in previous message; also used to close connection 70 BGP routing policy legend: B W provider network X A customer network: C Y Figure 4.5-BGPnew: a simple BGP scenario  A,B,C are provider networks  X,W,Y are customer (of provider networks)  X is dual-homed: attached to two networks  X does not want to route from B via X to C  .. so X will not advertise to B a route to C 71 BGP routing policy (2) legend: B W provider network X A customer network: C Y Figure 4.5-BGPnew: a simple BGP scenario  A advertises to B the path AW  B advertises to X the path BAW  Should B advertise to C the path BAW?  No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers  B wants to force C to route to w via A  B wants to route only to/from its customers! 72 Why different Intra- and Inter-AS routing ? Policy:  Inter-AS: administrator wants control over how its traffic routed, who routes through its net.  Intra-AS: single admin, so no policy decisions needed Scale:  hierarchical routing saves table size, reduced update traffic Performance:  Intra-AS: can focus on performance  Inter-AS: policy may dominate over performance 73 IP Version 6  Features  128-bit addresses (classless)  multicast  real-time service  authentication and security  autoconfiguration  end-to-end fragmentation  protocol extensions  Header  40-byte “base” header  extension headers (fixed order, mostly fixed length)  fragmentation  source routing  authentication and security  other options 74 4.4 Broadcast/Multicast routing  Broadcast routing –- deliver a packet from a source node to all other nodes  Multicast routing – deliver a packet from a source node to a subset of other nodes 75 Source-duplication versus in-network duplication duplicate R1 duplicate creation/transmission R1 duplicate R2 R2 R3 R4 (a) R3 R4 (b) (a) source duplication, (b) in-network duplication 76 Broadcast routing algorithms Uncontrolled flooding Controlled flooding Spanning-tree broadcast 77 Uncontrolled flooding  The source node sends a copy of the packet to all of its neighbors  When a node receives a broadcast packet, it duplicates the packet and forwards it to all of its neighbors (except the neighbor from which it receives the packet) Problems:  If the graph has cycles, then one or more copies of each broadcast packet will cycle indefinitely  Broadcast storm 78 Controlled flooding Sequence-number-controlled flooding Source node puts its address and a broadcast sequence number into a broadcast packet Each node maintains a list of the source address and sequence number of each packet it has received When a node receives a broadcast packet If the packet is in the list, the packet is dropped Otherwise, the packet is duplicated and forwarded 79 Controlled flooding  Reverse path forwarding  When a router receives a broadcast packet, it duplicates and forwards the packet only if the packet arrives on the link that is on its own shortest unicast path back to the source A B c F E Packet will be forwarded D G Packet not forwarded beyond receiving router 80 Controlled flooding Drawback  Some of the nodes receive redundant packets A B c F E D G Redundant packets Ideally, every node should receive only one copy of the broadcast packet. 81 Spanning-tree broadcast Spanning tree – a tree that contains all nodes in a graph Minimum spanning tree – a spanning tree whose cost is the minimum among all the spanning trees of a graph  Broadcast along a spanning tree A B c F A E B c D F G (a) Broadcast initiated at A E D G (b) Broadcast initiated at D 82 Construction of Spanning-tree  Many algorithms have been developed  Center-based approach  Select a center node (rendezvous or core)  Each node unicasts tree-join message to the center node A A 3 B c 4 F 2 E 1 Center node B c D F 5 E D G G (a) Stepwise construction of spanning tree (b) Constructed spanning tree 83 Multicast Routing: Problem Statement  Goal: find a tree (or trees) connecting routers having local multicast group members  tree: not all paths between routers used  source-based: different tree from each sender to receivers  shared-tree: same tree used by all group members Source-based trees Shared tree 84 Approaches for building multicast trees source-based tree: one tree per source shortest path trees reverse path forwarding group-shared tree: group uses one tree minimal spanning (Steiner) center-based trees …we first look at basic approaches, then specific protocols adopting these approaches 85 Shortest Path Tree  multicast forwarding tree: tree of shortest path routes from source to all receivers  Dijkstra’s algorithm S: source LEGEND R1 1 2 R4 R2 3 R3 router with attached group member 5 4 R6 router with no attached group member R5 6 R7 i link used for forwarding, i indicates order link added by algorithm 86 Reverse Path Forwarding  rely on router’s knowledge of unicast shortest path from it to sender  each router has simple forwarding behavior: if (multicast datagram received on incoming link on shortest path back to sender) then flood datagram onto all outgoing links else ignore datagram 87 Reverse Path Forwarding: example S: source LEGEND R1 R4 router with attached group member R2 R5 R3 R6 R7 router with no attached group member datagram will be forwarded datagram will not be forwarded • result is a source-specific reverse SPT – may be a bad choice with asymmetric links 88 Reverse Path Forwarding: pruning  forwarding tree contains subtrees with no multicast group members  no need to forward datagrams down subtree  “prune” messages sent upstream by router with no downstream group members LEGEND S: source R1 router with attached group member R4 R2 P R5 R3 R6 P R7 P router with no attached group member prune message links with multicast forwarding 89 Shared-Tree: Steiner Tree  Steiner Tree: minimum cost tree connecting all routers with attached group members  problem is NP-complete  excellent heuristics exists  not used in practice:  computational complexity  information about entire network needed  monolithic: rerun whenever a router needs to join/leave 90 Center-based trees  single delivery tree shared by all  one router identified as “center” of tree  to join:  edge router sends unicast join-message addressed to center router  join-message “processed” by intermediate routers and forwarded towards center  join-message either hits existing tree branch for this center, or arrives at center  path taken by join-message becomes new branch of tree for this router 91 Center-based trees: an example Suppose R6 chosen as center: LEGEND R1 3 R2 router with attached group member R4 2 R5 R3 1 R6 1 router with no attached group member path order in which join messages generated R7 92 Internet Multicasting Routing: DVMRP  DVMRP: distance vector multicast routing protocol, RFC1075  flood and prune: source-based tree, reverse path forwarding,  RPF tree based on DVMRP’s own routing tables constructed by communicating DVMRP routers  no assumptions about underlying unicast  initial datagram to multicast group flooded everywhere via RPF  routers not wanting group: send upstream prune messages 93 DVMRP: continued…  soft state: DVMRP router periodically (1 min.) “forgets” branches are pruned:  multicast data again flows down unpruned branch  downstream router: reprune or else continue to receive data  routers can quickly regraft to tree  following IGMP join at leaf  odds and ends  commonly implemented in commercial routers  Mbone routing done using DVMRP 94 Tunneling Q: How to connect “islands” of multicast routers in a “sea” of unicast routers? physical topology logical topology  multicast datagram encapsulated inside “normal” (non-multicast- addressed) datagram  normal IP datagram sent thru “tunnel” via regular IP unicast to receiving multicast router  receiving multicast router decapsulates to get multicast datagram 95 PIM: Protocol Independent Multicast  Not dependent on any specific underlying unicast routing algorithm (like RIP, OSPF, works with all)  Two different multicast distribution scenarios : Dense: Sparse:  group members  # of routers with group densely packed, in “close” proximity. members is small wrt total # of routers  group members “widely dispersed” 96 Consequences of Sparse-Dense Dichotomy: Dense Sparse:  group membership by  no membership until routers explicitly join routers assumed until routers explicitly prune  receiver-driven  data-driven construction construction of multicast tree (e.g., center-based) of multicast tree (e.g., RPF)  bandwidth and nongroup-router processing  bandwidth and nonconservative group-router processing profligate 97 PIM- Dense Mode Flood-and-prune RPF, similar to DVMRP but  underlying unicast protocol provides RPF information for incoming datagram  less complicated (less efficient) downstream flood than DVMRP  reduces reliance on underlying routing algorithm  has protocol mechanism for router to detect if it is a leaf-node router 98 PIM - Sparse Mode  Center-based approach R1  router sends join message to rendezvous point (RP)  intermediate routers update state and forward join  after joining via RP, router can switch to source-specific tree R4 join R2 R3 join R5 join R6 all data multicast from rendezvous point R7 rendezvous point 99 PIM - Sparse Mode Sender(s):  unicast data to RP, which distributes down RP-rooted tree  RP can extend multicast tree upstream to source  RP can send stop message to the source if no attached receivers R1 R4 join R2 R3 join R5 join R6 all data multicast from rendezvous point R7 rendezvous point  “no one is listening!” 100 4.5 MultiProtocol Label Switching (MPLS) Prior Work MPLS Overview MPLS Architecture Prior Work Tag Switching (Cisco) Aggregate Route-Based IP Switching (ARIS, IBM) IP Navigator IFMP-IP Switching (Ipsilon) Cell Switching Router (CSR, Toshiba) 102 Prior Work Tag switching is based on the control-driven approach. The set up of LSPs (Label Switched Paths) closely follows control messages such as routing updates and RSVP messages. Aggregate route-based IP switching (ARIS) is based on the control-driven approach. Very similar to tag switching. ARIS introduces the concept of an “egress identifier” (FECs) to express the granularity of LSPs. IP Navigator is again a control-driven protocol. Use OSPF as the internal routing protocol used within a routing domain. Explicit routing is used to setting up the VCs. 103 Prior Work  Ipsilon Flow Management Protocol (IFMP) is a traffic driven protocol. When the number of packets from a flow exceeds a predetermined threshold, the controller uses IFMP to set up an LSP for the particular flow.  Cell switch router (CSR) proposal is similar to IP switching. CSR is primarily designed as a device for interconnecting ATM clouds. Within an LIS (logical IP subnet), ATM forum standards are used to connection hosts and switched together.  Multiple LISs are then interconnected with CSRs that are capable of running both IP forwarding and cell forwarding. The setup of LSPs is data-driven for best effort traffic and RSVPdriven for flows that require resource reservation. 104 MPLS Overview RFC 3812 The IETF MPLS working group is to standardize a base technology that integrates the label swapping forwarding paradigm with network layer routing. Cisco is the major contributor to the MPLS working group. substitute “Label” for “Tag” in Tag Switching @ MPLS Core mechanisms of MPLS Semantics assigned to a stream label Labels are associated with specific streams of data. Forwarding Methods Forwarding is simplified by the use of the short fixed length labels to identify streams. Forwarding may require simple functions such as looking up a label in a table, swapping labels, and possibly decrementing and checking a TTL. Label Distribution Methods Allow nodes to determine which labels to use for specific streams. Native IP Forwarding IP routing: both the packet forwarding and route determination process in an IP network. Native IP forwarding (NIF): hop-by-hop, destination-based packet forwarding. Each packet’s next hop and output port are determined by a longest-prefix-match forwarding table lookup. Additional packet classification may also be performed to derive output port queuing and scheduling rules. 107 A Simplified NIF forwarding engine Longest Prefix Match lookup Forwarding Table Next hop + port Packet Classification Input Ports IP Header Queuing and Scheduling rules Output Ports IP payload Packet Classification keys: IP source and destination addresses, IP protocol type, DiffServ (DS) or TOS byte, and TCP/UDP port numbers. 108 Per-Hop classification, queuing, and scheduling Classify Queue Port 1 Port N S Port M 109 A Simplified LSR forwarding engine Switching Table Next hop + port Queuing and Scheduling rules Output Ports Input Ports MPLS label MPLS payload 110 Traffic Engineering Conventional IP routing attempts to find the shortest path between a packet’s current location and its intended destination. “Hot spots” and packet loss rates, latency, and jitter increase as the average load on a router rises. Solutions: (1) Faster routers, (2) Alternate routes. Routing policy may also require traffic engineering. For example, the external link between R6 and A3 may have been funded solely by A2 and A3. Therefore, A1’s traffic must not be allowed to traverse it. 111 Traffic Engineering -- Override the shortest path route IP Backbone Access 1 R1 R6 Access 3 R5 Access 2 R2 R3 R4 Route from A2 to D Desired route from A1 to D Actual route from A1 to D Destination D 112 Signaling and Provisioning Signaling: when network (re)configuration can be requested by users at any time and achieved within milliseconds or seconds. Provisioning: When the reaction time for (re)configuration becomes measured in minutes or hours. In either case, the (re)configuring action involves establishing (or modifying) information used by routers or switches to control their forwarding actions, including  forwarding (routing) information,  classification rules, and/or  queuing and scheduling parameters. 113 Core MPLS Components The basic routing approach Routing is accomplished through the use of standard L3 routing protocols (e.g. OSPF and BGP). The information maintained by the L3 routing protocols is then used to distribute labels to neighboring nodes that are used in the forwarding of packets. Labels Label semantics, Label granularity, Label assignment, Label stack and forwarding operations. Label Semantics  The label is nothing more than a shorthand for an aggregate stream of user data.  The meaning of the label is a strictly local issue between two neighboring nodes.  MPLS could be employed between any two neighboring nodes, even if no other nodes in the network participate in MPLS.  When MPLS is used between more than two nodes, then the operation between any two neighboring nodes could be interpreted as independent of the operation between any other pair of nodes. Label Granularity  The device uses the label to forward packets will forward all packets with the same label in the same way.  A Forwarding Equivalence Class (FEC) is a set of L3 packets which are all forwarded in the same manner by a particular Label Switching Router (LSR).  For unicast IP traffic, the granularity of a label allows various levels of aggregation in a Label Information Base (LIB).  For IP multicast, the natural binding of a label would be to a multicast tree. Label assignment  Label assignment involves allocating a label, and then binding a label to a route.  Label assignment can be driven by control traffic or data traffic. (discussed later.)  Label withdrawal is primarily a matter of garbage collection, that is collecting up unused labels so that they may be reassigned. 117 Routing Aggregation R6 Access 1 4 1 R1 R5 Access 3 2 Access 2 R2 R3 3 5 R4 Destination D 118 Forwarding Component Label Stack and Forwarding Operations label swap : looking up the incoming label to determine the outgoing label, encapsulation, port, and any additional information which may pertain to the stream such as a particular queue or other QoS related treatment. label push : When a packet first enters an MPLS domain, the packet is associated with a label. label pop : When a packet leaves an MPLS domain, the label is removed. The label stack is useful within hierarchical routing domain. Encapsulation Label-based forwarding makes use of various pieces of information, including a label or stack of labels, and possibly additional information such as a TTL field. “MPLS encapsulation” : encapsulate the label information and information used for label based forwarding. An encapsulation scheme may make use of the following fields: label, TTL, class of service, stack indicator, next header type indicator, and checksum MPLS label stack encoding Stack bottom Stack top COS Label (20 bits) Label (20 bits) Label (20 bits) Exp (3 bits) Exp (3 bits) Exp (3 bits) S (1 bit) S (1 bit) S (1 bit) TTL (8 bits) TTL (8 bits) TTL (8 bits) ... Original Packet MPLS frame delivered to link layer 121 Label Assignment Topology driven (Tag)  In response to normal processing of routing protocol control traffic  Labels are pre-assigned; no label setup latency at forwarding time. Request driven (RSVP)  In response to normal processing of request based control traffic  May require a large number of labels to be assigned. Traffic driven (Ipsilon)  The arrival of data at an LSR triggers label assignment and distribution.  Label setup latency; potential for packet reordering. 122 Label Distribution Explicit Label Distribution Downstream label allocation label allocation is done by the downstream LSR most natural mechanism for unicast traffic Upstream label allocation label allocation is done by the upstream LSR may be used for optimality for some multicast traffic A unique label for an egress LSR within the MPLS domain Any stream to a particular MPLS egress node could use the label of that node. 123 Label Distribution Explicit Label Distribution Protocol (LDP) Reliability : by transport protocol or as part of LDP. Separate routing computation and label distribution. Piggybacking on Other Control Messages Use existing routing/control protocol for distributing routing/control and label information.  OSPF, BGP, RSVP, PIM Combine routing and label distribution. Label purge mechanisms By time out Exchange of MPLS control packets 124 Label Distribution Protocol  LDP Peer:  Two LSRs that exchange label/stream mapping information via LDP  LDP messages  Discovery messages (via UDP)  announce and maintain the presence of LSR  Session messages  maintain session between LDP peers  Advertisement message  label operation (Label distribution)  Notification message  advisory information and signal error information  Error notification: signal fatal errors  Advisory notification: status of the LDP session or some previous message received from the peer. 125 Label Swapping  Labeled Packet Example : Forwarding a Labeled Packet  Map the incoming label to a next hop label, determines where to forward the packet.  Encodes the new label stack into the packet, and then forwards it. Incoming Label Map (ILM) Input Port Label 1 4  Unlabeled Packet  LSR analyzes the L3 header, to determine the packet’s stream.  Map the stream to a next hop, determines where to forward the packet.  Encodes the new label stack into the packet, and then forwards it. Output Port Label 2 6 Label Switching Router (LSR) L3 Header L2 Header IP Router Module Label Dat H3 4 H2 Dat 1 H3 6 2 126 H2 Use of MPLS in a Hierarchy Swap L1 L4 OSPF R2 R1 L2 Push IN OUT IN OUT L2 L3 L3 L1 Swap OUT L1 R3 IN OUT L1 L4 R5 R6 R4 Pop L2 L3 L3 L1 L1 L1 L1 BGP L2 L1 Domain 1 Domain 2 127 Conclusion MPLS improves the scalability of hop-by-hop routing and forwarding, and provides traffic engineering capabilities for better network provisioning. It decouples forwarding from routing and allows multi-protocol support without requiring changes to the basic forwarding paradigm. Generalized MPLS (GMPLS) λMPLS (Optical wavelength-based) 128

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib