Routing Routing protocol Goal: determine “good” path (sequence of routers) thru network from source to dest. 5 2 A Graph abstraction for routing algorithms: • graph nodes are routers • graph edges are physical links – link cost: delay, $ cost, or congestion level B 2 1 D 3 C 3 1 5 F 1 E 2 • “good” path: – typically means minimum cost path – other def’s possible Routing Algorithm classification Global or decentralized information? Global: • all routers have complete topology, link cost info • “link state” algorithms Decentralized: • router knows physicallyconnected neighbors, link costs to neighbors • iterative process of computation, exchange of info with neighbors • “distance vector” algorithms Static or dynamic? Static: • routes change slowly over time Dynamic: • routes change more quickly – periodic update – in response to link cost changes A Link-State Routing Algorithm Dijkstra’s algorithm • net topology, link costs known to all nodes – accomplished via “link state broadcast” – all nodes have same info • computes least cost paths from one node (‘source”) to all other nodes – gives routing table for that node • iterative: after k iterations, know least cost path to k dest.’s Notation: • c(i,j): link cost from node i to j. cost infinite if not direct neighbors • D(v): current value of cost of path from source to dest. V • p(v): predecessor node along path from source to v • N: set of nodes whose least cost path definitively known Dijsktra’s Algorithm 1 Initialization: 2 N = {A} 3 for all nodes v 4 if v adjacent to A 5 then D(v) = c(A,v) 6 else D(v) = infinity 7 8 Loop 9 find w not in N such that D(w) is a minimum 10 add w to N 11 update D(v) for all v adjacent to w and not in N: 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N Dijkstra’s algorithm: example Step 0 1 2 3 4 5 start N A AD ADE ADEB ADEBC ADEBCF D(B),p(B) D(C),p(C) D(D),p(D) D(E),p(E) D(F),p(F) 2,A 1,A 5,A infinity infinity 2,A 4,D 2,D infinity 2,A 3,E 4,E 3,E 4,E 4,E 5 2 A B 2 1 D 3 C 3 1 5 F 1 E 2 Dijkstra’s algorithm, discussion Algorithm complexity: n nodes • each iteration: need to check all nodes, w, not in N • n*(n+1)/2 comparisons: O(n**2) • more efficient implementations possible: O(nlogn) Oscillations possible: • e.g., link cost = amount of carried traffic D 1 1 0 A 0 0 C e 1+e B e initially 2+e D 0 1 A 1+e 1 C 0 B 0 … recompute routing 0 D 1 A 0 0 2+e B C 1+e … recompute 2+e D 0 A 1+e 1 C 0 B e … recompute Distance Vector Routing Algorithm iterative: • continues until no nodes exchange info. • self-terminating: no “signal” to stop asynchronous: • nodes need not exchange info/iterate in lock step! distributed: • each node communicates only with directly-attached neighbors Distance Table data structure • each node has its own • row for each possible destination • column for each directly-attached neighbor to node • example: in node X, for dest. Y via neighbor Z: X D (Y,Z) distance from X to = Y, via Z as next hop = c(X,Z) + min {DZ(Y,w)} w Distance Table: example 7 A B 1 C E cost to destination via D () A B D A 1 14 5 B 7 8 5 C 6 9 4 D 4 11 2 2 8 1 E 2 D E D (C,D) = c(E,D) + min {DD(C,w)} = 2+2 = 4 w E D (A,D) = c(E,D) + min {DD(A,w)} E w = 2+3 = 5 loop! D (A,B) = c(E,B) + min {D B(A,w)} = 8+6 = 14 w loop! Distance table gives routing table E cost to destination via Outgoing link D () A B D A 1 14 5 A A,1 B 7 8 5 B D,5 C 6 9 4 C D,4 D 4 11 2 D D,4 Distance table to use, cost Routing table Distance Vector Routing: overview Iterative, asynchronous: each local iteration caused by: • local link cost change • message from neighbor: its least cost path change from neighbor Distributed: • each node notifies neighbors only when its least cost path to any destination changes – neighbors then notify their neighbors if necessary Each node: wait for (change in local link cost of msg from neighbor) recompute distance table if least cost path to any dest has changed, notify neighbors Distance Vector Algorithm: At all nodes, X: 1 Initialization: 2 for all adjacent nodes v: 3 D X(*,v) = infinity /* the * operator means "for all rows" */ 4 D X(v,v) = c(X,v) 5 for all destinations, y 6 send min D X(y,w) to each neighbor /* w over all X's neighbors */ w Distance Vector Algorithm (cont.): 8 loop 9 wait (until I see a link cost change to neighbor V 10 or until I receive update from neighbor V) 11 12 if (c(X,V) changes by d) 13 /* change cost to all dest's via neighbor v by d */ 14 /* note: d could be positive or negative */ 15 for all destinations y: D X(y,V) = D X(y,V) + d 16 17 else if (update received from V wrt destination Y) 18 /* shortest path from V to some Y has changed */ 19 /* V has sent a new value for its min DV(Y,w) */ w 20 /* call this received new value is "newval" */ 21 for the single destination y: D X(Y,V) = c(X,V) + newval 22 23 if we have a new min DX(Y,w)for any destination Y w 24 send new value of min D X(Y,w) to all neighbors w 25 26 forever Distance Vector Algorithm: example X 2 Y 7 1 Z Distance Vector Algorithm: example X 2 Y 7 1 Z X Z X Y D (Y,Z) = c(X,Z) + minw{D (Y,w)} = 7+1 = 8 D (Z,Y) = c(X,Y) + minw {D (Z,w)} = 2+1 = 3 Distance Vector: link cost changes Link cost changes: • node detects local link cost change • updates distance table (line 15) • if cost change in least cost path, notify neighbors (lines 23,24) “good news travels fast” 1 X 4 Y 50 1 Z algorithm terminates Distance Vector: link cost changes Link cost changes: • good news travels fast • bad news travels slow “count to infinity” problem! 60 X 4 Y 50 1 Z algorithm continues on! Distance Vector: poisoned reverse If Z routes through Y to get to X : • Z tells Y its (Z’s) distance to X is infinite (so Y won’t route to X via Z) • will this completely solve count to infinity problem? 60 X 4 Y 50 1 Z algorithm terminates Comparison of LS and DV algorithms Message complexity • LS: with n nodes, E links, O(nE) msgs sent each • DV: exchange between neighbors only – convergence time varies Speed of Convergence • LS: O(n2) algorithm requires O(nE) msgs – may have oscillations • DV: convergence time varies – may be routing loops – count-to-infinity problem Robustness: what happens if router malfunctions? LS: – node can advertise incorrect link cost – each node computes only its own table DV: – DV node can advertise incorrect path cost – each node’s table used by others • error propagate thru network Roadmap • Routing in the Internet – Routing algorithms – Routing Protocols • Intra-AS routing: RIP and OSPF • Inter-AS routing: BGP Routing in the Internet • The Global Internet consists of Autonomous Systems (AS) interconnected with each other: – Stub AS: small corporation: one connection to other AS’s – Multihomed AS: large corporation (no transit): multiple connections to other AS’s – Transit AS: provider, hooking many AS’s together • Two-level routing: – Intra-AS: administrator responsible for choice of routing algorithm within network – Inter-AS: unique standard for inter-AS routing: BGP Internet AS Hierarchy Inter-AS border (exterior gateway) routers Intra-AS interior routers Intra-AS Routing • Also known as Interior Gateway Protocols (IGP) • Most common Intra-AS routing protocols: – RIP: Routing Information Protocol – OSPF: Open Shortest Path First – IGRP: Interior Gateway Routing Protocol (Cisco proprietary) RIP ( Routing Information Protocol) • • • • Distance vector algorithm Included in BSD-UNIX Distribution in 1982 Distance metric: # of hops (max = 15 hops) Distance vectors: exchanged among neighbors every 30 sec via Response Message (also called advertisement) • Each advertisement: list of up to 25 destination nets within AS Dest w x z …. w Next C … hops 4 ... A RIP: Example Advertisement from A to D x D z B y C Destination Network w y z x …. Next Router Num. of hops to dest. …. .... A B B -- Routing table in D 2 2 7 1 RIP: Example Dest w x z …. Next C … w hops 4 ... A Advertisement from A to D z x Destination Network w y z x …. D B C y Next Router Num. of hops to dest. …. .... A B B A -- Routing table in D 2 2 7 5 1 RIP: Link Failure and Recovery If no advertisement heard after 180 sec --> neighbor/link declared dead – routes via neighbor invalidated – new advertisements sent to neighbors – neighbors in turn send out new advertisements (if tables changed) – link failure info quickly propagates to entire net – poison reverse used to prevent ping-pong loops (infinite distance = 16 hops) RIP Table processing • RIP routing tables managed by applicationlevel process called route-d (daemon) • advertisements sent in UDP packets, periodically repeated routed routed Transprt (UDP) network (IP) link physical Transprt (UDP) forwarding table forwarding table network (IP) link physical RIP Table example (continued) Router: giroflee.eurocom.fr Destination -------------------127.0.0.1 192.168.2. 193.55.114. 192.168.3. 224.0.0.0 default Gateway Flags Ref Use Interface -------------------- ----- ----- ------ --------127.0.0.1 UH 0 26492 lo0 192.168.2.5 U 2 13 fa0 193.55.114.6 U 3 58503 le0 192.168.3.5 U 2 25 qaa0 193.55.114.6 U 3 0 le0 193.55.114.129 UG 0 143454 • Three attached networks (LANs) • • • • Router only knows routes to attached LANs Default router used to “go up” Route multicast address: 224.0.0.0 Loopback interface (for debugging) OSPF (Open Shortest Path First) • “open”: publicly available • Uses Link State algorithm – LS packet dissemination – Topology map at each node – Route computation using Dijkstra’s algorithm • OSPF advertisement carries one entry per neighbor router • Advertisements disseminated to entire AS (via flooding) – Carried in OSPF messages directly over IP (rather than TCP or UDP OSPF “advanced” features (not in RIP) • Security: all OSPF messages authenticated (to prevent malicious intrusion) • Multiple same-cost paths allowed (only one path in RIP) • For each link, multiple cost metrics for different TOS (e.g., satellite link cost set “low” for best effort; high for real time) • Integrated uni- and multicast support: – Multicast OSPF (MOSPF) uses same topology data base as OSPF • Hierarchical OSPF in large domains. Hierarchical OSPF Hierarchical OSPF • Two-level hierarchy: local area, backbone. – Link-state advertisements only in area – each nodes has detailed area topology; only know direction (shortest path) to nets in other areas. • Area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers. • Backbone routers: run OSPF routing limited to backbone. • Boundary routers: connect to other AS’s. Chapter 4: Network Layer • 4. 1 Introduction • 4.2 Virtual circuit and datagram networks • 4.3 What’s inside a router • 4.4 IP: Internet Protocol – – – – Datagram format IPv4 addressing ICMP IPv6 • 4.5 Routing algorithms – Link state – Distance Vector – Hierarchical routing • 4.6 Routing in the Internet – RIP – OSPF – BGP • 4.7 Broadcast and multicast routing Internet inter-AS routing: BGP • BGP (Border Gateway Protocol): the de facto standard • BGP provides each AS a means to: 1. Obtain subnet reachability information from neighboring ASs. 2. Propagate the reachability information to all routers internal to the AS. 3. Determine “good” routes to subnets based on reachability information and policy. • Allows a subnet to advertise its existence to rest of the Internet: “I am here” BGP basics • Pairs of routers (BGP peers) exchange routing info over semipermanent TCP connections: BGP sessions • Note that BGP sessions do not correspond to physical links. • When AS2 advertises a prefix to AS1, AS2 is promising it will forward any datagrams destined to that prefix towards the prefix. – AS2 can aggregate prefixes in its advertisement 3c 3a 3b AS3 1a AS1 2a 1c 1d 1b 2c AS2 2b eBGP session iBGP session Distributing reachability info • With eBGP session between 3a and 1c, AS3 sends prefix reachability info to AS1. • 1c can then use iBGP do distribute this new prefix reach info to all routers in AS1 • 1b can then re-advertise the new reach info to AS2 over the 1bto-2a eBGP session • When router learns about a new prefix, it creates an entry for the prefix in its forwarding table. 3c 3a 3b AS3 1a AS1 2a 1c 1d 1b 2c AS2 2b eBGP session iBGP session external internal Path attributes & BGP routes • When advertising a prefix, advert includes BGP attributes. – prefix + attributes = “route” • Two important attributes: – AS-PATH: contains the ASs through which the advert for the prefix passed: AS 67 AS 17 – NEXT-HOP: Indicates the specific internal-AS router to next-hop AS. (There may be multiple links from current AS to next-hop-AS.) • When gateway router receives route advert, uses import policy to accept/decline. BGP route selection • Router may learn about more than 1 route to some prefix. Router must select route. • Elimination rules: 1. Local preference value attribute: policy decision 2. Shortest AS-PATH 3. Closest NEXT-HOP router: hot potato routing 4. Additional criteria BGP messages • BGP messages exchanged using TCP. • BGP messages: – OPEN: opens TCP connection to peer and authenticates sender – UPDATE: advertises new path (or withdraws old) – KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request – NOTIFICATION: reports errors in previous msg; also used to close connection BGP routing policy legend: B W provider network X A customer network: C Y Figure 4.5-BGPnew: a simple BGP scenario • A,B,C are provider networks • X,W,Y are customer (of provider networks) • X is dual-homed: attached to two networks – X does not want to route from B via X to C – .. so X will not advertise to B a route to C BGP routing policy (2) legend: B W provider network X A customer network: C Y Figure 4.5-BGPnew: a simple BGP scenario • A advertises to B the path AW • B advertises to X the path BAW • Should B advertise to C the path BAW? – No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers – B wants to force C to route to w via A – B wants to route only to/from its customers! Why different Intra- and Inter-AS routing ? Policy: • Inter-AS: admin wants control over how its traffic routed, who routes through its net. • Intra-AS: single admin, so no policy decisions needed Scale: • hierarchical routing saves table size, reduced update traffic Performance: • Intra-AS: can focus on performance • Inter-AS: policy may dominate over performance Chapter 4: Network Layer • 4. 1 Introduction • 4.2 Virtual circuit and datagram networks • 4.3 What’s inside a router • 4.4 IP: Internet Protocol – – – – Datagram format IPv4 addressing ICMP IPv6 • 4.5 Routing algorithms – Link state – Distance Vector – Hierarchical routing • 4.6 Routing in the Internet – RIP – OSPF – BGP • 4.7 Broadcast and multicast routing Broadcast Routing • Deliver packets from source to all other nodes • Source duplication is inefficient: duplicate duplicate creation/transmission R1 R1 duplicate R2 R2 R3 R4 source duplication R3 R4 in-network duplication • Source duplication: how does source determine recipient addresses? In-network duplication • Flooding: when node receives brdcst pckt, sends copy to all neighbors – Problems: cycles & broadcast storm • Controlled flooding: node only brdcsts pkt if it hasn’t brdcst same packet before – Node keeps track of pckt ids already brdcsted – Or reverse path forwarding (RPF): only forward pckt if it arrived on shortest path between node and source • Spanning tree – No redundant packets received by any node Spanning Tree • First construct a spanning tree • Nodes forward copies only along spanning tree A B c F A E B c D F G (a) Broadcast initiated at A E D G (b) Broadcast initiated at D Spanning Tree: Creation • Center node • Each node sends unicast join message to center node – Message forwarded until it arrives at a node already belonging to spanning tree A A 3 B c 4 E F 1 2 B c D F 5 E D G G (a) Stepwise construction of spanning tree (b) Constructed spanning tree Multicast Routing: Problem Statement • Goal: find a tree (or trees) connecting routers having local mcast group members – tree: not all paths between routers used – source-based: different tree from each sender to rcvrs – shared-tree: same tree used by all group members Shared tree Source-based trees Multicast: one sender to many receivers • Multicast: act of sending datagram to multiple receivers with single “transmit” operation – analogy: one teacher to many students • Question: how to achieve multicast Multicast via unicast • source sends N unicast datagrams, one addressed to each of N receivers routers forward unicast datagrams multicast receiver (red) not a multicast receiver (red) Multicast: one sender to many receivers • Multicast: act of sending datagram to multiple receivers with single “transmit” operation – analogy: one teacher to many students • Question: how to achieve multicast Network multicast Multicast routers (red) duplicate and forward multicast datagrams • Router actively participate in multicast, making copies of packets as needed and forwarding towards multicast receivers Multicast: one sender to many receivers • Multicast: act of sending datagram to multiple receivers with single “transmit” operation – analogy: one teacher to many students • Question: how to achieve multicast Application-layer multicast • end systems involved in multicast copy and forward unicast datagrams among themselves Internet Multicast Service Model 128.59.16.12 128.119.40.186 multicast group 226.17.30.197 128.34.108.63 128.34.108.60 multicast group concept: use of indirection – hosts addresses IP datagram to multicast group – routers forward multicast datagrams to hosts that have “joined” that multicast group Multicast groups class D Internet addresses reserved for multicast: host group semantics: o anyone can “join” (receive) multicast group o anyone can send to multicast group o no network-layer identification to hosts of members needed: infrastructure to deliver mcast-addressed datagrams to all hosts that have joined that multicast group Joining a mcast group: two-step process • local: host informs local mcast router of desire to join group: IGMP (Internet Group Management Protocol) • wide area: local router interacts with other routers to receive mcast datagram flow – many protocols (e.g., DVMRP, MOSPF, PIM) IGMP IGMP wide-area multicast routing IGMP IGMP: Internet Group Management Protocol • host: sends IGMP report when application joins mcast group – IP_ADD_MEMBERSHIP socket option – host need not explicitly “unjoin” group when leaving • router: sends IGMP query at regular intervals – host belonging to a mcast group must reply to query query report IGMP IGMP version 1 • router: Host Membership Query msg broadcast on LAN to all hosts • host: Host Membership Report msg to indicate group membership – randomized delay before responding – implicit leave via no reply to Query • RFC 1112 IGMP v2: additions include • group-specific Query • Leave Group msg – last host replying to Query can send explicit Leave Group msg – router performs groupspecific query to see if any hosts left in group – RFC 2236 IGMP v3: under development as Internet draft Approaches for building mcast trees Approaches: • source-based tree: one tree per source – shortest path trees – reverse path forwarding • group-shared tree: group uses one tree – minimal spanning (Steiner) – center-based trees …we first look at basic approaches, then specific protocols adopting these approaches Shortest Path Tree • mcast forwarding tree: tree of shortest path routes from source to all receivers – Dijkstra’s algorithm S: source LEGEND R1 1 2 R4 R2 3 R3 router with attached group member 5 4 R6 router with no attached group member R5 6 R7 i link used for forwarding, i indicates order link added by algorithm Reverse Path Forwarding rely on router’s knowledge of unicast shortest path from it to sender each router has simple forwarding behavior: if (mcast datagram received on incoming link on shortest path back to center) then flood datagram onto all outgoing links else ignore datagram Reverse Path Forwarding: example S: source LEGEND R1 R4 router with attached group member R2 R5 R3 R6 R7 router with no attached group member datagram will be forwarded datagram will not be forwarded • result is a source-specific reverse SPT – may be a bad choice with asymmetric links Reverse Path Forwarding: pruning • forwarding tree contains subtrees with no mcast group members – no need to forward datagrams down subtree – “prune” msgs sent upstream by router with no downstream group members LEGEND S: source R1 router with attached group member R4 R2 P R5 R3 R6 P R7 P router with no attached group member prune message links with multicast forwarding Shared-Tree: Steiner Tree • Steiner Tree: minimum cost tree connecting all routers with attached group members • problem is NP-complete • excellent heuristics exists • not used in practice: – computational complexity – information about entire network needed – monolithic: rerun whenever a router needs to join/leave Center-based trees • single delivery tree shared by all • one router identified as “center” of tree • to join: – edge router sends unicast join-msg addressed to center router – join-msg “processed” by intermediate routers and forwarded towards center – join-msg either hits existing tree branch for this center, or arrives at center – path taken by join-msg becomes new branch of tree for this router Center-based trees: an example Suppose R6 chosen as center: LEGEND R1 3 R2 router with attached group member R4 2 R5 R3 1 R6 R7 1 router with no attached group member path order in which join messages generated Internet Multicasting Routing: DVMRP • DVMRP: distance vector multicast routing protocol, RFC1075 • flood and prune: reverse path forwarding, source-based tree – RPF tree based on DVMRP’s own routing tables constructed by communicating DVMRP routers – no assumptions about underlying unicast – initial datagram to mcast group flooded everywhere via RPF – routers not wanting group: send upstream prune msgs DVMRP: continued… • soft state: DVMRP router periodically (2 hours) “forgets” branches are pruned: – mcast data again flows down unpruned branch – downstream router: reprune or else continue to receive data • routers can quickly regraft to tree – following IGMP join at leaf • odds and ends – commonly implemented in commercial routers – Mbone routing done using DVMRP Tunneling Q: How to connect “islands” of multicast routers in a “sea” of unicast routers? physical topology logical topology mcast datagram encapsulated inside “normal” (non-multicast- addressed) datagram normal IP datagram sent thru “tunnel” via regular IP unicast to receiving mcast router receiving mcast router unencapsulates to get mcast datagram PIM: Protocol Independent Multicast • not dependent on any specific underlying unicast routing algorithm (works with all) • two different multicast distribution scenarios : Dense: Sparse: group members densely packed, in “close” proximity. bandwidth more plentiful # networks with group members small wrt # interconnected networks group members “widely dispersed” bandwidth not plentiful Consequences of Sparse-Dense Dichotomy: Dense Sparse: • group membership by • no membership until routers assumed until routers explicitly join routers explicitly prune • receiver- driven • data-driven construction of mcast construction on mcast tree (e.g., centertree (e.g., RPF) based) • bandwidth and non• bandwidth and nongroup-router group-router processing profligate processing conservative PIM- Dense Mode flood-and-prune RPF, similar to DVMRP but underlying unicast protocol provides RPF info for incoming datagram less complicated (less efficient) downstream flood than DVMRP reduces reliance on underlying routing algorithm has protocol mechanism for router to detect it is a leaf-node router PIM - Sparse Mode • center-based approach • router sends join msg to rendezvous point (RP) R1 – intermediate routers update state and forward join • after joining via RP, router can switch to sourcespecific tree – increased performance: less concentration, shorter paths R4 join R2 R3 join R5 join R6 all data multicast from rendezvous point R7 rendezvous point PIM - Sparse Mode sender(s): • unicast data to RP, which distributes down RP-rooted tree • RP can extend mcast tree upstream to source • RP can send stop msg if no attached receivers – “no one is listening!” R1 R4 join R2 R3 join R5 join R6 all data multicast from rendezvous point R7 rendezvous point