COMS W4995-1 Lecture 6 Dynamic routing protocols II 1. Dynamic Routing Protocols: Link State Routing 2. Intra-Domain Routing Protocols: OSPF & BGP Dynamic Routing Protocols Link State Routing The Gang of Four Link State IGP EGP OSPF IS-IS Vectoring RIP BGP Link State Routing Based on Dijkstra’ s Shortest-Path-First algorithm. Each router starts by knowing: Each router advertises to the entire network (flooding): Prefixes of its attached networks. Links to its neighbors. Prefixes of its directly connected networks. Active links to its neighbors. Each router learns: A complete topology of the network (routers, links). Each router computes shortest path to each destination. In a stable situation, all routers have the same graph, and compute the same paths. Dijkstra’s Shortest Path Algorithm for a Graph Input: Graph (N,E) with N the set of nodes and E the set of edges cvw link cost (cvw = 1 if (v,w) E, cvv = 0) s source node. Output: Dn cost of the least-cost path from node s to node n M = {s}; for each n M Dn = csn; while (M all nodes) do Find w M for which Dw = min{Dj ; j M}; Add w to M; for each neighbor n of w and n M Dn = min[ Dn, Dw + cwn ]; Update route; end for end while end for Link state routing: graphical illustration Global view: b 3 a a’s view: 3 a d b 6 b d’s view: c c 1 a c’s view: 2 c 6 a b’s view: 3 1 2 d c b 1 c 2 d 6 Collecting all views yield a global & complete view of the network! Operation of a Link State Routing protocol Received LSAs Link State Database Dijkstra’s Algorithm LSAs are flooded to other interfaces IP Routing Table Link State Routing: Properties Each node requires complete topology information Link state information must be flooded to all nodes Guaranteed to converge Distance Vector vs. Link State Routing With distance vector routing, each node has information only about the next hop: Node A: to reach F go to B Node B: to reach F go to D Node D: to reach F go to E Node E: go directly to F Distance vector routing makes poor routing decisions if directions are not completely correct (e.g., because a node is down). A B C D E If parts of the directions incorrect, the routing may be incorrect until the routing algorithms has re-converged. F Distance Vector vs. Link State Routing In link state routing, each node has a complete map of the topology A If a node fails, each node can calculate the new route B C D E A F A Difficulty: All nodes need to have a consistent view of the network A B C D E B C D E A F B C D E B C D E A F B C D E F F A F B C D E F Distance Vector vs. Link State Routing Link State • • • • • • Vectoring Topology information is flooded within the routing domain Best end-to-end paths are computed locally at each router. Best end-to-end paths determine next-hops. • Based on minimizing some notion of distance Works only if policy is shared and uniform Examples: OSPF, IS-IS • • • • • Each router knows little about network topology Only best next-hops are chosen by each router for each destination network. Best end-to-end paths result from composition of all nexthop choices Does not require any notion of distance Does not require uniform policies at all routers Examples: RIP, BGP Dynamic Routing Protocols Open Shortest Path First OSPF OSPF = Open Shortest Path First The OSPF routing protocol is the most important link state routing protocol on the Internet (another link state routing protocol is IS-IS (intermediate system to intermediate system) The complexity of OSPF is significant RIP (RFC 2453 ~ 40 pages) OSPF (RFC 2328 ~ 250 pages) History: 1989: RFC 1131 OSPF Version 1 1991: RFC1247 OSPF Version 2 1994: RFC 1583 OSPF Version 2 (revised) 1997: RFC 2178 OSPF Version 2 (revised) 1998: RFC 2328 OSPF Version 2 (current version) Features of OSPF Provides authentication of routing messages Enables load balancing by allowing traffic to be split evenly across routes with equal cost Type-of-Service routing allows to setup different routes dependent on the TOS field Supports subnetting Supports multicasting Allows hierarchical routing Hierarchical OSPF Hierarchical OSPF Two-level hierarchy: local area, backbone. Link-state advertisements only in area each nodes has detailed area topology; only know direction (shortest path) to nets in other areas. Area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers. Backbone routers: run OSPF routing limited to backbone. Example Network 10.1.1.2 .1 4 .2 .2 3 2 • Metric is in the range [0 , 216] • Metric can be asymmetric 3 .6 1 .5 .3 5 .6 10.1.7.0 / 24 .4 .3 .3 1 .4 .2 .5 .5 10.1.5.0/24 10.1.2.3 • Link costs are called Metric .4 10.1.4.0 / 24 10.1.1.0 / 24 Router IDs can be selected independent of interface addresses, but usually chosen to be the smallest interface address 2 10.1.3.0 / 24 .1 10.1.7.6 10.1.4.4 10.1.6.0 / 24 10.1.1.1 10.1.5.5 Link State Advertisement (LSA) 10.1.1.1 10.1.1.2 10.1.1.0 / 24 3 2 .2 .2 .4 .4 .4 .3 .5 .3 .3 .5 10.1.5.5 The LSA of router 10.1.1.1 is as follows: Link State ID: 10.1.1.1 Advertising Router: Number of links: 10.1.1.1 = Router ID 3 = 2 links plus router itself Description of Link 1: Description of Link 2: Description of Link 3: Link ID = 10.1.1.2, Metric = 4 Link ID = 10.1.2.2, Metric = 3 Link ID = 10.1.1.1, Metric = 0 .6 .5 10.1.5.0/24 10.1.2.3 .6 10.1.7.0 / 24 10.1.4.0 / 24 10.1.6.0 / 24 .1 .2 10.1.3.0 / 24 4 .1 10.1.7.6 10.1.4.4 = Router ID Network and Link State Database 10.1.1.1 10.1.1.0 / 24 Each router has a database which contains the LSAs from all other routers .2 .2 .4 10.1.4.0 / 24 .4 .4 .3 .5 .3 .3 .6 10.1.7.0 / 24 10.1.6.0 / 24 .1 .2 10.1.7.6 10.1.4.4 10.1.3.0 / 24 .1 10.1.1.2 .6 .5 .5 10.1.5.0/24 10.1.5.5 10.1.2.3 LS Type Link StateID Adv. Router Checksum LS SeqNo LS Age Router-LSA 10.1.1.1 10.1.1.1 0x9b47 0x80000006 0 Router-LSA 10.1.1.2 10.1.1.2 0x219e 0x80000007 1618 Router-LSA 10.1.2.3 10.1.2.3 0x6b53 0x80000003 1712 Router-LSA 10.1.4.4 10.1.4.4 0xe39a 0x8000003a 20 Router-LSA 10.1.5.5 10.1.5.5 0xd2a6 0x80000038 18 Router-LSA 10.1.7.6 10.1.7.6 0x05c3 0x80000005 1680 Link State Database The collection of all LSAs is called the link-state database Each router has an identical link-state database Useful for debugging: Each router has a complete description of the network If neighboring routers discover each other for the first time, they will exchange their link-state databases The link-state databases are synchronized using reliable flooding OSPF Packet Format OSPF Message IP header OSPF Message Header OSPF packets are not carried as UDP payload! OSPF has its own IP protocol number: 89 TTL: set to 1 (in most cases) Body of OSPF Message Message Type Specific Data LSA LSA Header Destination IP: neighbor’s IP address or 224.0.0.5 (ALLSPFRouters) or 224.0.0.6 (AllDRouters) LSA LSA Data ... ... LSA OSPF Packet Format OSPF Message Header 2: current version is OSPF V2 version Message types: 1: Hello (tests reachability) 2: Database description 3: Link Status request 4: Link state update 5: Link state acknowledgement Standard IP checksum taken over entire packet Authentication passwd = 1: Authentication passwd = 2: Body of OSPF Message type message length source router IP address ID of the Area from which the packet originated Area ID checksum authentication type authentication authentication 32 bits 64 cleartext password 0x0000 (16 bits) KeyID (8 bits) Length of MD5 checksum (8 bits) Nondecreasing sequence number (32 bits) 0: no authentication 1: Cleartext password 2: MD5 checksum (added to end packet) Prevents replay attacks OSPF LSA Format LSA Link Age LSA Header LSA Header LSA Data Link Type Link State ID advertising router link sequence number checksum length Link ID Link 1 Link Data Link Type #TOS metrics Metric Link ID Link 2 Link Data Link Type #TOS metrics Metric Discovery of Neighbors Routers multicasts OSPF Hello packets on all OSPFenabled interfaces. If two routers share a link, they can become neighbors, and establish an adjacency 10.1.10.1 10.1.10.2 Scenario: Router 10.1.10.2 restarts OSPF Hello OSPF Hello: I heard 10.1.10.2 After becoming a neighbor, routers exchange their link state databases Neighbor discovery and database synchronization Scenario: Router 10.1.10.2 restarts Discovery of adjacency 10.1.10.1 10.1.10.2 OSPF Hello OSPF Hello: I heard 10.1.10.2 After neighbors are discovered the nodes exchange their databases Database Description: Sequence = X Sends database description. (description only contains LSA headers) Acknowledges receipt of description Database Description: Sequence = X, 5 LSA headers = Router-LSA, 10.1.10.1, 0x80000006 Router-LSA, 10.1.10.2, 0x80000007 Router-LSA, 10.1.10.3, 0x80000003 Router-LSA, 10.1.10.4, 0x8000003a Router-LSA, 10.1.10.5, 0x80000038 Router-LSA, 10.1.10.6, 0x80000005 Database Description: Sequence = X+1, 1 LSA header= Router-LSA, 10.1.10.2, 0x80000005 Database Description: Sequence = X+1 Sends empty database description Database description of 10.1.10.2 Regular LSA exchanges 10.1.10.1 Link State Request packets, LSAs = Router-LSA, 10.1.10.1, Router-LSA, 10.1.10.2, Router-LSA, 10.1.10.3, Router-LSA, 10.1.10.4, Router-LSA, 10.1.10.5, Router-LSA, 10.1.10.6, 10.1.10.1 sends requested LSAs Link State Update Packet, LSAs = Router-LSA, 10.1.10.1, 0x80000006 Router-LSA, 10.1.10.2, 0x80000007 Router-LSA, 10.1.10.3, 0x80000003 Router-LSA, 10.1.10.4, 0x8000003a Router-LSA, 10.1.10.5, 0x80000038 Router-LSA, 10.1.10.6, 0x80000005 10.1.10.2 10.1.10.2 explicitly requests each LSA from 10.1.10.1 Dissemination of LSA-Update A router sends and refloods LSA-Updates, whenever the topology or link cost changes. (If a received LSA does not contain new information, the router will not flood the packet) Exception: Infrequently (every 30 minutes), a router will flood LSAs even if there are not new changes. Acknowledgements of LSA-updates: explicit ACK, or implicit via reception of an LSA-Update Question: If a new node comes up, it could build the database from regular LSA-Updates (rather than exchange of database description). What role do the database description packets play? Dynamic Routing Protocols (Inter-domain) Border Gateway Protocol BGP Quick View BGP = Border Gateway Protocol . Currently in version 4, specified in RFC 1771. (~ 60 pages) Note: In the context of BGP, a gateway is nothing else but an IP router that connects autonomous systems. Interdomain routing protocol for routing between autonomous systems Uses TCP to establish a BGP session and to send routing messages over the BGP session BGP is a path vector protocol. Routing messages in BGP contain complete routes. Network administrators can specify routing policies BGP Policy-based Routing Each node is assigned an AS number (ASN) BGP’s goal is to find any AS-path (not an optimal one). Since the internals of the AS are never revealed, finding an optimal path is not feasible. Network administrator sets BGP’s policies to determine the best path to reach a destination network. How Many ASNs are there today? 20,570 14,588 origin only (no transit) Thanks to Geoff Huston. http://bgp.potaroo.net on October 9, 2005 ARDs versus ASes Autonomous Routing Domains Don’t Always Need BGP or an ASN Qwest Nail up routes 130.132.0.0/16 pointing to Yale Nail up default routes 0.0.0.0/0 pointing to Qwest Yale University 130.132.0.0/16 Static routing is the most common way of connecting an autonomous routing domain to the Internet. This helps explain why BGP is a mystery to many … ASNs Can Be “Shared” (RFC 2270) AS 701 UUNet AS 7046 Crestar Bank AS 7046 NJIT AS 7046 Hood College 128.235.0.0/16 ASN 7046 is assigned to UUNet. It is used by Customers single homed to UUNet, but needing BGP for some reason (load balancing, etc..) [RFC 2270] ARDs and ASes: Summary Most ARDs have no ASN (statically routed at Internet edge) Some unrelated ARDs share the same ASN (RFC 2270) Some ARDs are implemented with multiple ASNs (example: Worldcom) ASes are just an implementation detail of Inter-domain routing How many prefixes today? 221,002 IPv4 Address space covered 33.3% 23% Thanks to Geoff Huston. http://bgp.potaroo.net on October 9, 2005 Policy-Based vs. Distance-Based Routing? Minimizing “hop count” can violate commercial relationships that constrain interdomain routing. Host 1 Cust1 YES ISP1 NO ISP3 ISP2 Cust3 Host 2 Cust2 Thanks to Tim Griffin http://www.cl.cam.ac.uk/users/tgg22 Customer versus Provider provider provider customer IP traffic customer Customer pays provider for access to the Internet Why not minimize “AS hop Count”? National ISP1 National ISP2 YES NO Regional ISP3 Cust3 Regional ISP2 Cust2 Regional ISP1 Cust1 Shortest path routing is not compatible with commercial relations The “Peering” Relationship peer provider peer customer Peers provide transit between their respective customers Peers do not provide transit between peers traffic allowed traffic NOT allowed Peers (often) do not exchange $$$ Peering Provides Shortcuts Peering also allows connectivity between the customers of “Tier 1” providers. peer provider peer customer Peering Wars Peer Reduces upstream transit costs Can increase end-to-end performance May be the only way to connect your customers to some part of the Internet (“Tier 1”) Don’t Peer You would rather have customers Peers are usually your competition Peering relationships may require periodic renegotiation Peering struggles are by far the most contentious issues in the ISP world! Peering agreements are often confidential. The Border Gateway Protocol (BGP) BGP = + RFC 1771 “optional” extensions RFC 1997 (communities) RFC 2439 (damping) RFC 2796 (reflection) RFC3065 (confederation) … + routing policy configuration languages (vendor-specific) + Current Best Practices in management of Interdomain Routing BGP was not DESIGNED. It EVOLVED. BGP Route Processing Open ended programming. Constrained only by vendor configuration language Receive Apply Policy = filter routes & BGP Updates tweak attributes Apply Import Policies Based on Attribute Values Best Routes Best Route Selection Best Route Table Apply Policy = filter routes & tweak attributes Apply Export Policies Install forwarding Entries for best Routes. IP Forwarding Table Transmit BGP Updates BGP Attributes Value ----1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... 255 Code --------------------------------ORIGIN AS_PATH NEXT_HOP MULTI_EXIT_DISC LOCAL_PREF ATOMIC_AGGREGATE AGGREGATOR COMMUNITY ORIGINATOR_ID CLUSTER_LIST DPA ADVERTISER RCID_PATH / CLUSTER_ID MP_REACH_NLRI MP_UNREACH_NLRI EXTENDED COMMUNITIES Reference --------[RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1997] [RFC2796] [RFC2796] [Chen] [RFC1863] [RFC1863] [RFC2283] [RFC2283] [Rosen] Most important attributes reserved for development From IANA: http://www.iana.org/assignments/bgp-parameters Not all attributes need to be present in every announcement ASPATH Attribute AS 1129 135.207.0.0/16 AS Path = 1755 1239 7018 6341 135.207.0.0/16 AS Path = 1239 7018 6341 AS 1239 Sprint AS 1755 Ebon e AS 6341 AT&T Research 135.207.0.0/16 Prefix Originated 135.207.0.0/16 AS Path = 1129 1755 1239 7018 6341 AS 12654 RIPE NCC RIS project 135.207.0.0/16 AS Path = 7018 6341 AS7018 135.207.0.0/16 AS Path = 6341 Global Access 135.207.0.0/16 AS Path = 3549 7018 6341 AT&T 135.207.0.0/16 AS Path = 7018 6341 AS 3549 Global Crossing Shorter Doesn’t Always Mean Shorter In fairness: could you do this “right” and still scale? Mr. BGP says that path 4 1 is better than path 3 2 1 Duh! AS 4 AS 3 Exporting internal state would dramatically increase global instability and amount of routing state AS 2 AS 1 Routing Example 1 Thanks to Han Zheng Routing Example 2 Thanks to Han Zheng Tweak Tweak Tweak (TE) For inbound traffic Filter outbound routes Tweak attributes on outbound routes in the hope of influencing your neighbor’s best route selection inbound traffic outbound routes For outbound traffic Filter inbound routes Tweak attributes on inbound routes to influence best route selection outbound traffic inbound routes In general, an AS has more control over outbound traffic Backup Links with Local Preference (Outbound Traffic) AS 1 primary link Set Local Pref = 100 for all routes from AS 1 backup link AS 65000 Set Local Pref = 50 for all routes from AS 1 Forces outbound traffic to take primary link, unless link is down. Multihomed Backups (Outbound Traffic) AS 1 AS 3 provider provider primary link backup link Set Local Pref = 100 for all routes from AS 1 Set Local Pref = 50 for all routes from AS 3 AS 2 Forces outbound traffic to take primary link, unless link is down. Shedding Inbound Traffic with ASPATH Prepending AS 1 Prepending will (usually) force inbound traffic from AS 1 to take primary link provider 192.0.2.0/24 ASPATH = 2 2 2 192.0.2.0/24 ASPATH = 2 primary backup customer AS 2 192.0.2.0/24 Yes, this is a Glorious Hack … … But Padding Does Not Always Work AS 1 AS 3 provider provider 192.0.2.0/24 ASPATH = 2 192.0.2.0/24 ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 primary backup customer AS 2 192.0.2.0/24 AS 3 will send traffic on “backup” link because it prefers customer routes and local preference is considered before ASPATH length! Padding in this way is often used as a form of load balancing COMMUNITY Attribute to the Rescue! AS 1 AS 3 provider provider AS 3: normal customer local pref is 100, peer local pref is 90 192.0.2.0/24 ASPATH = 2 COMMUNITY = 3:70 192.0.2.0/24 ASPATH = 2 primary backup customer AS 2 192.0.2.0/24 Customer import policy at AS 3: If 3:90 in COMMUNITY then set local preference to 90 If 3:80 in COMMUNITY then set local preference to 80 If 3:70 in COMMUNITY then set local preference to 70 BGP Issues - What is a BGP Wedgie? BGP ¾ wedgie Full wedgie policies make sense locally Interaction of local policies allows multiple stable routings Some routings are consistent with intended policies, and some are not If an unintended routing is installed (BGP is “wedged”), then manual intervention is needed to change to an intended routing When an unintended routing is installed, no single group of network operators has enough knowledge to debug the problem Dynamic Routing Protocols: Summary Dynamic routing protocols: RIP, OSPF, BGP RIP uses distance vector algorithm, and converges slow (the count-to-infinity problem) OSPF uses link state algorithm, and converges fast. But it is more complicated than RIP. Both RIP and OSPF finds lowest-cost path. BGP uses path vector algorithm, and its path selection algorithm is complicated, and is influenced by policies. BGP has its own problems see WIDGI by Tim Griffin More Readings (Optional) BGP Wedgies: Bad Routing Policy Interactions that Cannot be Debugged JI’s Intro to interdomain routing. "Interdomain Setting of PlanetLab Nodes." PlanetLab Meeting, May 14, 2004. Understanding the Border Gateway Protocol (BGP) ICNP 2002 Tutorial Session References – – [VGE1996, VGE2000] Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan, and Deborah Estrin. Computer Networks, Jan. 2000. (Also USC Tech Report, Feb. 1996) [GW1999] An Analysis of BGP Convergence Properties. Timothy G. Griffin, Gordon Wilfong. SIGCOMM 1999 [GSW1999] Policy Disputes in Path Vector Protocols. Timothy G. Griffin, F. Bruce Shepherd, Gordon Wilfong. ICNP 1999 [GW2001] A Safe Path Vector Protocol. Timothy G. Griffin, Gordon Wilfong. INFOCOM 2001 [GR2000] Stable Internet Routing without Global Coordination. Lixin Gao, Jennifer Rexford. SIGMETRICS 2000 [GGR2001] Inherently safe backup routing with BGP. Lixin Gao, Timothy G. Griffin, Jennifer Rexford. INFOCOM 2001 [GW2002a] On the Correctness of IBGP Configurations. Griffin and Wilfong.SIGCOMM 2002. [GW2002b] An Analysis of the MED oscillation Problem. Griffin and Wilfong. ICNP 2002.