Internet Routing (COS 598A) Today: Topology Size Jennifer Rexford http://www.cs.princeton.edu/~jrex/teaching/spring2005 Tuesdays/Thursdays 11:00am-12:20pm Outline • BGP distribution inside an AS – Full-mesh iBGP – Route reflectors – Routing anomalies caused by route reflectors – Pros and cons of proposed solutions • IGP topology – OSPF areas – Summarization – Multiple ASes Routers Running eBGP, iBGP, and IGP destination AS A r2 r3 r1 2 r1 AS B 1 r4 2 r2 4 r4 Legend eBGP session iBGP session IGP link Route r1 has closest egress point Roles of eBGP, iBGP, and IGP • eBGP: External BGP – Learn routes from neighboring ASes – Advertise routes to neighboring ASes • iBGP: Internal BGP – Disseminate BGP information within the AS • IGP: Interior Gateway Protocol – Compute shortest paths between routers in AS – Identify closest egress point in BGP path selection Full Mesh iBGP Configuration • Internal BGP session – Forward best BGP route to a neighbor – Do not send from one iBGP neighbor to another • Full-mesh configuration – iBGP session between each pair of routers – Ensures complete visibility of BGP routes Why Do Point-to-Point Internal BGP? • Reusing the BGP protocol – iBGP is really just BGP – … except you don’t add an AS to the AS path – … or export routes between iBGP neighbors • No need to create a second protocol – Another protocol would add complexity • And, full-mesh is workable for many networks – Well, until they get too big… Scalability Limits of Full Mesh on the Routers • Number of iBGP sessions – TCP connection to every other router • Bandwidth for update messages – Every BGP update sent to every other router • Storage for the BGP routing table – Storing many BGP routes per destination prefix • Configuration changes when adding a router – Configuring iBGP session on every other router Route Reflectors • Relax the iBGP propagation rule – Allow sending updates between iBGP neighbors • Route reflector – Receives iBGP updates from neighbors – Send a single BGP route to the clients • Very much like provider, peer, and customer – To client: send all BGP routes – To peer route reflector: send client-learned routes – To route reflector: send all client-learned routes Example: Single Route Reflector r1 r2 r3 r1 r4 r2 1 2 r4 2 4 r2 Router only learns about r2 The Problem About Route Reflectors • Advantage: scalability – Fewer iBGP sessions – Lower bandwidth for update messages – Smaller BGP routing tables – Lower configuration overhead • Disadvantage: changes the answers – Clients only learn a subset of the BGP routes – Does not result is same choices as a full mesh – ... especially if RR sees different IGP distances Routing Anomaly: Forwarding Loop r1 r2 1 r2 r1 1 1 Picks r2 Picks r1 Packet deflected toward other egress point, causing a loop Routing Anomaly: Protocol Oscillation 1 5 r1 1 2 5 1 1 r2 RR1 prefers r2 over r1 RR2 prefers r3 over r2 RR3 prefers r1 over r3 3 5 r3 Avoiding Routing Anomalies • Reduce impact of route reflectors – Ensure route reflector is close to its clients – … so the RR makes consistent decisions • Sufficient conditions for ensuring consistency – RR preferring routes through clients over “peers” – BGP messages should traverse same path as data • Forces a high degree of replication – Many route reflectors in the network – E.g., a route reflector per PoP for correctness – E.g. have a second RR per PoP for reliability http://www.acm.org/sigs/sigcomm/sigcomm2002/papers/ibgp.pdf Possible Solution: Disseminating More Routes • Make route reflectors more verbose – Send all BGP routes to clients, not just best route – Send all equally-good BGP routes (up to IGP cost) r1 r2 r3 r4 • Advantages – Client routers have improved visibility r1 – Make the same decisions r2 as in a full mesh 1 2 • Disadvantages r4 – Higher overhead2for sending and storing routes – Requires protocol changes to send 4 r1, r2, r4 multiple routes – Not backwards compatible with legacy routers http://www.acm.org/sigs/sigcomm/sigcomm2002/papers/bgposci.pdf Possible Solution: Customized Dissemination • Make route reflector more intelligent – Send customized BGP route to each client – Tell each client what he would pick himself r1 r2 r3 r4 • Advantages – Make the same decisions as in a full mesh r1 r2 – Remain compatible with legacy routers 1 2 • Disadvantages r4 – Intelligent RR must 2 make decisions per client – … and select closest egress from each viewpoint 4 r1 http://www.rnp.br/ietf/internet-drafts/draft-bonaventure-bgp-route-reflectors-00.txt http://www.cs.princeton.edu/~jrex/papers/rcp-nsdi.pdf Possible Solution: Multicast/Flooding • Replace point-to-point distribution – Apply a multicast protocol to distribute messages – Or, flood the BGP messages to all routers • Advantages – Complete distribution without route reflectors – Avoids configuration overhead of a full mesh • Disadvantages – Requires an additional, new protocol – Not backwards-compatible with legacy routers – Large BGP routing tables, like in a full mesh http://www.nanog.org/mtg-0302/ppt/van.pdf Possible Solution: Tunnel Between Edge Routers • Tunneling through the core – Ingress router selects ingress point – Other routers blindly forward to the egress • Advantages r1 r2 – No risk of forwarding loops – No BGP running on interior routers • Disadvantages – Overhead of1tunneling protocol/technology r1 1 r2 tunnel tunnel – Still has a risk of protocol oscillations 1 State-of-the-Art of BGP Distribution in an AS • When full-mesh doesn’t scale – Hierarchical route-reflector configuration • One or two route reflectors per PoP – Some networks use “confederations” (mini ASes) • Recent ideas – Sufficient conditions to avoid anomalies – Enhanced RRs sending multiple or custom routes – Flooding/multicast of BGP updates – Tunneling to avoid packet deflections • Open questions – Are the sufficient conditions too restrictive? – Good comparison of the various approaches IGP Topology Interior Gateway Protocols (IGPs) • Protocol overhead depends on the topology – Bandwidth: flooding of link state advertisements – Memory: storing the link-state database – Processing: computing the shortest paths 2 3 2 1 1 1 3 5 4 3 Improving the Scaling • Dijkstra’s shortest-path algorithm – Simplest version: O(N2), where N is # of nodes – Better algorithms: O(L*log(N)), where L is # links – Incremental algorithms: great for small changes • Timers to pace operations – Minimum time between LSAs for the same link – Minimum time between path computations • More resources on the routers – Routers with more CPU and memory Introducing Hierarchy: OSPF Areas • Divide network into regions – Backbone (area 0) and non-backbone areas – Each area has its own link-state database – Advertise only path distances at area boundaries Area 2 cost 3 Area 1 2 3 area border router 2 Area 3 1 Area 0 1 3 5 4 3 1 To area 0 cost 8 Area 4 Summarization at Area Boundaries • Areas only help so much – Advertising path costs to reach each component – Single link failure may change multiple path costs • Summarization: LSA for multiple components – LSA for an IP prefix containing the addresses – LSA carries cost for the maximum path cost 2 3 2 1 1 1 3 5 4 3 To area 0 cost 8 Assigning OSPF Areas • Group related routers – E.g., in a Point-of-Presence – Assign to single OSPF area Inter-PoP – Put inter-PoP links in area 0 Intra-PoP • Enable summarization – Select an address block for the equipment in the area – Assign IP addresses in the block to router CPUs and interfaces Other networks Pros and Cons of Summarization • Advantages: scalability – Reduce the size of the link-state database • One entry per summary prefix – Isolate the rest of the network from changes • Only advertise when max path cost changes • Disadvantages – Complexity • Extra configuration details for areas & summarization • Requires tight coupling with IP address assignment – Inefficiency • Summarization hides details that affect path selection • Data packets may traverse a less-attractive path Dividing into Multiple ASes • Divide the network into regions – Separate instance of IGP per region – Interdomain routing between regions – Loss of visibility into differences within region 50 100 20 20 100 North America 20 50 50 100 50 20 20 100 50 Europe 100 50 20 100 Asia Multi-AS Networks, Not Just for Scalability • Administrative reasons – Separate networks per geographic region – Mergers/acquisitions that combine networks • Why not merge to single AS? – – – – Using different intradomain protocols Managed by different people Fear of encountering scalability problems Fear of losing the benefits of isolation • Why merge to a single AS? – Simpler configuration – More efficient routing – Avoid having separate AS hop in BGP AS paths Which Approach is Better? • Ideal: flat IGP network – Single AS – Single IGP instance, no areas • Hierarchical IGP – Single AS – Single IGP instance, using areas & summarization • Multiple ASes – Multiple ASes – Separate IGP instances • Some other approach??? Comparison Metrics • Scalability – Protocol overhead • Storing and flooding link-state advertisements • Overhead of Dijkstra shortest-path computation – Effects of topology changes • Number of advertisements after a change • Likelihood a change must be propagated • Efficiency – Stretch: comparing path lengths • In ideal flat intradomain routing • In alternative scheme – How much longer do the paths get? Interesting Research Questions • Routing protocols that achieve small stretch – Theory work on algorithms to minimize stretch – Protocol work on hierarchy and aggregation – Any new distributed protocols with low stretch? – Avoid sharp boundaries between areas/ASes? • Identifying good places to hide information – Given a network graph with link weights – Decide where to put area and AS boundaries – … with the goal of minimizing stretch – … within some max size of each area or AS Conclusion • Networks are getting bigger – Growth of a network topology – Merger/acquisition of other networks • Techniques for scaling the routing design – BGP route reflection – OSPF areas – Multiple BGP ASes • Relatively open research area – Rich theoretical tradition on compact routing – Common operational practices for protocol scaling – Not much work has been done in between Next Time: Router Configuration • Two papers – “Automated provisioning of BGP customers” (just sections 1-3) – “Detecting BGP faults with static analysis” • Review only of second paper – Summary – Why accept – Why reject – Future work • Optional – Short survey on BGP routing policies for ISPs – NANOG video covering material in second paper