Internet Routing (COS 598A) Today: BGP Routing Table Size Jennifer Rexford http://www.cs.princeton.edu/~jrex/teaching/spring2005 Tuesdays/Thursdays 11:00am-12:20pm Outline • IP prefixes – Review of CIDR and hierarchical allocation – Resource constraints on IP routers – Impact of increasing number of prefixes • Growth in BGP routing table size – Growth of global prefixes over time – Characterizing the causes of growth • Limiting the number of prefixes – Techniques for limiting the size – Fundamental challenges of limiting size Classless InterDomain Routing (CIDR) Use two 32-bit numbers to represent a network. Network number = IP address + Mask IP Address : 12.4.0.0 Address Mask IP Mask: 255.254.0.0 00001100 00000100 00000000 00000000 11111111 11111110 00000000 00000000 Network Prefix for hosts Usually written as 12.4.0.0/15 Hierarchy in Allocating Address Blocks • Prefixes are key to Internet scalability – Address allocation by ARIN/RIPE/APNIC and by ISPs – Routing protocols and packet forwarding based on prefixes – Today, routing tables contain ~150,000-200,000 prefixes 12.0.0.0/16 12.1.0.0/16 12.2.0.0/16 12.3.0.0/16 12.0.0.0/8 : : : 12.253.0.0/16 12.254.0.0/16 12.3.0.0/24 12.3.1.0/24 : : 12.3.254.0/24 12.253.0.0/19 12.253.32.0/19 12.253.64.0/19 12.253.96.0/19 12.253.128.0/19 12.253.160.0/19 12.253.192.0/19 : : : Resource Constraints on a High-End Router Store forwarding table and forward data packets Processor Line card Line card Line card Store routing table and process routing protocol messages Line card Switching Fabric Line card Line card Routing Information Base (RIB) • Routing table for the routing protocol – E.g., BGP routes learned from each neighbor – Typically managed in software in router CPU • Factors affecting RIB size – Number of destination prefixes – Number of BGP routes per prefix – Size of each route (e.g., BGP attributes) • Impact of a large RIB – Higher delay to index or scan the table – Ungraceful reaction to table overflow Ungraceful Overload Behavior in BGP • BGP is an incremental protocol – Announcement when new route available – Withdrawal when route no longer available – No messages when nothing is changing • Cannot discard or delete state – … because you won’t receive the message again – When table is full, router must drop session(s) • Router reaction in practice may be worse – E.g., drop all BGP sessions and reestablish – E.g., interface lock-up till router is rebooted – Reactions place heavy BGP load on neighbors Forwarding Information Base • Forwarding tables in IP routers – Maps each IP prefix to next-hop link(s) – Longest prefix match look-up for data packets – Hardware on line card in high-end routers • Impact of a large FIB – Higher delay to construct/update the table – Higher delay for packet lookup – Incomplete table or router crash on overflow forwarding table (FIB) destination 12.34.158.5 4.0.0.0/8 4.83.128.0/17 12.0.0.0/8 12.34.158.0/24 126.255.103.0/24 outgoing link Serial0/0.1 Impact of Table Size: Message Overhead • More BGP update messages – More prefixes means more update messages – … and more bandwidth and CPU consumption – … and longer delays for bringing up a session • More BGP route flapping – More likely to have one or more flapping prefixes – … which consumes even more resources – … and makes the routing system less stable Growth in BGP Routing Table Size http://www.cisco.com/en/US/about/ac123/ac147/ac174/ac176/ about_cisco_ipj_archive_article09186a00800c83cc.html http://www.cs.princeton.edu/~jrex/teaching/spring2005/reading/bu02.pdf Pre-CIDR (1988-1994): Steep Growth Rate Growth faster than improvements in equipment capability CIDR Deployment (1994-1996): Much Flatter Efforts to aggregate (even decreases after IETF meetings!) CIDR Growth (1996-1998): Roughly Linear Good use of aggregation, and peer pressure in CIDR report Boom Period (1998-2001): Steep Growth Internet boom and increased multi-homing Long-Term View (1989-2005): Post-Boom Cause of Growth #1: Multi-Homing • Connecting to multiple providers – All providers must advertise the prefix – Hole-punching: subnet contained in a supernet ISP #1 12.0.0.0/8 12.1.1.0/24 ISP #2 3.0.0.0/8 12.1.1.0/24 Stub 12.1.1.0/24 • Detecting hole-punching – Stub AS connects to two or more ASes – Prefix is contained in one provider’s supernet Cause of Growth #2: Failure to Aggregate • Prefixes could be coalesced – Advertised exactly the same way – Adjacent prefixes or subnet/supernet relationship ISP #1 12.1.2.0/23 ISP #2 12.1.2.0/24 12.1.3.0/24 12.0.0.0/8 12.1.1.0/24 Stub 12.1.1.0/24 Stub Stub 12.1.2.0/24 12.1.3.0/24 • Detecting failure to aggregate – Prefixes with same attributes in set of BGP tables – Could be reduced to fewer prefixes by combining Cause of Growth #3: Load Balancing • Larger block sub-divided for more control – Advertise multiple subnets of a larger prefix – Treat differently to influence incoming traffic ISP #1 ISP #2 12.1.2.0/23 12.1.2.0/24 12.1.2.0/23 12.1.3.0/24 Stub • Detecting load balancing – Prefixes originated by the same AS – Could be collapsed (e.g., contiguous or contained) – … but, have different attributes, such as AS path Cause of Growth #4: Address Fragmentation • Different parts of the address space – Distinct address blocks allocated to same AS – Must be advertised separately in BGP ISP #1 18.8.0.0/16 12.1.1.0/24 Stub • Detecting address fragmentation – Prefixes announced the same way by same AS – Cannot be collapsed into fewer prefixes Significance of the Four Causes • Overall contribution – Address fragmentation is the most significant – The other three causes are all important as well • Growth over time – Increasing multi-homing – Increasing load balancing • Architectural implications – Exploit commonality across non-contiguous address blocks? – Multi-homing without hole-punching? – Load balancing without de-aggregating? Transient Growth in Table Size: Routing Leaks Transient spike due to neighbor’s BGP mistake Techniques for Limiting Table Size Hierarchical Address Allocation • Regional Internet Registries – Allocate large address blocks to ISPs – Publish guidelines for minimum block sizes • ARIN: in 63.0.0.0/8, no mask lengths more than /19 • APNIC: in 211.0.0.0/8, no mask lengths more than /23 • Internet Service Providers – Allocate smaller blocks to customers • Reclaim address blocks when customers leave – Hierarchical address allocation inside the ISP • Advertise subnets only when necessary • Customer-owned addresses and multi-homing Hierarchical Allocation: Only One Router Knows • Three-level hierarchy – ISP as a whole: 12.0.0.0/8 – Edge router in ISP: 12.1.0.0/16 – Customer at edge router: 12.1.2.0/24, 12.1.5.0/24 Only this router needs to know the small /24 blocks 12.0.0.0/8 12.1.0.0/16 Stub 12.1.2.0/24 Stub 12.1.5.0/24 Hierarchical Allocation: Only the ISP Knows • Customer connecting in multiple places – All routers in the ISP need to know the subnet – Otherwise they can’t reach all egress points 12.0.0.0/8 12.1.0.0/16 Stub 12.1.5.0/24 – But the rest of the Internet doesn’t need to know Hierarchical Allocation: Must Advertise • Sometimes have to advertise the subnet – Customer doesn’t fall in ISP’s address block – Customer connects to multiple providers 12.0.0.0/8 12.1.0.0/16 Stub 78.34.0.0/16 Stub 12.1.5.0/24 Another ISP Filtering Small Subnets on BGP Sessions • Small address blocks – Larger mask than RIR guidelines • E.g., filter /20 and longer in 63.0.0.0/8 – Or, all prefixes with mask longer than /24 • Trade-off on aggressive filtering – Don’t filter aggressively • Risk of exceeding memory limits on the router – Filter aggressively • Risk of disconnecting some parts of the Internet • Risk of thwarting stub ASes trying to load-balance • Who should pay to store the small subnets??? Prefix Limits to Protect Against Route Leaks • Vulnerability to other ASes – Sending many small subnets – Exporting address space they shouldn’t • Filtering policies may not be enough – E.g., all /24s is still 224 prefixes is still a lot • Max-prefix limit on BGP session – Per-session configurable limit on # of prefixes – Tear down the session if number exceeded – Not great, but better than exceeding the memory Fundamental Problems: Not Easily Automated • Dependence on “side information” – Customer prefix falls in provider’s address space? – Customer connects to ISP in multiple places? – Customer connects to multiple providers? • Auto-combining is hard in distributed system – Safe to combine 12.1.2.0/24 and 12.1.3.0/24??? – Depends on whether other ASes need the details not safe seems safe 12.1.2.0/24 12.1.3.0/24 Optimization: Reducing Forwarding Table Size • Local FIB minimization – Router locally minimizes size of forwarding table – E.g., purple router has FIB entry for 12.1.2.0/23 – … while still keeping both subnets in BGP table 12.1.2.0/24 12.1.3.0/24 – But, the size of the RIB may still be an issue Architectural Idea: Reducing BGP Table Size • Separating BGP propagation from the routers – Exchange BGP updates via separate servers – Servers tell routers only the BGP routes they need – … yet still propagate full details to neighbors 12.1.2.0/24 12.1.2.0/24 12.1.3.0/24 BGP 12.1.3.0/24 12.1.2.0/23 BGP – We’ll return to this idea in the coming weeks Conclusions • Scalability limitations – Resource constraints on routers – … impose limits on number of prefixes • Growth in the number of prefixes – Historical trends toward increasing table size – Multi-homing, failure to aggregate, load balancing, and address fragmentation • Approaches to limiting growth – Hierarchical address allocation – Careful scoping of BGP route advertisements – Explicit minimization of FIB and RIB sizes Next Time: Large Topologies • Two papers – “Hierarchical routing for large networks: Performance evaluation and optimization” – “BGP route reflection: An alternative to full mesh IBGP” • Review only of first paper – Summary – Why accept – Why reject – Avenues for future work • Optional reading – Fun 1928 article “On Being the Right Size”