Internet Routing (COS 598A) Jennifer Rexford Today: BGP Routing Table Size Tuesdays/Thursdays 11:00am-12:20pm

advertisement
Internet Routing (COS 598A)
Today: BGP Routing Table Size
Jennifer Rexford
http://www.cs.princeton.edu/~jrex/teaching/spring2005
Tuesdays/Thursdays 11:00am-12:20pm
Outline
• IP prefixes
– Review of CIDR and hierarchical allocation
– Resource constraints on IP routers
– Impact of increasing number of prefixes
• Growth in BGP routing table size
– Growth of global prefixes over time
– Characterizing the causes of growth
• Limiting the number of prefixes
– Techniques for limiting the size
– Fundamental challenges of limiting size
Classless InterDomain Routing (CIDR)
Use two 32-bit numbers to represent a network.
Network number = IP address + Mask
IP Address : 12.4.0.0
Address
Mask
IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
11111111 11111110 00000000 00000000
Network Prefix
for hosts
Usually written as 12.4.0.0/15
Hierarchy in Allocating Address Blocks
• Prefixes are key to Internet scalability
– Address allocation by ARIN/RIPE/APNIC and by ISPs
– Routing protocols and packet forwarding based on prefixes
– Today, routing tables contain ~150,000-200,000 prefixes
12.0.0.0/16
12.1.0.0/16
12.2.0.0/16
12.3.0.0/16
12.0.0.0/8
:
:
:
12.253.0.0/16
12.254.0.0/16
12.3.0.0/24
12.3.1.0/24
:
:
12.3.254.0/24
12.253.0.0/19
12.253.32.0/19
12.253.64.0/19
12.253.96.0/19
12.253.128.0/19
12.253.160.0/19
12.253.192.0/19
:
:
:
Resource Constraints on a High-End Router
Store forwarding
table and forward
data packets
Processor
Line card
Line card
Line card
Store routing table
and process routing
protocol messages
Line card
Switching
Fabric
Line card
Line card
Routing Information Base (RIB)
• Routing table for the routing protocol
– E.g., BGP routes learned from each neighbor
– Typically managed in software in router CPU
• Factors affecting RIB size
– Number of destination prefixes
– Number of BGP routes per prefix
– Size of each route (e.g., BGP attributes)
• Impact of a large RIB
– Higher delay to index or scan the table
– Ungraceful reaction to table overflow
Ungraceful Overload Behavior in BGP
• BGP is an incremental protocol
– Announcement when new route available
– Withdrawal when route no longer available
– No messages when nothing is changing
• Cannot discard or delete state
– … because you won’t receive the message again
– When table is full, router must drop session(s)
• Router reaction in practice may be worse
– E.g., drop all BGP sessions and reestablish
– E.g., interface lock-up till router is rebooted
– Reactions place heavy BGP load on neighbors
Forwarding Information Base
• Forwarding tables in IP routers
– Maps each IP prefix to next-hop link(s)
– Longest prefix match look-up for data packets
– Hardware on line card in high-end routers
• Impact of a large FIB
– Higher delay to construct/update the table
– Higher delay for packet lookup
– Incomplete table or router crash on overflow
forwarding table (FIB)
destination
12.34.158.5
4.0.0.0/8
4.83.128.0/17
12.0.0.0/8
12.34.158.0/24
126.255.103.0/24
outgoing link
Serial0/0.1
Impact of Table Size: Message Overhead
• More BGP update messages
– More prefixes means more update messages
– … and more bandwidth and CPU consumption
– … and longer delays for bringing up a session
• More BGP route flapping
– More likely to have one or more flapping prefixes
– … which consumes even more resources
– … and makes the routing system less stable
Growth in BGP Routing Table Size
http://www.cisco.com/en/US/about/ac123/ac147/ac174/ac176/
about_cisco_ipj_archive_article09186a00800c83cc.html
http://www.cs.princeton.edu/~jrex/teaching/spring2005/reading/bu02.pdf
Pre-CIDR (1988-1994): Steep Growth Rate
Growth faster than improvements in equipment capability
CIDR Deployment (1994-1996): Much Flatter
Efforts to aggregate (even decreases after IETF meetings!)
CIDR Growth (1996-1998): Roughly Linear
Good use of aggregation, and peer pressure in CIDR report
Boom Period (1998-2001): Steep Growth
Internet boom and increased multi-homing
Long-Term View (1989-2005): Post-Boom
Cause of Growth #1: Multi-Homing
• Connecting to multiple providers
– All providers must advertise the prefix
– Hole-punching: subnet contained in a supernet
ISP #1
12.0.0.0/8
12.1.1.0/24
ISP #2
3.0.0.0/8
12.1.1.0/24
Stub
12.1.1.0/24
• Detecting hole-punching
– Stub AS connects to two or more ASes
– Prefix is contained in one provider’s supernet
Cause of Growth #2: Failure to Aggregate
• Prefixes could be coalesced
– Advertised exactly the same way
– Adjacent prefixes or subnet/supernet relationship
ISP #1
12.1.2.0/23
ISP #2
12.1.2.0/24
12.1.3.0/24
12.0.0.0/8
12.1.1.0/24
Stub
12.1.1.0/24
Stub
Stub
12.1.2.0/24 12.1.3.0/24
• Detecting failure to aggregate
– Prefixes with same attributes in set of BGP tables
– Could be reduced to fewer prefixes by combining
Cause of Growth #3: Load Balancing
• Larger block sub-divided for more control
– Advertise multiple subnets of a larger prefix
– Treat differently to influence incoming traffic
ISP #1
ISP #2
12.1.2.0/23
12.1.2.0/24
12.1.2.0/23
12.1.3.0/24
Stub
• Detecting load balancing
– Prefixes originated by the same AS
– Could be collapsed (e.g., contiguous or contained)
– … but, have different attributes, such as AS path
Cause of Growth #4: Address Fragmentation
• Different parts of the address space
– Distinct address blocks allocated to same AS
– Must be advertised separately in BGP
ISP #1
18.8.0.0/16
12.1.1.0/24
Stub
• Detecting address fragmentation
– Prefixes announced the same way by same AS
– Cannot be collapsed into fewer prefixes
Significance of the Four Causes
• Overall contribution
– Address fragmentation is the most significant
– The other three causes are all important as well
• Growth over time
– Increasing multi-homing
– Increasing load balancing
• Architectural implications
– Exploit commonality across non-contiguous
address blocks?
– Multi-homing without hole-punching?
– Load balancing without de-aggregating?
Transient Growth in Table Size: Routing Leaks
Transient spike due to
neighbor’s BGP mistake
Techniques for Limiting Table Size
Hierarchical Address Allocation
• Regional Internet Registries
– Allocate large address blocks to ISPs
– Publish guidelines for minimum block sizes
• ARIN: in 63.0.0.0/8, no mask lengths more than /19
• APNIC: in 211.0.0.0/8, no mask lengths more than /23
• Internet Service Providers
– Allocate smaller blocks to customers
• Reclaim address blocks when customers leave
– Hierarchical address allocation inside the ISP
• Advertise subnets only when necessary
• Customer-owned addresses and multi-homing
Hierarchical Allocation: Only One Router Knows
• Three-level hierarchy
– ISP as a whole: 12.0.0.0/8
– Edge router in ISP: 12.1.0.0/16
– Customer at edge router: 12.1.2.0/24, 12.1.5.0/24
Only this router
needs to know the
small /24 blocks
12.0.0.0/8
12.1.0.0/16
Stub
12.1.2.0/24
Stub
12.1.5.0/24
Hierarchical Allocation: Only the ISP Knows
• Customer connecting in multiple places
– All routers in the ISP need to know the subnet
– Otherwise they can’t reach all egress points
12.0.0.0/8
12.1.0.0/16
Stub
12.1.5.0/24
– But the rest of the Internet doesn’t need to know
Hierarchical Allocation: Must Advertise
• Sometimes have to advertise the subnet
– Customer doesn’t fall in ISP’s address block
– Customer connects to multiple providers
12.0.0.0/8
12.1.0.0/16
Stub
78.34.0.0/16
Stub
12.1.5.0/24
Another ISP
Filtering Small Subnets on BGP Sessions
• Small address blocks
– Larger mask than RIR guidelines
• E.g., filter /20 and longer in 63.0.0.0/8
– Or, all prefixes with mask longer than /24
• Trade-off on aggressive filtering
– Don’t filter aggressively
• Risk of exceeding memory limits on the router
– Filter aggressively
• Risk of disconnecting some parts of the Internet
• Risk of thwarting stub ASes trying to load-balance
• Who should pay to store the small subnets???
Prefix Limits to Protect Against Route Leaks
• Vulnerability to other ASes
– Sending many small subnets
– Exporting address space they shouldn’t
• Filtering policies may not be enough
– E.g., all /24s is still 224 prefixes is still a lot
• Max-prefix limit on BGP session
– Per-session configurable limit on # of prefixes
– Tear down the session if number exceeded
– Not great, but better than exceeding the memory
Fundamental Problems: Not Easily Automated
• Dependence on “side information”
– Customer prefix falls in provider’s address space?
– Customer connects to ISP in multiple places?
– Customer connects to multiple providers?
• Auto-combining is hard in distributed system
– Safe to combine 12.1.2.0/24 and 12.1.3.0/24???
– Depends on whether other ASes need the details
not safe
seems safe
12.1.2.0/24
12.1.3.0/24
Optimization: Reducing Forwarding Table Size
• Local FIB minimization
– Router locally minimizes size of forwarding table
– E.g., purple router has FIB entry for 12.1.2.0/23
– … while still keeping both subnets in BGP table
12.1.2.0/24
12.1.3.0/24
– But, the size of the RIB may still be an issue
Architectural Idea: Reducing BGP Table Size
• Separating BGP propagation from the routers
– Exchange BGP updates via separate servers
– Servers tell routers only the BGP routes they need
– … yet still propagate full details to neighbors
12.1.2.0/24
12.1.2.0/24
12.1.3.0/24
BGP
12.1.3.0/24
12.1.2.0/23
BGP
– We’ll return to this idea in the coming weeks
Conclusions
• Scalability limitations
– Resource constraints on routers
– … impose limits on number of prefixes
• Growth in the number of prefixes
– Historical trends toward increasing table size
– Multi-homing, failure to aggregate, load balancing,
and address fragmentation
• Approaches to limiting growth
– Hierarchical address allocation
– Careful scoping of BGP route advertisements
– Explicit minimization of FIB and RIB sizes
Next Time: Large Topologies
• Two papers
– “Hierarchical routing for large networks:
Performance evaluation and optimization”
– “BGP route reflection: An alternative to full mesh
IBGP”
• Review only of first paper
– Summary
– Why accept
– Why reject
– Avenues for future work
• Optional reading
– Fun 1928 article “On Being the Right Size”
Download