Controlling the Impact of BGP Policy Changes on IP Traffic Jennifer Rexford IP Network Management and Performance AT&T Labs – Research; Florham Park, NJ http://www.research.att.com/~jrex/papers/tm011106.02.ps Work with Nick Feamster & Jay Borkenhagen Summary • Why do interdomain traffic engineering? – We are just gluttons for punishment… – Operators need to control traffic on edge links • Why work within BGP? – BGP is what we have to work with… – Hopefully we’ll get insight for improving BGP • Our approach – Model influence of BGP policy changes on traffic – Find ways to minimize the overhead of changes – Limit the impact of changes on neighboring ASes – Evaluate ideas by using traffic and routing data What is Traffic Engineering? • Don’t IP networks manage themselves? – TCP adapts sending rate to network congestion – Routing protocols adapt to changes in topology • Goals: user performance & network efficiency – Add routers/links to increase network capacity – Change routing to improve the flow of traffic • Tuning the configuration of existing protocols – Works today without deploying new protocols – Avoids stability challenges of load-sensitive routing – Allows operators to incorporate diverse constraints – Gives insight to drive future changes to protocols BGP Traffic Engineering? • Limitations of intradomain traffic engineering – Alleviating congestion on edge links – Making use of new or upgraded edge links – Influencing choice of end-to-end path • Extra flexibility by allowing changes to BGP policies – Direct traffic toward/from certain edge links – Change the set of egress links for a destination 2 4 1 3 Impact of Routing on Traffic Flow Topology BGP updates Routing configuration Distributed routing protocols Flow of traffic through the network Offered traffic Components of BGP • BGP protocol – Definition of how two BGP neighbors communicate – Message formats, state machine, route attributes, etc. – Standardized by the IETF • BGP decision process – Complex sequence of rules for selecting the best route – De facto standard applied by router vendors – Certain steps can be disabled or tuned by configuration • Policy specification – Flexible language for filtering and manipulating routes – Indirectly affects the selection of the best route – Varies across vendors, though constructs are similar BGP Decision Process • Highest local preference – Set by import policies upon receiving advertisement • Shortest AS path – Included in the route advertisement • Lowest origin type – Included in advertisement or reset by import policy • Smallest multiple exit discriminator – Included in the advertisement or reset by import policy • Smallest internal path cost – Based on intradomain routing protocol (e.g., OSPF) • Smallest next-hop router id – Final tie-break Paths to a Destination Prefix Two routes with shortest AS path among routes with max local-pref… … each router selects “closest” egress point based on OSPF weights Limitations of BGP for TE • BGP protocol – Distributed, policy based path vector protocol – No capacity or traffic load information in attributes • Commercial realities – ASes may have different/inconsistent policy goals – Change in best path may affect neighbor’s choices – Commercial relationships impose some policy constraints • BGP decision process – – – – Policies have only indirect influence on path selection Each router has a single “best” BGP route per prefix All “best routes” for a prefix must have same path length Strict hierarchy of rules in the decision process Scoping the Problem • Predictability – Ensure the BGP decision process is deterministic – Assume that BGP updates are (relatively) stable • Outbound traffic (import policy, local preference) – Easier to control how traffic leaves the network – Cooperate with neighbor ASes for inbound traffic • Limit overhead introduced by routing changes – Minimize frequency of changes to routing policies – Limit number of prefixes affected by changes • Limit impact on how traffic enters the network – Avoid new routes that might change neighbor’s mind – Select route with same attributes, or at least path length AT&T Data (June 1, 2001) • Analysis of peering links – Links connecting AT&T to other large providers – Relatively small number of high-end routers – Availability of both routing and traffic data • BGP routing tables – Log daily output of “show ip bgp” command – Extract BGP advertisements for each destination prefix – Focus only on routes learned directly from peers • Cisco Netflow – Collect continuous flow-level measurement – Aggregate the traffic based on the destination prefix – Focus only on outbound traffic headed to peers Modeling Routing Choices • Representation of BGP updates per prefix – Consider all routes learned from neighboring ASes – Separate into rows based on AS path length – In each row, group routes by border router – Order routes by other attributes (origin, MED, id) • Example – 3: cgcil01: {1239 2179 11485, 701 2179 11485}, la2ca01: {1239 12179 11485} – 5: attga01: {3561 12179 12179 12179 11485}, la2ca01: {3561 12179 12179 12179 11485} Controlling the Scale • Destination prefixes – More than 90,000 destination prefixes Don’t want to have per-prefix routing policies – 1% of prefixes 55% of traffic Focus on the small number of heavy hitters – Define routing policies for selected prefixes • Routing choices – About 27,000 unique “routing choices” Help in reducing the scale of the problem – 1% of routing choices 70% of traffic Focus on the very small number of “routing choices” – Define routing policies on common attributes Avoid Impacting Downstream Neighbors Will traffic volume change??? Predictable Routing Change • Predictability – Do not change the route sent to downstream neighbor – Focus on prefixes where all “best” routes are identical – Neighbors do not even receive a new BGP advertisement! • Example application – Multiple links to same peer, with one congested link – Assign lower local-pref at that link for some prefixes • Empirical results – 83.5% of prefixes have shortest AS paths that are all identical (same next-hop AS, same AS path, etc.) – These prefixes are responsible for 45% of the traffic – Plenty of scope to move traffic in a predictable fashion Semi-Predictable Routing Change • Semi-predictability – Do not change the length of the AS path sent to neighbor – Neighbors receive new advertisement with same length – (Hopefully) they still make the same routing decision • Example application – Need to move some traffic from one peer to another – Find prefixes with “best” paths via both neighbors – Assign lower local-pref at some links for some prefixes • Empirical results – 10-15% of prefixes have shortest paths with 2 next hops – These prefixes contribute 35-40% of the traffic – Potential to move traffic in a semi-predictable fashion Influence of AS Path Length • AS path length – Plays a significant role in the BGP decision process – All “best” routes must have the same AS path length – 10% of prefixes have choices with different path lengths • An idea: a more flexible approach – Possible to disable consideration of AS path length, and incorporate AS path length in local-pref assignment – E.g., treat paths of length 3 and 4 as equally good • AS prepending by other ASes – Inflating AS path length by adding fake hops – E.g., “701 80 80 80” instead of “701 80” – 18% of routes had some form of AS prepending Conclusions • BGP traffic engineering – Alleviate congestion on the edge links between ASes – Configure BGP policies to control the flow of traffic • Data analysis – Extract routing choices from the BGP routing tables – Collect prefix-level measurements of outgoing traffic – Analyze traffic to help in scoping the BGP TE problem • Ongoing work – – – – Toolkit for what-if questions about changes in BGP policies Heuristics for suggesting possible BGP policy changes Techniques for coordination between neighboring ASes Lightweight support for measurement in IP routers (Shameless) Psamp Plug • IETF activity on packet sampling – Minimal functionality for packet-level measurement – Tunable trade-offs between overhead and accuracy – Measurement data for a variety of important applications • Basic idea: parallel filter/sample banks – – – – Filter on header fields (src/dest, port #s, protocol) 1-out-of-N sampling (random, periodic, or hash) Record key IP and TCP/UDP header fields Send measurement record to a collection system • Get involved!!! – http://www.ietf.org/internet-drafts/draft-duffieldframework-papame-01.txt – http://ops.ietf.org/lists/psamp/