Internet Routing (COS 598A) Today: Non-Convergence: Policy Conflicts Jennifer Rexford http://www.cs.princeton.edu/~jrex/teaching/spring2005 Tuesdays/Thursdays 11:00am-12:20pm Outline • Stable Paths Problem – The problem BGP is solving – Abstract model for BGP – Translating reality into SPP • Conflicting routing policies – Examples of policy conflicts – Difficulty of identifying conflicts • Guaranteeing convergence – Guidelines based on business relationships – Provable convergence without global control • Recent work and a project idea What Problem Does a Routing Protocol Solve? • Most do shortest-path routing – Shortest hop count • Distance vector routing (e.g., RIP) – Shortest path as sum of link weights • Link-state routing (e.g., OSPF and IS-IS) • Policy makes BGP is more complicated – An AS might not tell a neighbor about a path • E.g., Sprint can’t reach UUNET through AT&T – An AS might prefer one path over a shorter one • E.g., ISP prefers to send traffic through a customer What is a good model for BGP? Could Use A Simulation Model • Simulate the message passing – Advertisements and withdrawals – Message format – Timers • Simulate the routing policy on each session – Filter certain route advertisements – Manipulate the attributes of others • Simulate the decision process – Each router applying all the steps per prefix Feasible, but tedious and ill-suited for formal arguments Stable Paths Problem (SPP) Instance • Node – BGP-speaking router – Node 0 is destination 210 2 20 – BGP adjacency – Set of1 routes to 0 at each node – Ranking of the paths 5210 2 • Edge • Permitted paths 5 4 420 430 3 30 0 1 130 10 most preferred … least preferred A Solution to a Stable Paths Problem • Solution 2 210 20 – Path assignment per node – Can be the “null” path – {u,w} is an edge in the graph – Node w is assigned path wP – The highest ranked path 1 consistent with the assignment of its neighbors 5210 2 • If node u has path uwP • Each node is assigned 5 4 420 430 3 30 0 1 130 10 A solution need not represent a shortest path tree, or a spanning tree. Translating a Real Configuration into SPP • Permitted paths at a node – Composition of export policies at other nodes Node 0 exports route to node 2 0 210 20 Node 2 exports 5210 “2 1 0” but not “2 0” 2 5 Node 1 exports “1 0” to node 2 • Ranking of paths at a node – Import policies at the node – Rank in terms of BGP decision process (i.e., local preference, AS path length, origin type, MED, …) An SPP May Have Multiple Solutions 120 10 120 10 1 120 10 1 0 0 2 210 20 1 2 210 20 First solution 0 2 210 20 Second solution An SPP May Have No Solution 2 210 20 4 0 130 10 1 3 3 320 30 Stable System Unstable After Failure 210 20 BGP is not robust : it is not guaranteed to recover from network failures. 1 130 10 2 Becomes a BAD GADGET if link (4, 0) goes down. 4 40 420 430 0 3 3420 30 Strawman Solution Doesn’t Work • Create a global Internet routing registry – Store the AS-level graph and all routing policies – Store all routing policies – But, ASes may be unwilling to divulge • Check for conflicting policies – Analyze the global system and identify conflicts – Contact the affected ASes to resolve them – But, checking is an NP-complete problem – … and, a safe system may be unsafe after failure Goal: sufficient condition for convergence with local control Guaranteeing Convergence Think Globally, Act Locally • Key features of a good solution – Flexibility: allow diverse local policies for each AS – Privacy: do not force ASes to divulge their policies – Backwards-compatibility: no changes to BGP – Guarantees: convergence even if system changes • Restrictions based on AS relationships – Path selection rules: which route you prefer – Export policies: who you tell about your route – AS graph structure: who is connected to who Customer-Provider Relationship • Customer pays provider for Internet access – Provider exports customer’s routes to everybody – Customer exports only to downstream customers Traffic to the customer Traffic from the customer d provider advertisements provider traffic customer d customer Peer-Peer Relationship • Peers exchange traffic between customers – AS exports only customer routes to a peer – AS exports a peer’s routes only to its customers Traffic to/from the peer and its customers advertisements peer d traffic peer Hierarchical AS Relationships • Provider-customer graph is directed & acyclic – If u is a customer of v and v is a customer of w – … then w is not a customer of u w v u Local Path Selection Rules • Classify routes based on next-hop AS – Customer routes, peer routes, and provider routes • Rank routes based on classification – Prefer customer routes over peer/provider routes • Allow any ranking of routes within a class – E.g., rank one customer route higher than another – Gives network operators the flexibility they need • Consistent with traffic engineering practices – Customers pay for service, and providers are paid – Peer relationship based on balanced traffic load Two Interpretations • System is stable because ASes act like this – High-level argument • Export and topology assumptions are reasonable • Path selection rule matches with financial incentives – Empirical results • BGP routes for popular destinations stable for ~10 days • Most instability from a few flapping destinations • ASes should follow rules for system stability – Encourage operators to obey these guidelines – … and provide ways to verify the configuration – Need to consider more complex relationships Playing One Condition Off Against Another • All three conditions are important – Path ranking, export policy, and graph structure • Allowing more flexibility in ranking routes – Allow same preference for peer and customer routes – Never choose a peer route over a shorter customer route • … at the expense of stricter AS graph assumptions – Hierarchical provider-customer relationship (as before) – No private peering with (direct or indirect) providers Peer-peer Extension to Backup Relationships • Backups: liberal export and ranking policies – The motivation is increased reliability – …but ironically it may cause routing instability! Backup Provider Peer-Peer Backup [RFC 1998] provider primary provider failure backup path failure backup path backup provider peer Backup Path Needs Global Significance 2 4 3 0 1 • Peer-backup relationship between 0 and 1 – Adds backup paths (2,1,0), (3,1,0), … • When link {2,0} fails… – Node 2 prefers (2,3,1,0) through a peer over the backup path (2,1,0) – Leads to the “bad gadget” example Backup Paths: Keeping Count of Backup Edges • Solution – Prefer routes with fewest backup links – Then, break ties by preferring customer routes • Mechanism – Tag BGP route advertisement with a counter – Increment the count as you cross a backup edge No backup One backup customer One backup peer 20 2 210 2310 2410 4 3 0 1 Recent Work Recent Work: Relaxing Export Rules • Goal: no restrictions on export and topology – Allow an AS to decide whether to export – Do not require hierarchical relationships • Question – How much do you have to restrict path ranking to have a guarantee that the system is safe? • Answer – Limited to shortest-path routing • Implications – Trade-off in safety, autonomy, & expressiveness Recent work by Nick Feamster and Ramesh Johari Recent Work: MED Oscillation (RFC 3345) • MED comparison when next-hop AS is same • No total ordering at the leftmost router – B > A: preferring smaller router-id – C > B: preferring smaller MED attribute – A > C: preferring eBGP-learned over iBGP AS 1 AS 2 B: Id=1, MED=20 C: MED=10 A: Id=2 iBGP Project Idea: Stable Paths Problem and Root-Cause Analysis Project Idea: Root-Cause Analysis • Root-cause analysis – Identify location and cause of routing changes – Inference from BGP protocol messages • Active area of research – Several proposed algorithms – Limited accuracy in making inferences • Research question – Is the problem just very hard? – Does the data not reveal enough information? • Project idea: study using SPP Project Idea, Continued • Model root-cause analysis – Start with an SPP instance – Fail a link (or a node) – See what path changes would occur • What events might cause these changes? 120 10 2340 20 1 340 320 3 2 0 4 40 Questions • Can you infer cause and location – If you observe routing changes at all nodes – If you observe only some of the nodes • What if you make some assumptions – E.g., policies based on business relationships • Where would you place monitors? – Best locations to place n monitors – Minimum number of monitors you need • What changes would you make to the routing protocol to make diagnosis easier? Next Time: Hot-Potato Routing • Two papers – “Dynamics of Hot-Potato Routing in IP Networks” – “TIE Breaking: Tunable Interdomain Egress Selection” • NANOG video – Covering material in the first paper • In honor of spring break – No written reviews • Talk with me about your course project – ... by Thursday March 24 – Final written report due Tuesday May 10