An open problem in Internet Routing --- Policy Language Design for BGP Timothy G. Griffin Intel Research, Cambridge UK tim.griffin@intel.com Nov 3, 2003 Architecture of Dynamic Routing IGP EGP (= BGP) AS 1 IGP = Interior Gateway Protocol Metric based: OSPF, IS-IS, RIP, EIGRP (cisco) EGP = Exterior Gateway Protocol IGP AS 2 Policy based: BGP The Routing Domain of BGP is the entire Internet Technology of Distributed Routing Link State • • • • • • Topology information is flooded within the routing domain Best end-to-end paths are computed locally at each router. Best end-to-end paths determine next-hops. Based on minimizing some notion of distance Works only if policy is shared and uniform Examples: OSPF, IS-IS Vectoring • • • • • • Each router knows little about network topology Only best next-hops are chosen by each router for each destination network. Best end-to-end paths result from composition of all next-hop choices Does not require any notion of distance Does not require uniform policies at all routers Examples: RIP, BGP The Gang of Four Link State IGP EGP OSPF IS-IS Vectoring RIP BGP Partial View of www.cl.cam.ac.uk (128.232.0.20) Neighborhood AS 3356 Level 3 AS 5459 LINX AS 6461 AboveNet AS 20965 GEANT AS 786 ja.net (UKERNA) Originates > 180 prefixes, Including 128.232.0.0/16 AS 7 UK Defense Research Agency AS 1239 Sprint AS 702 UUNET AS 1213 HEAnet (Irish academic and research) AS 4373 Online Computer Library Center How Many ASNs are there today? 16,046 Thanks to Geoff Huston. http://bgp.potaroo.net on November 3, 2003 Four Types of BGP Messages • Open : Establish a peering session. • Keep Alive : Handshake at regular intervals. • Notification : Shuts down a peering session. • Update : Announcing new routes or withdrawing previously announced routes. announcement = prefix + attributes values7 BGP Attributes Value ----1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... 255 Code --------------------------------ORIGIN AS_PATH NEXT_HOP MULTI_EXIT_DISC LOCAL_PREF ATOMIC_AGGREGATE AGGREGATOR COMMUNITY ORIGINATOR_ID CLUSTER_LIST DPA ADVERTISER RCID_PATH / CLUSTER_ID MP_REACH_NLRI MP_UNREACH_NLRI EXTENDED COMMUNITIES Reference --------[RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1997] [RFC2796] [RFC2796] [Chen] [RFC1863] [RFC1863] [RFC2283] [RFC2283] [Rosen] Most important attributes reserved for development From IANA: http://www.iana.org/assignments/bgp-parameters Not all attributes need to be present in every announcement BGP Route Processing Open ended programming. Constrained only by vendor configuration language Receive Apply Policy = filter routes & BGP Updates tweak attributes Apply Import Policies Based on Attribute Values Best Routes Best Route Selection Best Route Table Apply Policy = filter routes & tweak attributes Transmit BGP Updates Apply Export Policies Install forwarding Entries for best Routes. IP Forwarding Table 9 Route Selection Summary Highest Local Preference Enforce relationships Shortest ASPATH Lowest MED i-BGP < e-BGP traffic engineering Lowest IGP cost to BGP egress Lowest router ID Throw up hands and break ties ASPATH Attribute AS 1129 135.207.0.0/16 AS Path = 1755 1239 7018 6341 135.207.0.0/16 AS Path = 1239 7018 6341 AS 1239 Sprint AS 1755 135.207.0.0/16 AS Path = 1129 1755 1239 7018 6341 Ebone AS 12654 AS 6341 AT&T Research RIPE NCC RIS project 135.207.0.0/16 AS Path = 7018 6341 AS7018 135.207.0.0/16 AS Path = 6341 Global Access 135.207.0.0/16 AS Path = 3549 7018 6341 AT&T 135.207.0.0/16 AS Path = 7018 6341 AS 3549 Global Crossing 135.207.0.0/16 Prefix Originated 11 Shorter Doesn’t Always Mean Shorter In fairness: could you do this “right” and still scale? Mr. BGP says that path 4 1 is better than path 3 2 1 Duh! AS 4 AS 3 Exporting internal state would dramatically increase global instability and amount of routing state AS 2 AS 1 Shedding Inbound Traffic with ASPATH Prepending AS 1 Prepending will (usually) force inbound traffic from AS 1 to take primary link provider 192.0.2.0/24 ASPATH = 2 2 2 192.0.2.0/24 ASPATH = 2 primary backup customer AS 2 192.0.2.0/24 Yes, this is a Glorious Hack … 13 … But Padding Does Not Always Work AS 1 AS 3 provider provider 192.0.2.0/24 ASPATH = 2 192.0.2.0/24 ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 2 primary backup customer AS 2 192.0.2.0/24 AS 3 will send traffic on “backup” link because it prefers customer routes and local preference is considered before ASPATH length! Padding in this way is often used as a form of load 14 balancing COMMUNITY Attribute to the Rescue! AS 1 AS 3 provider provider AS 3: normal customer local pref is 100, peer local pref is 90 192.0.2.0/24 ASPATH = 2 COMMUNITY = 3:70 192.0.2.0/24 ASPATH = 2 primary backup customer AS 2 192.0.2.0/24 Customer import policy at AS 3: If 3:90 in COMMUNITY then set local preference to 90 If 3:80 in COMMUNITY then set local preference to 80 If 3:70 in COMMUNITY then set local preference to 70 15 Don’t celebrate just yet… Provider A (Tier 1) peering Provider B (Tier 1) provider/customer provider/customer Provider C (Tier 2) customer Now, customer wants a backup link to C…. Customer installs a “backup link” … Provider A (Tier 1) Provider C (Tier 2) backup customer sends “lower my preference” Community value Provider B (Tier 1) primary customer Disaster Strikes! Provider A (Tier 1) Provider C (Tier 2) backup Provider B (Tier 1) primary customer customer is happy that backup was installed … The primary link is repaired, and something odd occurs… Provider A (Tier 1) Provider C (Tier 2) backup Provider B (Tier 1) primary customer YIKES --- routing DOES NOT return to normal!!! WAIT! It Gets Better… A B P B B C B D P = primary B = backup OOOOOPS! A B P B B C B Suppose A, B, C all D break ties in the same direction (clockwise or counter-clockwise) No solution = Protocol Divergence What the heck is going on? • There is no guarantee that a BGP configuration has a unique routing solution. – When multiple solutions exist, the (unpredictable) order of updates will determine which one is wins. • There is no guarantee that a BGP configuration has any solution! – And checking configurations NP-Complete [GW1999] • Complex policies (weights, communities setting preferences, and so on) increase chances of routing anomalies. – … yet this is the current trend! What Problem is BGP Solving? Underlying problem Distributed means of computing a solution. Shortest Paths RIP, OSPF, IS-IS Stable ???? Paths [GSW1998, GSW2002] BGP An instance of the Stable Paths Problem (SPP) •A graph of nodes and edges, •Node 0, called the origin, •For each non-zero node, a set or permitted paths to the origin. This set always contains the “null path”. •A ranking of permitted paths at each node. Null path is always least preferred. (Not shown in diagram) 1 When modeling BGP : nodes represent BGP speaking routers, and 0 represents a node originating some address block 210 2 20 5 5210 2 4 420 430 3 30 0 1 130 10 most preferred … least preferred A Solution to a Stable Paths Problem 2 210 20 A solution is an assignment of permitted paths to each node such that •node u’s assigned path is either the null path or is a path uwP, where wP is assigned to node w and {u,w} is an edge in the graph, •each node is assigned the highest ranked path among those consistent with the paths assigned to its neighbors. 1 5 5210 2 4 420 430 3 30 0 1 130 10 A Solution need not represent a shortest path tree, or a spanning tree. An SPP may have multiple solutions 120 10 120 10 1 120 10 1 0 0 2 210 20 DISAGREE 1 2 210 20 First solution 0 2 210 20 Second solution BAD GADGET : No Solution 2 210 20 4 0 130 10 1 3 3 320 30 This is an SPP version of the example first presented in Persistent Route Oscillations in Inter-Domain Routing. Kannan Varadhan, Ramesh Govindan, and Deborah Estrin. Computer Networks, Jan. 2000 SURPRISE! 210 20 BGP is not robust : it is not guaranteed to recover from network failures. 1 130 10 2 Becomes a BAD GADGET if link (4, 0) goes down. 4 40 420 430 0 3 3420 30 PRECARIOUS 4 310 3120 5 5310 563120 53120 4310 453120 43120 1 3 120 10 0 6 2 6310 643120 63120 This part has a solution only when node 1 is assigned the direct path (1 0). 210 20 As with DISAGREE, this part has two distinct solutions Has a solution, but path vector may not find it! A Sufficient Condition for Robustness P Q : transitive closure of (subpath relation on permitted paths union the path ranking relation at each node) Partially Partially Ordered (PP0): For all paths P and Q, P Q and Q P implies (P = Q or head(P) = head(Q)) This is a sufficient condition for robustness PPO iff ranking functions can be rewritten to be strictly increasing along all paths Checking PPO at the “language level” is an NP-Complete problem Why is BGP not causing more trouble? If the provider/customer digraph is acyclic and every AS obeys the commandments • Thou shall prefer customer routes over all others • Thou shall use provider routes only as a last resort • Thou shall not provide transit between peers or providers then the BGP configuration is robust. [see Gao-Rexford and Gao-Griffin-Rexford] Hierarchical BGP (HBGP) HBGP +PEER + BU HBGP + BU HBGP +PEER HBGP [GR2000, GGR2001] Can BGP be fixed? • BGP policy languages have evolved organically • A policy language really should be designed! • But how? Joint work with Aaron Jaggard (UPenn Math) and Vijay Ramachandran (Yale CS) to appear at SIGCOMM 2003 Design Dimensions • • • • • • Robustness (required!) Transparency (required!) Expressive Power Autonomy (“local wiggle room”) Local vs. Global Constraints Policy Opacity Tradeoffs galore General Autonomy Suppose C and K are any predicates that partition all routes. Then it is possible to write policies, with no inbound filtering, such that for all imported routes, those that satisfy C are ranked below those that satisfy K. A Partial Ordered for the Design Space Global Constraint Local Constraint ( J1 , L1 ) < ( J2 , L2 ) if and only if for all S : SPP 1. J(S) implies J(S) 2 1 2. L(S) implies L(S) 1 2 2 Robust Designs ( J, L ) is a robust design if (J and L ) implies PPO Examples: ( True, SP ) 2 ( PPO, True ) Expressive Power ( PPO, True ) Not tractable Tractable ( True, SP ) Constraint Simplicity Robust Subspace Need Global Constraints Theorem: Any robust system supporting both transparency and autonomy must have a non-trivial global constraint Global constraints must be a part of design from the start Next? • Need techniques for constructing policy languages. • Design of protocols to enforce global constraints. • Can ad-hocery be avoided?