Consensus Routing Antonio-Gabriel Sturzu, SCPD Table of Contents Introduction Consistency issues Consensus Routing Overview Stable Mode Transient Mode Performance and Overhead Introduction Internet routing, especially interdomain routing has favored responsiveness over consistency In interdomain routing a router applies a received update immediately to its forwarding table before propagating it to other routers BGP updates are known to cause up to 30% packet loss for two minutes or more after a routing change Transient loops account for 90% of all packet loss Introduction(2) The primary contribution of the article is that is that it separates the safety concept from the liveness concept and associates consistency with safety and responsiveness with liveness Consistency safety means that a router forwards a packet along a packet adopted by the upstream routers Liveness means that the system reacts quickly to failures or policy changes Separating safety and liveness improves end-to-end availability They are obtained through stable and transient modes Consistency Issues BGP link failures Consistency issues(2) BGP policy change Consistency issues(3) iBGP link recovery Such blackholes can cause packet loss for tens of seconds Consistency issues(4) BGP policy cycles Consensus Routing Overview Forwards packets using – Stable mode – Transient mode Consensus routers simply log the new routes computed by the policy engine Periodically all routers engage in a distributed coordination algorithm that determines the most recent set of complete updates Consensus Routing Overview(2) The coordination is based on classical distributed snapshot and consensus algorithms The routers use the output of the coordination to compute a set of stable forwarding tables (STFs) that are guaranteed to be consistent Stable Mode The distributed coordination algorithm proceeds in epochs Steps of an epoch k: – Update log – Distributed snapshot The snapshot is a globally consistent view of all the updates in the system (complete or incomplete) – Frontier computation Aggregation Consensus Flood Stable Mode(2) – SFT computation – View change Versioning Garbage colection Router State Routing Information Base (RIB) Stores for each destination – Route update received from each neighbor – Locally selected best route – Route advertised to each neighbor History Stores for each destination a chronological list of received and selected routes in the RIB SFTs Store for each destination the next-hop interfaces corresponding to the stable routes Router State(2) Triggers – Globally unique identifier for a set of causally related events propagating through the network – (AS number, trigger number) In consensus routing each update carries a trigger that is associated with the route being implicitly withdrawn and replaced by the route announced in the update It tracks when the implicit withdrawal is complete Router State(3) In order to maintain the safety property an AS A generates a new trigger to be sent along with an update upon – A failure of the next-hop in A’s current route to the destination – A policy change that causes A to prefer another route to the destination over the current one – Receiving a route from a neighbor B that it prefers over its current route via a different neighbor C Update Processing Distributed Snapshot Frontier Computation Aggregation – Send the set of triggers (complete or incomplete) Consensus – Consolidators ensure that There is no single point of failure No single AS is trusted with the task of consolidating the snapshot A consolidator is reachable from every AS with high probability When consensus ends the consolidators use the snapshot report in order to compute the set of incomplete triggers I in the network Frontier Computation(2) In order to compute the set I they use the following idea: – A trigger is said to depend on all trigers that precede it in the history table – A trigger t is said to be complete if neither t nor any of his predecessors are incomplete Flood – The set of incomplete triggers I and the set S of AS-es that succesfully participated in the distributed snapshot are sent to all AS-es Building SFTs Transient Mode Routing deflections Backtracking Detour routing Backup routes – Use RBGP – Choosing the most link-disjoint backup route from the primary route protects against single link failures Performance Link failures – For BGP 13% of failures cause at least half of all AS-es to experience routing loops – For Consensus Routing with transient forwarding Backtraking enables continuous connectivity for at least 74% of all AS-es following 99% of failure cases By detouring connectivity is 98.5% With backup routes connectivity is 98% Performance Policy change – For BGP in more than 55% of the test cases AS-es were disconnected from the destination due to transient loops formed during convergence – Consensus routing transitions from one set of consistent loop-free routes to another completely avoiding transient loops Overhead Volume of control traffic Overhead(2) Cost of consensus – For 9 nodes all the nodes learnt the agreed value in under 450 miliseconds – For 18 and 27 nodes times were 1.4 and 1.8 seconds Path dilation – Measures how far packets have to be redirected Overhead(3) Path dilation Overhead(4) Response time A 30 second epoch results in more than 90% of the paths being adopted in less than 2 minutes Overhead(5) Implementation Overhead – Consensus Routing adds 8% in update processing and about 11% additional lines of code to the BGP implementation