New Routing Architectures Jennifer Rexford Advanced Computer Networks http://www.cs.princeton.edu/courses/archive/fall08/cos561/ Tuesdays/Thursdays 1:30pm-2:50pm Outline • Changing the routing architecture – Why? – Where and how? • Example architectures – Removing routing from routers – Hybrid Link-state/Path-vector – Resilient Overlay Routing Why Change Routing? • Better performance – Scalability, security, convergence, reliability, flexibility, … • Simpler management – For network operators – For folks deploying services • Greater extensibility – To enable experimentation – To enable new services What to Change, and Where? • Add another layer about network routing – Routing functionality in overlay networks • Change the routing protocols – To improve scalability, security, convergence, … • Change the division of functionality – Data, control, and management planes • Change the division of responsibility – End users, third parties, and service providers • ??? Removing Routing from Routers: Routing Control Platform, Routing as a Service, 4D Control Plane, Ethane, … Network Operators • Network-wide views – Network topology (e.g., routers, links) – Mapping to lower-level equipment – Traffic matrix • Network-level objectives – Load balancing – Survivability – Reachability – Security • Direct control – Explicit configuration of data-plane mechanisms What Should Routers Do? • Forward packets: yes – Must be done at high speed – … in line-card hardware on fast routers – So, needs to be done on the routers • Collect measurement data: yes – Traffic statistics – Topology information • Compute routes: no??? – Distributed computation of forwarding tables – Doesn’t inherently need to run on the routers Reasons to Remove Routing From Routers • Routing is hard to do in a distributed fashion – Beyond single-path and/or shortest-path routing • Difficult to make load-sensitive routing stable – Over-reacting to out-of-date information • Poor visibility to drive good decisions – Incomplete local views of topology and load • Not flexible enough for end users – Cannot easily select customized routes • Difficult to extend over time – Hard-coded into the underlying routers Routing Control Platform • Goal: Move beyond today’s artifacts, while remaining compatible with the legacy routers • RCP computes routes for the routers – Network-wide visibility and control • Backwards compatibility – RCP speaks to routers using BGP protocol RCP AS 2 Example Services • Selective denial-of-service attack blackholing – Identify entry point and victim of attack – Drop offending traffic at the entry point • Planned maintenance dryout – Drain traffic off of an edge router – Before bringing it down for maintenance • Flexible egress point selection – Multiple ways to reach the same destination – Giving customers control over the decision • Enhanced interdomain routing security – Anomaly detection or security protocols Routing As a Service • Goal: third parties pick end-to-end paths for clients to satisfy diverse user objectives • Forwarding infrastructure – Basic routing (e.g., default routing) – Primitives for inserting routes • Route selector – Aggregates network information – Selects routes on behalf of clients – Competes with other selectors for customers • End host – Queries route selector to set up paths Feasibility • Fast reaction to failures – Routers are closer to the failures – Can a service react quickly enough? • Scalability with network size – State and computation grow with the topology – Can a service manage a large network? • Reliability? – Service is now a point of failure – Is simple replication enough? • Security? – Service is now a natural point of attack – Easier (or harder) to protect than the routers? Improving BGP Convergence Routing Change: Before and After 0 0 (2,0) (2,0) (1,0) (1,2,0) 1 2 1 2 (3,2,0) (3,1,0) 3 3 Routing Change: Path Exploration • AS 1 – Delete the route (1,0) – Switch to next route (1,2,0) – Send route (1,2,0) to AS 3 0 (2,0) • AS 3 – Sees (1,2,0) replace (1,0) – Compares to route (2,0) – Switches to using AS 2 (1,2,0) 1 2 (3,2,0) 3 Routing Change: Path Exploration • Initial situation – Destination 0 is alive – All ASes use direct path (1,0) (1,2,0) (1,3,0) (2,0) (2,1,0) (2,3,0) (2,1,3,0) 1 2 • When destination dies – All ASes lose direct path – All switch to longer paths – Eventually withdrawn 0 • E.g., AS 2 – – – – (2,0) (2,1,0) (2,1,0) (2,3,0) (2,3,0) (2,1,3,0) (2,1,3,0) null 3 (3,0) (3,1,0) (3,2,0) Convergence Overhead and Delay • Path exploration is expensive – Large number of possible paths – Might have to explore (nearly) all of them • Much slower than link-state routing – Simply floods the topology – And routers compute shortest path • Any way to reduce BGP convergence time? – Avoid exploring paths with the same failure? – Hybrids of path vector and link state? HLP: Hybrid Link-state/Path-vector • Assume hierarchical AS structure – Provider-customer relationships dominate – And some peer-peer edges – (Are we willing to cook in these assumptions?) • Hybrid of link state and path vector – Link state within a sub-tree – Path vector across peer-peer links • Route on AS numbers – Rather than prefixes Add New Features in an Overlay: Resilient Overlay Networks Overlay Networks Overlay Networks RON: Resilient Overlay Networks Premise: by building application overlay network, can increase performance and reliability of routing Princeton application-layer router Yale Two-hop (application-level) Berkeley-to-Princeton route Berkeley http://nms.csail.mit.edu/ron/ RON Circumvents Policy Restrictions • IP routing depends on AS routing policies – But hosts may pick paths that circumvent policies USLEC me ISP PU Patriot My home computer RON Adapts to Network Conditions B A C • Start experiencing bad performance – Then, start forwarding through intermediate host RON Customizes to Applications B A bulk transfer C • VoIP traffic: low-latency path • Bulk transfer: high-bandwidth path How Does RON Work? • Keeping it small to avoid scaling problems – A few friends who want better service – Just for their communication with each other – E.g., VoIP, gaming, collaborative work, etc. • Send probes between each pair of hosts B A C How Does RON Work? • Exchange the results of the probes – Each host shares results with every other host – Essentially running a link-state protocol! – So, every host knows the performance properties • Forward through intermediate host when needed B B A C RON Works in Practice • Faster reaction to failure – RON reacts in a few seconds – BGP sometimes takes a few minutes • Single-hop indirect routing – No need to go through many intermediate hosts – One extra hop circumvents the problems • Better end-to-end paths – Circumventing routing policy restrictions – Sometimes the RON paths are actually shorter RON Limited to Small Deployments • Extra latency through intermediate hops – Software delays for packet forwarding – Propagation delay across the access link • Overhead on the intermediate node – Imposing CPU and I/O load on the host – Consuming bandwidth on the access link • Overhead for probing the virtual links – Bandwidth consumed by frequent probes – Trade-off probe overhead vs. detection speed • Possibility of causing instability – Moving traffic in response to poor performance – May lead to congestion on the new paths Future Routing Architecture • Who is in charge? – Network administrators? – End hosts? – Third-party overlays? – Third-party routing providers? • Build on top of today’s network? – New AS-level control plane? – Overlays on top of existing Internet? • Assume (restricted) economic models? – To improve scalability and convergence?