Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University http://www.cs.princeton.edu/~jrex Why is Network Management So Darn Hard? • Oodles and oodles of complex features – Many protocols – Many mechanisms – Many configurable parameters • Little guidance for network administrators – How to select and compose features? – How to set the configurable parameters? • Managing boxes, rather than networks – Routers, switches, firewalls, IDSes, servers, etc. – Low-level, box-specific configuration languages The Enemy is Complexity • Goal: raising the level of abstraction – Network-level design and configuration – Composition of protocols and mechanisms • Idea #1: add abstraction on top – Compile high-level spec into box configuration – But, must grapple with inherent complexity • Idea #2: design system for manageability – Identify network-level abstractions – … and change the boxes and protocols – But, must grapple with backwards compatibility Example: Border Gateway Protocol • ASes exchange reachability information – IP prefix: block of destination IP addresses – AS path: sequence of ASes along the path • Configurable routing policies – Path selection (which route to use?) – Path export (who to tell about the route?) “12.34.158.0/24: path (7018,1,88)” “12.34.158.0/24: path (88)” 88 1 7018 data traffic data traffic 12.34.158.5 Some Things I Hate About BGP… • Routers in an AS have different views – Effect: protocol oscillation and loops – Point fix: testing sufficient conditions • Routing policy distributed across routers Too distributed – Effect: routers need to share information – Point fix: complex “tagging” of BGP routes • Policy has only an indirect effect on traffic – Effect: selecting the right policy is hard – Point fix: “what if” tools for traffic engineering • BGP route selection depends on the IGP – Effect: disruptions from small internal changes – Point fix: “what if” tools to identify risks Too indirect Interdomain Routing: Design for Manageability • Routing Control Platform – Represents the AS to others – Has complete view of candidate routes – Computes answers for the AS’s routers • Communicates with other ASes – Using BGP or (ideally) a brand new protocol RCP AS 1 RCP Physical peering AS 2 Inter-AS Protocol RCP AS 3 Advantages of RCP Approach • Lower management complexity – Complete, network-wide view – Direct control over the routers – Single specification of policies and objectives • Simpler routers – Much less control-plane software – Much less configuration state • Enabling innovation – New algorithms for selecting paths within an AS – New approaches to inter-AS routing Deployability: Backwards Compatibility using BGP • Border Gateway Protocol (BGP) – Protocol: messages sent between routers – Decision logic: route-selection process – Policy: configurable rules for path selection/export • The key point is that BGP has – Complex decision logic and policies – Yet a simple protocol (and message format) – Use BGP messages to “program” the routers Phase 1: Flexible Path Selection in One AS Before: conventional use of BGP in backbone network eBGP iBGP After: RCP learns routes and sends answers to routers eBGP RCP iBGP Phase 2: AS-Wide Path Selection and Export Before: RCP gets “best” iBGP routes (and IGP feed) eBGP RCP iBGP After: RCP gets all eBGP routes from neighbors eBGP RCP iBGP Phase 3: Direct Communication Between RCPs Before: RCP gets all eBGP routes from neighbors eBGP RCP iBGP After: ASes exchange routes via RCP RCP RCP Inter-AS Protocol RCP iBGP AS 1 Physical peering AS 2 AS 3 Systems Considerations (NSDI’05) • Reliability – Problem: single point of failure – Solution: replication of RCP components • Consistency – Problem: inconsistent decisions by replicas – Solution: consistency without inter-replica protocol • Scalability – Problem: storing and computing for all routers – Solution: store each route once and amortize work Example Network Management Applications • Customer-driven route selection – Customized load-balancing policies – Geographic rules for route selection • Blocking denial-of-service attacks – “Blackhole” routes that drop traffic – Only for routers carrying attack traffic • Hitless maintenance – Move traffic away from certain routers – Before the operators bring down the routers Conclusion • Network management is too hard – IP was not designed for management – Complex, distributed operation of routers • Must reduce complexity – Network-wide views and objectives – Direct control over the data plane • RCP approach is feasible – Deployable, scalable, and reliable – Solves important management problems • Many interesting open problems Backup Slides Routing Control Platform (RCP) Routing Control Platform (RCP) Route Control Server (RCS) Options Answers BGP Engine … BGP updates … BGP updates Network Topology OSPF Viewer … OSPF link-state advertisements Scalability: Standard Computing Platform • Prototype on a high-end PC – 3.2 GHz Pentium-4 with 8 GB of RAM – Running the Linux 2.6.5 kernel • Workload from the AT&T backbone – Replay the BGP and OSPF messages • Good RCP performance – Memory usage: less than 2GB – Speed, BGP changes: less than 40 msec – Speed, topology changes: 0.1-0.8 seconds Short answer: the system can keep up Reliability: Replication and Consistency • Replication: avoid single point of failure – Multiple RCPs in a network – Connected at different places • Consistency: no explicit coordination – Replica has full view of each partition – Replicas perform the same algorithm on the same data, and get the same answer A RCP A A, B B RCP B