Evolving Toward a Self-Managing Network Jennifer Rexford Princeton University

advertisement
Evolving Toward a
Self-Managing Network
Jennifer Rexford
Princeton University
http://www.cs.princeton.edu/~jrex
Why is Network Management So Darn Hard?
• Oodles and oodles of complex features
– Many protocols
– Many mechanisms
– Many configurable parameters
• Little guidance for network administrators
– How to select and compose features?
– How to set the configurable parameters?
• Managing boxes, rather than networks
– Routers, switches, firewalls, IDSes, servers, etc.
– Low-level, box-specific configuration languages
The Enemy is Complexity
• Goal: raising the level of abstraction
– Network-level design and configuration
– Composition of protocols and mechanisms
• Idea #1: add abstraction on top
– Compile high-level spec into box configuration
– But, must grapple with inherent complexity
• Idea #2: design system for manageability
– Identify network-level abstractions
– … and change the boxes and protocols
– But, must grapple with backwards compatibility
Example: Border Gateway Protocol
• ASes exchange reachability information
– IP prefix: block of destination IP addresses
– AS path: sequence of ASes along the path
• Configurable routing policies
– Path selection (which route to use?)
– Path export (who to tell about the route?)
“12.34.158.0/24: path (7018,1,88)”
“12.34.158.0/24: path (88)”
88
1
7018
data traffic
data traffic
12.34.158.5
Some Things I Hate About BGP…
• Routers in an AS have different views
– Effect: protocol oscillation and loops
– Point fix: testing sufficient conditions
• Routing policy distributed across routers
Too
distributed
– Effect: routers need to share information
– Point fix: complex “tagging” of BGP routes
• Policy has only an indirect effect on traffic
– Effect: selecting the right policy is hard
– Point fix: “what if” tools for traffic engineering
• BGP route selection depends on the IGP
– Effect: disruptions from small internal changes
– Point fix: “what if” tools to identify risks
Too
indirect
Interdomain Routing: Design for Manageability
• Routing Control Platform
– Represents the AS to others
– Has complete view of candidate routes
– Computes answers for the AS’s routers
• Communicates with other ASes
– Using BGP or (ideally) a brand new protocol
RCP
AS 1
RCP
Physical
peering
AS 2
Inter-AS Protocol
RCP
AS 3
Advantages of RCP Approach
• Lower management complexity
– Complete, network-wide view
– Direct control over the routers
– Single specification of policies and objectives
• Simpler routers
– Much less control-plane software
– Much less configuration state
• Enabling innovation
– New algorithms for selecting paths within an AS
– New approaches to inter-AS routing
Deployability: Backwards Compatibility using BGP
• Border Gateway Protocol (BGP)
– Protocol: messages sent between routers
– Decision logic: route-selection process
– Policy: configurable rules for path selection/export
• The key point is that BGP has
– Complex decision logic and policies
– Yet a simple protocol (and message format)
– Use BGP messages to “program” the routers
Phase 1: Flexible Path Selection in One AS
Before: conventional use of BGP in backbone network
eBGP
iBGP
After: RCP learns routes and sends answers to routers
eBGP
RCP
iBGP
Phase 2: AS-Wide Path Selection and Export
Before: RCP gets “best” iBGP routes (and IGP feed)
eBGP
RCP
iBGP
After: RCP gets all eBGP routes from neighbors
eBGP
RCP
iBGP
Phase 3: Direct Communication Between RCPs
Before: RCP gets all eBGP routes from neighbors
eBGP
RCP
iBGP
After: ASes exchange routes via RCP
RCP
RCP
Inter-AS Protocol
RCP
iBGP
AS 1
Physical
peering
AS 2
AS 3
Systems Considerations (NSDI’05)
• Reliability
– Problem: single point of failure
– Solution: replication of RCP components
• Consistency
– Problem: inconsistent decisions by replicas
– Solution: consistency without inter-replica protocol
• Scalability
– Problem: storing and computing for all routers
– Solution: store each route once and amortize work
Example Network Management Applications
• Customer-driven route selection
– Customized load-balancing policies
– Geographic rules for route selection
• Blocking denial-of-service attacks
– “Blackhole” routes that drop traffic
– Only for routers carrying attack traffic
• Hitless maintenance
– Move traffic away from certain routers
– Before the operators bring down the routers
Conclusion
• Network management is too hard
– IP was not designed for management
– Complex, distributed operation of routers
• Must reduce complexity
– Network-wide views and objectives
– Direct control over the data plane
• RCP approach is feasible
– Deployable, scalable, and reliable
– Solves important management problems
• Many interesting open problems
Backup Slides
Routing Control Platform (RCP)
Routing Control
Platform (RCP)
Route Control
Server (RCS)
Options
Answers
BGP Engine
…
BGP
updates
…
BGP
updates
Network
Topology
OSPF Viewer
…
OSPF link-state
advertisements
Scalability: Standard Computing Platform
• Prototype on a high-end PC
– 3.2 GHz Pentium-4 with 8 GB of RAM
– Running the Linux 2.6.5 kernel
• Workload from the AT&T backbone
– Replay the BGP and OSPF messages
• Good RCP performance
– Memory usage: less than 2GB
– Speed, BGP changes: less than 40 msec
– Speed, topology changes: 0.1-0.8 seconds
Short answer: the system can keep up
Reliability: Replication and Consistency
• Replication: avoid single point of failure
– Multiple RCPs in a network
– Connected at different places
• Consistency: no explicit coordination
– Replica has full view of each partition
– Replicas perform the same algorithm on the
same data, and get the same answer
A
RCP A
A, B
B
RCP B
Download