Design Requirements for Bullet-Proof Packet Passers Avi Freedman avi@freedman.net Chief Technical Officer, Netaxs VP and Chief Network Architect, Akamai Overview • • • • • • • Goals and problems in Good Networking Current and future SLAs Failure analysis Hardware requirements Software requirements Sample architecture – Nortel OPC Open questions Goals for Good Networking • The three things that customers seem to want from IP networking: – – – – Stability Performance Burstability/capacity assurance Price • Order varies, but Stability is almost always #1 Problems in Good Networking • Performance is often a backbone capacity – and more often a peering/transit issues. • Burstability problems come from lack of large aggregation capabilities (no 100 gb ports to connect 1gb customers to); a soluble engineering effort, though, with enough of even today’s hardware. Problems in Good Networking • The biggest problem is stability. Four main causes: – – – – Operator error Software Fiber cuts Hardware • One can argue over ranking, but all are important. • Fiber is a soluble issue with money and engineering. • We’ll revisit these. Current and Future SLAs • Today’s SLAs are fairly weak. SLAs of the future will trend towards minutes per year of outage, with large credits for complete outages. • CDNs already offer SLAs that give 1 day credit for a 15 minute slowdown (not even outage). • Today’s hardware and software cannot be relied upon to pass IP packets reliably enough to meet these SLAs. • To meet these SLAs, 5 minutes/year of systemwide outage is probably all that customers will tolerate at some point – and the first network to offer it in a vacuum will win huge market share. Failure Analysis – Op Error • What causes operator error? • Often it’s not ignorance, but the fact that doing distributed configuration is hard with today’s tools. • Key point – cisco ‘no’ method has caused many a network outage. • GUIs are unwiedly, though. • And Unix OS on routers is a security problem! • Industry work on ‘safer’ GUIs is needed. Failure Analysis - Hardware • Hardware is typically less of a problem, but OIR often stands for “Online insert and reboot”. • The design needs to be simple, elegant, and redundant. • Ideally, scalable and expandable as well, but simplicity of design is the best assurance of stability. Failure Analysis - Software • Router software causes literally hundreds of outages per year – even (excuse the term) megalapses inside networks. • Most of the problems do NOT relate to protocol design, though there are scaling issues to be solved there. • Most of the problems come from – – Bad code – Bad OS (OS fails to protect against bad code) Failure Analysis – CPU Protection • Additionally, there is a chronic problem in that vendors are not providing sufficient protection for the route-processing engines, and as denial of service attacks get more aggressive, this is a growing problem! • The industry needs to describe to vendors what rules are needed – (Don’t allow multicast except for OSPF to connected interfaces, etc…) Failure Analysis – Software Modularity • In addition to contributing to bad code, the more monolithic nature of current router OSs make it hard to avoid downtime while upgrading the network. • Upgrade-on-the-fly (with a base OS that remains unchanged) is an elusive goal, but it is achievable – 5ESS and DMS boxes prove it. Sample Architecture – Nortel OPC • As a case study, we consider the Nortel OPTera Packet Core, which has been designed around carrier-class robustness, with feedback from industry and telephonyswitch engineers. • The OPC is a 3+-year-old research project that went into “product” mode about a year ago. Products are about a year out, so Nortel is aggressively seeking input about robustness! OPC – Design Requirements • The OPC team defined 99.999% as the target uptime, and defined “uptime” as uptime across ports. So, 5 minutes downtime across all (of up to) 480 ports, or potentially more downtime across fewer ports. • Figures 2 software upgrades/year, and splits “acceptable” failures roughly evenly between hardware and software. OPC – Hardware Overview • The OPC starts with a base 20 slot “application shelf” chassis of port and/or processor cards, and fabric slots. Base config can run in-chassis fabric, but is not expandable on the fly. • If broken out into an application shelf and fabric shelf, can be expanded to full 480slot config without downtime or packet loss. OPC – Hardware Overview • Each slot has (up to) 10gb of “port” capacity, and 16gb of backplane (14.5gb effective after overhead). • Maximally configured, a 4.8tb router consisting of 24 application shelves in 12 racks, 16 fabric shelves in 4 bays, and a processor shelf. • Each shelf can be up to 1km apart (entire system must be within 1km diameter per spec, though it’s not clear this is a robustness-enhancing function until the router can operate partitioned) OPC – Fabric • The OPC fabric is “passive” – with each possible set of boards, the config is fixed, and no software is required to drive or configure the fabric. • Can be imagined as parallel train tracks, with each board being a “station”, and slightly fewer “trains” shuttling 4 cells of traffic (each cell being one of 4 fixed priorities per cell). More boards is more stations. OPC – Card Architecture • Each card has a general-purpose CPU (Motorola 750), and two packet-processor chips (the RSP2). • The RSP2 runs “software”, mostly microcode, scheduling, etc… • The RSP2 can do up to 100 instructions on each of 16 packets in parallel, and then in serial for packet modification. • For read-only packet processing, within 1% of line rate is possible per card. 40-43 byte packets are line-rate, 65-70 byte packets yeild < 1% loss, beyond is line-rate. OPC - Software • The major cause of software-based router failures is bad code. Ultimately, better software engineering is required. • Along the way, sound software architecture and protective features are needed. • And on-the-fly upgrade-ability. • As well as main-CPU-protection. OPC – Main-CPU Protection • Each board’s RSP2s can do packet classification inbound or outbound, can throw away packets, replicate them (multicast or sniffing), kick them up to the main CPU, or send them to another port/card. • The capability exists as well to shape different classes of traffic as part of kicking packets up to the main CPU on-card or on another card. • The key is the ruleset; input is needed. Main CPU Protection • As a general issue, rules should be reflected in multiple router vendors. • Rules such as – – 64k/sec of BGP from an IP, only if we are talking to that IP – No non-OSPF multicast – 10 packets per second to each connected IP Nortel OPC - CLI • Nortel is soliciting input on robust CLI design to reduce operator error. • Possibilities include ability for comments, transactions (commit/rollback), networkwise synchronized update (though this can cause instability as well) OPC – Software Architecture • We now talk about the software that runs on the main CPUs, and the main Motorola 750 procs per board. • Chorus multi-threaded, multi-CPU real-time OS as a base. Has memory protection and preemptive multitasking. • IPC layer (“RACE”) on top, handles communication between processes “agents” and threads. Among other things, RACE allows “virtual synchrony” – running multiple processes in parallel and taking the first answer as a result. • This allows for easy upgrading of processes, and robustness in case of single- or multi-card failures. Open Questions • What are other vendors doing? Cisco, Juniper, Avici all seem to be missing in major areas Nortel is addressing. Of course, you can buy Cisco, Juniper, and Avici products now • CLI design input • CPU protection rule input • Software architecture input (what modules should be on-the-fly upgrade-able); for example, tradeoffs in BGP converge-ance vs. upgrade-ability.