Engineering robust routers

Design Requirements
Bullet-Proof Packet Passers
Avi Freedman
Chief Technical Officer, Netaxs
VP and Chief Network Architect, Akamai
Goals and problems in Good Networking
Current and future SLAs
Failure analysis
Hardware requirements
Software requirements
Sample architecture – Nortel OPC
Open questions
Goals for Good Networking
• The three things that customers seem to
want from IP networking:
Burstability/capacity assurance
• Order varies, but Stability is almost always
Problems in Good Networking
• Performance is often a backbone capacity –
and more often a peering/transit issues.
• Burstability problems come from lack of
large aggregation capabilities (no 100 gb
ports to connect 1gb customers to); a
soluble engineering effort, though, with
enough of even today’s hardware.
Problems in Good Networking
• The biggest problem is stability. Four main
Operator error
Fiber cuts
• One can argue over ranking, but all are important.
• Fiber is a soluble issue with money and
• We’ll revisit these.
Current and Future SLAs
• Today’s SLAs are fairly weak. SLAs of the future
will trend towards minutes per year of outage,
with large credits for complete outages.
• CDNs already offer SLAs that give 1 day credit
for a 15 minute slowdown (not even outage).
• Today’s hardware and software cannot be relied
upon to pass IP packets reliably enough to meet
these SLAs.
• To meet these SLAs, 5 minutes/year of systemwide outage is probably all that customers will
tolerate at some point – and the first network to
offer it in a vacuum will win huge market share.
Failure Analysis – Op Error
• What causes operator error?
• Often it’s not ignorance, but the fact that doing
distributed configuration is hard with today’s
• Key point – cisco ‘no’ method has caused many a
network outage.
• GUIs are unwiedly, though.
• And Unix OS on routers is a security problem!
• Industry work on ‘safer’ GUIs is needed.
Failure Analysis - Hardware
• Hardware is typically less of a problem, but
OIR often stands for “Online insert and
• The design needs to be simple, elegant, and
• Ideally, scalable and expandable as well, but
simplicity of design is the best assurance of
Failure Analysis - Software
• Router software causes literally hundreds of
outages per year – even (excuse the term)
megalapses inside networks.
• Most of the problems do NOT relate to
protocol design, though there are scaling
issues to be solved there.
• Most of the problems come from –
– Bad code
– Bad OS (OS fails to protect against bad code)
Failure Analysis – CPU Protection
• Additionally, there is a chronic problem in
that vendors are not providing sufficient
protection for the route-processing engines,
and as denial of service attacks get more
aggressive, this is a growing problem!
• The industry needs to describe to vendors
what rules are needed
– (Don’t allow multicast except for OSPF to
connected interfaces, etc…)
Failure Analysis – Software Modularity
• In addition to contributing to bad code, the
more monolithic nature of current router
OSs make it hard to avoid downtime while
upgrading the network.
• Upgrade-on-the-fly (with a base OS that
remains unchanged) is an elusive goal, but
it is achievable – 5ESS and DMS boxes
prove it.
Sample Architecture – Nortel OPC
• As a case study, we consider the Nortel
OPTera Packet Core, which has been
designed around carrier-class robustness,
with feedback from industry and telephonyswitch engineers.
• The OPC is a 3+-year-old research project
that went into “product” mode about a year
ago. Products are about a year out, so
Nortel is aggressively seeking input about
OPC – Design Requirements
• The OPC team defined 99.999% as the
target uptime, and defined “uptime” as
uptime across ports. So, 5 minutes
downtime across all (of up to) 480 ports, or
potentially more downtime across fewer
• Figures 2 software upgrades/year, and splits
“acceptable” failures roughly evenly
between hardware and software.
OPC – Hardware Overview
• The OPC starts with a base 20 slot
“application shelf” chassis of port and/or
processor cards, and fabric slots. Base
config can run in-chassis fabric, but is not
expandable on the fly.
• If broken out into an application shelf and
fabric shelf, can be expanded to full 480slot config without downtime or packet loss.
OPC – Hardware Overview
• Each slot has (up to) 10gb of “port” capacity, and
16gb of backplane (14.5gb effective after
• Maximally configured, a 4.8tb router consisting of
24 application shelves in 12 racks, 16 fabric
shelves in 4 bays, and a processor shelf.
• Each shelf can be up to 1km apart (entire system
must be within 1km diameter per spec, though it’s
not clear this is a robustness-enhancing function
until the router can operate partitioned)
OPC – Fabric
• The OPC fabric is “passive” – with each
possible set of boards, the config is fixed,
and no software is required to drive or
configure the fabric.
• Can be imagined as parallel train tracks,
with each board being a “station”, and
slightly fewer “trains” shuttling 4 cells of
traffic (each cell being one of 4 fixed
priorities per cell). More boards is more
OPC – Card Architecture
• Each card has a general-purpose CPU (Motorola
750), and two packet-processor chips (the RSP2).
• The RSP2 runs “software”, mostly microcode,
scheduling, etc…
• The RSP2 can do up to 100 instructions on each of
16 packets in parallel, and then in serial for packet
• For read-only packet processing, within 1% of line
rate is possible per card. 40-43 byte packets are
line-rate, 65-70 byte packets yeild < 1% loss,
beyond is line-rate.
OPC - Software
• The major cause of software-based router
failures is bad code. Ultimately, better
software engineering is required.
• Along the way, sound software architecture
and protective features are needed.
• And on-the-fly upgrade-ability.
• As well as main-CPU-protection.
OPC – Main-CPU Protection
• Each board’s RSP2s can do packet classification
inbound or outbound, can throw away packets,
replicate them (multicast or sniffing), kick them
up to the main CPU, or send them to another
• The capability exists as well to shape different
classes of traffic as part of kicking packets up to
the main CPU on-card or on another card.
• The key is the ruleset; input is needed.
Main CPU Protection
• As a general issue, rules should be reflected
in multiple router vendors.
• Rules such as –
– 64k/sec of BGP from an IP, only if we are
talking to that IP
– No non-OSPF multicast
– 10 packets per second to each connected IP
Nortel OPC - CLI
• Nortel is soliciting input on robust CLI
design to reduce operator error.
• Possibilities include ability for comments,
transactions (commit/rollback), networkwise synchronized update (though this can
cause instability as well)
OPC – Software Architecture
• We now talk about the software that runs on the
main CPUs, and the main Motorola 750 procs per
• Chorus multi-threaded, multi-CPU real-time OS as
a base. Has memory protection and preemptive
• IPC layer (“RACE”) on top, handles
communication between processes “agents” and
threads. Among other things, RACE allows
“virtual synchrony” – running multiple processes
in parallel and taking the first answer as a result.
• This allows for easy upgrading of processes, and
robustness in case of single- or multi-card failures.
Open Questions
• What are other vendors doing? Cisco, Juniper,
Avici all seem to be missing in major areas Nortel
is addressing. Of course, you can buy Cisco,
Juniper, and Avici products now 
• CLI design input
• CPU protection rule input
• Software architecture input (what modules should
be on-the-fly upgrade-able); for example, tradeoffs in BGP converge-ance vs. upgrade-ability.