ISIS Next Generation Router Shahzad Ali Xia Chen

advertisement
Carnegie
Mellon
ISIS
Next Generation Router
Shahzad Ali
Xia Chen
Brendan Howell
Yu Zhong
Broadband Networks, 2000
Motivation
Carnegie
Mellon

Anybody can make a router.
‡ Key is to make one with
– High speed (OC-48 and up)
– High port densities
– Better performance (throughput, delay, latency)
– Cheap
– etc.

But we always forget some requirements.
‡ The new network requires new services
– Guaranteed bandwidth/latency (QoS)
– Scaleable Design
Now the task is not that simple!!
Broadband Networks, 2000
Possible uses for our router
Carnegie
Mellon









Aggregation point for large data centers
Backbone router at a major peering point
Backbone router for carrier facilities
Core router for IP transit providers
High bandwidth
Scaleable
QoS
Robustness
Major Routing Protocol support.
Broadband Networks, 2000
So what did we come up with?
Carnegie
Mellon







Design a router that meets the minimum specifications
Maximize switch throughput
Provide support for strong QoS in the switch with
realistic assumptions
Build an extensible simulator
Evaluate the design using simulations
Make the design realistic!
Tackle the tough issue of scalability
Result: ISIS
Broadband Networks, 2000
Design Decisions
Carnegie
Mellon

64 port OC-48 with total capacity of 160 Gb/s
‡ Our base design allows for 128 OC-48 (320 Gb/s)
‡ Uses PMC Sierra PM9311 and PM9312 crossbar and
scheduler.

Buffers
‡ PC-100 10ns access latency
‡ Dimension buffer by simulation

Queuing?
‡ Output Queuing
‡ Input Queuing
Broadband Networks, 2000
Output Queuing
Carnegie
Mellon

Provides perfect QoS.

Switching Fabric
‡ Can order outgoing cells
according to their
priorities using FQ (WFQ,
WF2Q+, etc)
‡ Use fast sorting
techniques to speedup the
process*.
But, requires a speedup of
N for switching fabric
‡ Not scaleable for high
bandwidth switches.
‡ Not cheap either.
Input
*“A simple and Fast Hardware Implementation Architecture for WFQ Algorithms”,
Nen-Fu Huang and Chi-An Su
Broadband Networks, 2000
Output
Input Queuing
Carnegie
Mellon

Provides no support for
hard guarantees (QoS).

Switching Fabric
‡ Difficult to maximize
switch throughput while
providing these
guarantees.
But, simple to implement,
with minimal speedup and
maximum throughput.
‡ Make scheduling decisions
using a smart scheduling
algorithm.
‡ Use iSlip or similar
algorithms to achieve high
throughput
Broadband Networks, 2000
Input
Output
Queuing
Carnegie
Mellon

Output Queuing
‡ No!
‡ Violates our design goal of scalability.
‡ Too Expensive

Input Queuing
‡ Maybe
‡ If we can do proper QoS with it.

Maybe a third option?
Broadband Networks, 2000
QoS with Input Queuing
Carnegie
Mellon







The only way we can imagine this happening is with
buffered crossbars
The buffers are provided at each cross-point
Number of buffers are bounded (between 3-5 cells)
Use FQ servers at Input, crossbar buffers and output to
achieve QoS
The solution is hard to implement
Plus it only provides probabilistic guarantees
Plus we don’t even know if it actually works
Broadband Networks, 2000
QoS with Input Queuing
Carnegie
Mellon

Some recent work focuses on this aspect.
‡ “Implementing Distributed Packet fair Queuing in a
Scalable Switch Architecture”, D. C. Stephens
‡ “A Distributed Scheduling Architecture for Scalable
Packet Switches”, F. M. Chiussi and A. Francini
Is that the best we can do?
Broadband Networks, 2000
CIOQ
Carnegie
Mellon

Combined Input output queuing: combines the benefits
of both Input and Output queuing.
‡ Input queuing is speedup of 1
‡ Output queuing is speedup of N.
‡ CIOQ is between 1 and N, need to buffer at input and
output.

Requires a speedup of only 2 to simulate an output
queuing switch completely.
‡ Reasonable given the difficulty of our initial goal.

Guarantees the exact same output as a OQ switch.
Broadband Networks, 2000
CIOQ
Carnegie
Mellon

Assigns a value called slackness to each VOQ at the
input.
‡ Slackness is the urgency of a packet. Slackness of 0
means high urgency.
‡ Inputs and outputs select ports based on a priority
calculated by the FQ discipline used.
– For FCFS, high priority for packet that came earlier.


Uses Gale-Shapely algorithm to do a stable matching
of inputs to outputs after they have been selected in
Phase I.
No analysis has been done on the throughput of a
CIOQ switch, just an analytical bound on its proximity
to OQ switch.
Broadband Networks, 2000
CIOQ
Carnegie
Mellon

Some recent work in this area
‡ “Delay-Bound Guarantee in CIOQ Switches”, H. Chao, L.
Shen Chen
‡ “Matching Output Queuing with CIOQ Switch”, S.
Chaung, A. Goel, N. McKeown, B. Prabhakar.
‡ “College Admissions and the stability of marriage”, D.
Gale, L. Shapely.
Broadband Networks, 2000
ngrSim
Carnegie
Mellon


A simulator for ISIS: Next Generation Router.
ngrSim follows a modular design approach.
‡ All components are coded as modules that can be
plugged in place of others.
‡ Allows for easy mix-and-match of various schemes.

It is event driven … completely.
‡ Everything is driven by the event queue.
Broadband Networks, 2000
Event handling
Carnegie
Mellon


An abstract class handler is
defined.
A class event is described
which consists of an object
of type handler.
‡ The handler is supposed
to be the component that
is responsible for the
event.
‡ When the event is run, the
handle function of the
handler is called.

Event are enqueued into an
event queue.
class handler {
public:
handler() { };
virtual ~handler() { };
virtual void handle(event *e) = 0;
void set_next_handler(handler * h)
{
next_handler = h;
}
protected:
handler * next_handler;
};
Event
Port ID
Data
Handler
Broadband Networks, 2000
Event handling
Carnegie
Mellon

Efficient event handling capability.
‡ Events are queued in a structure called a Skip-list,
developed by William Pugh at University of Maryland*.
‡ A probabilistic list which allows for O(1) inserts, deletes
and searches.


Simulation time is kept track of in the event queue.
The event at the head of the event queue is run (handle
function called) and the simulation time is updated.
*http://www.cs.umd.edu/~pugh
Broadband Networks, 2000
Main Loop for ngrSim
Carnegie
Mellon
// Initialize the components tgen, ipp,
// framer, sched, fab, reframer, opp
((crossbar*)fab)->start();
((tgen*)tg)->start();
while (1){
event* e = (eq.instance()).dequeue();
if (e == NULL && !sim_running)
break;
(e->get_next_handler())->handle(e);
if ((eq.instance()).get_time() > simTime){
sim_running = 0;
((tgen*)tg)->stop();
((crossbar*)fab)->stop();
}
}
// print Stats
Broadband Networks, 2000
Initialize the switch and declare various
components. ALL READ FROM A
CONFIGURATION FILE
Start the periodic fabric pull and
the traffic generation events.
Main Loop
Get the event of the head of the
queue. This updates the system
time as well
Call the handle function.
Check if simulation ended
Print the statistics and draw
graphs
Design of ngrSim
Carnegie
Mellon

The design follows exactly from the design we had in
the earlier design review.
Broadband Networks, 2000
Object Diagram for ngrSim
Carnegie
Mellon
(Variable Length)
(64 Bytes, 6 Bytes Fixed Header)
Packet
Packet
FRAMER
IPP
Route Lookup
TGEN
Checksum
Break packet into
cells
Etc
Scheduler
Traffic
Cell
Fabric
Traffic
FIFO
iSlip
Put packets on the
outgoing link
CIOQ
Use FQ here as well
OPP
REFRAMER
Assemble cells
back to packets
simpleOPP
Broadband Networks, 2000
cioqOPP
Crossbar
Description of the components
Carnegie
Mellon


Packets are generated in TGEN module and are passed
as events to IPP, Framer and Scheduler.
Framer breaks packets into cells.
‡ Cells are 64 bytes with 6 bytes of header.
– 2 input + output port number = 4 bytes
– 1 byte for cell ID
– 1 byte for flag + priority

Scheduler is an abstract class. All schedulers that are
implemented derive from this base class.
‡ We have FCFS, iSlip and CIOQ implemented.

The scheduler does not pass the packets on to the
fabric. The fabric is running at a fixed time slot which
is determined by the cell size and line speed.
Broadband Networks, 2000
Description of the components
Carnegie
Mellon


The fabric pulls cells from the scheduler at these
regular intervals.
Fabric is also an abstract class. All fabrics will derive
form this class.
‡ We currently have a crossbar fabric implemented.

The fabric sends the cells to the reframer.
‡ Reframer reassemble cells into packets.
‡ If some cell is not received in a certain time limit, the
partial packet is discarded.

The reframer passes the reassembled packets to the
OPP.
‡ The OPP queues the packets and sends them out at the
link rate.
‡ We have simpleOPP and cioqOPP implemented.
Broadband Networks, 2000
Statistics
Carnegie
Mellon

All modules have statistics.
‡ They calculate them independently and can be queried.
‡ We keep track of
– cell and packet count
– buffer sizes
– delays
– drops

We ran experiments for 100 million clock ticks.
‡ Each data point is a result of 10 runs of the same
experiment with different random seeds.
‡ The values were only collected after steady-state.
‡ Due to time-constraint, we could only run for 16 ports.
Broadband Networks, 2000
Carnegie
Mellon
Throughput (FIFO)
Theoretical
bound = 58 %
Scheduler
throughput is
45%
?
Overall
throughput is
30%
Broadband Networks, 2000
Carnegie
Mellon
Throughput (speedup = 1.25)
Theoretical
bound = 58 %
Scheduler
throughput is
55%
Overall
throughput is
40 %
Broadband Networks, 2000
FIFO with different Speedup
Carnegie
Mellon
FIFO Drop Rate with different Speedups
0.7
0.6
0.5
Drop Rate
0.4
Speedup 1
0.3
Speedup 1.25
Speedup N (4)
0.2
0.1
0
0
0.2
0.4
0.6
-0.1
Load
Broadband Networks, 2000
0.8
1
1.2
FIFO with different Speedup
Carnegie
Mellon
FIFO Delay with different Speedups
900000
800000
700000
Delay (ns)
600000
500000
Speedup 1
400000
Speedup 1.25
Speedup N (4)
300000
200000
100000
0
-100000
0
0.2
0.4
0.6
Load
Broadband Networks, 2000
0.8
1
1.2
Scheduling Algorithm Throughput
Carnegie
Mellon
Throughput under Different Scheduling Algorithms
Throughput Ratio (In/Out)
1.2
1
0.8
FIFO
0.6
CIOQ
iSLIP
0.4
0.2
0
0
0.2
0.4
0.6
Load
Broadband Networks, 2000
0.8
1
Scheduling Algorithm Delay
Carnegie
Mellon
Delay under Different Scheduling Algorithm
800000
?
700000
600000
Delay (ns)
500000
FIFO
400000
CIOQ
300000
iSLIP
200000
100000
0
0
0.2
0.4
0.6
-100000
Load
Broadband Networks, 2000
0.8
1
Buffer sizes with iSlip
Carnegie
Mellon
Delay with different buffer size
1400000
buffer1000
1000000
buffer5000
Delay(ns)
1200000
800000
600000
400000
200000
0
-200000 0
Broadband Networks, 2000
0.2
0.4
0.6
Load
0.8
1
1.2
Scheduling Algorithm
Carnegie
Mellon
overall delay of cioq, nFIFO, iSlip
3000000
islip-overall
2500000
fifo-overall
Delay
2000000
cioq-overall
iSlip performs
worse than
CIOQ
1500000
1000000
500000
0
-500000 0
0.2
0.4
0.6
load
Broadband Networks, 2000
0.8
1
To make ISIS a reality …
Carnegie
Mellon

Physical Specifications of base design
‡ Chassis Height: 10 in. Includes power module shelf (AC
or DC)
‡ Chassis Width: 17.25 in. not including rack mount
flanges. can be rack mounted in 19 or 22 in.
‡ Chassis Depth: 18 in. not including cable management
system
‡ Chassis Weight: approximately 50 lbs. depending on
configuration

Standards compliance
‡ Safety: UL1950,IEC60950,IEC60825,TS001,AS/NZS 3260
‡ Electromagnetic Emissions: FCC Class A, ICES-003 Class
A, EN55022 Class B, VCCI Class B, AS/NZS 3548 Class B
‡ NEBS: SR-3580 Level 3 Compliant
Broadband Networks, 2000
Chassis Configuration
Carnegie
Mellon




Modular Design separates
Line card module from
Switch fabric module.
Redundant Power supplies
and fabrics can be used to
ensure robustness.
Compact design allows
flexibility in rack
placement.
The modules are connected
through proprietary fiber
interconnect using LCS
protocol*.
*http://www.pmcsierra.com/products/details/pm9311/
Broadband Networks, 2000
Carnegie
Mellon
That’s cute … but show me something big!




The PMC9113 has 10 Gb/s channels that can be used to
connect to other switching modules or line cards
through fiber and LCS protocol.
So instead of using one 320 Gb/s switch, we can use
more than one.
We can extend the capacity of the switch by
connecting many switch modules in a structure.
The easiest structure we can think of is a mesh.
‡ Other possibilities include a hyper-cube.
Broadband Networks, 2000
ISIS-A: ISIS with an attitude
Carnegie
Mellon
Routing Module
16 10Gb/s
Channels to other
switching modules
Switching module with
320 Gb/s capacity
Line card Module
(160 Gb/s)
10 Gp/s channel
16 such switches
connected in a mesh
through 10 Gb/s
channels of fiber.
Broadband Networks, 2000
Full mesh of 16 equals 1024 OC-48 ports (2.5 Tb/s)
ISIS-A: Routing
Carnegie
Mellon

Based on source and destination address, line cards
route to
‡ ports on the card itself (through the line card)
‡ Other line cards on same module (through the fabric)
‡ Other switching modules (through the interconnect)

Routing between switching modules based on
modified hot-potato routing with queue lengths
‡ Queues only monitor bandwidth utilization of channels
between modules.
– Built into the chip-set.
‡ Send to the recipient directly if queue is not loaded.
‡ Send to the least loaded queue recipient, otherwise.
Broadband Networks, 2000
ISIS-A: Routing and QoS.
Carnegie
Mellon

Routing is more like load balancing
‡ Only more intelligent
‡ Controls delays

For QoS,
‡ Some bandwidth is reserved for guaranteed traffic (max
10 Gb/s between two modules)
‡ If such traffic arrives, send directly to the recipient (one
link) using this reserved bandwidth.
‡ If no such traffic, then use the bandwidth for normal
traffic.

Requires modification to the scheduler (which is
programmable)
‡ Addressing has to be universal.
‡ Routing between modules has to be added.
Broadband Networks, 2000
Future Work
Carnegie
Mellon

Short-term
‡ Need to run more experiments with different parameters.
‡ Plan to follow-up on the scalability options.
‡ Get statistical significance for the results.

Long-term
‡ Implement other scheduling algorithms.
‡ Implement other fabric types.
‡ Explore other possibilities for QoS with IQ switches.
Broadband Networks, 2000
Download