1.
2.
3.
4.
5.
6.
7.
Last Lecture: Network Layer
Design goals and issues
Basic Routing Algorithms & Protocols
Addressing, Fragmentation and reassembly o o o o
Internet Routing Protocols and Inter-networking
Intra- and Inter-domain Routing Protocols
Introduction to BGP
Why is routing so hard to get right? ✔
Credits: slides from Jennifer Rexford, Nick Feamster, Hari Balakrishnan,
Timothy Griffin ICNP’02 Tutorial, Xin Hu & Z. Morley Mao
Router design
Congestion Control, Quality of Service
More on the Internet’s Network Layer
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 1
1.
2.
3.
4.
5.
6.
7.
This Lecture: Network Layer
Design goals and issues
Basic Routing Algorithms & Protocols
Addressing, Fragmentation and reassembly
Internet Routing Protocols and Inter-networking
1.
2.
3.
Router design
Short history ✔
Router architectures
Address lookup problem
Congestion Control, Quality of Service
More on the Internet’s Network Layer
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 2
1.
What is an Internet router?
What limits performance: Memory access time
The early days: Modified computers
Programmable against uncertainty
The middle years: Specialized for performance
Needed new architectures, theory, and practice
So how did we do?
Slides from Nick McKeown @ Stanford
3
Ada Lovelace
Ada Lovelace
4
Data
Head
Slides from Nick McKeown @ Stanford
5
…
…
…
N
1
2
R
3
5
6
4
8
7
N = number of linecards. Typically 8-32 per chassis
R = line-rate. 1Gb/s, 2.5Gb/s, 10Gb/s, 40Gb/s, 100Gb/s
Capacity of router = N x R
Slides from Nick McKeown @ Stanford
6
6ft
Cisco CRS-1
19”
Juniper M320
17”
Capacity: 1.2Tb/s
Power: 10.4kW
Weight: 0.5 Ton
Cost: $500k
3ft
Capacity: 320 Gb/s
Power: 3.1kW
2ft 2ft
7
10-Port GigE
(for Cisco 12000 Series)
1-Port OC48 (2.5 Gb/s)
(for Juniper M40)
Power: about 150 Watts
2in
10in
21in
4-Port 10 GigE
(for Cisco CRS-1)
8
Cisco’s ASR 9000
Linecard: Cisco 100 GE, 100Gbps
Max Capacity
25 Tbps
Juniper T1600
17.43 x 37.45 x 31 in
606 lbs
Juniper TX Matrix +
21.4 x 52 x 36.2 in
900 lbs interconnects up to 16 T1600 chassis into a single routing entity
Data
01000111100010101001110100011001
Header
1. Internet Address
2. Age
3. Checksum to protect header
Slides from Nick McKeown @ Stanford
11
Lookup internet address
Check and update age
Check and update checksum
Slides from Nick McKeown @ Stanford
12
Router Control and Management
1
Memory, memory, …
2
DRAM then DRAM now d d
Address Data Address Data
•
•
DRAMs designed to maximize number of bytes
Access time (“speed”) has stayed pretty much constant
In 11 years: from 50ns down to 20ns, much slower than Moore’s law
Slides from Nick McKeown @ Stanford
17
What is an Internet router?
What limits performance: Memory access time
The early days: Modified computers
Programmable against uncertainty
The middle years: Specialized for performance
Needed new architectures, theory, and practice
So how did we do?
The present: Internet showing its age
Simple model breaking down
R
R
R
R
st
Must run at rate N x R
R
R
R
R
Bottlenecks
st
Off-chip Buffer
Shared Bus
CPU
Route
Table
Buffer
Memory
Line
Interface
MAC
Line
Interface
MAC
Line
Interface
MAC
Typically <0.5Gb/s aggregate capacity
20
Innovation #1: Linecards have routing tables
Prevents central table from becoming a bottleneck at high speeds
Complication: Must update forwarding tables on the fly.
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 21
nd
R
R
R
R
nd
CPU
Route
Table
Buffer
Memory
Line
Card
Buffer
Memory
Fwding
Cache
MAC
Line
Card
Buffer
Memory
Fwding
Cache
MAC
Line
Card
Buffer
Memory
Fwding
Cache
MAC
Typically <5Gb/s aggregate capacity
• Function more important than speed
• 1993 (WWW) changed everything
• We badly needed
– Some new architecture
– Some theory
– Some practice
Innovation #2: Switched Backplane
Using a switching fabric :
input ports can simultaneously connect to output ports in one time slot (in a 1-to-1 manner)
Advantage: Exploits parallelism
Disadvantage: Need scheduling algorithm
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo 25
Switching Fabrics Allow for Parallel Transfer
rd
N x R
OQ Switch
Data Hdr
Data Hdr
Header Processing
Lookup
IP Address
Update
Header
Address
Table
Header Processing
Lookup
IP Address
Update
Header
Address
Table
1
2
1
2
N times line rate
Data Hdr Header Processing
Lookup
IP Address
Update
Header
Address
Table
N N
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo
Queue
Packet
Buffer
Memory
Queue
Packet
Buffer
Memory
N times line rate
Queue
Packet
Buffer
Memory
28
Simple Model to View an OQ-Switch
Link 2
Link 1
R1
Link 3
Link 4
Link 1, ingress
Link rate, R
Link 2, ingress
R
Link 3, ingress
R
Link 4, ingress
R
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo
Link 1, egress
Link rate, R
Link 2, egress
R
Link 3, egress
R
Link 4, egress
R
29
Characteristics of an OQ-Switch: Nice!
Arriving packets are immediately written into the output queue, without intermediate buffering.
The flow of packets to one output does not affect the flow to another output.
An OQ switch is work conserving: an output line is always busy when there is a packet in the switch for it .
OQ switches have the highest throughput , and lowest average delay .
The rate of individual flows, and the delay of packets can be controlled (with Weighted Fair Queueing + leaky bucket )
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 30
Example OQ: Shared Memory Switch (SMS)
Link 1, ingress
A single, physical memory device
Link 1, egress
Link 2, ingress
R
Link 3, ingress
R
Link N , ingress
R
Link 2, egress
R
Link 3, egress
R
Link N , egress
R
31
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo
Required Memory Bandwidth
Basic OQ switch :
Consider an OQ switch with N different physical memories, and all links operating at rate R bits/s.
In the worst case, packets may arrive continuously from all inputs, destined to just one output.
Worst-case memory bandwidth requirement for each memory is (N+1)R bits/s .
Shared Memory Switch:
Maximum memory bandwidth requirement for the memory is 2NR bits/s .
Also, single point of failure!
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 32
1
2
How Fast Can A “Practical” SMS Be?
5ns SRAM
Shared
Memory
5ns per memory operation
Two memory operations per packet
Therefore, up to 160Gb/s
In practice, closer to 80Gb/s
Note: SRAM’s Very Expensive!
N
200 byte bus
Commercial routers with SM architecture:
Juniper’s E-series/ERX edge routers
M-series/M20, M40, and M160 core routers
33
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo
OQ Example: Shared Medium Switch
Pro: good delay & throughput, broadcast/multicast possible
Con: high bus speed (NR), high memory bandwidth ((N+1)R), highspeed address filter (NR)
Commercial routers: Cisco 7500 series
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 34
Summary of OQ Switches
OQ switches are ideal
Work-conserving
Maximize throughput
Minimize expected delay
Permit delay guarantees for constrained traffic input interface output interface
Backplane
OQ switches don’t scale well
R
O
Requires
N memory writes per time slot (output speedup of N)
Memory bandwidth is a bottleneck
Parallelism is not straightforward
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo
C
35
1 x R
Arbiter
IQ Switch
Only input interfaces store packets
Advantages
Easier to built (store packets at inputs if contention at outputs)
Relatively easy to design algorithms
Disadvantages
In general, hard to achieve high utilization
We can show that :
Requires input/output speedup of 2 to achieve 100% throughput
With higher speedup, rate guarantee is possible too! input interface output interface
Backplane
R
O
C
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 38
Main IQ Problem: Head of Line Blocking
39
HoL Blocking Leads to Low Throughput
Karol-Hluchyj-Morgan (IEEE Trans. Comm. 87): 22 %
The best that any queueing system can achieve.
0% 20% 40% 60% 80%
Load
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2010; Instructor: Hung Q. Ngo
100%
40
41
Solution to HoL: Virtual Output Queues
A Router with Virtual Output Queues
VOQ Can reach the best that any queueing system can achieve.
Caveat: Has to run a sophisticated scheduling algorithm to compute a
“maximum weight matching”!
42
SUNY at Buffalo;
CSE 489/589 –
Modern
Networking
Concepts; Fall
Hung Q. Ngo
40%
Load
60% 80% 100%
Jim Dai & Balaji Prabhakar (2000) Showed an IQ switch using a maximum weight matching algorithm can achieve a throughput of up to 100%
arbitrarily distributed input traffic as long as
(i) It obeys the strong law of large numbers, and
(ii) it does not oversubscribe any input or output.
More Precise Statement of the Result number of packets that have arrived at input i destined to output j up to time n
Strong law of large numbers assumption :
this is called the arrival rate at number of departures from up to time n
Non-overloading assumption :
Want a scheduling algorithm to be efficient :
Maximum Weight Matching (MWM)
A
1
(n)
A
11
(n)
A
1N
(n)
L
11
(n)
1
S* ( n )
1
D
1
(n)
A
N
(n)
A
N1
(n)
A
NN
(n)
L
NN
(n)
N N
D
N
(n)
*
S n
arg max(
T
L n S n
L
11
(n)
L
N1
(n)
“Request” Graph
Maximum
Weight Match
Bipartite Match
Problem with Running a MWM Algorithm
The best known algorithm is still too slow!!!
n: # of inputs/outputs, m = # of edges in the bipartite graph
(Because we have to compute a MWM every few nano-seconds!)
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 46
From Maximum to Maximal to Randomized!
In Practice : maximal matching, not maximum
Maximal Matches
Wavefront Arbiter (WFA)
Parallel Iterative Matching (PIM) iSLIP
Justification: Dai & Prabakhar [infocom 2000]
Give the fabric a speedup of 2 (thus CIOQ) and even a maxim al matching yields 100% throughput too!
Several other works: use a randomized algorithms
M. Mitzenmacher, B. Prabhakar, and D. Shah (FOCS 02)
P. Giaccone, B. Prabhakar, and D. Shah (INFOCOM 02)
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 47
Evolution of IQ-Switching Till Early 2000
Theory:
Input
Queueing
(IQ)
58% [Karol, 1987]
IQ + VOQ,
Maximum weight matching
100% [M et al., 1995]
Practice:
Input
Queueing
(IQ)
IQ + VOQ,
Sub-maximal size matching e.g. PIM, iSLIP.
Different weight functions, incomplete information, pipelining.
100% [Various]
Randomized algorithms
100% [Tassiulas, 1998]
IQ + VOQ,
Maximal size matching,
Speedup of two.
100% [Dai & Prabhakar, 2000]
Various heuristics, distributed algorithms, and amounts of speedup
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 48
Third Generation Routers
“Crossbar”: Switched Backplane
Line
Card
Local
Buffer
Memory
Fwding
Table
MAC
CPU
Card
Routing
Table
Line
Card
Local
Buffer
Memory
Fwding
Table
MAC
Typically <50Gb/s aggregate capacity
Arbiter Arbiter Arbiter Arbiter
Arbiter Arbiter Arbiter Arbiter
Arbiter
Mimicking OQ Switch for 100% Throughput
N x R
Mimicking OQ Switch for 100% Throughput
1 x R
Are they equivalent?
NR
No.
R
Combined Input-Output Queue (CIOQ)
1 x R ? x R
Algorithm
Now are they equivalent?
NR
R 2R
Algorithm
Yes, if it runs 2 times faster .
CIOQ Switches
Both input and output interfaces store packets
Advantages
Easy to built
Utilization 1 can be achieved with limited input/output speedup (≤ 2) input interface output interface
Backplane
Disadvantages
Harder to design algorithms
Two congestion points
Need to design flow control
R
O
C
An input/output speedup of 2, a CIOQ can emulate any work-conserving OQ [G+98,SZ98], need to run a stable marriage matching algorithm
Or, a maximal matching algorithm
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 56
Stable Marriage Problem
Consider N women and N men
Each woman/man ranks each man/woman in the order of their preferences
Stable matching, a matching with no blocking pairs
Blocking pair; let p(i) denote the pair of i
There are matched pairs (k, p(k)) and (j, p(j)) such that k prefers p(j) to p(k), and p(j) prefers k to j
Gale Shapely Algorithm (GSA)
As long as there is a free man m
m proposes to highest ranked women w in his list he hasn’t proposed yet
If w is free, m an w are engaged
If w is engaged to m’ and w prefers m to m’, w releases m’
Otherwise m remains free
A stable matching exists for every set of preference lists
Complexity: worst-case O(N 2 )
men pref. list
1 2 4 3 1
2 1 4 3 2
3 4 3 2 1
4 1 2 4 3
Example women pref. list
1 1 4 3 2
2 3 1 4 2
3 1 2 3 4
4 2 1 4 3
If men propose to women, the stable matching is
(1,2), (2,4), (3,3),(2,4)
What is the stable matching if women propose to men?
OQ Emulation with a Speedup of 2
Input preference list : list of cells at that input ordered in the inverse order of their arrival
Output preference list : list of all input cells to be forwarded to that output ordered by the times they would be served in an OQ schedule
Use GSA to match inputs to outputs
Outputs initiate the matching
Can emulate all work-conserving schedulers
Example c.2
b.2
b.1
a.1
c.1
a.2
b.3
c.3
1
2
3
(a) c.2
b.2
b.1
a.1
1 c.1
a.2
2 b.3
c.3
3 a.1
c.1
b.3
a b c
(c) a b c c.2
b.2
b.1
a.1
1 c.1
a.2
2 b.3
c.3
3 a.1
c.1
(b) a b c c.2
b.2
b.1
1 a.2
2 c.3
3 a.1
b.3
c.1
a b c
(d)
th
Multirack; optics inside
Optical links
Linecards
100s of metres
Switch
th
Alcatel 7670 RSP
Juniper TX8/T640
TX8
Avici TSR Cisco CRS-1
4
2
0
8
6
16
14
12
10
1990 1993 1996 1999 2002 2003 2004
Slides from Nick McKeown @ Stanford
64
A Typical High Speed Router Today
CIOQ Architecture
Input/output speedup ≤ 2
Input interface
Perform packet forwarding (and classification)
Output interface
Perform packet (classification and) scheduling
Backplane
Switching fabric; speedup N
Schedule packet transfer from input to output
SUNY at Buffalo; CSE 489/589 – Modern Networking Concepts; Fall 2009; Instructor: Hung Q. Ngo 65
th
Load-balancing over passive optics
© Nick McKeown 2006
th
Load-balancing over passive optics
Electronic processing at R a
Very scalable. Petabits?
© Nick McKeown 2006