High-Performance Routing Overview

advertisement
High-Performance Routing
Tarik Cicic
University of Oslo
December 2001
Overview
•
•
•
•
•
•
•
What is routing and what is switching
Router and switch architecture in short
Address structure of the Internet
Routing and forwarding procedure
Performance issues
Routing lookups and packet classification
Switch interconnect
2
Router
•
•
•
•
“L3 device”
does L3 (IP) packet forwarding
supports L3 routing protocols (IP)
possibility to interconnect different L2
technologies (IP/ATM with IP/SDH)
3
1
Switch
•
•
•
•
“L2 device”
L2 packet (or cell) forwarding
forwarding decision based e.g. on flow ID
all links (interfaces / ports) same L2
technology
• distinction between routers and switches is
fluid, some mean it dissapears
4
Basic architectural components
Congestion
Control
Admission
Control
Routing
Reservation
Control
Control
Datapath:
Output
Scheduling
Switching
Policing
per-packet
processing
(as we shall see, not all routers5 are switching routers)
Per-packet processing
Output
Scheduling
Forwarding
Table
Interconnect
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
6
2
ATM forwarding
• Forwarding procedure:
–
–
–
–
Lookup cell VCI/VPI in VC table
replace old VCI/VPI with new
forward cell to outgoing interface
transmit cell onto link
• VPI/VCI tables are built using a separate
routing protocol (PNNI)
7
Ethernet switch forwarding
• Lookup frame destination address in
forwarding table
– if known, forward to correct port
– if unknown, broadcast to all ports
• learn source address of incoming frame
• forward frame to outgoing interface
• transmit frame onto link
8
IP router forwarding
• Lookup packet destination address in
forwarding table
– if known, forward to correct port
– if unknown, drop packet
• decrement TTL, update header checksum
• forward packet to outgoing interface
• transmit packet onto link
9
3
Additional delay in routing
Routing Kernel and Switch Controller
An extra queue
must be passed.
ATM switch
10
Comparison
Switch
Router
Simple table Hierarchical
lookup
addresses,
CIDR
Header
Label
CRC, TTL
modification swapping or
none
Queues
n
n+1
Lookup
Not so big difference,
after all?
11
First-generation IP routers
Shared Backplane
CP
U
M
em
or
Buffer
Memory
CPU
Li
In n e
ter
fa
ce
y
DMA
DMA
DMA
Line
Interface
Line
Interface
Line
Interface
MAC
MAC
MAC
12
4
Second-generation IP routers
Buffer
Memory
CPU
DMA
DMA
DMA
Line
Card
Local
Buffer
Memory
Line
Card
Local
Buffer
Memory
Line
Card
Local
Buffer
Memory
MAC
MAC
MAC
13
Third-generation switches/routers
Switched Backplane
L
L ILnin ine
L ILniInnitneetere
LiILniInneitneeteer f r fa face
L
I
CPI Initnnetnereter f r fac acece
nUt er fa ac e
er fa ce e
fa ce
M
ce
em
or
y
Line
Card
CPU
Card
Line
Card
Local
Buffer
Memory
Local
Buffer
Memory
MAC
MAC
14
• The third generation routers / switches
further obscures the difference
• recall that ATM is considered to be a
complex technology inducing too much
overhead
• a hybrid routing-switching technology is
introduced
15
5
Addressing basics
• Each node has a unique address
• flat addressing:
– twenty nodes need twenty entries in the routing table
– two million nodes need two million entries
• hierarchical addressing:
– addresses composed of
• network address
• node address
16
IP addresses
Class-based:
232-1
0
A
Net mask
B
127 networks
~32000 networks
Net mask
C
~16 million networks
Net mask
17
IP routers: Class-based addresses
IP Address Space
Class A
212.17.9.4
Class B
Class A
Class B
Class C
Class C
D
Routing Table:
Exact
Match
212.17.9.0
Port 4
18
6
Forwarding decision (class-based)
Classless
coding:
128.9.0.0
65/24
0
142.12/19
128.9/16
216
232-1
128.9.16.14
• Only the shown networks are known to the router
• other packets are sent to a default interface
19
Problems with Class-Based Addressing
• Fixed net id – host id boundaries too
inflexible: rapid depletion of address space
• Exponential growth of routing table size
20
Classless Inter-Domain Routing
128.9.19/24
128.9.25/24
128.9.16/20 128.9.176/20
128.9/16
232-1
0
128.9.16.14
Most specific route = “longest matching prefix”
21
7
CIDR routing table
128.9.16.14
Prefix
Port
65/24
128.9/16
128.9.16/20
128.9.19/24
128.9.25/24
128.9.176/20
142.12/19
3
5
2
7
10
1
3
CIDR saved IPv4 from running out of addresses
22
CIDR: Hierarchical Route Aggregation
Backbone
192.2.0/22, R2
R1
R3
R4
R2
ISP P
192.2.0/22
Site T
192.2.1/24
ISP Q
200.11.0/22
Site S
192.2.2/24
23
192.2.1/24 192.2.2/24
192.2.0/22
IP number line
Problems with Route Aggregation
• Change of provider
• Multi-homed networks
24
8
Multi-Homed Networks
Backbone
192.2.2/24, R3
192.2.0/22, R2
R1
R3
R4
R2
192.2.1/24
192.2.2/24
ISP Q
200.11.0/22
25
Change of Provider
Backbone
192.2.2/24, R3
192.2.0/22, R2
R1
R3
R4
R2
ISP P
192.2.0/22
Site T
192.2.1/24
ISP Q
200.11.0/22
Site S
192.2.2/24
26
Active BGP Entries
http://www.telstra.net/ops/bgp/index.html
27
9
Global Internet routing
• The Internet is divided in routing domains
– it is too large to route only using OSPF
– policing must be imposed
• Border Gateway Protocol (BGP) is used to
interconnect the domains
Domain3
Domain1
Domain2
28
Border Gateway Protocol
• Dominant inter-domain routing protocol
• Domains (in this context) = administrative
units in the Internet (Autonomous Systems)
• BGP “speakers” in a domain announce the
routes this domain uses to reach networks in
the Internet (and much more, e.g.
willingness to be used as transit)
29
Name Service (DNS)
• Way to map the hardly intelligible IP addresses to
human-understandable names
• largely orthogonal to addressing
• name hierarchy, comparable to e.g. a file system
edu
ucla
www cs
com
mit apple
org
cisco
uk
no
uio
vg
30
10
DNS
•
•
•
•
•
•
End hosts are the leaves in this tree
there is no root – it is distributed!
the hierarchy is divided into zones
each zone runs a name server
each server can resolve all names in its zone
it also “talks” to other name servers
– caches the name information
– learns by interrogating other servers
31
DNS
Client
www.ifi.uio.no
www.ifi.uio.no
129.240.64.2
o
o.n
.ui
. ifi
o 0
n
w
.
ww
ui o 0.1.4
4
9.2
12
Local
name
server
www.ifi.uio.no
ifi.uio.no
129.240.64.16
ww
w
ww . ifi.u
io.
w
n
12 . ifi.
9.2 uio o
40
. 64 . no
.2
Root
name
server
UIO
name
server
IFI
name
server
32
Challenges of Modern IP Routing
• High performance core IP router coming to
the market today should
– have capacity for keeping 200000+ routes
– support 10 - 40 Gb/s lines
+ support all new, advanced functionality
(policing, service differentiation +++)
33
11
Lookup Time
Year
Line
Line
Rate
Capacity (Mpps)
40 B
84 B
354 B
0.48
0.23
0.054
1997-78
OC3
(Gb/s)
0.155
1998-99
OC12
0.622
1.94
0.92
0.22
1999-00
OC48
2.5
7.81
3.72
0.88
2000-01
OC192
10
31.25
15
3.53
2002-03
OC768
40
125
60
14.12
34
Lookup Time (cont.)
• Basic functionality requires only address
lookup
• Advanced functions require
– flow classification (e.g. IntServ), or
– other header analyze (e.g. TOS or policing)
• We first discuss the address lookups
35
Three Stages of Packet Processing
Forwarding
Table
1
2
Interconnect
Output
Scheduling
3
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
36
12
Forwarding Decision
Destination
address?
Routing
table
Forwarding
decision
Input buffer
Output buffer
Linecard
37
IP Router Lookup
H
E
A
D
E
R
Dstn
Addr
Forwarding Engine
Next Hop
Next Hop Computation
Forwarding Table
Destination Next Hop
-------------
Incoming
Packet
----
----
38
Lookup Algorithms
• Clever data structure needed
• Optimize:
– lookup time
– memory requirements
– incremental update time
39
13
Trivial Schemes
• Address caching: remember last addresses
in hope that more packets addressed to the
same destination will appear soon
• List of (address, next hop) pairs:
– O(N) entries
– O(N) lookup time
– O(1) update time
40
Example Forwarding Table
5-bit Prefixes
Prefix
Next Hop
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
41
Radix Trie
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
• O(W) lookup
1
• O(NW) storage
1
0
• O(W) update
P2
1
1
P1
0
Lookup: 10111
P3
1
P4
42
14
PATRICIA Trie
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
• O(W2) lookup
2
0
1
3
• O(N) storage
P1
0
• O(W) update
1
P2
5
0
1
P3
P4
Lookup: 10111
Backtracking!
43
Multi-bit Tries
W
W/k
44
Tertiary Trie (k=2)
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
• O(W/2) lookup
10
11
10
10
P3
10
Lookup: 10111
• O(N*4) storage
P2
P41
P11
11
P12
11
P42
45
15
Hardware Lookups
• Content-Addressable Memory (CAM)
• Information is not located on a fixed physical
location in RAM chip, but parameterized
• If the destination address is used as the parameter,
O(1) lookup time is achieved!
• Ternary CAM is commercially available
• Power consumption proportional to the routing
table size, 6-8 W
• 0.5 MB at 66 MHz costs ~100 $
46
Lookup Comparison
Algorithm
Lookup
Storage
Binary Trie
W
NW
W
Patricia
W2
N
W
Multiary Trie
W/k
N* 2k
-
N log W
-
N
W
Binary search log W
on trie levels
T-CAM
1
Update
47
Providing Value-Added Services
• Differentiated services
– Regard traffic from AS#33 as `platinum-grade’
• Access Control Lists
– Deny UDP host 194.72.72.33
• Committed Access Rate
– Rate limit WWW traffic from sub-interface#739 to 10Mbps
• Policy-based Routing
– Route all voice traffic through the ATM network
• Peering Arrangements
– Restrict the total amount of traffic of precedence 7 from MAC
address N to 20 Mbps between 10 am and 5pm
• Accounting and Billing
– Generate hourly reports of traffic from MAC address M
48
16
Flow Classification
H
E
A
D
E
R
Forwarding Engine
Flow Index
Flow Classification
Policy Database
Predicate
Action
-------------
Incoming
Packet
----
----
49
A Packet Classifier
Field 1
Field 2
… Field k Action
Rule 1
152.163.190.69/21 152.163.80.11/32
…
Udp
A1
Rule 2
152.168.3.0/24
152.163.200.157/16
…
Tcp
A2
…
…
…
… …
…
Rule N
152.168.3.0/16
152.163.80.11/32
…
Any
An
Given a classifier, find the action associated with the highest priority
rule (here, the lowest numbered rule) matching an incoming packet.
50
Field #1 Field #2
Data
Geometric Interpretation in 2D
Field #2
R7
R3
R6
e.g. (144.24/16, 64/24)
e.g. (128.16.46.23, *) R1
R5
R4
R2
Field51 #1
17
Proposed Schemes
Pros
Sequential
Evaluation
Cons
Small storage, scales well with
number of fields
Slow classification rates
Ternary CAMs Single cycle classification
Grid of Tries
Cost, density, power
consumption
Small storage requirements and Not easily extendible to
fast lookup rates for two fields. more than two fields.
Suitable for big classifiers
52
Three Stages of Packet Processing
Forwarding
Table
1
2
Interconnect
Output
Scheduling
3
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
53
Interconnects: Two basic techniques
Input Queuing
Usually a non-blocking
switch fabric (e.g. crossbar)
Output Queuing
Usually a fast bus
54
18
Interconnects: Output Queuing
Individual Output Queues
Centralized Shared Memory
Memory b/w = 2RN
1
2
N
1
2
Memory b/w = R*(N+1)
N
55
“Ideal” Output Queuing
1
2
1
1
2
2
11
1
2
1
2 1
2
1
1
56
How fast can we make centralized shared memory?
5ns SRAM
Shared
Memory
• 5ns per memory operation
• Two memory operations per packet
• Therefore, up to 160Gb/s
• In practice, closer to 80Gb/s
1
2
N
200 byte bus
57
19
Input Queuing with Crossbar
Memory b/w = 2R
Data In
Scheduler
Data Out
58
Head of Line Blocking
Delay
configuration
Load
58.6%
100%
59
Head of Line Blocking
60
20
61
62
Virtual output queues
63
21
Delay
Virtual Output Queues
Load
100%
64
Input Queueing
Memory b/w = 2R
Scheduler
Can be quite
complex!
65
Three Stages of Packet Processing
Forwarding
Table
1
2
Interconnect
Output
Scheduling
3
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
Forwarding
Table
Forwarding
Decision
66
22
Summary
• Extreme requirements for modern routing
equipment
• Scalability: 110 000 ++ routing entries
• Performance: 30 000 000 + lookups per
second
• Modern services demand a far bigger
processing power
67
23
Download