LOOKUPS AND CLASSIFICATION

advertisement
Address Lookup and Classification
EE384Y
May 23, 2006
High Performance
Switching and Routing
Telecom Center Workshop: Sept 4, 1997.
Pankaj Gupta
Principal Architect and Member of Technical Staff,
Netlogic Microsystems
pankaj@netlogicmicro.com
http://klamath.stanford.edu/~pankaj
1
Generic Router Architecture
(Review from EE384x)
Header Processing
Data
Hdr
Lookup
Update
IP Address Header
IP Address
~1M prefixes
Off-chip DRAM
Queue
Packet
Data
Hdr
Next Hop
Address
Table
Buffer
Memory
~1M packets
Off-chip DRAM
2
Lookups Must be Fast
Year
Aggregate
Line-rate
1997
1999
2001
2003
2006
622 Mb/s
2.5 Gb/s
10 Gb/s
40 Gb/s
80 Gb/s
Arriving rate of 40B
POS packets (Million
pkts/sec)
1.56
6.25
25
100
200
1. Lookup mechanism must be simple and easy to implement
2. Memory access time is the bottleneck
200Mpps × 2 lookups/pkt = 400 Mlookups/sec → 2.5ns per lookup3
Memory Technology (2006)
Technology Max
$/chip
single
($/MByte)
chip
density
Access
speed
Watts/
chip
Networking
DRAM
64 MB
$30-$50
($0.50-$0.75)
40-80ns
0.5-2W
SRAM
8 MB
$50-$60
($5-$8)
3-4ns
2-3W
TCAM
2 MB
$200-$250
($100-$125)
4-8ns
15-30W
Note: Price, speed and power are manufacturer and market dependent.
4
Lookup Mechanism is
Protocol Dependent
Networking Lookup
Techniques we will
Protocol
Mechanism study
MPLS, ATM,
Ethernet
Exact match
search
–Direct lookup
–Associative lookup
–Hashing
–Binary/Multi-way Search
Trie/Tree
IPv4, IPv6
Longest-prefix -Radix trie and variants
match search
-Compressed trie
-Binary search on prefix
intervals
5
Outline
I.
•
•
•
•
Routing Lookups
–
–
–
–
–
–
–
Overview
Exact matching
Direct lookup
Associative lookup
Hashing
Trees and tries
Longest prefix matching
Why LPM?
Tries and compressed tries
Binary search on prefix intervals
References
II. Packet Classification
6
Exact Matches in ATM/MPLS
Direct Memory Lookup
VCI/MPLS-label
Memory
(Outgoing Port, new VCI/label)
• VCI/Label space is 24 bits
- Maximum 16M addresses. With 64b data,
this is 1Gb of memory.
• VCI/Label space is private to one link
• Therefore, table size can be “negotiated”
• Alternately, use a level of indirection
7
Exact Matches in Ethernet
Switches
• Layer-2 addresses are usually 48-bits long,
• The address is global, not just local to the
link,
• The range/size of the address is not
“negotiable” (like it is with ATM/MPLS)
• 248 > 1012, therefore cannot hold all
addresses in table and use direct lookup.
8
Exact Matches in Ethernet
Switches (Associative Lookup)
• Associative memory (aka Content Addressable
Memory, CAM) compares all entries in parallel
against incoming data.
Associative
Memory
(“CAM”)
Network address
48bits
“Normal”
Memory
Location
Port
Match
9
Exact Matches
Hashing
Pointer
List/Bucket
Data
16, say
Memory
Address
48
Hashing
Function
Data
Network
Address
Address
Memory
List of network addresses in
this bucket
• Use a pseudo-random hash function (relatively insensitive to
actual function)
• Bucket linearly searched (or could be binary search, etc.)
• Leads to unpredictable number of memory references
10
Exact Matches Using Hashing
Number of memory references
Expected number of memory references :
1
(Expected length of list | list is not empty)
2




1


 1 
2  1  (1  1 ) M 


N 

ER 
Where:
ER = Expected number of memory references
M = Number of memory addresses in table
N = Number of linked lists
 = M N
11
Exact Matches in Ethernet
Switches
16, say
Data
48
Hashing
Function
Address
Network
Address
Perfect Hashing
Memory
Port
There always exists a perfect hash function.
Goal: With a perfect hash function, memory lookup always takes
O(1) memory references.
Problem:
- Finding perfect hash functions (particularly a minimal
perfect hash) is complex.
- Updates make such a hash function yet more complex
- Advanced techniques: multiple hash functions, bloom
12
filters…
Exact Matches in Ethernet
Switches
Hashing
• Advantages:
– Simple to implement
– Expected lookup time is small
– Updates are fast (except with perfect hash functions)
• Disadvantages
– Relatively inefficient use of memory
– Non-deterministic lookup time (in rare cases)
•  Attractive for software-based switches.
However, hardware platforms are moving to other
techniques (but they can do well with a more sophisticated
form of hashing)
13
Exact Matches in Ethernet
Switches
Trees and Tries
Binary Search Tree
<
>
>
<
>
log2N
<
Binary Search Trie
N entries
Lookup time dependent on table size,
but independent of address length,
storage is O(N)
0
0
1
010
1
0
1
111
Lookup time bounded and independent
of table size, storage is O(NW)
14
Exact Matches in Ethernet
Switches
Multiway tries
16-ary Search Trie
Ptr=0 means
no children
0000, 0
0000, ptr
1111, ptr
000011110000
1111, ptr
0000, 0
1111, ptr
111111111111
Q: Why can’t we just make it a 248-ary trie?
15
Exact Matches in Ethernet
Switches
Multiway tries
As degree increases, more and more pointers are “0”
Degree of
Tree
# Mem
References
2
4
8
16
64
256
48
24
16
12
8
6
# Nodes Total Memory Fraction
(Mbytes)
Wasted (%)
(x106)
1.09
0.53
0.35
0.25
0.17
0.12
4.3
4.3
5.6
8.3
21
64
49
73
86
93
98
99.5
Table produced from 215 randomly generated 48-bit addresses
16
Exact Matches in Ethernet
Switches
Trees and Tries
• Advantages:
– Fixed lookup time
– Simple to implement and update
• Disadvantages
– Inefficient use of memory and/or
requires large number of memory
references
More sophisticated algorithms compress ‘sparse’ nodes.
17
Outline
I.
•
•
•
•
Routing Lookups
–
–
–
–
–
–
–
Overview
Exact matching
Direct lookup
Associative lookup
Hashing
Trees and tries
Longest prefix matching
Why LPM?
Tries and compressed tries
Binary search on prefix intervals
References
II. Packet Classification
18
Longest Prefix Matching:
IPv4 Addresses
• 32-bit addresses
• Dotted quad notation: e.g. 12.33.32.1
• Can be represented as integers on
the IP number line [0, 232-1]: a.b.c.d
denotes the integer:
(a*224+b*216+c*28+d)
0.0.0.0
IP Number Line
255.255.255.255
19
Class-based Addressing
A
B
C
128.0.0.0
0.0.0.0
D
E
192.0.0.0
Class
Range
MS bits netid
hostid
A
0.0.0.0 – 128.0.0.0
0
bits 1-7
bits 8-31
B
128.0.0.0 191.255.255.255
10
bits 2-15
bits 16-31
C
192.0.0.0 223.255.255.255
110
bits 3-23
bits 24-31
1110
-
-
11110
-
-
D
(multicast)
E (reserved)
224.0.0.0 239.255.255.255
240.0.0.0 255.255.255.255
20
Lookups with Class-based
Addresses
netid
port#
23
Port 1
186.21
Port 2
Class A
192.33.32.1
Class B
Class C
Exact match
192.33.32 Port 3
21
Problems with Class-based
Addressing
• Fixed netid-hostid boundaries too
inflexible
– Caused rapid depletion of address space
• Exponential growth in size of routing
tables
22
Number of BGP routes advertised
Early Exponential Growth in
Routing Table Sizes
23
Classless Addressing (and
CIDR)
• Eliminated class boundaries
• Introduced the notion of a variable
length prefix between 0 and 32 bits
long
• Prefixes represented by P/l: e.g.,
122/8, 212.128/13, 34.43.32/22,
10.32.32.2/32 etc.
• An l-bit prefix represents an
aggregation of 232-l IP addresses
24
CIDR:Hierarchical Route
Aggregation
Backbone routing table
Router
192.2.0/22, R2
R1
R2
Backbone
R3
R2
ISP, P
192.2.0/22
Site, S
Site, T
192.2.1/24
192.2.2/24
R4
ISP, Q
200.11.0/22
192.2.1/24
192.2.2/24
192.2.0/22
200.11.0/22
IP Number Line
25
Number of active BGP prefixes
Post-CIDR Routing Table
sizes
Optional Exercise: What would this graph
look like without CIDR? (Pick any one random
AS, and plot the two curves side-by-side)
Source: http://bgp.potaroo.net
26
Routing Lookups with CIDR
Optional Exercise: Find the ‘nesting
distribution’ for routes in your randomly-picked AS
192.2.2/24
192.2.2/24, R3
192.2.0/22, R2
200.11.0/22, R4
192.2.0/22
192.2.0.1
192.2.2.100
200.11.0/22
200.11.0.33
LPM: Find the most specific route, or the longest
matching prefix among all the prefixes matching the
destination address of an incoming packet
27
Longest Prefix Match is
Harder than Exact Match
• The destination address of an
arriving packet does not carry with it
the information to determine the
length of the longest matching prefix
• Hence, one needs to search among
the space of all prefix lengths; as well
as the space of all prefixes of a given
length
28
LPM in IPv4
Use 32 exact match algorithms for
LPM!
Exact match
against prefixes
of length 1
Network Address
Exact match
against prefixes
of length 2
Priority
Encode
and pick
Port
Exact match
against prefixes
of length 32
29
Metrics for Lookup Algorithms
• Speed (= number of memory accesses)
• Storage requirements (= amount of
memory)
• Low update time (support >10K updates/s)
• Scalability
– With length of prefix: IPv4 unicast (32b),
Ethernet (48b), IPv4 multicast (64b), IPv6
unicast (128b)
– With size of routing table: (sweetspot for
today’s designs = 1 million)
• Flexibility in implementation
• Low preprocessing time
30
Radix Trie (Recap)
Trie node
A
1
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
Lookup 10111
C
P2
G
P3
next-hop-ptr (if prefix)
right-ptr
left-ptr
B
1
0
1
D
1
E
0
1
Add P5=1110*
0
P4
H
P5
P1
F
I
31
Radix Trie
• W-bit prefixes: O(W) lookup, O(NW)
storage and O(W) update complexity
Advantages
Disadvantages
Simplicity
Worst case lookup slow
Wastage of storage space in
chains
Extensible to wider fields
32
Leaf-pushed Binary Trie
Trie node
A
1
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
C
P2
G
B
1
0
1
0
left-ptr or
next-hop
right-ptr or
next-hop
D
P1
E
P2
P3 P4
33
PATRICIA
B
D
2
0
3
0
1
A
Patricia tree internal node
1
P1
E
C
bit-position
right-ptr
left-ptr
5
P2
F
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
0
P3
1
P4
G
Lookup 10111
Bitpos 12345
34
PATRICIA
• W-bit prefixes: O(W2) lookup, O(N)
storage and O(W) update complexity
Advantages
Disadvantages
Decreased storage
Worst case lookup slow
Backtracking makes
implementation complex
Extensible to wider fields
35
Path-compressed Tree
Lookup 10111
1, , 2
0
0
10,P2,4
1
P4
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
1
P1
C
D
1010,P3,5
P1
B
A
E
Path-compressed tree node structure
variable-length next-hop (if
prefix present)
bitstring
left-ptr
bit-position
right-ptr36
Path-compressed Tree
• W-bit prefixes: O(W) lookup, O(N)
storage and O(W) update complexity
Advantages
Disadvantages
Decreased storage
Worst case lookup slow
37
Multi-bit Tries
W
Binary trie
Depth = W
Degree = 2
Stride = 1 bit
Multi-ary trie
W/k
Depth = W/k
Degree = 2k
Stride = k bits
38
Prefix Expansion with Multi-bit
Tries
If stride = k bits, prefix lengths that
are not a multiple of k need to be
expanded
E.g., k = 2:
Prefix
Expanded prefixes
0*
00*, 01*
11*
11*
Maximum number of expanded prefixes
corresponding to one non-expanded prefix = 2k-1
39
Four-ary Trie (k=2)
A four-ary trie node
next-hop-ptr (if prefix)
ptr00 ptr01 ptr10 ptr11
A
10
11
B
P2
D
P3
10
G
P1
111*
H1
P2
10*
H2
P3
1010*
H3
P4
10101
H4
Lookup 10111
C
10
10
E
P11
11
11
P41
P42
F
P12
H
40
Prefix Expansion Increases
Storage Consumption
• Replication of next-hop ptr
• Greater number of unused (null)
pointers in a node
Time ~ W/k
Storage ~ NW/k * 2k-1
Optional Exercise: The increase in number of
null pointers in LPM is a worse problem than in
exact match. Why?
41
Generalization: Different
Strides at Each Trie Level
•
•
•
•
16-8-8 split
4-10-10-8 split
24-8 split
21-3-8 split
Optional Exercise: Why does this not work
well for IPv6?
42
Choice of Strides: Controlled
Prefix Expansion [Sri98]
Given a forwarding table and a desired
number of memory accesses in the worst
case (i.e., maximum tree depth, D)
A dynamic programming algorithm to compute
the optimal sequence of strides that minimizes
the storage requirements: runs in O(W2D) time
Advantages
Disadvantages
Optimal storage under
these constraints
Updates lead to suboptimality anyway
Hardware implementation
difficult
43
Binary Search on Prefix
Intervals [Lampson98]
P2
I1
0000
Prefix
Interval
P1
/0
0000…1111
P2
00/2
0000…0011
P3
1/1
1000…1111
P4
1101/4
1101…1101
P5
001/3
0010…0011
P5
0010
P3
P1
I2
I3
0100
P4
I4
0110
1000 10011010
I5
1100
I6
1110 1111
44
Alphabetic Tree
0111
>

0011

>
0001
1/2

>
I1
I1
0000
P5
0010
I3
0100
>
1/32
I5
P4
I4
0110
1/32
I6
P3
P1
I2
>
1100

1/16
I4
I2
P2

1/8
I3
1/4
1101
1000 10011010
I5
1100
I6
1110 451111
Another Alphabetic Tree
0001
I1
0011
1/2
0111
I2
1/4
I3
1/8
1100
I4
1/16
1/32
1101
I5
1/32
46
I6
Multiway Search on Intervals
•W-bit N prefixes: O(logN) lookup,
O(N) storage
Advantages
Disadvantages
Storage is linear
Can be ‘balanced’
Lookup time independent
of W
But, lookup time is dependent
on N
Incremental updates more
complex than tries
Each node is big in size:
requires higher memory
bandwidth
47
Routing Lookups: References
• [lulea98] A. Brodnik, S. Carlsson, M. Degermark, S. Pink.
“Small Forwarding Tables for Fast Routing Lookups”,
Sigcomm 1997, pp 3-14. [Example of techniques for decreasing
storage consumption]
• [gupta98] P. Gupta, S. Lin, N.McKeown. “Routing lookups in
hardware at memory access speeds”, Infocom 1998, pp 12411248, vol. 3. [Example of hardware-optimized trie with increased storage
consumption]
• P. Gupta, B. Prabhakar, S. Boyd. “Near-optimal routing
lookups with bounded worst case performance,” Proc.
Infocom, March 2000 [Example of deliberately skewing alphabetic
trees]
• P. Gupta, “Algorithms for routing lookups and packet
classification”, PhD Thesis, Ch 1 and 2, Dec 2000, available
at http://yuba.stanford.edu/ ~pankaj/phd.html [Background and
introduction to LPM]
48
Routing lookups : References
(contd)
• [lampson98] B. Lampson, V. Srinivasan, G. Varghese. “ IP
lookups using multiway and multicolumn search”, Infocom
1998, pp 1248-56, vol. 3. [Multi-way search]
• [PerfHash] Y. Lu, B. Prabhakar and F. Bonomi. “Perfect
hashing for network applications”, ISIT, July 2006
• [LC-trie] S. Nilsson, G. Karlsson. “Fast address lookup for
Internet routers”, IFIP Intl Conf on Broadband
Communications, Stuttgart, Germany, April 1-3, 1998.
• [sri98] V. Srinivasan, G.Varghese. “Fast IP lookups using
controlled prefix expansion”, Sigmetrics, June 1998.
• [wald98] M. Waldvogel, G. Varghese, J. Turner, B. Plattner.
“Scalable high speed IP routing lookups”, Sigcomm 1997, pp
25-36.
49
Download