Routing in Routers

advertisement
Routing in Routers
Hans Jørgen Furre Nygårdshaug
hansjny
Christian Fredrik Fossum Resell
chrifres
Hans Petter Taugbøl Kragset
hpkragse
Outline
●
Router architecture
○
●
Lookup algorithms
○
○
○
●
●
Network Processor
Address-cache based
Binary tries
■ Path compressed
■ Multibit
■ Level compressed
Hardware based
Packet classification
Switch fabric routing
Large scale router architecture
●
A router consists of several logical components
●
●
●
●
Packet processing
Address lookup
Traffic management
Physical layer
The Network Processor
●
●
NPs are specialised processors
High rate of change in the network world
○
●
Line rate is faster than processing rate (2002)
○
○
○
●
Flexible processors wanted
■ Performance vs adaptability tradeoff
To process all packets, parallelisation is required
Traffic can be split between processors based on flow (destination/source)
■ Traffic management etc only applicable per flow
Parallelisation introduces IPC issues
NPs are classified broadly as either configurable or programmable
Configurable NPs
●
●
●
●
Co-processors with specialised
abilities
○
○
Connected by configurable connections
Lookup, classification, forwarding
○
Pipeline packets through different steps in
parallel
○
Manager controls and schedules
transitions
Optimized with a narrow instruction
set
Designed for performance
Not adaptive
Programmable NPs
●
●
●
●
●
Multiple RISC or RISC-cluster task units
Controller provides instruction set to RISCs
RISC handles packets as instructed
○ May distribute tasks to co-processors
Flexible and adaptive, instruction sets can
change
○ This means RISCs can’t be optimised
More time consuming
○ may not meet line speed
Lookup-algorithms
●
●
●
Why is prefix-matching worse than exact lookup?
Cache based
Trie-based lookup schemes
Why is prefix matching so hard?
●
●
●
●
Incoming packet: dst:adr 10.138.5.8
The router does not know the length of the prefix.
Hence: searching in the space of all prefix lengths
AND the space of all prefixes of a given length
Cache based
●
●
●
●
Cache recently used addresses
Depends on locality of the packet stream
With more multiplexing into high-speed links, locality decreases
Not sufficient anymore
Trie-based schemes
Binary trie:
0
1
P1
1
0
P2
P3
1
1
P4
0
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
P4
0
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
Current next-hop pointer: none
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
Current next-hop pointer: none
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
Current next-hop pointer: P2
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
Current next-hop pointer: P2
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
Current next-hop pointer: P4
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
Current next-hop pointer: P4
P5
1
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
Right pointer
P2
P3
1
1
Address: 10111010
P4
0
Current next-hop pointer: P4
P5
1
?
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
P2
Right pointer
P3
1
1
Address: 10111010
Current next-hop pointer: P4
P4
Return la
st next-h
o
0
p pointe
r.
P5
1
?
Trie-based schemes
0
Binary trie:
P1
Node structure:
1
0
Next-hop-pointer
Left-pointer
1
P2
Right pointer
P3
1
1
Lookup time complexity
Storage requirement
P4
O(W)
0
O(NW)
P5
1
Path-compressed tries
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
All leaf nodes contains a prefix and a
next hop pointer
1
0
P2
P3
1
1
P4
0
P5
1
Path-compressed tries
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
All leaf nodes contains a prefix and a
next hop pointer
1
0
P2
P3
1
1
P4
0
P5
1
Path-compressed tries
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
All leaf nodes contains a prefix and a
next hop pointer
1
0
P2
1
P4
0
P5
P3
Path-compressed tries
,,1
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
1,,2
All leaf nodes contains a prefix and a
next hop pointer
1
0
P2
1
Node structure
Bit string
Left-pointer
Next hop
ptr
P4
Bit
position
Right pointer
0
P5
P3
Path-compressed tries
,,1
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
1,,2
All leaf nodes contains a prefix and a
next hop pointer
1
0
10, P2, 4
P2
1
P3
Node structure
Bit string
Left-pointer
Next hop
ptr
P4
Bit
position
Right pointer
0
P5
1011, P4, 5
Path-compressed tries
,,1
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
1,,2
All leaf nodes contains a prefix and a
next hop pointer
1
0
10, P2, 4
P2
1
P3
Node structure
Bit string
Next hop
ptr
Left-pointer
Address: 1001
P4
Bit
position
Right pointer
0
P5
1011, P4, 5
Path-compressed tries
,,1
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
1,,2
All leaf nodes contains a prefix and a
next hop pointer
1
0
10, P2, 4
P2
1
P3
Node structure
Bit string
Next hop
ptr
Left-pointer
Address: 1001
P4
Bit
position
Right pointer
0
P5
1011, P4, 5
Path-compressed tries
,,1
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
1,,2
All leaf nodes contains a prefix and a
next hop pointer
1
0
10, P2, 4
P2
P3
1
Node structure
Bit string
Next hop
ptr
Left-pointer
Address: 1001
P4
Bit
position
Right pointer
0
?
P5
1011, P4, 5
Path-compressed tries
,,1
●
Removes single descendant internal nodes
(nodes without next hop pointer)
0
1
P1
●
1,,2
All leaf nodes contains a prefix and a
next hop pointer
1
0
10, P2, 4
P2
P3
1
Node structure
Bit string
Next hop
ptr
Left-pointer
Address: 1001
P4
Bit
position
Right pointer
0
Prefix not matching. Return P2.
?
P5
1011, P4, 5
Multibit tries
●
●
Each node contains 2k pointers
If pointer address length is not a multiple of K, expand address
Multibit tries
●
●
Each node contains 2k pointers
If pointer address length is not a multiple of K, expand address
Example: 2-b tries
Ptr
1-B addr
Expanded
P1
0
00
P1
0
01
P2
10
10
P3
11
11
P4
1011
1011
P5
10110
101100
P5
10110
101101
Multibit tries
00
P1
01
P1
10
11
P2
P3
11
P4
00
P5
01
P5
Multibit tries
00
P1
01
P1
10
11
P2
P3
11
P4
Address: 10011
00
P5
01
P5
Multibit tries
00
P1
01
P1
10
11
P2
P3
11
P4
Address: 10011
00
One step instead of five
P5
01
P5
Multibit tries
00
P1
01
P1
10
11
P2
P3
11
P4
Lookup time complexity
O(W/K)
00
Storage requirement
O(2(K-1)NW)
P5
01
P5
Level-compressed tries
●
●
●
Combines path compression and multibit
Replaces the I’th complete level with a
single node with 2i descendants
Performed recursively on each subtrie
Updates? Rebuild structure.
ref [6]: Fast IP Routing with LC-Tries, Stefan Nilsson and Gunnar Karlsson, August 01, 1998
Hardware based schemes
DIR-24-8-BASIC
●
●
●
●
Fantastic name
Pipeline in hardware
Approx 100 M p/s with “current” SDRAM technology
Two tables used for lookup
○
○
●
24-8 split is reasonable
○
○
●
●
●
TBL24: up to and including 24 bits long
TBLlong: longer than 24 bits
size is no problem
over 99 percent are 24 bits long or less
Simple and efficient
Requires large amounts of memory
Bad scaling
TBL24 entry format
If longest route with this 24-bit prefix is < 24 bits long:
0
Next Hop
1 bit
15 bits
If longest route with this 24-bit prefix is > 24 bits long:
1
1 bit
Index into 2nd table
15 bits
TBL24
DIR-24-8-BASIC
destination
address
0
224
entries
24
23
31
8
TBLlong
Next hop
SRAM-based Pipeline Scheme
●
●
●
●
Needs only SRAMs
Avoids memory refresh overhead of DRAM
Segment and offset pairs
Segmentation table: 64 k entries
○
○
●
●
●
●
next-hop, or
pointer to second-level tables (next hop array)
offset length: 0 < k <= 16
each NHA has a k determined by longest prefix it has
stored in last 4 bits of entry
Difficult to scale to 128-bit IPv6 addresses
SRAM-based Pipeline Scheme
Ternary CAM
●
●
●
●
●
Specialized memory
(binary) CAMs can only perform exact matching
Stores (value, mask) pairs
Priority encoder is used to select a match
Forwarding table
○
●
●
decreasing order of prefix length
Incremental update problem
High price, large power consumption
TCAM
Line number
Address (binary)
Output port
1
101XX
A
2
0110X
B
3
011XX
C
4
10011
D
Packet classification
●
ISPs want to provide different QoS
○
●
Different customers pay for higher QoS
○
○
●
bitrate, ping, transmission limit, ...
Routers must distinguish customers
Routers implement rules
○
○
●
Must be implemented in routers
For a flow, all applicable rules form a classifier
For every packet the router must use the classifier to see if the packet is allowed through.
■ Needless to say, this isn’t trivial.
Classification algorithms
○
○
Speed, scalability, adaptability and flexibility are important
Not discussed further here!
Arbitration Algorithms for
Single-Stage Switches
iSLIP
●
●
●
Algorithm for switch fabric routing
Input buffered
Adaptive
○
○
●
Self balancing
Fair
Non-starving
ref [2]: The iSLIP Scheduling Algorithm for Input-Queued Switches, Nick McKeown 1999
1
1
1
4
4
2
3
3
2
1
1
4
2
Inputs
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Outputs
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Virtual Output Queues
VOQs
2
4
2
3
3
…
3
1
1
4
2
4
3
3
4
1
1
4
2
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Arbiters
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Step 1 - Request
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
2
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Step 2 - Grant
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
2
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Step 3 - Accept
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
2
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
Changes
2
3
3
●
1
4
2
●
In standard RRM all output
arbiters would advance
because they issued a grant.
This is iSLIP’s advantage.
4
1
4
2
3
2
3
1
1
4
2
3
3
4
2
4
2
3
Request
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Grant
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Request
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Grant
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Request
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Grant
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
1
4
2
3
3
2
1
1
4
2
3
3
●
●
●
Now the arbiters are perfectly
unsynchronised!
100% throughput
This goes on...
4
3
2
4
1
1
4
2
3
2
3
3
4
2
1
1
4
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Traffic pattern
●
●
Note that the previous examples was with uniform traffic.
In non-uniform traffic iSLIP still provides fair queueing.
○
●
●
Adapts fast and maintains RR fairness.
During bursty traffic the delay will be proportionate to the burst length and N
(switch ports).
Improvements and modifications exist:
○
○
○
iterations
priority
weight
ref [2]: The iSLIP Scheduling Algorithm for Input-Queued Switches, Nick McKeown 1999
Performance
●
Performs better than naive
algorithms
Request
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Grant
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Request
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Grant
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Request
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Grant
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
2
1
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
Accept
1
1
4
1
4
2
3
3
2
1
4
2
3
3
●
●
1 and 2 go in turns, RR.
Important to move arbiter past last
accept, in case of incoming new
packet.
1
4
3
2
4
1
1
4
2
3
2
3
3
4
2
1
1
4
2
4
2
3
iDRRM
●
●
Starts at inputs, not outputs like iSLIP
Very similar otherwise
1
1
1
4
4
2
3
3
2
1
1
4
2
Step 1 - Request
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
2
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Step 2 - Grant
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
2
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
4
2
3
3
2
1
1
4
2
Step 3 - Accept
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
1
1
1
4
2
4
3
3
2
1
1
4
2
4
1
1
4
2
4
1
1
4
2
3
2
3
3
4
2
3
3
3
2
4
2
3
iDRRM
●
●
Comparable throughput to iSLIP
Less time to arbit and easier implementation
○
less information exchange between input and output arbiters
■ saves the time for initial request
ref [1]: Next Generation Routers, H. Jonathan Chao, 2002
Comparison (worst case)
iSLIP
●
●
●
Request: N² messages
Grant: N
Accept: N
iDRRM
●
●
●
Request: N
Grant: N
Accept: N
EDRRM
●
●
Exhaustive DRRM
The arbiters DON’T change when
a request is accepted
○
●
●
This can lead to starvation, needs a
timer to override
Higher throughput in non-uniform
traffic
In uniform and random traffic the
advantage is lost
Conclusion
●
●
●
Routers are hard to make good in all aspects
Size of routing tables grows fast
Network traffic grows fast
○
●
Traffic pattern also changing (video streaming etc)
Questions?
References
1.
2.
3.
4.
5.
6.
Next Generation Routers, H. Jonathan Chao, 2002
The iSLIP Scheduling Algorithm for Input-Queued Switches, Nick McKeown 1999: https://www.cse.
iitb.ac.in/~mythili/teaching/cs641_autumn2015/references/16-iSlip-crossbar-sched.pdf
http://tiny-tera.stanford.edu/~nickm/papers/Infocom98_lookup.pdf
https://www.pagiamtzis.com/cam/camintro/
N. Huang, S. Zhao, J. Pan, and C. Su, “A fast IP routing lookup scheme for gigabit switching
routers,” Proc. IEEE INFOCOM’99, Mar. 1999.
http://www.drdobbs.com/cpp/fast-ip-routing-with-lc-tries/184410638
Download