All-Path Bridging Update ) Jun Tanaka (Fujitsu Labs. Ld.)

advertisement
All-Path Bridging Update
Jun Tanaka (Fujitsu Labs. Ld.)
Guillermo Ibanez (UAH, Madrid, Spain)
Vinod Kumar (Tejas Networks)
IEEE Plenary meeting Atlanta 7-10 Nov.
Contents
•
•
•
•
All-Path Basics
Issues
Report of All-Path Demos
Report of proposal to AVB WG
2016/7/26
1
Problem Statement
IEEE802.1D RSTP has following limitations;
–
–
–
–
–
Not all the links cannot be used
The shortest path might not be used anytime
No multipath available
Root bridge tends to be high load
Not scalable
2016/7/26
2
Objectives
• To overcome RSTP limitation
–
–
–
–
–
–
–
Loop free
All links to be used
TRILL
Provide shortest path
Provide multipath
Compatible with 802.1D/Q
No new tag or new frame to be defined
Zero configuration
2016/7/26
SPB
3
All-Path Basics (One-way)
1
4
S
S
3
2
5
D
ARP_req
S
Port locked to S
D
Port locked to D
4
All-Path Basics (One-way)
The first received port
is locked to S.
- Register S to a table
- Start lock timer
- Learn S at the port
1
S
X ARP_req
The later received port
discard the frame S.
- Check S w/ the table
if the lock timer effective
4
X
S
S
S
2
3
ARP_req
S
Port locked to S
D
Port locked to D
5
D
5
All-Path Basics (One-way)
1
S
S
4
X
X
X
S
S
S
2
3
S
5
D
ARP_req
S
Port locked to S
D
Port locked to D
6
All-Path Basics (Two-way)
1
If DA is on the FDB
Unicast forwarding
same as 802.1d
S
S
4
S
S
S
2
3
D
S
5
D
D
ARP_reply
S
Port locked to S
D
Port locked to D
7
All-Path Basics (Two-way)
1
S
S
4
S
S
2
D
S
3
D
S
5
D
D
ARP_reply
S
Port locked to S
D
Port locked to D
8
Needful Things
802.1D
• Forwarding Database (Large: ex.16k~ entry)
• Aging timer (long: ex. 300s)
+
All-Path
• First-come table (small: ex. ~1k entry)
• Lock timer (short: ex. ~1s)
• Filtering logic (late-come frames)
2016/7/26
9
Minimum aging time of Lock timer
FP
FP
SP
x
FP
x
SP
FP
FP: First Port, SP: Second Port
First port received (learning)
Processing time
(forwarding, learning,
classification, tagging,
queuing etc.)
First port received (learning)
Processing time
(forwarding, learning,
classification, tagging,
queuing etc.)
The minimum
aging time
x
Second port received (discarding)
The aging timer shall be valid to
discard this frame as received
from the second port
x
Second port received
(discarding)
The First-come table aging time shall be longer than 2 x (one-way link delay + processing delay)
If it is for Data center, it can be less than 1ms.
2016/7/26
10
Scope of All-Path
Both support, loop free, shortest path
SPB, ECMP
TRILL
Enterprise, Campus,
Small datacenter
Manageability Home network etc.
LAN
MAN/WAN
ALL-PATH
Simple
 Less operation
 Natural load balance
Large area, provider network
Large datacenter etc.
RSTP/MSTP
Scalability
2016/7/26
11
Issues
1. Path recovery
2. Server edge
3. Load balance
2016/7/26
12
1. Path Recovery – Original idea
1
S
S
4
ARP_req
S
S
2
S
3
S
5
D
D
• Mechanism: When unknown unicast frame arriving at bridge with failed
link, path fail message is generated per MAC entry towards source bridge,
that generates corresponding ARP to re-establish tree.
• Question: If 10K MAC entries are existed in FDB, 10K path fail frames
should be generated, is it feasible processing for local CPU, especially in
high-speed link (ex. 10GE)?
13
1. Path Recovery – Original idea
1
Path_fail
S
Path_fail
S
4
D
S
S
2
3
S
Port locked to S
D
Port locked to D
5
D
D
14
1.Path recovery – Selective flush
a
a
2
(Fujitsu)
1
1 SW3
2
SW5
flush message is terminated
because “b” is not binded to port1
MAC=a
3
3
a
1
a
1
b
2
SW1
flush “b”
b
2
SW2
Delete entry “b” from FDB and re-sends
the flush message to SW1.
a
1
flush “b”
b
2
SW4
a
1
b
2
SW6
MAC=b
May includes two or more…ex. 100s of
MAC addresses to be flushed as a list.
 When link failure is detected, MAC flush lists are flooded.
54 frames (187 MAC / 1500B frame) for 10K MAC entry.
 Avoid unnecessary flooding, MAC entries are deleted to shorten.
 Issues: How to prevent flush frame loss.
May require CPU processing power.
 Experience: 15ms to flush 10K MACs in a node (1GHz MIPS Core)
2016/7/26
15
1. Path Recovery - Loop
back(UAH)
• Low processing at failed (link) bridges: loopback
is part of the standard forwarding table
• Processing load is distributed among source
edge bridges involved in flows. Only one side
(SA>DA) asks for repair.
• Resiliency: If first packet looped back is lost the
other following looped back frames will follow.
2. Server Edge
• Question: If a server has two or more NICs, how to find which
port is first?
• vswitch: only vswitch to support All-Path
• VEB: both VEB and vswitch to support All-Path
• VEPA: only external switch to support All-Path
Vswitch
NIC
NIC
Vswitch
VEB
NIC
VEB
NIC
VEPA
NIC
Ext. switch
2016/7/26
17
3. Load Balance (Fujitsu)
140000000
SW1
100000000
80000000
60000000
40000000
20000000
SW4
SW5
SW1
SW2
SW3
0
0
5.6
11.2
16.8
22.4
28
33.6
39.2
44.8
50.4
56
61.6
67.2
72.8
78.4
84
89.6
95.2
100.8
106.4
112
117.6
123.2
128.8
134.4
SW3
Throughput
SW2
120000000
Elapsed time
• Load balance is available in natural way because high
load link tend not to be selected with queuing delay.
• Pros: zero-configuration load balance
• Cons: you cannot control load balance like SPB/ECMP
2016/7/26
18
Load Distribution (UAH simulations)
•
Objectives:
–
–
•
Topology:
–
Links subset of a Small Data Center topology to
show path selection at core
–
Core links capacity is lower (100Mbps) to
force load distribubtion and congestion only
at core
Queues support up to 100.000 frames (so
that they affect as delay and not discarding
frames)
–
•
Explain native load distribution results of
Singapore presentation
Visualize how the on-demand path selection
avoids the loaded links
Traffic: stepped in sequence, left to right
–
–
–
Green servers send UDP packets towards red
servers
Groups of 25 servers initiate the communication
every second. The first one at second 1, the
second at second 2, second 3,…. And finally,
the last group is a single server that starts the
communication at second 4 of the simulation.
UDP packets (1 packet every 1 ms,
simultaneous for all servers). The packet size
varies between 90 and 900 bytes in the different
simulations to simulate increasing traffic loads.
Simulation I – UDP packet size: 90B
x 25
1s
x 25
S3
2s
x 25
3s
x 25
4s
x 25 x1
# flows
51
1
24
1
24
0
S4
x 25
S1
S2
x 25
x 25
Server
Group
1
2
3
4
x 25
Paths
s3-s4
s3-s4 and s3-s2-s4
s3-s4 and s3-s2-s4
s3-s1-s4
Note the path s3-s4 is reused several
times because is still not so loaded
(low traffic)
Simulation I – UDP packet size: 300B
# flows
26
14
36
14
36
0
Server
Group
1
2
3
4
S3
S4
S1
S2
Paths
s3-s4
s3-s1-s4 and s3-s2-s4
s3-s2-s4
s3-s4
Note the path s3-s4 is not reused when the
2nd group starts, but instead uses s3-s1-s4
and s3-s2-s4, similar with the 3rd group, the
4rd reuses s3-s4 because it’s again the
fastest once s1 and s2 are loaded because of
groups 2 and 3
Simulation I – UDP packet size: 900B
x 25
1s
x 25
S3
2s
x 25
3s
x 25
4s
x 25 x1
# flows
26
25
25
25
25
0
S4
x 25
S1
S2
x 25
x 25
Server
Group
1
2
3
4
x 25
Paths
s3-s4
s3-s1-s4
s3-s2-s4
s3-s4
900B means some frames are being discarded at
queues (too much traffic). Group 1 chooses s3-s4 and
fully loads it, 2 chooses s3-s1-s4 and same happens, 3
chooses s3-s2-s4 and same, when 4 starts, every link
(except the one from s1-s2) is fully loaded, so s3-s4 is
again the fastest path and is chosen.
Load distribution conclusions
• Notice how the # of flows gets distributed in the links in
the core when the traffic increases due to increased
latency.
– Load distribution starts with low loads
– Path diversity increases with load
• Similar balancing effect observed in redundant links from
an access switch to two core switches
• On demand path selection finds paths adapted to current,
instantaneous conditions, not to past or assumed traffic
matrix
Report on Proposal for AVB TG
• May 12, Thu, morning session @ AVB
• Dr. Ibanez presented the materials as used in IW session
(Singapore and Santa Fe)
• Questions and comments
–
–
–
–
Any other metric than latency e.g. bandwidth?
Path recovery time comparing with RSTP?
Any broadcast storm occurred when link failed?
What’s the status in IW session, any PAR created?
• AVB status
– They try to solve by their own way, using SRP.
– Not only latency but also bandwidth can be used as metric
– Also redundant path can be calculated
2016/7/26
24
Path Selection with SRP
at-phkl-SRP-Stream-Path-Selection-0311-v01.pdf
2016/7/26
25
REPORT OF ALL PATH DEMOS
- TORONTO: SIGCOM AUGUST 2011
- BONN: LCN OCTOBER 2011
Demo at Sigcom 2011
•HW NetFPGA implementation
•Four NetFPGAs (4*1 Gbps)
•Demo:
• Zero configuration
• Video streaming, high throughput.
• Robustness, no frame loops
• Fast path recovery
• Internet connection, std hosts
•http://conferences.sigcomm.org/sigcomm/2011/papers/sig
comm/p444.pdf
Demo at IEEE LCN 2011 (october, Bonn)
Openflow and Linux (OpenWRT) ALL Path switches
NOX Openflow
controller
Ethernet switch
Demo at IEEE LCN 2011 (october, Bonn)
Openflow and Linux (OpenWRT) ALL Path switches
• One NEC switch splitted into 4 Openflow
switches
• Four Soekris boards as 4 Openflow switches
• Two Linksys WRT routers running ARP Path
over Linux implementation
• Video streaming and internet access without
host changes
•
– Some video limitations at OpenWRT routers
– Smooth operation on Soekris and NEC.
Reference: A Small Data Center Network of ARP-Path
Bridges Made of Openflow Switches. Guillermo
Ibáñez (UAH); Jad Naous (MIT/Stanford Univ.) ; Elisa
Rojas (UAH); Bart De Schuymer (Art in Algorithms,
Belgium); Thomas Dietz (NEC Europe Ltd., Germany)
Feedback from All Path-UAH demos
• At every demo most people an explanation of how ARP
Path works (video available was shown)
• Intrigued about the mechanism, and interest on the
reconfiguration of flows and native loop avoidance
• Amount of state stored per bridge: per host or per bridge.
(Encapsulating versions Q-in-Q, M-in-M are possible, but
not the target, already covered by SPB)
• Questions on compatibility and miscibility with standard
bridges (automatic core-island mode, no full miscibility)
• Collateral questions on NetFPGA and on LCN demo
topology
• Next step : Implementation on a commodity Ethernet
Switch (FPGA) (Chip/Switch manufacturers are invited to
provide a switch platform) and implementation of
interoperability with 802.1D bridges in Linux version
Conclusions
• All Path bridging is a reality
– A new class of transparent low latency bridges
• Do not compute, find the path by direct probing
•
•
•
•
Zero configuration
Robust, loop free
Native load distribution
Paths non predictable, but resilient, paths adapt to
traffic and traffic is not predictable
• Low latency
Download