PPTX

advertisement
Practical TDMA for Datacenter Ethernet
Bhanu C. Vattikonda, George Porter,
Amin Vahdat, Alex C. Snoeren
Variety of applications hosted in datacenters
All-to-all
Performance depends
on throughput
sensitive traffic in
shuffle phase
Gather/Scatter
Generate latency
sensitive traffic
Network is treated as a black-box
Applications like Hadoop
MapReduce perform inefficiently
Applications like Memcached
experience high latency
Why does the lack of
coordination hurt
performance?
3
Example datacenter scenario
Traffic receiver
•Bulk transfer
•Latency sensitive
Bulk transfer
Latency
sensitive
Bulk transfer
4
Drops and queuing lead to poor performance
Traffic receiver
• Bulk transfer traffic
experiences packet drops
• Latency sensitive traffic
gets queued in the buffers
Bulk transfer
Latency
sensitive
Bulk transfer
5
Current solutions do not take a holistic approach
Facebook uses a custom UDP based transport protocol
Alternative transport protocols like DCTCP address TCP
shortcomings
Infiniband, Myrinet offer boutique hardware solutions to
address these problems but are expensive
Since the demand can
be anticipated, can we
coordinate hosts?
6
Taking turns to transmit packets
Receiver
Bulk transfer
Latency sensitive
Bulk transfer
7
TDMA: An old technique
Enforcing TDMA is difficult
It is not practical to task hosts with keeping track of time
and controlling transmissions
End host clocks quickly go out of synchronization
9
Existing TDMA solutions need special support
Since end host clocks cannot be synchronized, special
support is needed from the network
FTT-Ethernet, RTL-TEP, TT-Ethernet require modified switching
hardware
Even with special support, the hosts need to run real time
operating systems to enforce TDMA
FTT-Ethernet, RTL-TEP
Can we do TDMA with
commodity Ethernet?
10
TDMA using Pause Frames
Flow control packets (pause frames) can be used to control
Ethernet transmissions
Pause frames are processed in hardware
Very efficient processing of the flow control packets
Blast UDP packets
802.3x Pause frames
Measure time taken by sender to react to the pause frames
11
TDMA using Pause Frames
Pause frames processed in hardware
Very efficient processing of the flow control packets
• Reaction time to pause
frames is 2 – 6 μs
• Low variance
* Measurement done using 802.3x pause frames
12
TDMA using commodity hardware
Collect demand information
from the end hosts
TDMA imposed over
Ethernet using a
centralized fabric manager
Compute the schedule for
communication
Control end host
transmissions
13
TDMA example
D1
S
S –> D1: 1MB
D2
S –> D2: 1MB
S
S
S
S
->
->
->
->
Compute the schedule for
communication
Control end host
transmissions
round12
Schedule
• round1:
• round2:
• round3:
• round4:
•…
Collect demand information
from the end hosts
D1
D2
D1
D2
Fabric manager
14
More than one host
Fabric
manager
• Control packets should be
processed with low variance
round12
• Control packets should arrive at
the end hosts synchronously
Synchronized arrival of control packets
We cannot directly measure the synchronous arrival
Difference in arrival of a pair of control packets at 24 hosts
16
Synchronized arrival of control packets
Difference in arrival of a pair of control packets at 24 hosts
Variation of ~15μs for different sending rates at end hosts
17
Ideal scenario: control packets arrive synchronously
round2
round3
Host A
Round 1
Round 2
Round 3
Host B
Round 1
Round 2
Round 3
round2
round3
18
Experiments show that packets do not arrive synchronously
round2
Host A
Round 1
Host B
Round 1
Round 2
Round 3
Round 2
Round 3
Out of sync by <15μs
round2
19
Guard times to handle lack of synchronization
Stopround2
Host A
Round 1
Host B
Round 1
Round 2
Round 2
Round 3
Round 3
Guard times (15μs) handle
out of sync control packets
Stopround2
20
TDMA for Datacenter Ethernet
Control end host
transmissions
• Use flow control packets to
achieve low variance
• Guard times adjust for variance
in control packet arrival
21
Encoding scheduling information
We use IEEE 802.1Qbb priority flow control frames to
encode scheduling information
Using iptables rules, traffic for different destinations can be
classified into different Ethernet classes
802.1Qbb priority flow control frames can then be used to
selectively start transmission of packets to a destination
22
Methodology to enforce TDMA slots
Pause all traffic
Un-pause traffic to a particular destination
Pause all traffic to begin the guard time
23
Evaluation
MapReduce shuffle phase
All to all transfer
Memcached like workloads
Latency between nodes in a mixed
environment in presence of background flows
Hybrid electrical and optical switch
architectures
Performance in dynamic network topologies
24
Experimental setup
24 servers
HP DL380
Dual Myricom 10G NICs with kernel bypass
to access packets
1 Cisco Nexus 5000 series 10G
96-port switch,1 Cisco Nexus
5000 series 10G 52-port switch
300μs TDMA slot and 15μs guard time
Effective 5% overhead
25
All to all transfer in multi-hop topology
•10GB all to all transfer
8 Hosts
8 Hosts
8 Hosts
26
All to all transfer in multi-hop topology
•10GB all to all transfer
TCP all to all
•We use a simple round robin
scheduler at each level
•5% inefficiency owing to
guard time
Ideal transfer time: 1024s
TDMA all to all
8 Hosts
8 Hosts
8 Hosts
27
Latency in the presence of background flows
•Start both bulk transfers
•Measure latency between
nodes using UDP
Bulk transfer
Latency
sensitive
Receiver
Bulk transfer
28
Latency in the presence of background flows
•Latency between the nodes in presence of TCP flows is high
and variable
•TDMA system achieves lower latency
TCP
TDMA
TDMA with
Kernel bypass
29
Adapting to dynamic network configurations
Electrical packet
switch
Optical circuit
switch
30
Adapting to dynamic network configurations
•Link capacity between the hosts is varied between 10Gbps
and 1Gbps every 10ms
Sender
Ideal
performance
Receiver
Adapting to dynamic network configurations
•Link capacity between the hosts is varied between 10Gbps
and 1Gbps every 10ms
Sender
TCP
performance
Receiver
Adapting to dynamic network configurations
TDMA better
suited since it
prevents packet
losses
TCP performance
33
Conclusion
TDMA can be achieved using commodity hardware
Leverage existing Ethernet standards
TDMA can lead to performance gains in current networks
15% shorter finish times for all to all transfers
3x lower latency
TDMA is well positioned for emerging network architectures
which use dynamic topologies
2.5x throughput improvement in dynamic network settings
34
Thank You
35
Download