Practical TDMA for Datacenter Ethernet Bhanu C. Vattikonda, George Porter, Amin Vahdat, Alex C. Snoeren Variety of applications hosted in datacenters All-to-all Performance depends on throughput sensitive traffic in shuffle phase Gather/Scatter Generate latency sensitive traffic Network is treated as a black-box Applications like Hadoop MapReduce perform inefficiently Applications like Memcached experience high latency Why does the lack of coordination hurt performance? 3 Example datacenter scenario Traffic receiver •Bulk transfer •Latency sensitive Bulk transfer Latency sensitive Bulk transfer 4 Drops and queuing lead to poor performance Traffic receiver • Bulk transfer traffic experiences packet drops • Latency sensitive traffic gets queued in the buffers Bulk transfer Latency sensitive Bulk transfer 5 Current solutions do not take a holistic approach Facebook uses a custom UDP based transport protocol Alternative transport protocols like DCTCP address TCP shortcomings Infiniband, Myrinet offer boutique hardware solutions to address these problems but are expensive Since the demand can be anticipated, can we coordinate hosts? 6 Taking turns to transmit packets Receiver Bulk transfer Latency sensitive Bulk transfer 7 TDMA: An old technique Enforcing TDMA is difficult It is not practical to task hosts with keeping track of time and controlling transmissions End host clocks quickly go out of synchronization 9 Existing TDMA solutions need special support Since end host clocks cannot be synchronized, special support is needed from the network FTT-Ethernet, RTL-TEP, TT-Ethernet require modified switching hardware Even with special support, the hosts need to run real time operating systems to enforce TDMA FTT-Ethernet, RTL-TEP Can we do TDMA with commodity Ethernet? 10 TDMA using Pause Frames Flow control packets (pause frames) can be used to control Ethernet transmissions Pause frames are processed in hardware Very efficient processing of the flow control packets Blast UDP packets 802.3x Pause frames Measure time taken by sender to react to the pause frames 11 TDMA using Pause Frames Pause frames processed in hardware Very efficient processing of the flow control packets • Reaction time to pause frames is 2 – 6 μs • Low variance * Measurement done using 802.3x pause frames 12 TDMA using commodity hardware Collect demand information from the end hosts TDMA imposed over Ethernet using a centralized fabric manager Compute the schedule for communication Control end host transmissions 13 TDMA example D1 S S –> D1: 1MB D2 S –> D2: 1MB S S S S -> -> -> -> Compute the schedule for communication Control end host transmissions round12 Schedule • round1: • round2: • round3: • round4: •… Collect demand information from the end hosts D1 D2 D1 D2 Fabric manager 14 More than one host Fabric manager • Control packets should be processed with low variance round12 • Control packets should arrive at the end hosts synchronously Synchronized arrival of control packets We cannot directly measure the synchronous arrival Difference in arrival of a pair of control packets at 24 hosts 16 Synchronized arrival of control packets Difference in arrival of a pair of control packets at 24 hosts Variation of ~15μs for different sending rates at end hosts 17 Ideal scenario: control packets arrive synchronously round2 round3 Host A Round 1 Round 2 Round 3 Host B Round 1 Round 2 Round 3 round2 round3 18 Experiments show that packets do not arrive synchronously round2 Host A Round 1 Host B Round 1 Round 2 Round 3 Round 2 Round 3 Out of sync by <15μs round2 19 Guard times to handle lack of synchronization Stopround2 Host A Round 1 Host B Round 1 Round 2 Round 2 Round 3 Round 3 Guard times (15μs) handle out of sync control packets Stopround2 20 TDMA for Datacenter Ethernet Control end host transmissions • Use flow control packets to achieve low variance • Guard times adjust for variance in control packet arrival 21 Encoding scheduling information We use IEEE 802.1Qbb priority flow control frames to encode scheduling information Using iptables rules, traffic for different destinations can be classified into different Ethernet classes 802.1Qbb priority flow control frames can then be used to selectively start transmission of packets to a destination 22 Methodology to enforce TDMA slots Pause all traffic Un-pause traffic to a particular destination Pause all traffic to begin the guard time 23 Evaluation MapReduce shuffle phase All to all transfer Memcached like workloads Latency between nodes in a mixed environment in presence of background flows Hybrid electrical and optical switch architectures Performance in dynamic network topologies 24 Experimental setup 24 servers HP DL380 Dual Myricom 10G NICs with kernel bypass to access packets 1 Cisco Nexus 5000 series 10G 96-port switch,1 Cisco Nexus 5000 series 10G 52-port switch 300μs TDMA slot and 15μs guard time Effective 5% overhead 25 All to all transfer in multi-hop topology •10GB all to all transfer 8 Hosts 8 Hosts 8 Hosts 26 All to all transfer in multi-hop topology •10GB all to all transfer TCP all to all •We use a simple round robin scheduler at each level •5% inefficiency owing to guard time Ideal transfer time: 1024s TDMA all to all 8 Hosts 8 Hosts 8 Hosts 27 Latency in the presence of background flows •Start both bulk transfers •Measure latency between nodes using UDP Bulk transfer Latency sensitive Receiver Bulk transfer 28 Latency in the presence of background flows •Latency between the nodes in presence of TCP flows is high and variable •TDMA system achieves lower latency TCP TDMA TDMA with Kernel bypass 29 Adapting to dynamic network configurations Electrical packet switch Optical circuit switch 30 Adapting to dynamic network configurations •Link capacity between the hosts is varied between 10Gbps and 1Gbps every 10ms Sender Ideal performance Receiver Adapting to dynamic network configurations •Link capacity between the hosts is varied between 10Gbps and 1Gbps every 10ms Sender TCP performance Receiver Adapting to dynamic network configurations TDMA better suited since it prevents packet losses TCP performance 33 Conclusion TDMA can be achieved using commodity hardware Leverage existing Ethernet standards TDMA can lead to performance gains in current networks 15% shorter finish times for all to all transfers 3x lower latency TDMA is well positioned for emerging network architectures which use dynamic topologies 2.5x throughput improvement in dynamic network settings 34 Thank You 35