Data Center Traffic and Measurements: SoNIC Hakim Weatherspoon Assistant Professor, Dept of Computer Science CS 5413: High Performance Systems and Networking November 12, 2014 Slides from USENIX symposium on Networked Systems Design and Implementation (NSDI) 2013 presentation of “SoNIC: Precise Realtime Software Access and Control of Wired Networks,” Goals for Today • Analysis and Network Traffic Characteristics of Data Centers in the wild – T. Benson, A. Akella, and D. A. Maltz. In Proceedings of the 10th ACM SIGCOMM conference on Internet measurement (IMC), pp. 267-280. ACM, 2010. Interpacket Delay and Network Research Application • Interpacket gap, spacing, arrival time, … IPG Transport Packet i Network Data Link Physical Packet i+1 IPD • Important metric for network research – Can be improved with access to the PHY Packet Generation Increasing Throughput Detecting timing channel Characterization 7/12/2016 SoNIC NSDI 2013 Packet Capture Estimating bandwidth 4 Network Research enlightened via the PHY Application • Valuable information: Idle characters IPG Transport Network Data Link Physical 7/12/2016 Packet i Packet i+1 IPD – Can provide precise timing base for control • Each bit is ~97 ps wide SoNIC NSDI 2013 5 Network Research enlightened via the PHY Application • Valuable information: characters 12 /I/sIdle = 100bits = 9.7ns IPG Transport Network Data Link Physical – for control • Each bit is ~97 ps wide Packet Generation 7/12/2016 Packet i Packet i+1 One Idle character Can provide(/I/) precise timing base = 7~8 bits Detecting timing channel SoNIC NSDI 2013 Packet Capture 6 Principle #1: Precision Precise network measurements is enabled via access to the physical layer (and the idle characters and bits within interpacket gap) 7/12/2016 SoNIC NSDI 2013 7 How to control the idle characters (bits)? Application • Access to the entire stream is required IPG Transport Network Data Link Physical Packet i Packet i+1 • Issue1: The PHY is simply a black box – No interface from NIC or OS – Valuable information is invisible (discarded) Packet i Packet i Packet i+1 Packet i+1 Packet i+2 Packet i+2 • Issue2: Limited access to hardware – We are network systems researchers a.k.a. we like software 7/12/2016 SoNIC NSDI 2013 8 Principle #2: Software Network Systems researchers need software access to the physical layer 7/12/2016 SoNIC NSDI 2013 9 Precision + Software = Physics equipment??? • BiFocals [IMC’10 Freedman, Marian, Lee, Birman, Weatherspoon, Xu] – Enabled novel network research – Precision + Software = Laser + Oscilloscope + Offline analysis – Allowed precise control in software • Limitations – Offline (not realtime) – Limited Buffering – Expensive 7/12/2016 SoNIC NSDI 2013 10 Principle #3: Realtime Network systems researchers need access and control of the physical layer (interpacket gap) continuously in realtime 7/12/2016 SoNIC NSDI 2013 11 Challenge Application • Goal: Control every bit in software in realtime IPG Transport Network Data Link Physical Packet i Packet i+1 IPD – Enable novel network research • Challenge – Requires unprecedented software access to the PHY 7/12/2016 SoNIC NSDI 2013 12 Outline • Introduction • SoNIC: Software-defined Network Interface Card – Background: 10GbE Network Stack – Design • Network Research Applications • Conclusion 7/12/2016 SoNIC NSDI 2013 13 SoNIC: Software-defined Network Interface Card Application • Implements the PHY in software IPG Transport Network Data Link Physical Packet i Packet i+1 IPD – Enabling control and access to every bit in realtime – With commodity components – Thus, enabling novel network research • How? – Backgrounds: 10 GbE Network stack – Design and implementation • Hardware & Software • Optimizations 7/12/2016 SoNIC NSDI 2013 14 10GbE Network Stack Application Data Transport Network Data Link Preamble Physical 64/66b PCS Encode Decode Scrambler Descrambler Gearbox Blocksync PMA Eth Hdr 64 bit /S/ /D/ L3 Hdr Data L2 Hdr L3 Hdr Data L2 Hdr L3 Hdr Data Idle characters (/I/) 2 bit syncheader /D/ /D/ /D/ CRC Gap 10.3125 Gigabits /T/ /E/ 16 bit 011010010110100101101001011010010110100101101001011010010110100101101 PMD 7/12/2016 SoNIC NSDI 2013 15 10GbE Network Stack Application Data Transport Network SW Data Link Physical 64/66b PCS Encode Decode Scrambler Descrambler Gearbox Blocksync PMA PMD 7/12/2016 L3 Hdr Data L2 Hdr L3 Hdr Data L2 Hdr L3 Hdr Data Packet i Preamble Eth Hdr Packet i+1 CRC Gap HW /S/ /D/ /D/ /D/ Packet i /D/ /T/ /E/ Packet i+1 011010010110100101101001011010010110100101101001011010010110100101101 Commodity NIC SoNIC NSDI 2013 16 10GbE Network Stack Application Data Application L3 Hdr Data Transport L3 Hdr Data Network Data DataCRC Link SW Transport Network L2 Hdr Data Link Preamble Eth Hdr L2 Hdr L3 Hdr HW Physical 64/66b PCS Encode /S/ Packet i /D/ Decode /D/ Packet/D/ i+1 Gap Physical 64/66b PCS Encode /T/ Decode /E/ /D/ SW Scrambler Descrambler Scrambler Descrambler Gearbox Blocksync Gearbox Blocksync HW PMA PMD 7/12/2016 011010010110100101101001011010010110100101101001011010010110100101101 PMA SoNIC NetFPGA SoNIC NSDI 2013 PMD 17 SoNIC Design Application Data Transport Network Data Link Preamble Eth Hdr L3 Hdr Data L2 Hdr L3 Hdr Data L2 Hdr L3 Hdr Data CRC Gap Physical 64/66b PCS Encode /S/ Decode /D/ /D/ /D/ /D/ /T/ /E/ SW Scrambler Descrambler Gearbox Blocksync HW PMA PMD 7/12/2016 011010010110100101101001011010010110100101101001011010010110100101101 SoNIC SoNIC NSDI 2013 18 SoNIC Design and Architecture Application Data L3 Hdr APP Data Userspace L3 Hdr APP Data Kernel L2 Hdr TX MAC L3 Hdr Data Transport Network L2 Hdr Data Link Preamble Eth Hdr RX MACCRC Gap Physical 64/66b PCS Encode /S/ Decode /D/ /D/ SW Scrambler Descrambler Gearbox Blocksync HW PMA PMD 7/12/2016 /D/ /D/ /T/ TX PCS RX PCS Gearbox Blocksync /E/ Hardware 011010010110100101101001011010010110100101101001011010010110100101101 Transceiver Transceiver SoNIC SoNIC NSDI 2013 SFP+ 19 SoNIC Design: Hardware • To deliver every bit from/to software Application Transport – High-speed transceivers – PCIe Gen2 (=32Gbps) Network • Optimized DMA engine Data Link Physical 64/66b PCS Encode Decode SW Scrambler Descrambler Gearbox Blocksync SFP+ SFP+ FPGA HW PMA PMD 7/12/2016 PCIeGen2 SoNIC NSDI 2013 20 SoNIC Design: Software Application Port 0 Port 1 Transport APP APP Network TX MAC RX MAC TX MAC RX MAC TX PCS RX PCS TX PCS RX PCS Data Link Physical 64/66b PCS Encode Decode SW Scrambler Descrambler Gearbox Blocksync HW • Dedicated Kernel Threads – TX / RX PCS, TX / RX MAC threads – APP thread: Interface to userspace PMA PMD 7/12/2016 Packet i SoNIC NSDI 2013 Packet i+1 21 SoNIC Design: Synchronization Application Port 0 Transport APP Low-latency FIFOs Port 1 APP Network TX MAC RX MAC TX MAC RX MAC TX PCS RX PCS TX PCS RX PCS Data Link Physical 64/66b PCS Encode Decode SW Scrambler Descrambler Gearbox Blocksync SFP+ SFP+ FPGA Pointer-polling No Interrupts HW PMA PMD 7/12/2016 PCIeGen2 SoNIC NSDI 2013 22 SoNIC Design: Optimizations Application • Scrambler Transport Naïve Implementation Network s state d data for i = 0 63 do in (d >> i) & 1 out (in Å(s >> 38)Å(s >> 57))&1 s (s << 1) | out r r | (out << i) state s end for Data Link Physical 64/66b PCS Encode Decode Scrambler Descrambler G( x) x58 x39 1 0.436 Gbps Gearbox Blocksync PMA PMD 7/12/2016 Optimized Implementation s state d data r (s >> 6)Å (s >> 25) Å d r r Å(r << 39) Å (r << 58) state r 21 Gbps • CRC computation • DMA engine SoNIC NSDI 2013 23 SoNIC Design: Interface and Control • Hardware control: ioctl syscall • I/O : character device interface • Sample C code for packet generation and capture 1: #include "sonic.h" 2: 3: struct sonic_pkt_gen_info info = { 4: .mode = 0, 5: .pkt_num = 1000000000UL, 6: .pkt_len = 1518, 7: .mac_src = "00:11:22:33:44:55", 8: .mac_dst = "aa:bb:cc:dd:ee:ff", 9: .ip_src = "192.168.0.1", 10: .ip_dst = "192.168.0.2", 11: .port_src = 5000, 12: .port_dst = 5000, 13: .idle = 12, 14: }; 15: 16: /* OPEN DEVICE*/ 17: fd1 = open(SONIC_CONTROL_PATH, O_RDWR); 18: fd2 = open(SONIC_PORT1_PATH, O_RDONLY); 7/12/2016 19: /* CONFIG SONIC CARD FOR PACKET GEN*/ 20: ioctl(fd1, SONIC_IOC_RESET) 21: ioctl(fd1, SONIC_IOC_SET_MODE, PKT_GEN_CAP) 22: ioctl(fd1, SONIC_IOC_PORT0_INFO_SET, &info) 23 24: /* START EXPERIMENT*/ 25: ioctl(fd1, SONIC_IOC_START) 26: // wait till experiment finishes 27: ioctl(fd1, SONIC_IOC_STOP) 28: 29: /* CAPTURE PACKET */ 30: while ((ret = read(fd2, buf, 65536)) > 0) { 31: // process data 32: } 33: 34: close(fd1); 35: close(fd2); SoNIC 24 Outline • Introduction • SoNIC: Software-defined Network Interface Card • Network Research Applications – Packet Generation – Packet Capture – Covert timing channel • Conclusion 7/12/2016 SoNIC NSDI 2013 25 Network Research Applications Application • Interpacket delays and gaps IPG Transport Packet i Network Packet i+1 IPD Data Link Physical Packet Generation 7/12/2016 Detecting timing channel SoNIC NSDI 2013 Packet Capture 26 Packet Generation and Capture • Basic functions for network research – Generation: SoNIC allows control of IPGs in # of /I/s – Capture: SoNIC captures what was sent with IPGs in bits APP APP TX MAC RX MAC TX MAC RX MAC TX PCS RX PCS TX PCS RX PCS 1518B 1518B 1518B 1518B 1518B 9Gbps, IPD =13992 bits (1357ns) 7/12/2016 SoNIC NSDI 2013 27 Packet Generation • SoNIC allows precise control of IPGs CDF of generated IPDs 1 SoNIC CDF 0.8 Specialized NIC Higher variance 0.6 APP TX MAC RX MAC TX PCS RX PCS 0.4 1518B APP SoNIC Zero variance!!! 0.2 0 1000 Sniffer 10G 1500 1518B 2000 2500 Interpacket delays (ns) 1518B 1518B TX MAC RX MAC TX PCS RX PCS 3000 1518B 9Gbps, IPD =13992 bits (1357ns) 7/12/2016 SoNIC NSDI 2013 28 Packet Capture • SoNIC captures what is sent CDF of captured IPDs 1 CDF 0.8 SoNIC Kernel Userspace Sniffer 10G 0.6 APP TX MAC RX MAC TX PCS RX PCS 0.4 0.2 APP TX MAC RX MAC TX PCS RX PCS 0 0 1518B 1000 1518B 2000 3000 4000 Interpacket delays (ns) 1518B 1518B 5000 1518B 9Gbps, IPD =13992 bits (1357ns) 7/12/2016 SoNIC NSDI 2013 29 Covert Timing Channel • Embedding signals into interpacket gaps. – Large gap: ‘1’ – Small gap: ‘0’ Packet i Packet i Packet i+1 Packet i+1 • Covert timing channel by modulating IPGs at 100ns APP TX MAC RX MAC TX PCS RX PCS 7/12/2016 • Overt channel at 3 Gbps • Covert channel at 250 kbps • Over 4-hops with < 1% BER SoNIC NSDI 2013 APP TX MAC RX MAC TX PCS RX PCS 30 Covert Timing Channel • Modulating IPGS at 100ns scale (=128 /I/s) 1 SoNIC Kernel CDF 0.8 APP 3562 /I/s 3562 - 128 /I/s 3562 + 128 /I/s BER = 0.37% 0.6 APP 0.4 TX MAC RX MAC TX PCS RX PCS 0.2 ‘1’ ‘0’ TX MAC RX MAC TX PCS RX PCS 0 500 1500 2500 ‘1’: 3562 + 128 /I/s ‘0’: 3562 – 128 /I/s 7/12/2016 3500 Interpacket delays (ns) 4500 ‘1’: 3562 + a /I/s ‘0’: 3562 – a /I/s SoNIC NSDI 2013 31 Contributions • Network Research – Unprecedented access to the PHY with commodity hardware – A platform for cross-network-layer research – Can improve network research applications • Engineering – – – – Precise control of interpacket gaps (delays) Design and implementation of the PHY in software Novel scalable hardware design Optimizations / Parallelism • Status – Measurements in large scale: DCN, GENI, 40 GbE 7/12/2016 SoNIC NSDI 2013 32 Conclusion • Precise Realtime Software Access to the PHY • Commodity components – An FPGA development board, Intel architecture • Network applications – Network measurements – Network characterization – Network steganography • Webpage: http://sonic.cs.cornell.edu – SoNIC is available Open Source. 7/12/2016 SoNIC NSDI 2013 33 Before Next time • Project Interim report – Due Monday, November 24. – And meet with groups, TA, and professor • Fractus Upgrade: Should be back online • Required review and reading for Friday, November 14 – Timing is Everything: Accurate, Minimum Overhead, Available Bandwidth Estimation in High-speed Wired Networks, H. Wang, K. Lee, E. Li, C. L. Lim, A. Tang, and H. Weatherspoon. ACM SIGCOMM Internet Measurement Conference (IMC), November 2014. – http://conferences2.sigcomm.org/imc/2014/papers/p407.pdf • Check piazza: http://piazza.com/cornell/fall2014/cs5413 • Check website for updated schedule