Data Center Traffic and Measurements: SoNIC Hakim Weatherspoon

advertisement
Data Center Traffic and
Measurements: SoNIC
Hakim Weatherspoon
Assistant Professor, Dept of Computer Science
CS 5413: High Performance Systems and Networking
November 12, 2014
Slides from USENIX symposium on Networked Systems Design and Implementation
(NSDI) 2013 presentation of “SoNIC: Precise Realtime Software Access and Control
of Wired Networks,”
Goals for Today
• Analysis and Network Traffic Characteristics of
Data Centers in the wild
– T. Benson, A. Akella, and D. A. Maltz. In Proceedings of
the 10th ACM SIGCOMM conference on Internet
measurement (IMC), pp. 267-280. ACM, 2010.
Interpacket Delay and Network Research
Application
• Interpacket gap, spacing, arrival time, …
IPG
Transport
Packet i
Network
Data Link
Physical
Packet i+1
IPD
• Important metric for network research
– Can be improved with access to the PHY
Packet
Generation
Increasing
Throughput
Detecting
timing channel
Characterization
7/12/2016
SoNIC NSDI 2013
Packet Capture
Estimating
bandwidth
4
Network Research enlightened via the PHY
Application
• Valuable information: Idle characters
IPG
Transport
Network
Data Link
Physical
7/12/2016
Packet i
Packet i+1
IPD
– Can provide precise timing base for control
• Each bit is ~97 ps wide
SoNIC NSDI 2013
5
Network Research enlightened via the PHY
Application
• Valuable information:
characters
12 /I/sIdle
= 100bits
= 9.7ns
IPG
Transport
Network
Data Link
Physical
–
for control
• Each bit is ~97 ps wide
Packet
Generation
7/12/2016
Packet i
Packet i+1
One Idle character
Can provide(/I/)
precise timing base
= 7~8 bits
Detecting
timing channel
SoNIC NSDI 2013
Packet Capture
6
Principle #1: Precision
Precise network measurements is
enabled via access to the physical layer
(and the idle characters and bits within
interpacket gap)
7/12/2016
SoNIC NSDI 2013
7
How to control the idle characters (bits)?
Application
• Access to the entire stream is required
IPG
Transport
Network
Data Link
Physical
Packet i
Packet i+1
• Issue1: The PHY is simply a black box
– No interface from NIC or OS
– Valuable information is invisible (discarded)
Packet i
Packet i
Packet i+1
Packet i+1
Packet i+2
Packet i+2
• Issue2: Limited access to hardware
– We are network systems researchers
a.k.a. we like software
7/12/2016
SoNIC NSDI 2013
8
Principle #2: Software
Network Systems researchers need
software access to the physical layer
7/12/2016
SoNIC NSDI 2013
9
Precision + Software = Physics equipment???
• BiFocals [IMC’10 Freedman, Marian, Lee, Birman, Weatherspoon, Xu]
– Enabled novel network research
– Precision + Software =
Laser + Oscilloscope + Offline analysis
– Allowed precise control in software
• Limitations
– Offline (not realtime)
– Limited Buffering
– Expensive
7/12/2016
SoNIC NSDI 2013
10
Principle #3: Realtime
Network systems researchers need access
and control of the physical layer
(interpacket gap) continuously in realtime
7/12/2016
SoNIC NSDI 2013
11
Challenge
Application
• Goal: Control every bit in software in realtime
IPG
Transport
Network
Data Link
Physical
Packet i
Packet i+1
IPD
– Enable novel network research
• Challenge
– Requires unprecedented software access to the PHY
7/12/2016
SoNIC NSDI 2013
12
Outline
• Introduction
• SoNIC: Software-defined Network Interface Card
– Background: 10GbE Network Stack
– Design
• Network Research Applications
• Conclusion
7/12/2016
SoNIC NSDI 2013
13
SoNIC: Software-defined Network Interface Card
Application
• Implements the PHY in software
IPG
Transport
Network
Data Link
Physical
Packet i
Packet i+1
IPD
– Enabling control and access to every bit in realtime
– With commodity components
– Thus, enabling novel network research
• How?
– Backgrounds: 10 GbE Network stack
– Design and implementation
• Hardware & Software
• Optimizations
7/12/2016
SoNIC NSDI 2013
14
10GbE Network Stack
Application
Data
Transport
Network
Data Link
Preamble
Physical
64/66b PCS
Encode
Decode
Scrambler
Descrambler
Gearbox
Blocksync
PMA
Eth Hdr
64 bit
/S/
/D/
L3 Hdr
Data
L2 Hdr
L3 Hdr
Data
L2 Hdr
L3 Hdr
Data
Idle
characters
(/I/)
2 bit
syncheader
/D/
/D/
/D/
CRC
Gap
10.3125 Gigabits
/T/
/E/
16 bit
011010010110100101101001011010010110100101101001011010010110100101101
PMD
7/12/2016
SoNIC NSDI 2013
15
10GbE Network Stack
Application
Data
Transport
Network
SW
Data Link
Physical
64/66b PCS
Encode
Decode
Scrambler
Descrambler
Gearbox
Blocksync
PMA
PMD
7/12/2016
L3 Hdr
Data
L2 Hdr
L3 Hdr
Data
L2 Hdr
L3 Hdr
Data
Packet i
Preamble
Eth Hdr
Packet i+1
CRC
Gap
HW
/S/
/D/
/D/
/D/
Packet i
/D/
/T/
/E/
Packet i+1
011010010110100101101001011010010110100101101001011010010110100101101
Commodity NIC
SoNIC NSDI 2013
16
10GbE Network Stack
Application
Data
Application
L3 Hdr
Data
Transport
L3 Hdr
Data
Network
Data
DataCRC
Link
SW
Transport
Network
L2 Hdr
Data Link
Preamble
Eth Hdr
L2 Hdr
L3 Hdr
HW
Physical
64/66b PCS
Encode
/S/
Packet i /D/
Decode
/D/ Packet/D/
i+1
Gap
Physical
64/66b PCS
Encode /T/ Decode /E/
/D/
SW
Scrambler
Descrambler
Scrambler
Descrambler
Gearbox
Blocksync
Gearbox
Blocksync
HW
PMA
PMD
7/12/2016
011010010110100101101001011010010110100101101001011010010110100101101
PMA
SoNIC
NetFPGA
SoNIC NSDI 2013
PMD
17
SoNIC Design
Application
Data
Transport
Network
Data Link
Preamble
Eth Hdr
L3 Hdr
Data
L2 Hdr
L3 Hdr
Data
L2 Hdr
L3 Hdr
Data
CRC
Gap
Physical
64/66b PCS
Encode
/S/
Decode
/D/
/D/
/D/
/D/
/T/
/E/
SW
Scrambler
Descrambler
Gearbox
Blocksync
HW
PMA
PMD
7/12/2016
011010010110100101101001011010010110100101101001011010010110100101101
SoNIC
SoNIC NSDI 2013
18
SoNIC Design and Architecture
Application
Data
L3 Hdr
APP Data
Userspace
L3 Hdr
APP Data
Kernel
L2 Hdr TX MAC
L3 Hdr
Data
Transport
Network
L2 Hdr
Data Link
Preamble
Eth Hdr
RX MACCRC
Gap
Physical
64/66b PCS
Encode
/S/
Decode
/D/
/D/
SW
Scrambler
Descrambler
Gearbox
Blocksync
HW
PMA
PMD
7/12/2016
/D/
/D/
/T/
TX PCS
RX PCS
Gearbox
Blocksync
/E/
Hardware
011010010110100101101001011010010110100101101001011010010110100101101
Transceiver
Transceiver
SoNIC
SoNIC NSDI 2013
SFP+
19
SoNIC Design: Hardware
• To deliver every bit from/to software
Application
Transport
– High-speed transceivers
– PCIe Gen2 (=32Gbps)
Network
• Optimized DMA engine
Data Link
Physical
64/66b PCS
Encode
Decode
SW
Scrambler
Descrambler
Gearbox
Blocksync
SFP+
SFP+
FPGA
HW
PMA
PMD
7/12/2016
PCIeGen2
SoNIC NSDI 2013
20
SoNIC Design: Software
Application
Port 0
Port 1
Transport
APP
APP
Network
TX MAC
RX MAC
TX MAC
RX MAC
TX PCS
RX PCS
TX PCS
RX PCS
Data Link
Physical
64/66b PCS
Encode
Decode
SW
Scrambler
Descrambler
Gearbox
Blocksync
HW
• Dedicated Kernel Threads
– TX / RX PCS, TX / RX MAC threads
– APP thread: Interface to userspace
PMA
PMD
7/12/2016
Packet i
SoNIC NSDI 2013
Packet i+1
21
SoNIC Design: Synchronization
Application
Port 0
Transport
APP
Low-latency
FIFOs
Port 1
APP
Network
TX MAC
RX MAC
TX MAC
RX MAC
TX PCS
RX PCS
TX PCS
RX PCS
Data Link
Physical
64/66b PCS
Encode
Decode
SW
Scrambler
Descrambler
Gearbox
Blocksync
SFP+
SFP+
FPGA
Pointer-polling
No Interrupts
HW
PMA
PMD
7/12/2016
PCIeGen2
SoNIC NSDI 2013
22
SoNIC Design: Optimizations
Application
• Scrambler
Transport
Naïve Implementation
Network
s  state
d  data
for i = 0 63 do
in  (d >> i) & 1
out  (in Å(s >> 38)Å(s >> 57))&1
s  (s << 1) | out
r  r | (out << i)
state  s
end for
Data Link
Physical
64/66b PCS
Encode
Decode
Scrambler
Descrambler
G( x)  x58  x39 1
0.436 Gbps
Gearbox
Blocksync
PMA
PMD
7/12/2016
Optimized Implementation
s  state
d  data
r  (s >> 6)Å (s >> 25) Å d
r  r Å(r << 39) Å (r << 58)
state  r
21 Gbps
• CRC computation
• DMA engine
SoNIC NSDI 2013
23
SoNIC Design: Interface and Control
• Hardware control: ioctl syscall
• I/O : character device interface
• Sample C code for packet generation and capture
1: #include "sonic.h"
2:
3: struct sonic_pkt_gen_info info = {
4: .mode = 0,
5: .pkt_num = 1000000000UL,
6: .pkt_len = 1518,
7: .mac_src = "00:11:22:33:44:55",
8: .mac_dst = "aa:bb:cc:dd:ee:ff",
9: .ip_src = "192.168.0.1",
10: .ip_dst = "192.168.0.2",
11: .port_src = 5000,
12: .port_dst = 5000,
13: .idle = 12,
14: };
15:
16: /* OPEN DEVICE*/
17: fd1 = open(SONIC_CONTROL_PATH, O_RDWR);
18: fd2 = open(SONIC_PORT1_PATH, O_RDONLY);
7/12/2016
19: /* CONFIG SONIC CARD FOR PACKET GEN*/
20: ioctl(fd1, SONIC_IOC_RESET)
21: ioctl(fd1, SONIC_IOC_SET_MODE, PKT_GEN_CAP)
22: ioctl(fd1, SONIC_IOC_PORT0_INFO_SET, &info)
23
24: /* START EXPERIMENT*/
25: ioctl(fd1, SONIC_IOC_START)
26: // wait till experiment finishes
27: ioctl(fd1, SONIC_IOC_STOP)
28:
29: /* CAPTURE PACKET */
30: while ((ret = read(fd2, buf, 65536)) > 0) {
31: // process data
32: }
33:
34: close(fd1);
35: close(fd2);
SoNIC
24
Outline
• Introduction
• SoNIC: Software-defined Network Interface Card
• Network Research Applications
– Packet Generation
– Packet Capture
– Covert timing channel
• Conclusion
7/12/2016
SoNIC NSDI 2013
25
Network Research Applications
Application
• Interpacket delays and gaps
IPG
Transport
Packet i
Network
Packet i+1
IPD
Data Link
Physical
Packet
Generation
7/12/2016
Detecting
timing channel
SoNIC NSDI 2013
Packet Capture
26
Packet Generation and Capture
• Basic functions for network research
– Generation: SoNIC allows control of IPGs in # of /I/s
– Capture: SoNIC captures what was sent with IPGs in bits
APP
APP
TX MAC
RX MAC
TX MAC
RX MAC
TX PCS
RX PCS
TX PCS
RX PCS
1518B
1518B
1518B
1518B
1518B
9Gbps, IPD =13992 bits (1357ns)
7/12/2016
SoNIC NSDI 2013
27
Packet Generation
• SoNIC allows precise
control
of IPGs
CDF of
generated
IPDs
1
SoNIC
CDF
0.8
Specialized NIC
Higher variance
0.6
APP
TX MAC
RX MAC
TX PCS
RX PCS
0.4
1518B
APP
SoNIC
Zero variance!!!
0.2
0
1000
Sniffer 10G
1500
1518B
2000
2500
Interpacket delays (ns)
1518B
1518B
TX MAC
RX MAC
TX PCS
RX PCS
3000
1518B
9Gbps, IPD =13992 bits (1357ns)
7/12/2016
SoNIC NSDI 2013
28
Packet Capture
• SoNIC captures what
is sent
CDF of
captured IPDs
1
CDF
0.8
SoNIC
Kernel
Userspace
Sniffer 10G
0.6
APP
TX MAC
RX MAC
TX PCS
RX PCS
0.4
0.2
APP
TX MAC
RX MAC
TX PCS
RX PCS
0
0
1518B
1000
1518B
2000
3000
4000
Interpacket delays (ns)
1518B
1518B
5000
1518B
9Gbps, IPD =13992 bits (1357ns)
7/12/2016
SoNIC NSDI 2013
29
Covert Timing Channel
• Embedding signals into interpacket gaps.
– Large gap: ‘1’
– Small gap: ‘0’
Packet i
Packet i
Packet i+1
Packet i+1
• Covert timing channel by modulating IPGs at 100ns
APP
TX MAC
RX MAC
TX PCS
RX PCS
7/12/2016
• Overt channel at 3 Gbps
• Covert channel at 250 kbps
• Over 4-hops with < 1% BER
SoNIC NSDI 2013
APP
TX MAC
RX MAC
TX PCS
RX PCS
30
Covert Timing Channel
• Modulating IPGS at 100ns scale (=128 /I/s)
1
SoNIC
Kernel
CDF
0.8
APP
3562 /I/s
3562 - 128 /I/s
3562 + 128 /I/s
BER = 0.37%
0.6
APP
0.4
TX MAC
RX MAC
TX PCS
RX PCS
0.2
‘1’
‘0’
TX MAC
RX MAC
TX PCS
RX PCS
0
500
1500
2500
‘1’: 3562 + 128 /I/s
‘0’: 3562 – 128 /I/s
7/12/2016
3500
Interpacket delays (ns)
4500
‘1’: 3562 + a /I/s
‘0’: 3562 – a /I/s
SoNIC NSDI 2013
31
Contributions
• Network Research
– Unprecedented access to the PHY with commodity hardware
– A platform for cross-network-layer research
– Can improve network research applications
• Engineering
–
–
–
–
Precise control of interpacket gaps (delays)
Design and implementation of the PHY in software
Novel scalable hardware design
Optimizations / Parallelism
• Status
– Measurements in large scale: DCN, GENI, 40 GbE
7/12/2016
SoNIC NSDI 2013
32
Conclusion
• Precise Realtime Software Access to the PHY
• Commodity components
– An FPGA development board, Intel architecture
• Network applications
– Network measurements
– Network characterization
– Network steganography
• Webpage: http://sonic.cs.cornell.edu
– SoNIC is available Open Source.
7/12/2016
SoNIC NSDI 2013
33
Before Next time
• Project Interim report
– Due Monday, November 24.
– And meet with groups, TA, and professor
• Fractus Upgrade: Should be back online
• Required review and reading for Friday, November 14
– Timing is Everything: Accurate, Minimum Overhead, Available Bandwidth
Estimation in High-speed Wired Networks, H. Wang, K. Lee, E. Li, C. L. Lim, A.
Tang, and H. Weatherspoon. ACM SIGCOMM Internet Measurement
Conference (IMC), November 2014.
– http://conferences2.sigcomm.org/imc/2014/papers/p407.pdf
• Check piazza: http://piazza.com/cornell/fall2014/cs5413
• Check website for updated schedule
Download