Network Research and Linux at the Hamilton Institute, NUIM. David Malone 4 November 2006 1 What has TCP ever done for Us? • Demuxes applications (using port numbers). • Makes sure lost data is retransmitted. • Delivers data to application in order. • Engages in congestion control. • Allows OOB data. • Some weird stuff with TCP options. 2 Standard Picture of TCP Connector Connector Listener No Listener SYN SYN RST SYN+ACK ACK ACKs Data FIN High Port (Usually) FIN High Port (Usually) Known Port (Usually) 3 Known Port (Usually) Other Views of TCP Packets In Data In Socket Magic Socket Data Out Packets Out (at link rate) Programmer’s View 4 Network View Congestion Control • TCP controls number of packets in network. • Packets are acknowledged, so flow of ACKs. • Receiver advertises window to avoid overflow. • Congestion window tries to adapt to network. • Slow start to roughly find capacity. • Congestion avoidance gradually adapts. 5 The Congestion Window k'th congestion event wi w i(k+1) w i(k) t a (k) tb (k) t c (k) Time (RTT) k'th congestion epoch • Additive increase, multiplicative decrease (AIMD). • To fill link need to reach BW × τ . • Backoff by 1/2, implies buffer is BW × τ . • Fairness, responsiveness, stability, . . . 6 TCP On Linux • Network stack buffers in-flight data. • Socket buffer must be BW × τ . • /proc/net/core/{r,w}mem_max → sockbuf sizes. • /proc/net/ipv4/tcp_{r,w,}mem → min/def/max tcp window. • Trade off — memory is wired, so valuable. • Defaults have recently been increased. 7 Research Work • Packet loss not caused by congestion. • Filling big BW × τ product packet at a time. • Bad for long-distance high-bandwidth links. • Various solutions in pipeline (BIC, Scalable, High-Speed, FAST, H-TCP). • Pluggable congestion control in Linux (behind TCP_CONG_ADVANCED). • /proc/sys/net/ipv4/tcp_congestion_control • Working on other congestion detection techniques. 8 Throughput 24000 24000 Throughput Flow 1 Throughput Flow 2 Throughput Flow 1 Throughput Flow 2 22000 22000 20000 20000 18000 16000 Throughput (KBytes) Throughput (KBytes) 18000 14000 12000 10000 16000 14000 12000 10000 8000 8000 6000 6000 4000 2000 4000 0 200 400 600 800 1000 0 time (sec) 200 400 600 time (sec) 9 800 1000 Cwnd 1200 1200 Cwnd flow 1 Cwnd flow 2 1000 1000 800 800 Cwnd (packets) Cwnd (packets) Cwnd flow 1 Cwnd flow 2 600 600 400 400 200 200 0 0 0 200 400 600 800 1000 0 time (sec) 200 400 600 time (sec) 10 800 1000 Practical Stuff • Other issues at play, such as implementation quality. • For example ACK processing and queueing problems. • Testing is important: land speed records. • Project with OSDL to build validation suite. 11 Before 25000 4.5e+09 snd_cwnd snd_nxt throttle slow path 4e+09 20000 1000 3.5e+09 2.5e+09 2e+09 10000 Distribution 3e+09 15000 Bytes (sequence) Segments (cwnd, ssthresh) 10000 100 10 1.5e+09 1e+09 5000 1 5e+08 0 0 100 200 300 400 500 600 0 700 0.1 0.1 time (seconds) 1 10 100 microsecond 12 1000 10000 After 10000 slow path 4e+09 1000 3.5e+09 3e+09 15000 2.5e+09 2e+09 10000 Distribution 20000 Segments (cwnd, ssthresh) 4.5e+09 snd_nxt snd_una snd_cwnd snd_ssthresh qlen throttle Bytes (sequence) 25000 100 10 1.5e+09 1e+09 5000 1 5e+08 0 0 100 200 300 400 500 600 0 700 0.1 0.1 time (seconds) 1 10 100 microsecond 13 1000 10000 802.11(e) MAC Summary • After TX choose rand(0, CW − 1). • Wait until medium idle for DIFS(50µs), • While idle count down in slots (20µs). • TX when counter gets to 0, ACK after SIFS (10µs). • If ACK then CW = CWmin else CW∗ = 2. Ideally produces even distribution of packet TX. In 11e have multiple queues. Each has own CWmin , DIFS(aka AIFS) and can have TXOP. 14 Testbed setup Multiple STA (Linux) connected to AP (Linux hostap). Hardware model 1× AP Desktop PC 18× STA Soekris 1× STA Desktop PC WLAN NIC Atheros AR5212 External antenna, PCI interface, Madwifi driver with local patches for 11e parameter setting. 15 16 Validation Relative Throughput against TXOP parameter Relative Throughput against AIFS parameter 5.5 30 Analytical Model Experimental Results 5 25 4 Relative Throughput Relative Throughput 4.5 3.5 3 2.5 20 15 10 2 5 1.5 1 0 1000 2000 3000 4000 TXOP in unit of us 5000 6000 0 7000 Relative Throughput against CWmin Analytical Model Experimental Results 16 Relative Throughput 14 12 10 8 6 4 2 0 −3 −2 −1 0 1 2 3 CWmin Shift (0 corresponds to the default CWmin 32) 5 10 15 AIFS in unit of 20us timeslot 20 25 Measure relative performance of two saturated flows while varying TXOP, AIFS and CWmin . Compare to well-known models. 20 18 0 4 17 Before 12 TCP Uploads to AP for 180 sec 6 Uploads and 6 Downloads for 180 sec 1200 1200 1000 800 Throughput(kbits/sec) Throughput(kbits/sec) 1000 600 400 200 0 800 600 400 200 1 2 3 4 5 6 7 8 STA id 9 10 11 0 12 18 1 2 3 4 5 6 7 8 STA id 9 10 11 12 After 12 TCP Uploads to AP for 180 sec 6 Uploads and 6 Downloads 600 700 600 500 Throughput(kbits/sec) Throughput(kbits/sec) 500 400 300 200 400 300 200 100 0 100 1 2 3 4 5 6 7 8 STA id 9 10 11 0 12 19 1 2 3 4 5 6 7 8 STA id 9 10 11 12 Unprioritised Voice Throughput Delay 70 30000 60 Average Delay (seconds x 10-6) 25000 50 Throughput (kbps) Unpriotised Station Delay bound for queue stability 40 30 20 20000 15000 10000 5000 10 Unpriotised Station Ideal Throughput 90% Throughput 0 0 2 4 6 0 8 10 12 14 16 18 Number of competing stations 0 2 4 6 8 10 12 Number of competing stations 20 14 16 18 Measuring Delay • Want to measure one-way MAC delay. • NTP slow and insufficiently accurate. • Simultaneously observable TX better, largish noise. 0.1 station 1 station 2 station 3 station 4 Offset (s) 0.05 0 -0.05 -0.1 0 500 1000 1500 Time (s) 21 2000 2500 Delay Technique • Transmission not complete until MAC ACK. • Hardware supports interrupt after ACK. Interface TX Queue 1. Driver notes enqueue time. Driver 4. Driver notes completion time. Driver TX Discriptor Queue 2. Hardware contends until ACK received Packet transmitted Hardware ACK received 22 3. Hardware interrupts driver. Validation 1600 Median delay (seconds x 10-6) 1400 1200 1000 800 600 400 Long preamble Fit line: 11.0012Mbps x + 484.628us Short preamble Fit line: 11.0016Mbps x + 292.565us 200 0 0 200 400 600 800 1000 1200 Packet size (bytes, excluding headers) 23 1400 0.025 0.16 0.14 0.02 Fraction of total packets Fraction of Packets 0.12 0.015 0.01 0.1 0.08 0.06 0.04 0.005 0.02 0 900 1000 1100 1200 1300 1400 1500 1600 0 900 0.025 0.25 0.02 0.2 Fraction of Packets Fraction of total packets Delay (seconds x 10-6) 0.015 0.01 0.005 1000 1100 1200 1300 Delay (seconds x 10-6) 1400 1500 1600 0.15 0.1 0.05 0 900 1000 1100 1200 1300 1400 1500 1600 1700 0 800 1800 Delay (seconds x 10-6) 900 1000 1100 1200 1300 Delay (seconds x 10-6) 24 1400 1500 1600 AIFS Impact Throughput Delay 70 30000 60 Average Delay (seconds x 10-6) 25000 50 Throughput (kbps) Unpriotised Station Competitors AIFS = 4 Competitors AIFS = 6 Delay bound for queue stability 40 30 20 Unpriotised Station Competitors AIFS = 4 Competitors AIFS = 6 Ideal Throughput 90% Throughput 10 0 0 2 4 6 20000 15000 10000 5000 0 8 10 12 14 16 18 Number of competing stations 0 2 4 6 8 10 12 Number of competing stations 25 14 16 18