Assignment 1: Network Emulation of TCP CUBIC and SACK (Draft #1) Naeem Khademi Networks and Distributed Systems Group Department of Informatics University of Oslo, Norway naeemk@ifi.uio.no ABSTRACT The goal of this report is to experimentally evaluate the performance of two TCP congestion control mechanisms, namely SACK and CUBIC, as implemented in TCP suite in Linux kernel by using network emulation techniques. The performance of these TCP variants have been evaluated using real-life measurements under different parameters settings which are varying bottleneck buffer size and concurrent download flows. Keywords TCP, Measurement, Network emulation 1. INTRODUCTION TCP has been the dominant reliable data transfer protocol in Internet over a decade and expected to remain so in the future. One of the most challening issues among academicians has been to maximize the TCP’s performance (e.g. goodput) under different scenarios in Internet. Several congestion control mechanisms are proposed to achieve this goal with CUBIC being currently the default congestion control mechanism in Linux kernel. However, TCP suite implemented in Linux kernel provides the functionality to select the congestion control mechanisms other than default and also provides the source code of them openly. This gives the oppurtunity to the researchers to use the passive measurements to evaluate the performance of these mechanisms under varying conditions and to enhance/modify the existing congestion control mechanisms to optimize the performance. Buffer size plays an important rule in determining the TCP’s performance. The common assumption about the buffer sizing requirement for a single TCP flow follows the rule-ofthumb which identifies the required buffer size at the bottleneck as the bandwidth×delay product of the flow’s path. This amount of buffer size is associated to the sawtooth behavior of TCP congestion window (cwnd ) in NewReno or SACK. TCP designed in a way to saturate the maximum available bandwidth over a long-term period and it eventually fills up the total buffer size provided along the path. Larger buffer sizes introduce higher queueing delays and therefore are not favorable for delay-sensitive applications and affect the TCP’s round-trip time (RTT). It has been shown that, under a realistic traffic scenario, such as when multiple TCP flows coexist along the path between sender and receiver, desyncronized TCP flows’ cwnd clash out each other and provide almost a flat-rate aggregate cwnd value, easing the buffer sizing requirement to a smaller value than bandwidth×delay product. The research works have shown that, this value can be correlated to the square root of the number of flows. This phonomenon facilitates the development of core routers at the Internet backbone with smaller buffer sizes providing almost the same utilization level while requiring few numbers of RAM chipsets, lowering the products’ cost. In this paper, we study the impact of different buffer sizes at the bottleneck router (here, a network emulator node) jointly with various number of coexisting flows between a sender and a receiver, for both TCP CUBIC and SACK. We study how these scenario settings affect the TCP’s throughput and RTT and compare the behavior of CUBIC and SACK in this aspects. The rest of this paper is organized as follows: Section 2 presents the experimental setup and methodology used in the measurements. Section 3 demonstrates the measurement results and evaluates the performance of TCP CUBIC and SACK under various parameter settings and finally Section 4 justifies the provided results and concludes the paper. 2. EXPERIMENTAL METHODOLOGY This section describes the experimental setup and methodology used in this paper. To passively measure the performance of different TCP variants, we setup a testbed using three computers acting as sender, receiver and an emulator/router node in between. Each of the TCP peers are physically conected to the emulator node using a 100Mbps wired link (however we have used an emulation tool to limit the available bandwidth to 10Mbps only). Figure 1 is a schematic diagrams of the network topology used in the tests. We have used emulab to setup this testbed and conduct our experiments. emulab in an open network testbed at the University of Utah which provides a platform for researchers to conduct real-life network experiments (Figure 2). The specifications of this test-bed is presented in 5e+06 Emulator node (delay=100ms) 10 Mbps node A (sender) 10 Mbps Figure 1: Network topology Aggregate Throughput (b/s) node B (receiver) 4e+06 3e+06 2e+06 1e+06 CUBIC SACK 0 1 2 5 10 Number of Parallel Flows (buffer size=20pkts) 20 Figure 3: Aggregate throughput of parallel flows Figure 2: Emulab testbed Table 1. The PCs employed for the measurements are located in the clusters of concentrated nodes stacked in network lab environments. Table 1: Test-bed setup Test-bed Emulab PC Pentium III 600 MHz Memory 256 MB Operating System Fedora Core 4 Linux kernel 2.6.18.6 Default Interface Queue Size 1000 pkts Node numbers 3 Each experiment was repeated for 10 runs, each of which lasting for 50 seconds with 20 seconds time gap between each of the two experiments. The presented results are the averages over all runs. The iperf traffic generation tool was used to measure TCP traffic during each test. TCP’s send and receive buffers were set large enough (256 KB) to provide the opportunity for cwnd to grow with no restriction. The Maximum Segment Size (MSS) was set to 1448 bytes (MTU=1500 bytes). Two performance metrics are measured: 1) TCP throughput. 2) Round-Trip Time (RTT). To gather the traffic statistics we have used tcpdump to monitor the traffic traversed along the path between sender and receiver. To calculate the throughput and the average RTT, we employed tcptrace, a tool which parses the dumped traffic files in pcap format and gives various statistics about the TCP flows. Each experiment consists of 50 seconds iperf TCP traffic from node A to node B. The number of coexisting flows between these nodes varies from 1 to 20. The network emulator’s role (the intermediate node) is to incur a fixed amount of delay for each arriving TCP packet (100 ms) and to limit the maximum available bandwidth on the link to 10 Mbps. In addition, the emulator buffers the arriving TCP packets in both its ingress and egress buffers with them being in equal sizes (ranging from 20 to 100 packets). To provide a fine-grained distribution of the serviced packets by the emulator node among different TCP flows, we have set the maximum burst size of a TCP flow at the emulator node to be as 10 consecutive packets only. We are able to change the TCP congestion control mechanism in TCP Linux suite using the /proc file system or sysctl command. 3. MEASUREMENT RESULTS This section demonstrates the performance measurement results of TCP CUBIC and SACK for different buffer sizes and various number of parallel flows. 3.1 The Impact of Number of Parallel TCP Flows Figure 3 shows the impact of number of coexisting flows on the overall TCP performance when the buffer size at the bottleneck node is 20 packets. We can observe that as the number of concurrent TCP flows grow from 1 to 20, the total aggregate throughput of system increases from 2.5 and 2.9 Mbps to 4.5 and 4.9 Mbps for SACK and CUBIC respestively with CUBIC achieveing a slightly higher TCP throughput than SACK. This almost two-fold increase in throughput is due to the fact that as the number of coexisting flows increases, there flows become more desynchronized over time provding a better utilization level than a single flow scenario. In this scenario, CUBIC’s performance stands at a higher level than SACK due to the aggresive nature of CUBIC’s cwnd growth (as the cubic function of the time elapsed from the last congestion event). Figure 4 demonstrates the impact of various number of coexisting TCP flows on the average RTT of all flows. TCP flows’ average RTT remains almost fixed at a minimal level for various number of parallel flows, when bottleneck’s buffer size is set to the small values of 20 packets for both CUBIC and SACK. This is because small buffer sizes incur very little queuing delays and packet drop events become more frequent instead. CUBIC-buf=20pkts SACK-buf=20pkts CUBIC-buf=50pkts SACK-buf=50pkts CUBIC-buf=100pkts SACK-buf=100pkts CUBIC, 1 flow SACK, 1 flow CUBIC, 2 flows SACK, 2 flows CUBIC, 5 flows 200 SACK, 5 flows CUBIC, 10 flows SACK, 10 flows CUBIC, 20 flows SACK, 20 flows 200 180 RTT (ms) RTT (ms) 180 160 140 120 140 120 100 1 2 5 Number of Parallel Flows 10 100 20 20 Figure 4: RTT vs. number of parallel flows 1.1e+07 1e+07 9e+06 8e+06 7e+06 CUBIC, 1 flow SACK, 1 flow CUBIC, 2 flows SACK, 2 flows CUBIC, 5 flows SACK, 5 flows CUBIC, 10 flows SACK, 10 flows CUBIC, 20 flows SACK, 20 flows 6e+06 5e+06 4e+06 3e+06 2e+06 1e+06 0 20 50 100 Buffer Size (packets) 50 100 Buffer Size (packets) 200 Figure 6: RTT vs. Buffer Size 1.2e+07 Aggregate Throughput (b/s) 160 200 In this paper, we evaluated the performance of two congestion control mechanisms, namely CUBIC and SACK, using passive measurements and emulation techniques and under various scenarios. The impact of buffer sizing at the bottleneck on the throughput and RTT of these TCP variants has been studied. Forthermore, the impact of number of coexisting TCP flows has been studied as well. While higher number of parallel flows leads to the increase of aggregate throughput for both CUBIC and SACK (with CUBIC performing slightly better most of the times), it also increases the average RTT of CUBIC compared to SACK. On the other hand, considering various buffer sizes, an increase in buffer size leads to the increase of throughput and RTT for both SACK and CUBIC. However, TCP throught remains constact after a certain threshold while increasing in queuing delay (and therfore RTT), in this case, CUBIC is being more exposed to the increase in RTT. Figure 5: Throughput vs. Buffer Size 5. However, the difference between CUBIC and SACK’s RTT becomes more significant when the buffer size grows with CUBIC’s RTT being more than SACK under various number of parallel flows. This returns to the aggresive behavior of CUBIC after a loss event, most probably caused by a buffer overflow, which leads to the higher number of TCP packets being at the bottleneck buffer at any instant of time and therefore increasing the queuing delay. In contrast, SACK performs conservatively by halving the cwnd size (similar to NewReno), therefore having less number of packets traversing along the path, and draining the bottleneck buffer after a loss event, and consequently smaller queing delay values. 3.2 The Impact of Buffer Sizing Figure 5 shows the impact of buffer sizing on the aggregate TCP throughput of the system (This graph will be explained in the next draft/presentation). Figure 6 demonstrates the average RTT of TCP flows for varying buffer sizes (This graph will be explained in the next draft/presentation). 4. CONCLUSIVE REMARKS REMARKS This draft will be modified accordingly to include more results/figures. We are already continuing the experiments for larger than 100 packets buffer size values. Each test is repeated for 10 runs in emulab and the final average results will be presented with confidence intervals of 95%.