Loss Synchronization of TCP Connections at a Shared Bottleneck Link Chakchai So-In cs5@cec.wustl.edu May 9th, 2006 Department of Computer Science and Engineering Campus Box 1045 Washington University One Brookings Drive St. Louis, MO 63130-4899 Loss Synchronization of TCP Connections at a Shared Bottleneck Link Chakchai So-In Department of Computer Science and Engineering Washington University in St. Louis cs5@cec.wustl.edu May 9th, 2006 Abstract In this project, we study the degree of loss synchronization between TCP connections sharing a bottleneck network link, i.e., the fraction of the connections that lose a packet during a congestion instance. Loss synchronization determines the variability of the total TCP load on the link and therefore affects the loss, delay, and throughput characteristics of network applications. We conduct our investigation in the Open Network Laboratory (ONL) [Turner et al., 2005], a network testbed maintained by the Applied Research Laboratory at Washington University in St. Louis. To measure loss synchronization, we develop an ONL plugin that collects packet-level statistics (arrival time, size, flow affiliation, sequence number) at a router and communicates the recorded data to an end host for generating a log file. The statistics plugin and log-generating software can also be used in other ONL studies that require knowledge of packet-level dynamics at routers. Control parameters in our investigation of loss synchronization include the types and intensity of traffic, network topology, link capacities, propagation delays, and router buffer sizes. We report the loss rates, queue sizes, degree of TCP fairness, and link utilizations observed in our experiments. 1. Introduction Apart from keeping the loss rate low, the cost of using transmission media makes high usage of the bottleneck link the main goal of any congestion control. Many methodologies have been proposed to solve this problem, such as modifying router functionalities or tuning transmission protocol parameters. One practical solution is to increase the size of the router buffer; however, unlimited router buffer sizes are not feasible. Also, increasing router buffer sizes tends to increase queuing delays, so we still face the problem of defining optimum router buffer sizes. Given the limits on router buffer sizes, the next issue is to optimize a queue scheduling discipline for when a packet must be dropped at the router. The First In First Out or Drop Tail Queue is one of three traditional queue scheduling algorithms implemented to drop packets when a router queue overflows. This simple technique is widely used in the Internet; however, customarily it introduces total synchronization when packets are dropped from several connections within a congestion instance. A second technique, Random Drop Queue, randomly chooses a packet from the queue to drop when packets arrive and the queue is full. Another queuing technique is the Random Early Drop Queue. If the queue length exceeds a drop level, then the router drops each arriving packet with a fixed drop probability. The big advantage of this technique is that it can reduce total synchronization. 2 TCP flows such as those generated by HTTP, DNS, and FTP applications, which consume more than 80 percent of network traffic [NLANR, 1996], have their own problems with total synchronization. TCP sends as many packets as possible until it detects a lost packet, at which point it reduces the transmission rate. Like many congestion control protocols, TCP uses packet loss as an indication of congestion. If losses are synchronized, TCP flows which identical RTT sharing a bottleneck receive the loss indications at around the same time and decrease the transmission rates at around the same time. This leads to underutilization of the output link (Figure 1b). As a result, if loss synchronization can be prevented, then the bandwidth can be used more efficiently (Figure 1b). (a) Synchronization of TCP behavior (b) De-synchronization of TCP behavior Figure 1: TCP synchronization and de-synchronization behavior In this project, we investigate the loss synchronization of TCP connections in the ONL (Open Network Laboratory) testbed, emulating a single-bottleneck dumbbell topology. Compared to the Internet, the ONL network provides a more controlled environment using dedicated routers, transmission lines, and computer hosts. Parameters examined in our evaluation include traffic load, link delays, and router buffer sizes. We also report loss rates, link utilizations, and degree of TCP fairness. In section 2, the simulation scenario and TCP parameters are described. We also explain the “rstat555” plugin design, features, and setup in this section. Section 3 shows the results of each experiment, where the main parameter is the number of TCP flows. Section 4 shows other experimental results when the number of flows is set to 50 connections, but the buffer sizes, delays, and bottleneck link bandwidths are different. Several recently proposed router buffer sizing models [Appenzeller et al., 2004], [Dhamdhere et al., 2003], and [Le et al., 2003] are evaluated in section 5, and finally we draw conclusions and recommend a suitable model for sizing router buffers. 2. Simulation Methodology 2.1 Simulation Parameters We perform the experiments on the ONL testbed. In general, N TCP flows (S1…SN) are transmitted through a shared bottleneck link with capacity L to receivers (R1…RN) as shown in Figure 2.1a. Figure 2.1b shows the ONL setup that simulates the Figure 2.1a topology. 3 (a) Simulation topology (b) ONL network setup Figure 2.1: Simulation Setup In our experiments, we use iperf [Tirumala et al., 2003] as a TCP traffic generator running on top of Linux TCP [Srolahti and Kuznetsov, 2002] because we believe that the popularity and availability of the Linux operating system; however, we disable SACK and timestamp functionalities. Generally, iperf generates many long-lived TCP flows for a fixed period of time. The FIFO queue scheduling discipline is configured at the router. Table 2.1 shows other parameters. Because the experiments are based on a real network, there are the hardware limitations. For example, the start time of each flow is not exactly the same. To simulate the topology in Figure 2.1a, Figure 2.1b also shows the loop back design in which a number of flows come from a single host (n1p3) with multiple sessions, and are forwarded to a receiver (n1p2) within the same sender session. Figure 2.1b shows TCP flows are transmitted from n1p3 to n1p2, but instead of forwarding directly from port 3 to port 2, we make the router forward all packets from port 3 to port 5. Due to the loop back design, all packets would be forwarded to port 6, and finally, the routing is set to send them directly to port 2 (n1p2). Acknowledgement packets are transmitted in the opposite way. Table 2.1: Simulation parameters Packet size 576 bytes (include 40 bytes for IP header and TCP header) TCP flows between 5 and 400 flows Maximum window 64 kilobytes Propagation delay of (L) between 5 and 250 milliseconds Bandwidth of L between 1 and 50 megabits per second Bandwidth of ∑S1…SN and ∑R1…RN 600 megabits per second Router buffer sizes between 3,125 and 50,000 bytes Simulation length 20 seconds In this project, first we observe the degree of TCP loss synchronization by varying the number of source flows from 5 to 400 with fixed delay, buffer size, and bottleneck link bandwidth. We conduct three more experiments: in one, the buffer size is less than BW*DELAY [Villamizar et al., 1997], in the second, it is equal, and in the third it is greater. Next, with 50 TCP flows, we vary the delays, buffer sizes, and bottleneck link bandwidths. Finally, using recently recommended router buffer sizes, we run another experiment in order to observe the loss rates, link utilizations, and degree of TCP fairness. 4 2.2 Plugin To measure TCP loss synchronization, we developed an ONL plugin that collects packet-level statistics. This section briefly describes our plugin design, plugin features, and plugin configuration for gathering the packets from the ONL router to the end host. 2.2.1 Plugin Design The statistical plugin (rstat555) is designed to gather flow statistics on the packet level at the ONL router, and send them back to an end host. The rstat555 is loaded on a Smart PortCard (SPC) embedded to the ONL routers (Figure 2.2). Timestamps, source and destination addresses, source and destination ports, packet size, sequence and acknowledgement numbers, and queue size are examples of the information items. Since there is a limitation on buffer storage and performance, the necessary information is forwarded for storage at the end system in order to do offline analysis. Figure 2.2: ONL gigabit router architecture [Turner et al., 2005] The plugin is bundled in the rstat-555-v1.tar package. The rstat.c and rstat.h are the main components, and a developer can modify these files to add extra functionality. Generally, this statistical plugin makes a UDP packet, attaches the first N bytes of a received packet (which is counted from at the beginning of an IP header) and some extra information, and then forwards it to the end host. Developers can implement their own logging demons in order to receive all packets. However, in order to do logging and analysis in this project, we develop simple logging and tracing tools. The daemons run at the end host: they are the net collector (udp-echo-logger) and net analyzer (udp-echo-trace) which are bundled in the udp-echo-tool-ver1.tar package. The net collector gathers all messages from the routers and stores them in raw format, and then the raw file can be converted to ASCII format by the net analyzer. Table 2.2 shows net collector and net analyzer usages. A trace file example is shown in Table 2.3. 5 • Table 2.2: Net collector and net analyzer usages udp-echo-logger : Log all packets in binary format Usage: udp-echo-logger [-P Listening Port] [-x First N Bytes] [-F LogFile] -N : 56 or 44 {TCP(56) = 16(APP) + 20(IP) + 20(TCP), UDP(44) = 16(APP) + IP(20) + UDP(8)} E.g. udp-echo-logger -P 8000 -x 56 -F i8000binary Note: Saving all packets to a local disk is strongly recommended to avoid packets lost due to the bottleneck of NFS (disk transfer speed). "tcpdump -w <file>" can be used to check if a kernel lost any packets. • udp-echo-trace : Convert binary to ASCII format Usage: udp-echo-trace [-P Protocol] [-N First N Bytes] [-f EnableFlowID] [-t Time & seq rounding] [-I Input File] [-O Output File] -P : 2 (TCP), 3 (UDP) -f : 1 (Enable FlowID trace), 0 (Disable FlowID trace) -t : 1 (Rounding up packet sequence and time to start at 0), 0 (Normal operation) -N : 56 or 44 {TCP (56); 16 APP + 20 IP + 20 TCP, UDP (28); 16 APP + 20 IP + 8 UDP} E.g. udp-echo-trace -P 2 -N 56 -f 0 -t 1 -I i8000binary -O i8000ascii For UDP packets, sequence number, acknowledgement number, and TCP window are set to 0. Table 2.3: Trace file example (Packet ID, Timestamps (msec), QID, Qlength (bytes), Packet size (bytes), Source IP, Destination IP, Protocol ID, IP identification, IP offset, Source port, Destination port, Packet sequence number, Acknowledgement number, TCP windows) 10 22 256 1152 576 192.168.1.64 192.168.1.48 6 7252 16384 32782 8000 256079826 251729665 5840 2.2.2 Plugin Features The rstat555 supports up to eight control messages, shown below: • "1" : Return ICMP, TCP, UDP, and total number of packets to RLI, then clear all counters. • “2" : Return ICMP, TCP, UDP, and total number of packets to RLI. • "3" : Change log server port number from 1025 to 65534 (The default port number is 8000). #define LOG_PORT 8000 • "4" : Change log IP server. The IP format is <0-255> <0-255> <0-255> <0-255>. Note: The delimiter is one space, e.g., 192 168 1 128 #define LOG_IP 0xc0a80180 • "5" : Change QID number, to observe the specific queue during transmission. Note: An extra GM filter with a specific queue may need to be configured. #define LOG_QID 256 • "6" : Specify the first N number of bytes from the beginning of IP packets. E.g., 20 + 20 = 40 for IP and TCP headers (no option) 20 + 8 = 28 for IP and UDP headers (no option) 6 Note: While either 40 or 28 bytes are specified, an application header (16 bytes) containing "packet index, current-time, qid, and qlength (4 bytes each)" is inserted before the first N bytes, and then sent back, which makes the total packet size either 56 or 44 bytes. • "9" : Enable or disable logging functionality, e.g., 0 = disable and 1 = enable. • "10": Return errors due to system call in five tuples: “Interrupt, Clock, Qlength, Buffer Allocation, and Packet Forwarding” 2.2.3 Plugin Configuration The main purpose of the statistical plugin is to collect flow statistics at the router and send them back to the end host. However, in order to gather all flows, a special general/exact filter is used to match all flows. Then the router makes duplicate packets and sends them to the Smart Port Card (SPC). The ONL auxiliary function makes a copy of each flow. Currently, SPC can support a total bandwidth up to 200 Mbps. Table 2.4 shows how the rstat555 plugin is configured. • • • • Table 2.4: How to set up the rstat555 plugin Add plugin directory to RLI (e.g., /users/chakchai/myplugins/). Add rstat555 instance to RLI, and bind the plugin ID to SPC ID (usually 8). Create general match filter and bind spc id to the plugin (usually 8). Note: Choose "aux" option to make duplicate packets to send to rstat555 plugin. Configure logging IP and PORT by sending a message to the plugin (code 3 and 4). Note: We specify the address and network prefix in both source and destination addresses to 192.168.0.0/16 in the ONL private network. At the router, the UDP source address is specified to be 172.16.0.1 to avoid the IP conflict; however, the source IP can be changed in rstat.h (HEX format). E.g., #define LOG_SOURCE 0xac100001 2.3 Experimental Setup The ONL network is designed to support offline analysis. Apart from the statistical plugin, in this project we need a delay plugin in order to vary RTT. Figure 2.3 shows how we configure the delay plugin. The delay plugin is installed at egress port 3 to delay the acknowledgement packets (Figure 2.1b). A general match filter is set to match all packets and send them directly to the delay plugin. After delaying, the plugin forwards these packets to the output link. To find out which packets are dropped, we design the loop back network to collect all packets before and after entering the specific queue (FIFO). 7 Figure 2.3: The pdelay plugin configuration at egress port 3 To gather the statistics from the router, a general match filter (GM) is set at both egress port 5 and ingress port 6. Figure 2.4 shows the GM filter for collecting all incoming packets, where the source and destination IP are 192.168.0.0/16 (ONL private network). The first rule is set to make all packets go through a single egress FIFO queue 256 (qid 256). The second rule is set to make duplicate incoming packets and forward them to the rstat555 plugin at “qid 136 and spc qid 8”. The plugin sends all packets back to the end host (n1p4) as shown in Figure 2.1b. Figure 2.5 shows the GM filter configuration for collecting the outgoing packets from egress port 6 to ingress port 5. This filter is configured to make duplicate packets and forward those to the rstat555 plugin to send them to the end host (n1p7). According to our observations, in order to mitigate the loss of logging packets, the net collector should be run on a dedicated disk or local disk (not NFS). 8 Figure 2.4: General match filter at egress port 5 Figure 2.5: General match filter at ingress port 6 9 3. TCP behavior for different flows with the fixed propagation delay, router buffer size, and bottleneck link bandwidth We conduct three experiments to observe whether the number of flows affects the degree of TCP loss synchronization (TCP flows range between 5 flows and 400 flows). We run each experiment for 20 sec. We set RTT to 10 msec and bottleneck link bandwidth to 10 Mbps for all three experiments. In the first experiment, the buffer size is set to 3,125 bytes, which equals to BW*RTT/4. In the second, the buffer size is set to BW*RTT (12,500 bytes) and finally, the buffer size is set to 50,000 bytes (BW*RTT*4). The fraction of affected flows and the cumulative fraction of congestion events are the main metrics to be investigated for the degree of TCP loss synchronization. For a congestion event, the fraction of affected flows equals to the number of affected flows divided by the total number of flows. The cumulative fraction of congestion events corresponding to fraction x of affected flows is the fraction of congestion events where the fraction of affected flows is at most x. Table 3.1 shows an example of how to calculate both values with 10 total flows and 5 congestion events. Table 3.1: Fraction of affected flows and the cumulative fraction of congestion events calculation 2nd 3rd 4th 5th Congestion event 1st Number of affected flows 8 5 2 2 5 within a congestion event 1. Fraction of affected flows 8/10 5/10 2/10 2/10 5/10 2. Sort (1) 2/10 2/10 5/10 5/10 8/10 3. Cumulative fraction of 2/5 4/5 1 congestion events Figure 3.1 shows the degree of TCP loss synchronization on fixed buffer sizes with different TCP flows. Figures 3.3 and 3.4 show the maximum and average number of TCP flow synchronization. The router queues are illustrated in Figure 3.4. The loss rates are shown in Figure 3.5. Figure 3.6 shows the link utilizations. Finally, the degree of TCP fairness is plotted in Figure 3.7. Other observations from the experiments are listed below. • From Figure 3.1a (with BW*RTT buffer size), in general, when the number of flows is increased, most of them desynchronize except for five flows which are 18% totally synchronized. Other than that, there is no total synchronization at all (from 10 flows to 400 flows). As found in [Appenzeller et al., 2004], small numbers of flows tends to be all synchronized, but not large aggregates of flows. However, we also believe that synchronization depends not only on the number of flows but also on the buffer sizes. Figures 3.1b and 3.1c show that with increasing buffer sizes, the percent of totally synchronized flows also increases. For example, for five flows the percentages were 2%, 18%, and 42% for 3,125, 12,500, and 50,000 byte buffer sizes. • For flows greater than 50, generally the fraction of affected flows is less then 0.2, 0.4, and 0.6 for 3,125, 12,500, and 50,000 byte buffer sizes, respectively. Figure 3.2a (BW*RTT buffer size) also shows that within a congestion event, in fact, at most 16, 21, 27, and 33 flows suffer loss at the same time for 50, 100, 200, 400 flows, respectively. Thus, the degree of TCP loss synchronization decreases exponentially, as shown in Figure 3.2b. • Also the results shown in Figures 3.3a and 3.3b (the average number of TCP flow synchronization vs. number of TCP flows) follow the same patterns as the results from Figure 3.2a and 3.2b. 10 • Figure 3.4 shows the router queue characteristics for different numbers of flows. In general, there is no total synchronization among flows. It also shows that the output link is almost always full, since the queue has never been empty (except for the small buffer size of 3,125 bytes, which often goes empty). Also, increasing the number of flows smoothes out the queue fluctuation at the router. • Figure 3.5 shows the relationship between the loss rates and the number of flows. Generally the loss rate keeps increasing dramatically when increasing the number of flows, as Robert Morris found [Morris, 1997]. However, with increases in buffer sizes, the loss rates decrease accordingly. • The relationship between the link utilizations and the number of flows is shown in Figure 3.6. Commonly, with enough buffer size (more than or equal to BW*RTT in this experiment), the link utilization increases slightly because the link is already utilized; however, with the small buffer size the lesser number of flows makes the link underutilized. • Figure 3.7 shows the degree of TCP fairness among flows (total bandwidth). For 5 and 10 flows, increasing the number of flows starts to slightly decrease the degree of TCP fairness (With few flows, each flow performs quite fairly). Beyond 50 flows, it is very obvious that TCP flows are unfair (the degree of fairness is less than 0.2). Moreover, roughly we can observe that increasing the buffer size affects the fairness index, but it is unpredictable. (a) Buffer size at 12,500 bytes Figure 3.1: Cumulative fraction of congestion events of 10 msec RTT. The notations 5, 10, 50, 100, 200, and 400 represent the number of flows from 5 to 400 TCP flows. 11 (b) Buffer size at 3,125 bytes (c) Buffer size at 50,000 bytes Figure 3.1: Cumulative fraction of congestion events of 10 msec RTT. The notations 5, 10, 50, 100, 200, and 400 represent the number of flows from 5 to 400 TCP flows (continued). 12 (a) Maximum number of TCP flow synchronization (b) Percentage of maximum number of TCP flow synchronization Figure 3.2: Maximum TCP flow synchronization and number of flows with 10 msec RTT. The notations 3125B, 12500B, and 50000B represent the router buffer size at 3,125, 12,500, and 50,000 bytes. (a) Average number of TCP flow synchronization (b) Percentage of average number of TCP flow synchronization Figure 3.3: Average TCP flow synchronization and number of flows with 10 msec RTT. The notations 3125B, 12500B, and 50000B represent the router buffer size at 3,125, 12,500, and 50,000 bytes. 13 (a) 5 flows (b) 10 flows (c) 50 flows (d) 100 flows (e) 200 flows (f) 400 flows Figure 3.4: Router queues with 10 msec RTT and 3,125/12,500/50,000 byte buffer sizes for 5 to 400 TCP flows. In each graph, for (a) to (f), the top line represents a 50,000 byte buffer size, the middle line represents a 12,500 byte buffer size, and the bottom line represents a 3,125 byte buffer size. 14 Figure 3.5: Number of flows vs. loss rates with different buffer sizes Figure 3.6: Number of flows vs. link utilizations with buffer sizes Figure 3.7: Number of flows vs. fairness index with different buffer sizes. The notations 3125B, 12500B, and 50000B represent the router buffer size at 3,125, 12,500, and 50,000 bytes. 15 4. TCP behavior for 50 flows with different RTT, buffer sizes, and bottleneck link bandwidths From section 3, we conclude that the number of flows does affect the degree of TCP loss synchronization, in that increasing the number of flows reduces the degree of TCP loss synchronization. In this section, we explore whether the degree of TCP loss synchronization will be really affected by the different buffer sizes, RTT, and bottleneck link bandwidths. We run three more experiments, fixing the number of flows to 50 and varying the router buffer sizes, RTT, and bottleneck link bandwidths. Each experiment is run over 20 sec. First, with 10 msec RTT and 25 Mbps link bandwidth, the buffer sizes are varied from 3,125 to 50,000 bytes (Figures 4.1). Second, we vary the RTT from 5 to 250 msec with 62,500 byte buffer size and 25 Mbps bottleneck link bandwidth (Figures 4.2). Finally, with 10 msec RTT and 12,500 byte buffer size, we vary the bottleneck link bandwidths from 1 to 50 Mbps (Figures 4.3). Table 4.1 also shows the percentage of the average of affected flows. The loss rates, link utilizations, and TCP fairness with different buffer sizes are shown in Table 4.2. Table 4.3 illustrates these parameters when the RTT is different. With varying the bottleneck link bandwidths, Table 4.4 also shows these three metrics. Other observations from the experiments are listed below. • In general, Figure 4.1 shows that increasing the buffer sizes also increases the degree of TCP loss synchronization. • Figure 4.2 shows that increasing the delays clearly increases the degree of TCP loss synchronization. • Increasing or decreasing the bottleneck link bandwidths affects the degree of TCP loss synchronization, but unpredictably (Figure 4.3). Increasing the bottleneck link bandwidths can either increase or reduce the degree of TCP loss synchronization, and vice versa. The degree of TCP loss synchronization does not change noticeably. • Table 4.1 also shows increasing either buffer sizes or RTT increases the percentage of the average of affected flows but not for increasing bottleneck link bandwidths. • Tables 4.2, 4.3, and 4.4 show the packet loss rates with different buffer sizes, RTT, and bottleneck link bandwidths. Like the results from section 3, the greater the buffer sizes are, the smaller the loss rates. The longer the delays become, the smaller the loss rates. Also, the greater the bottleneck link bandwidths are, the smaller the loss rates. • The link utilizations are also shown in these three tables. With 50 TCP flows, when the buffer sizes are increased, the link utilizations slightly increase because links are already utilized. Increasing the bottleneck link bandwidths also increases the link utilizations. However, increasing RTT reduces the link utilizations. • Tables 4.1, 4.2 and 4.3 also show the degree of TCP fairness. In general, increasing either buffer sizes or bottleneck link bandwidths increases the degree of TCP fairness. But increasing RTT can either increase or reduce the degree of TCP fairness, and vice versa. 16 Figure 4.1: Cumulative fraction of congestion events of 10 msec RTT with different buffer sizes (The notations 3125B, 6250B, 12500B, 25000B, and 50000B represent the router buffer size at 3,125, 6,250, 12,500, 25,000 and 50,000 bytes.) for 50 TCP flows Figure 4.2: Cumulative fraction of congestion events of 62,500 byte buffer size with different RTT (5, 10, 50, 100, and 250 msec) for 50 TCP flows 17 Figure 4.3: Cumulative fraction of congestion events of 10 msec RTT and 12,500 byte buffer size with different bottleneck link bandwidths (1, 5, 10, 25, and 50 Mbps) for 50 TCP flows Table 4.1: Percentage of the average of affected flows (a) Percentage of the average of affected flows with different buffer sizes Buffer Sizes (bytes) 3,125 6,250 12,500 25,000 8 10 12 16 Percent of the average of affected flows (%) (b) Percentage of the average of affected flows with different RTT RTT (msec) 5 10 50 100 6 10 32 48 Percent of the average of affected flows (%) 50,000 20 250 62 (c) Percentage of the average of affected flows with different bottleneck link bandwidths Bandwidth (Mbps) 1 5 10 25 50 14 12 12 14 16 Percent of the average of affected flows (%) Table 4.2: Buffer sizes vs. loss rates, fairness, and link utilization of 50 TCP flows with 10 msec RTT and 25 Mbps bottleneck link bandwidth Buffer Sizes (bytes) 3,125 6,250 12,500 25,000 50,000 14.09 14.81 13.28 12.05 9.05 Loss Rate (%) 98.85 99.16 99.16 99.16 99.16 Link Utilization (%) 0.0088 0.1338 0.1554 0.1670 0.2235 Fairness Index 18 Table 4.3: RTT vs. loss rates, fairness, and link utilization of 50 TCP flows with 62,500 byte buffer size and 25 Mbps bottleneck link bandwidth RTT (msec) 5 10 50 100 250 8.85 8.08 5.59 3.48 2.98 Loss Rate (%) 99.21 99.16 98.74 96.01 83.74 Link Utilization (%) 0.3254 0.1959 0.5793 0.3988 0.2700 Fairness Index Table 4.4: Bottleneck link bandwidths vs. loss rates, fairness, and link utilization of 50 TCP flows with 10 msec RTT and 12,500 byte buffer size Bandwidth (Mbps) 1 5 10 25 50 21.98 14.83 12.28 9.60 6.83 Loss Rate (%) 97.48 98.59 99.16 99.47 99.05 Link Utilization (%) 0.0033 0.0037 0.0128 0.4208 0.4997 Fairness Index 5. Discussion We found in sections 3 and 4 that most of the time TCP flows desynchronize. Also, the extent of TCP synchronized losses depends mostly on the number of flows, router buffers sizes, and RTT. With an increased number of flows, we believe that the buffer sizes could be reduced substantially due to the lack of synchronization at the router. Thus, in this section, we run experiments based on router buffer sizes recently proposed: [Appenzeller et al., 2004], [Dhamdhere et al., 2005], and [Le et al., 2005]. We also compare these results to the original recommended buffer size results (bandwidth delay product) [Villamizar and Song, 1994]. We plot the loss rates, link utilizations, and degree of TCP fairness for each model and observe these buffer sizes applied to a real network. We fix the number of flows at 50 connections, the bottleneck link bandwidth to 25Mbps, and vary the total delays from 5 to 250 msec. In 1994, [Villamizar and Song, 1994] proposed a model for the bandwidth delay product (BW*RTT), where the delay refers to the RTT. They measured the link utilization of an approximately 40 Mbps bottleneck link bandwidth with 1, 4, and 8 long-lived TCP flows. They found that in order to guarantee full link utilization, the router needs a buffer at least as big as the product of delay and link capacity. A second model, the so called “Stanford model” [Appenzeller et al., 2004], not only minimized buffer size but also achieved link utilization of almost 100%. They ran tests on both simulated and real networks (the dedicated routers and hosts). They claim that the model works especially well for the core routers when the number of flows is quite large. Given N flows, the required buffer size is RTT*BW/√N. With one flow, the buffer size would be the same as BW*RTT. This result is based on the degree of flow desynchronization and independence. As a result, at 2.5Gbps, a link which carries 10,000 flows can reduce the router buffer size by 99%. A third model, the “Georgia Tech model” [Dhamdhere et al., 2005], based the recommended buffer size on the constraints of loss rate, queuing delay, and link utilization. Based on NS2 simulation, they claim that the buffer requirement should depend on not only the number of flows but also on both the degree of synchronization and the harmonic mean of RTT. We use this model to determine buffer sizing for congested links (BSCL). Table 5.1 shows the BSCL formula and the values that we use for our experiment. 19 Table 5.1: BSCI formula and parameters BSCL formula Our parameters B = max{Bq, Bp} Bq = {q(Nb)CeTe – 2MNb[1-q(Nb)] }/ [2-q(Nb)] Bp = KpNb – CeTe Nb = Number of LBP flows at target link Ce = Effective capacity for LBP flows Te = Effective RTT of LBP M = Maximum Segment Size Kp = 0.87/√loss rate q(Nb) = 1 – (1 – 1/Nb)L’Nb L’Nb ≈ αNb LBP = Locally Bottlenecked Persistent Nb = Long lived TCP flows (50 flows) Ce =Bottleneck Link (25Mbps) RTT (between 5 and 250 msec) 576 Bytes (include IP and TCP headers) Loss rate is set to 10% α (loss synchronization factor) = 0.5 as the paper recommends. [Le et al., 2005] presented a revised model of router buffer sizing. They found that the Stanford model tends to underestimate the router buffer size. On the other hand, the Georgia Tech model and the bandwidth delay product model overestimate the router buffer size. Based on their analysis of the Georgia Tech model, they claim that the recommended buffer size should be 1.8*N – C*T (the UNC model) where N is the number of flows, C is the link capacity, and T is the median RTT. Table 5.2 shows the recommended buffer sizes for each model with the number of flows set to 50 connections. We also use another loss rate model [Morris, 2000] (Table 5.3) which establishes the relationship between the number of flows and router queue size as, “loss rate = 0.76N2/S2”, where N is the total flows and S is the buffer size plus the in-flight packet size. Figures 5.1, 5.2, and 5.3 show the loss rates, link utilizations, and TCP fairness index for each model. Flow 50 Table 5.2: Simulation parameters for each model RTT (sec) BW (Mbps) BW*RTT Stanford Georgia (bytes) (bytes) (bytes) 0.005 25 15,625 2,210 63,609 0.01 25 31,250 4,419 47,984 0.05 25 156,250 22,097 33,482 0.1 25 312,500 44,194 16,963 0.25 25 781,250 110,485 171,525 UNC (bytes) 36,215 20,590 -104,410 -260,660 -729,410 Table 5.3: Percent loss rates of four models by Robert Morris loss rate prediction Delays (msec) BW*RTT (%) Stanford (%) Georgia (%) UNC (%) 5 64.5503 198.1766 10.0410 23.4568 10 16.1376 49.5469 10.0410 23.4568 50 0.6455 1.9818 1.7511 N/A 100 0.1614 0.4955 0.5807 N/A 250 0.0258 0.0793 0.0694 N/A One interesting observation from Table 5.2 is that the UNC model can not predict the buffer size value if the RTT is quite large (more than 50 msec RTT causes negative values). Table 5.3 shows that the Morris loss rate prediction could not predict the feasible ratio if the buffer is not big enough, such as with the 5 msec RTT of the Stanford mode (198.1766%). 20 Figure 5.1: Percent loss rates for each model Figure 5.2: Percent link utilizations for each model Figure 5.3: Fairness index (min/max) for each model 21 Figures 5.1, 5.2, and 5.3 show the loss rates, link utilizations, and fairness index comparison for each model with a 25Mbps bottleneck link speed and 50 TCP flows. The results show that the Stanford model performs the worst concerning the loss rate (14% compared to the others, which are less than 10%), but if the delays are increased, the Stanford model’s loss rates are reduced (less than 5%). Considering the link utilizations, the UNC model outperforms the others (99.55%) with a small buffer size. On the other hand, the Stanford model performs the worst (55.76% at 5 msec RTT and 75.21% at 10 msec RTT, compared to the other models, which are more than 90%). However, the percent link utilization of the Stanford model keeps increasing with longer delays. For fairness index, in general, the Georgia Tech model outperforms the others except when the average RTT is huge. Overall, the Georgia Tech model achieves not only the minimum loss rate, but also high link utilization and good TCP fairness for small delays. For the UNC model, although the buffer requirement is less than the Georgia Tech model’s, the loss rates are higher and the fairness index is lower. The Stanford model can not explain why the link utilization is quite low (below 80%) for either 5 or 10 msec RTT. However, it performs well if RTT is quite large, 90.56% for 100 msec, compared to just 89.54% for the Georgia Tech model. We believe the high loss rate may affect the link utilization. The router queue often goes empty if the buffer size is not big enough, as shown in Figure 3.3 (3,125 bytes vs. 2,210 in the Stanford model). We conclude that comparatively the Georgia Tech model provides high link utilization and a lower loss rate with good fairness if the average RTT is small (less than 50 msec). For longer delays, the Stanford model outperforms the others. Moreover, Figure 5.1 confirms that the Stanford model really ignores the loss rate, which might lead to lower throughput when the buffer size is small. However, as they claim, this model works well if the number of flows is quite large (up to 10,000 flows), and the bottleneck link bandwidth is huge (2.5Gbps). On the other hand, the Georgia Tech model is most suitable for the access link (lower bottleneck link bandwidth). 6. Conclusion The degree of loss synchronization between TCP connections sharing a bottleneck network link is investigated in this project to verify the variability of the link utilization. We run the experiments in the ONL and construct a special statistical plugin to obtain router statistics at the packet level. The parameters in our observation are the router buffer sizes, propagation delays, and link capacities. The loss rates, link utilizations, and TCP fairness index are also observed. To observe the impact of the number of flows on the degree of TCP loss synchronization, we vary the number of flows from 5 to 400 connections with a fixed delay, buffer size, and bottleneck link bandwidth. We found that by increasing the number of flows, some connections still remain synchronized, but not for all flows. Most of them desynchronize. Also, we observe the impact of buffer sizes, delays, and bottleneck link bandwidths on the degree of TCP loss synchronization. With increased delays, obviously the degree of total synchronization increases. It is also not noticeably changed when the bottleneck link speed is increased. The buffer size affects the degree of TCP loss synchronization, but erratically. Moreover, we also show the loss rates, link utilizations, and fairness index for each experiment. Finally, since the chance of total TCP loss synchronization is slight, to utilize the bottleneck link and to reduce queuing delay, we believe the router buffer size could be reduced below BW*RTT. For small RTT, the Georgia Tech model provides both a lower loss rate and high link utilization with good TCP fairness. For longer delays, the Stanford model outperforms the other models with a small buffer size. 22 6. References [S. Turner et al., 2005] Jonathan S. Turner, Ken Wong, Jyoti Parwatikar, and John Lockwood, “Open Network Laboratory tutorial,” 2005 available online at http://onl.wustl.edu. [Trirumala et al., 2003] Ajay Tirumala, Feng Qin, Jon Dugan, Jim Ferguson, and Kevin Gibbs, “Iperf: Traffic generator,” March 2003 available online at http://dast.nlanr.net/Projects/Iperf. [Appenzeller et al., 2004] Guido Appenzeller, Isaac Keslassy, and Nick McKeown, “Sizing Router Buffers,” In Proceedings ACM SIGCOMM 2004, September 2004. [Le et al., 2005] Long Le, Kevin Jeffay, and F. Donelson Smith, “Sizing Router Buffers for Application Performance,” Technical Report UNC-CS-TR05-111, Department of Computer Science, University of North Carolina, January 2005. [Dhamdhere et al., 2005] Mogh Dhamdhere, Hao Jiang, and Constantinos Dovrolis, “Buffer Sizing for Congested Internet Links,” In Proceedings IEEE INFOCOM 2005, March 2005. [Morris, 2000] Robert Morris, “Scalable TCP Congestion Control,” In Proceedings IEEE INFOCOM 2000. [Villamizar and Song, 1994] Curtis Villamizar and Cheng Song, “High Performance TCP in ANSNET,” ACM SIGCOMM Computer Communication Review Volume 24, Issue 5, October 1994, Pages: 45 - 60. [Srolahti and Kuznetsov, 2002] Pasi Srolahti and Alexey Kuznetsov, “Congestion Control in Linux TCP,” Technical report at Institute of Nuclear Research at Moscow, 2002 available online at www.cs.helsinki.fi/research/iwtcp/papers/linuxtcp.pdf. [Morris, 1997] Robert Morris, “TCP Behavior with Many Flows,” IEEE International Conference on Network Protocols, October 1997. [Wischik and Mckeown, 2005] Damon Wischik and Nick Mckeown, “PartI: Buffer Sizes for Core Routers,” SIGCOMM Computer Communication Review, Vol. 35, No. 3. July 2005, Pages: 75-78. [Qiu et al., 1999] Lili Qiu, Yin Zhang, and Srinivasan Keshav, “Understanding the Performance of Many TCP Flows,” In Proceedings of the 7th International Conference on Network Protocols (ICNP’99) 1999. [Sun et al., 2004] Jinsheng Sun, Moshe Zukerman, King-Tim Ko, Guanrong Chen, and Sammy Chan, “Effect of Large Buffers on TCP Queuing Behavior,” In Proceedings IEEE INFOCOM 2004. [Barman et al., 2004] Dhiman Barman, Georgios Smaragdakis, and Ibrahim Matta, “The Effect of Router Buffer Size on HighSpeed TCP Performance,” In Proceedings IEEE Globecom 2004. [Gorinsky et al., 2005] Sergey Gorinsky, Anshul Kantawala, and Jonathan S. Turner, “Link Buffer Sizing: A New Look at the Old Problem,” In Proceedings ISCC 2005, 2005, Pages: 507 - 514. [Avrachenkov et al., 2002] K.E. Avrachenkov, U. Ayesta, E. Altman, P. Nain, and C. Barakat, “The Effect of Router Buffer Size on the TCP performance,” In Proceedings of LONIIS Workshop on Telecommunication Networks and Teletraffic Theory, January 2002. [F. Riley, 2002] George F. Riley, “On Standardized Network Topologies for Network Research,” In Proceedings of the 2002 Winter Simulation Conference, Pages: 664 - 670. 23 [Sawashima et al., 1997] Hidenari Sawashima, Yoshiaki Hori, Hideki Sunahara, and Yuji Oie, “Characteristics of UDP Packet Loss: Effect of TCP Traffic,” INET97, Japan 1997. [Floyd and Jacobson, 1993] Sally Floyd and Van Jacobson, “Random Early Detection gateways for Congestion Avoidance,” IEEE/ACM Transactions on Networking, V.1 N.4, August 1993, Pages: 397 413. [NLANR, 1996] National Laboratory for Applied Network Research (NLANR), “Flow Characterization,” available online at http://www.nlanr.net/Flowsresearch/fixstats.21.6.html. 24 7. Appendix These graphs below show average link utilizations and queue sizes of BW*RTT model from section 5 (Figure 5.2) at 5, 10, 100, and 250 msec RTT and the bottleneck link bandwidth is at 25 Mbps. The green line represents the total bandwidth at the bottleneck link. The black and red lines represent the logging hosts’ link bandwidths. (a) RTT = 5 msec 99.54% average link utilization (b) RTT = 10 msec 98.48% average link utilization (c) RTT = 100 msec 93.82% average link utilization (d) RTT = 250 msec 77.38% average link utilization Figure 7.1: Average link utilizations and queue sizes of BW*RTT model at 5, 10, 100, and 250 msec RTT with 25 Mbps bottleneck link bandwidth 25