Loss Synchronization of TCP Connections at a Shared Bottleneck Link

advertisement
Loss Synchronization of TCP Connections
at a Shared Bottleneck Link
Chakchai So-In
cs5@cec.wustl.edu
May 9th, 2006
Department of Computer Science and Engineering
Campus Box 1045
Washington University
One Brookings Drive
St. Louis, MO 63130-4899
Loss Synchronization of TCP Connections
at a Shared Bottleneck Link
Chakchai So-In
Department of Computer Science and Engineering
Washington University in St. Louis
cs5@cec.wustl.edu
May 9th, 2006
Abstract
In this project, we study the degree of loss synchronization between TCP connections sharing a bottleneck
network link, i.e., the fraction of the connections that lose a packet during a congestion instance. Loss
synchronization determines the variability of the total TCP load on the link and therefore affects the loss,
delay, and throughput characteristics of network applications.
We conduct our investigation in the Open Network Laboratory (ONL) [Turner et al., 2005], a network
testbed maintained by the Applied Research Laboratory at Washington University in St. Louis. To
measure loss synchronization, we develop an ONL plugin that collects packet-level statistics (arrival time,
size, flow affiliation, sequence number) at a router and communicates the recorded data to an end host for
generating a log file. The statistics plugin and log-generating software can also be used in other ONL
studies that require knowledge of packet-level dynamics at routers.
Control parameters in our investigation of loss synchronization include the types and intensity of traffic,
network topology, link capacities, propagation delays, and router buffer sizes. We report the loss rates,
queue sizes, degree of TCP fairness, and link utilizations observed in our experiments.
1. Introduction
Apart from keeping the loss rate low, the cost of using transmission media makes high usage of
the bottleneck link the main goal of any congestion control. Many methodologies have been proposed to
solve this problem, such as modifying router functionalities or tuning transmission protocol parameters.
One practical solution is to increase the size of the router buffer; however, unlimited router buffer sizes
are not feasible. Also, increasing router buffer sizes tends to increase queuing delays, so we still face the
problem of defining optimum router buffer sizes.
Given the limits on router buffer sizes, the next issue is to optimize a queue scheduling discipline
for when a packet must be dropped at the router. The First In First Out or Drop Tail Queue is one of three
traditional queue scheduling algorithms implemented to drop packets when a router queue overflows.
This simple technique is widely used in the Internet; however, customarily it introduces total
synchronization when packets are dropped from several connections within a congestion instance. A
second technique, Random Drop Queue, randomly chooses a packet from the queue to drop when packets
arrive and the queue is full. Another queuing technique is the Random Early Drop Queue. If the queue
length exceeds a drop level, then the router drops each arriving packet with a fixed drop probability. The
big advantage of this technique is that it can reduce total synchronization.
2
TCP flows such as those generated by HTTP, DNS, and FTP applications, which consume more
than 80 percent of network traffic [NLANR, 1996], have their own problems with total synchronization.
TCP sends as many packets as possible until it detects a lost packet, at which point it reduces the
transmission rate. Like many congestion control protocols, TCP uses packet loss as an indication of
congestion. If losses are synchronized, TCP flows which identical RTT sharing a bottleneck receive the
loss indications at around the same time and decrease the transmission rates at around the same time. This
leads to underutilization of the output link (Figure 1b). As a result, if loss synchronization can be
prevented, then the bandwidth can be used more efficiently (Figure 1b).
(a) Synchronization of TCP behavior
(b) De-synchronization of TCP behavior
Figure 1: TCP synchronization and de-synchronization behavior
In this project, we investigate the loss synchronization of TCP connections in the ONL (Open
Network Laboratory) testbed, emulating a single-bottleneck dumbbell topology. Compared to the Internet,
the ONL network provides a more controlled environment using dedicated routers, transmission lines, and
computer hosts. Parameters examined in our evaluation include traffic load, link delays, and router buffer
sizes. We also report loss rates, link utilizations, and degree of TCP fairness.
In section 2, the simulation scenario and TCP parameters are described. We also explain the
“rstat555” plugin design, features, and setup in this section. Section 3 shows the results of each
experiment, where the main parameter is the number of TCP flows. Section 4 shows other experimental
results when the number of flows is set to 50 connections, but the buffer sizes, delays, and bottleneck link
bandwidths are different. Several recently proposed router buffer sizing models [Appenzeller et al., 2004],
[Dhamdhere et al., 2003], and [Le et al., 2003] are evaluated in section 5, and finally we draw conclusions
and recommend a suitable model for sizing router buffers.
2. Simulation Methodology
2.1 Simulation Parameters
We perform the experiments on the ONL testbed. In general, N TCP flows (S1…SN) are
transmitted through a shared bottleneck link with capacity L to receivers (R1…RN) as shown in Figure
2.1a. Figure 2.1b shows the ONL setup that simulates the Figure 2.1a topology.
3
(a) Simulation topology
(b) ONL network setup
Figure 2.1: Simulation Setup
In our experiments, we use iperf [Tirumala et al., 2003] as a TCP traffic generator running on top
of Linux TCP [Srolahti and Kuznetsov, 2002] because we believe that the popularity and availability of
the Linux operating system; however, we disable SACK and timestamp functionalities. Generally, iperf
generates many long-lived TCP flows for a fixed period of time. The FIFO queue scheduling discipline is
configured at the router. Table 2.1 shows other parameters.
Because the experiments are based on a real network, there are the hardware limitations. For
example, the start time of each flow is not exactly the same. To simulate the topology in Figure 2.1a,
Figure 2.1b also shows the loop back design in which a number of flows come from a single host (n1p3)
with multiple sessions, and are forwarded to a receiver (n1p2) within the same sender session.
Figure 2.1b shows TCP flows are transmitted from n1p3 to n1p2, but instead of forwarding
directly from port 3 to port 2, we make the router forward all packets from port 3 to port 5. Due to the
loop back design, all packets would be forwarded to port 6, and finally, the routing is set to send them
directly to port 2 (n1p2). Acknowledgement packets are transmitted in the opposite way.
Table 2.1: Simulation parameters
Packet size
576 bytes (include 40 bytes for IP header and TCP header)
TCP flows
between 5 and 400 flows
Maximum window
64 kilobytes
Propagation delay of (L)
between 5 and 250 milliseconds
Bandwidth of L
between 1 and 50 megabits per second
Bandwidth of ∑S1…SN and ∑R1…RN
600 megabits per second
Router buffer sizes
between 3,125 and 50,000 bytes
Simulation length
20 seconds
In this project, first we observe the degree of TCP loss synchronization by varying the number of
source flows from 5 to 400 with fixed delay, buffer size, and bottleneck link bandwidth. We conduct three
more experiments: in one, the buffer size is less than BW*DELAY [Villamizar et al., 1997], in the
second, it is equal, and in the third it is greater. Next, with 50 TCP flows, we vary the delays, buffer sizes,
and bottleneck link bandwidths. Finally, using recently recommended router buffer sizes, we run another
experiment in order to observe the loss rates, link utilizations, and degree of TCP fairness.
4
2.2 Plugin
To measure TCP loss synchronization, we developed an ONL plugin that collects packet-level
statistics. This section briefly describes our plugin design, plugin features, and plugin configuration for
gathering the packets from the ONL router to the end host.
2.2.1 Plugin Design
The statistical plugin (rstat555) is designed to gather flow statistics on the packet level at the
ONL router, and send them back to an end host. The rstat555 is loaded on a Smart PortCard (SPC)
embedded to the ONL routers (Figure 2.2). Timestamps, source and destination addresses, source and
destination ports, packet size, sequence and acknowledgement numbers, and queue size are examples of
the information items. Since there is a limitation on buffer storage and performance, the necessary
information is forwarded for storage at the end system in order to do offline analysis.
Figure 2.2: ONL gigabit router architecture [Turner et al., 2005]
The plugin is bundled in the rstat-555-v1.tar package. The rstat.c and rstat.h are the main
components, and a developer can modify these files to add extra functionality. Generally, this statistical
plugin makes a UDP packet, attaches the first N bytes of a received packet (which is counted from at the
beginning of an IP header) and some extra information, and then forwards it to the end host. Developers
can implement their own logging demons in order to receive all packets. However, in order to do logging
and analysis in this project, we develop simple logging and tracing tools. The daemons run at the end
host: they are the net collector (udp-echo-logger) and net analyzer (udp-echo-trace) which are bundled in
the udp-echo-tool-ver1.tar package. The net collector gathers all messages from the routers and stores
them in raw format, and then the raw file can be converted to ASCII format by the net analyzer. Table 2.2
shows net collector and net analyzer usages. A trace file example is shown in Table 2.3.
5
•
Table 2.2: Net collector and net analyzer usages
udp-echo-logger : Log all packets in binary format
Usage: udp-echo-logger [-P Listening Port] [-x First N Bytes] [-F LogFile]
-N : 56 or 44 {TCP(56) = 16(APP) + 20(IP) + 20(TCP), UDP(44) = 16(APP) + IP(20) + UDP(8)}
E.g. udp-echo-logger -P 8000 -x 56 -F i8000binary
Note: Saving all packets to a local disk is strongly recommended to avoid packets lost
due to the bottleneck of NFS (disk transfer speed).
"tcpdump -w <file>" can be used to check if a kernel lost any packets.
•
udp-echo-trace : Convert binary to ASCII format
Usage: udp-echo-trace [-P Protocol] [-N First N Bytes] [-f EnableFlowID] [-t Time & seq rounding]
[-I Input File] [-O Output File]
-P : 2 (TCP), 3 (UDP)
-f : 1 (Enable FlowID trace), 0 (Disable FlowID trace)
-t : 1 (Rounding up packet sequence and time to start at 0), 0 (Normal operation)
-N : 56 or 44 {TCP (56); 16 APP + 20 IP + 20 TCP, UDP (28); 16 APP + 20 IP + 8 UDP}
E.g. udp-echo-trace -P 2 -N 56 -f 0 -t 1 -I i8000binary -O i8000ascii
For UDP packets, sequence number, acknowledgement number, and TCP window are set to 0.
Table 2.3: Trace file example (Packet ID, Timestamps (msec), QID, Qlength (bytes), Packet size (bytes),
Source IP, Destination IP, Protocol ID, IP identification, IP offset, Source port, Destination
port, Packet sequence number, Acknowledgement number, TCP windows)
10 22 256 1152 576 192.168.1.64 192.168.1.48 6 7252 16384 32782 8000 256079826 251729665 5840
2.2.2 Plugin Features
The rstat555 supports up to eight control messages, shown below:
•
"1" : Return ICMP, TCP, UDP, and total number of packets to RLI, then clear all counters.
•
“2" : Return ICMP, TCP, UDP, and total number of packets to RLI.
•
"3" : Change log server port number from 1025 to 65534 (The default port number is 8000).
#define LOG_PORT
8000
•
"4" : Change log IP server. The IP format is <0-255> <0-255> <0-255> <0-255>.
Note: The delimiter is one space, e.g., 192 168 1 128
#define LOG_IP
0xc0a80180
•
"5" : Change QID number, to observe the specific queue during transmission.
Note: An extra GM filter with a specific queue may need to be configured.
#define
LOG_QID 256
•
"6" : Specify the first N number of bytes from the beginning of IP packets.
E.g., 20 + 20 = 40 for IP and TCP headers (no option)
20 + 8 = 28 for IP and UDP headers (no option)
6
Note: While either 40 or 28 bytes are specified, an application header (16 bytes) containing
"packet index, current-time, qid, and qlength (4 bytes each)" is inserted before the first N
bytes, and then sent back, which makes the total packet size either 56 or 44 bytes.
•
"9" : Enable or disable logging functionality, e.g., 0 = disable and 1 = enable.
•
"10": Return errors due to system call in five tuples:
“Interrupt, Clock, Qlength, Buffer Allocation, and Packet Forwarding”
2.2.3 Plugin Configuration
The main purpose of the statistical plugin is to collect flow statistics at the router and send them
back to the end host. However, in order to gather all flows, a special general/exact filter is used to match
all flows. Then the router makes duplicate packets and sends them to the Smart Port Card (SPC). The
ONL auxiliary function makes a copy of each flow. Currently, SPC can support a total bandwidth up to
200 Mbps. Table 2.4 shows how the rstat555 plugin is configured.
•
•
•
•
Table 2.4: How to set up the rstat555 plugin
Add plugin directory to RLI (e.g., /users/chakchai/myplugins/).
Add rstat555 instance to RLI, and bind the plugin ID to SPC ID (usually 8).
Create general match filter and bind spc id to the plugin (usually 8).
Note: Choose "aux" option to make duplicate packets to send to rstat555 plugin.
Configure logging IP and PORT by sending a message to the plugin (code 3 and 4).
Note: We specify the address and network prefix in both source and destination addresses to
192.168.0.0/16 in the ONL private network. At the router, the UDP source address is specified to be
172.16.0.1 to avoid the IP conflict; however, the source IP can be changed in rstat.h (HEX format).
E.g., #define LOG_SOURCE
0xac100001
2.3 Experimental Setup
The ONL network is designed to support offline analysis. Apart from the statistical plugin, in this
project we need a delay plugin in order to vary RTT. Figure 2.3 shows how we configure the delay
plugin. The delay plugin is installed at egress port 3 to delay the acknowledgement packets (Figure 2.1b).
A general match filter is set to match all packets and send them directly to the delay plugin. After
delaying, the plugin forwards these packets to the output link. To find out which packets are dropped, we
design the loop back network to collect all packets before and after entering the specific queue (FIFO).
7
Figure 2.3: The pdelay plugin configuration at egress port 3
To gather the statistics from the router, a general match filter (GM) is set at both egress port 5 and
ingress port 6. Figure 2.4 shows the GM filter for collecting all incoming packets, where the source and
destination IP are 192.168.0.0/16 (ONL private network). The first rule is set to make all packets go
through a single egress FIFO queue 256 (qid 256). The second rule is set to make duplicate incoming
packets and forward them to the rstat555 plugin at “qid 136 and spc qid 8”. The plugin sends all packets
back to the end host (n1p4) as shown in Figure 2.1b.
Figure 2.5 shows the GM filter configuration for collecting the outgoing packets from egress port
6 to ingress port 5. This filter is configured to make duplicate packets and forward those to the rstat555
plugin to send them to the end host (n1p7). According to our observations, in order to mitigate the loss of
logging packets, the net collector should be run on a dedicated disk or local disk (not NFS).
8
Figure 2.4: General match filter at egress port 5
Figure 2.5: General match filter at ingress port 6
9
3. TCP behavior for different flows with the fixed propagation delay, router
buffer size, and bottleneck link bandwidth
We conduct three experiments to observe whether the number of flows affects the degree of TCP
loss synchronization (TCP flows range between 5 flows and 400 flows). We run each experiment for 20
sec. We set RTT to 10 msec and bottleneck link bandwidth to 10 Mbps for all three experiments. In the
first experiment, the buffer size is set to 3,125 bytes, which equals to BW*RTT/4. In the second, the
buffer size is set to BW*RTT (12,500 bytes) and finally, the buffer size is set to 50,000 bytes
(BW*RTT*4).
The fraction of affected flows and the cumulative fraction of congestion events are the main
metrics to be investigated for the degree of TCP loss synchronization. For a congestion event, the fraction
of affected flows equals to the number of affected flows divided by the total number of flows. The
cumulative fraction of congestion events corresponding to fraction x of affected flows is the fraction of
congestion events where the fraction of affected flows is at most x. Table 3.1 shows an example of how to
calculate both values with 10 total flows and 5 congestion events.
Table 3.1: Fraction of affected flows and the cumulative fraction of congestion events calculation
2nd
3rd
4th
5th
Congestion event
1st
Number of affected flows
8
5
2
2
5
within a congestion event
1. Fraction of affected flows
8/10
5/10
2/10
2/10
5/10
2. Sort (1)
2/10
2/10
5/10
5/10
8/10
3. Cumulative fraction of
2/5
4/5
1
congestion events
Figure 3.1 shows the degree of TCP loss synchronization on fixed buffer sizes with different TCP
flows. Figures 3.3 and 3.4 show the maximum and average number of TCP flow synchronization. The
router queues are illustrated in Figure 3.4. The loss rates are shown in Figure 3.5. Figure 3.6 shows the
link utilizations. Finally, the degree of TCP fairness is plotted in Figure 3.7. Other observations from the
experiments are listed below.
•
From Figure 3.1a (with BW*RTT buffer size), in general, when the number of flows is increased,
most of them desynchronize except for five flows which are 18% totally synchronized. Other than
that, there is no total synchronization at all (from 10 flows to 400 flows). As found in [Appenzeller et
al., 2004], small numbers of flows tends to be all synchronized, but not large aggregates of flows.
However, we also believe that synchronization depends not only on the number of flows but also on
the buffer sizes. Figures 3.1b and 3.1c show that with increasing buffer sizes, the percent of totally
synchronized flows also increases. For example, for five flows the percentages were 2%, 18%, and
42% for 3,125, 12,500, and 50,000 byte buffer sizes.
•
For flows greater than 50, generally the fraction of affected flows is less then 0.2, 0.4, and 0.6 for
3,125, 12,500, and 50,000 byte buffer sizes, respectively. Figure 3.2a (BW*RTT buffer size) also
shows that within a congestion event, in fact, at most 16, 21, 27, and 33 flows suffer loss at the same
time for 50, 100, 200, 400 flows, respectively. Thus, the degree of TCP loss synchronization
decreases exponentially, as shown in Figure 3.2b.
•
Also the results shown in Figures 3.3a and 3.3b (the average number of TCP flow synchronization vs.
number of TCP flows) follow the same patterns as the results from Figure 3.2a and 3.2b.
10
•
Figure 3.4 shows the router queue characteristics for different numbers of flows. In general, there is
no total synchronization among flows. It also shows that the output link is almost always full, since
the queue has never been empty (except for the small buffer size of 3,125 bytes, which often goes
empty). Also, increasing the number of flows smoothes out the queue fluctuation at the router.
•
Figure 3.5 shows the relationship between the loss rates and the number of flows. Generally the loss
rate keeps increasing dramatically when increasing the number of flows, as Robert Morris found
[Morris, 1997]. However, with increases in buffer sizes, the loss rates decrease accordingly.
•
The relationship between the link utilizations and the number of flows is shown in Figure 3.6.
Commonly, with enough buffer size (more than or equal to BW*RTT in this experiment), the link
utilization increases slightly because the link is already utilized; however, with the small buffer size
the lesser number of flows makes the link underutilized.
•
Figure 3.7 shows the degree of TCP fairness among flows (total bandwidth). For 5 and 10 flows,
increasing the number of flows starts to slightly decrease the degree of TCP fairness (With few flows,
each flow performs quite fairly). Beyond 50 flows, it is very obvious that TCP flows are unfair (the
degree of fairness is less than 0.2). Moreover, roughly we can observe that increasing the buffer size
affects the fairness index, but it is unpredictable.
(a) Buffer size at 12,500 bytes
Figure 3.1: Cumulative fraction of congestion events of 10 msec RTT. The notations 5, 10, 50, 100, 200,
and 400 represent the number of flows from 5 to 400 TCP flows.
11
(b) Buffer size at 3,125 bytes
(c) Buffer size at 50,000 bytes
Figure 3.1: Cumulative fraction of congestion events of 10 msec RTT. The notations 5, 10, 50, 100, 200,
and 400 represent the number of flows from 5 to 400 TCP flows (continued).
12
(a) Maximum number of TCP flow synchronization
(b) Percentage of maximum number of TCP flow
synchronization
Figure 3.2: Maximum TCP flow synchronization and number of flows with 10 msec RTT. The notations
3125B, 12500B, and 50000B represent the router buffer size at 3,125, 12,500, and 50,000 bytes.
(a) Average number of TCP flow synchronization
(b) Percentage of average number of TCP flow
synchronization
Figure 3.3: Average TCP flow synchronization and number of flows with 10 msec RTT. The notations
3125B, 12500B, and 50000B represent the router buffer size at 3,125, 12,500, and 50,000 bytes.
13
(a) 5 flows
(b) 10 flows
(c) 50 flows
(d) 100 flows
(e) 200 flows
(f) 400 flows
Figure 3.4: Router queues with 10 msec RTT and 3,125/12,500/50,000 byte buffer sizes for 5 to 400 TCP
flows. In each graph, for (a) to (f), the top line represents a 50,000 byte buffer size, the
middle line represents a 12,500 byte buffer size, and the bottom line represents a 3,125 byte
buffer size.
14
Figure 3.5: Number of flows vs. loss rates with different buffer sizes
Figure 3.6: Number of flows vs. link utilizations with buffer sizes
Figure 3.7: Number of flows vs. fairness index with different buffer sizes. The notations
3125B, 12500B, and 50000B represent the router buffer size at 3,125, 12,500, and
50,000 bytes.
15
4. TCP behavior for 50 flows with different RTT, buffer sizes, and bottleneck
link bandwidths
From section 3, we conclude that the number of flows does affect the degree of TCP loss
synchronization, in that increasing the number of flows reduces the degree of TCP loss synchronization.
In this section, we explore whether the degree of TCP loss synchronization will be really affected by the
different buffer sizes, RTT, and bottleneck link bandwidths.
We run three more experiments, fixing the number of flows to 50 and varying the router buffer
sizes, RTT, and bottleneck link bandwidths. Each experiment is run over 20 sec. First, with 10 msec RTT
and 25 Mbps link bandwidth, the buffer sizes are varied from 3,125 to 50,000 bytes (Figures 4.1). Second,
we vary the RTT from 5 to 250 msec with 62,500 byte buffer size and 25 Mbps bottleneck link bandwidth
(Figures 4.2). Finally, with 10 msec RTT and 12,500 byte buffer size, we vary the bottleneck link
bandwidths from 1 to 50 Mbps (Figures 4.3). Table 4.1 also shows the percentage of the average of
affected flows.
The loss rates, link utilizations, and TCP fairness with different buffer sizes are shown in Table
4.2. Table 4.3 illustrates these parameters when the RTT is different. With varying the bottleneck link
bandwidths, Table 4.4 also shows these three metrics. Other observations from the experiments are listed
below.
•
In general, Figure 4.1 shows that increasing the buffer sizes also increases the degree of TCP loss
synchronization.
•
Figure 4.2 shows that increasing the delays clearly increases the degree of TCP loss synchronization.
•
Increasing or decreasing the bottleneck link bandwidths affects the degree of TCP loss
synchronization, but unpredictably (Figure 4.3). Increasing the bottleneck link bandwidths can either
increase or reduce the degree of TCP loss synchronization, and vice versa. The degree of TCP loss
synchronization does not change noticeably.
•
Table 4.1 also shows increasing either buffer sizes or RTT increases the percentage of the average of
affected flows but not for increasing bottleneck link bandwidths.
•
Tables 4.2, 4.3, and 4.4 show the packet loss rates with different buffer sizes, RTT, and bottleneck
link bandwidths. Like the results from section 3, the greater the buffer sizes are, the smaller the loss
rates. The longer the delays become, the smaller the loss rates. Also, the greater the bottleneck link
bandwidths are, the smaller the loss rates.
•
The link utilizations are also shown in these three tables. With 50 TCP flows, when the buffer sizes
are increased, the link utilizations slightly increase because links are already utilized. Increasing the
bottleneck link bandwidths also increases the link utilizations. However, increasing RTT reduces the
link utilizations.
•
Tables 4.1, 4.2 and 4.3 also show the degree of TCP fairness. In general, increasing either buffer sizes
or bottleneck link bandwidths increases the degree of TCP fairness. But increasing RTT can either
increase or reduce the degree of TCP fairness, and vice versa.
16
Figure 4.1: Cumulative fraction of congestion events of 10 msec RTT with different buffer sizes (The
notations 3125B, 6250B, 12500B, 25000B, and 50000B represent the router buffer size at
3,125, 6,250, 12,500, 25,000 and 50,000 bytes.) for 50 TCP flows
Figure 4.2: Cumulative fraction of congestion events of 62,500 byte buffer size with different RTT (5,
10, 50, 100, and 250 msec) for 50 TCP flows
17
Figure 4.3: Cumulative fraction of congestion events of 10 msec RTT and 12,500 byte buffer size with
different bottleneck link bandwidths (1, 5, 10, 25, and 50 Mbps) for 50 TCP flows
Table 4.1: Percentage of the average of affected flows
(a) Percentage of the average of affected flows with different buffer sizes
Buffer Sizes (bytes)
3,125
6,250
12,500
25,000
8
10
12
16
Percent of the average of
affected flows (%)
(b) Percentage of the average of affected flows with different RTT
RTT (msec)
5
10
50
100
6
10
32
48
Percent of the average of
affected flows (%)
50,000
20
250
62
(c) Percentage of the average of affected flows with different bottleneck link bandwidths
Bandwidth (Mbps)
1
5
10
25
50
14
12
12
14
16
Percent of the average of
affected flows (%)
Table 4.2: Buffer sizes vs. loss rates, fairness, and link utilization of 50 TCP flows with 10 msec RTT
and 25 Mbps bottleneck link bandwidth
Buffer Sizes (bytes)
3,125
6,250
12,500
25,000
50,000
14.09
14.81
13.28
12.05
9.05
Loss Rate (%)
98.85
99.16
99.16
99.16
99.16
Link Utilization (%)
0.0088
0.1338
0.1554
0.1670
0.2235
Fairness Index
18
Table 4.3: RTT vs. loss rates, fairness, and link utilization of 50 TCP flows with 62,500 byte buffer size
and 25 Mbps bottleneck link bandwidth
RTT (msec)
5
10
50
100
250
8.85
8.08
5.59
3.48
2.98
Loss Rate (%)
99.21
99.16
98.74
96.01
83.74
Link Utilization (%)
0.3254
0.1959
0.5793
0.3988
0.2700
Fairness Index
Table 4.4: Bottleneck link bandwidths vs. loss rates, fairness, and link utilization of 50 TCP flows with
10 msec RTT and 12,500 byte buffer size
Bandwidth (Mbps)
1
5
10
25
50
21.98
14.83
12.28
9.60
6.83
Loss Rate (%)
97.48
98.59
99.16
99.47
99.05
Link Utilization (%)
0.0033
0.0037
0.0128
0.4208
0.4997
Fairness Index
5. Discussion
We found in sections 3 and 4 that most of the time TCP flows desynchronize. Also, the extent of
TCP synchronized losses depends mostly on the number of flows, router buffers sizes, and RTT. With an
increased number of flows, we believe that the buffer sizes could be reduced substantially due to the lack
of synchronization at the router. Thus, in this section, we run experiments based on router buffer sizes
recently proposed: [Appenzeller et al., 2004], [Dhamdhere et al., 2005], and [Le et al., 2005]. We also
compare these results to the original recommended buffer size results (bandwidth delay product)
[Villamizar and Song, 1994]. We plot the loss rates, link utilizations, and degree of TCP fairness for each
model and observe these buffer sizes applied to a real network. We fix the number of flows at 50
connections, the bottleneck link bandwidth to 25Mbps, and vary the total delays from 5 to 250 msec.
In 1994, [Villamizar and Song, 1994] proposed a model for the bandwidth delay product
(BW*RTT), where the delay refers to the RTT. They measured the link utilization of an approximately 40
Mbps bottleneck link bandwidth with 1, 4, and 8 long-lived TCP flows. They found that in order to
guarantee full link utilization, the router needs a buffer at least as big as the product of delay and link
capacity.
A second model, the so called “Stanford model” [Appenzeller et al., 2004], not only minimized
buffer size but also achieved link utilization of almost 100%. They ran tests on both simulated and real
networks (the dedicated routers and hosts). They claim that the model works especially well for the core
routers when the number of flows is quite large. Given N flows, the required buffer size is RTT*BW/√N.
With one flow, the buffer size would be the same as BW*RTT. This result is based on the degree of flow
desynchronization and independence. As a result, at 2.5Gbps, a link which carries 10,000 flows can
reduce the router buffer size by 99%.
A third model, the “Georgia Tech model” [Dhamdhere et al., 2005], based the recommended
buffer size on the constraints of loss rate, queuing delay, and link utilization. Based on NS2 simulation,
they claim that the buffer requirement should depend on not only the number of flows but also on both the
degree of synchronization and the harmonic mean of RTT. We use this model to determine buffer sizing
for congested links (BSCL). Table 5.1 shows the BSCL formula and the values that we use for our
experiment.
19
Table 5.1: BSCI formula and parameters
BSCL formula
Our parameters
B = max{Bq, Bp}
Bq = {q(Nb)CeTe – 2MNb[1-q(Nb)] }/ [2-q(Nb)]
Bp = KpNb – CeTe
Nb = Number of LBP flows at target link
Ce = Effective capacity for LBP flows
Te = Effective RTT of LBP
M = Maximum Segment Size
Kp = 0.87/√loss rate
q(Nb) = 1 – (1 – 1/Nb)L’Nb
L’Nb ≈ αNb
LBP = Locally Bottlenecked Persistent
Nb = Long lived TCP flows (50 flows)
Ce =Bottleneck Link (25Mbps)
RTT (between 5 and 250 msec)
576 Bytes (include IP and TCP headers)
Loss rate is set to 10%
α (loss synchronization factor) = 0.5 as the paper
recommends.
[Le et al., 2005] presented a revised model of router buffer sizing. They found that the Stanford
model tends to underestimate the router buffer size. On the other hand, the Georgia Tech model and the
bandwidth delay product model overestimate the router buffer size. Based on their analysis of the Georgia
Tech model, they claim that the recommended buffer size should be 1.8*N – C*T (the UNC model) where
N is the number of flows, C is the link capacity, and T is the median RTT.
Table 5.2 shows the recommended buffer sizes for each model with the number of flows set to 50
connections. We also use another loss rate model [Morris, 2000] (Table 5.3) which establishes the
relationship between the number of flows and router queue size as, “loss rate = 0.76N2/S2”, where N is
the total flows and S is the buffer size plus the in-flight packet size. Figures 5.1, 5.2, and 5.3 show the loss
rates, link utilizations, and TCP fairness index for each model.
Flow
50
Table 5.2: Simulation parameters for each model
RTT (sec)
BW (Mbps)
BW*RTT
Stanford
Georgia
(bytes)
(bytes)
(bytes)
0.005
25
15,625
2,210
63,609
0.01
25
31,250
4,419
47,984
0.05
25
156,250
22,097
33,482
0.1
25
312,500
44,194
16,963
0.25
25
781,250
110,485
171,525
UNC
(bytes)
36,215
20,590
-104,410
-260,660
-729,410
Table 5.3: Percent loss rates of four models by Robert Morris loss rate prediction
Delays (msec)
BW*RTT (%)
Stanford (%)
Georgia (%)
UNC (%)
5
64.5503
198.1766
10.0410
23.4568
10
16.1376
49.5469
10.0410
23.4568
50
0.6455
1.9818
1.7511
N/A
100
0.1614
0.4955
0.5807
N/A
250
0.0258
0.0793
0.0694
N/A
One interesting observation from Table 5.2 is that the UNC model can not predict the buffer size
value if the RTT is quite large (more than 50 msec RTT causes negative values). Table 5.3 shows that the
Morris loss rate prediction could not predict the feasible ratio if the buffer is not big enough, such as with
the 5 msec RTT of the Stanford mode (198.1766%).
20
Figure 5.1: Percent loss rates for each model
Figure 5.2: Percent link utilizations for each model
Figure 5.3: Fairness index (min/max) for each model
21
Figures 5.1, 5.2, and 5.3 show the loss rates, link utilizations, and fairness index comparison for
each model with a 25Mbps bottleneck link speed and 50 TCP flows. The results show that the Stanford
model performs the worst concerning the loss rate (14% compared to the others, which are less than
10%), but if the delays are increased, the Stanford model’s loss rates are reduced (less than 5%).
Considering the link utilizations, the UNC model outperforms the others (99.55%) with a small buffer
size. On the other hand, the Stanford model performs the worst (55.76% at 5 msec RTT and 75.21% at 10
msec RTT, compared to the other models, which are more than 90%). However, the percent link
utilization of the Stanford model keeps increasing with longer delays. For fairness index, in general, the
Georgia Tech model outperforms the others except when the average RTT is huge.
Overall, the Georgia Tech model achieves not only the minimum loss rate, but also high link
utilization and good TCP fairness for small delays. For the UNC model, although the buffer requirement
is less than the Georgia Tech model’s, the loss rates are higher and the fairness index is lower. The
Stanford model can not explain why the link utilization is quite low (below 80%) for either 5 or 10 msec
RTT. However, it performs well if RTT is quite large, 90.56% for 100 msec, compared to just 89.54% for
the Georgia Tech model. We believe the high loss rate may affect the link utilization. The router queue
often goes empty if the buffer size is not big enough, as shown in Figure 3.3 (3,125 bytes vs. 2,210 in the
Stanford model).
We conclude that comparatively the Georgia Tech model provides high link utilization and a
lower loss rate with good fairness if the average RTT is small (less than 50 msec). For longer delays, the
Stanford model outperforms the others. Moreover, Figure 5.1 confirms that the Stanford model really
ignores the loss rate, which might lead to lower throughput when the buffer size is small. However, as
they claim, this model works well if the number of flows is quite large (up to 10,000 flows), and the
bottleneck link bandwidth is huge (2.5Gbps). On the other hand, the Georgia Tech model is most suitable
for the access link (lower bottleneck link bandwidth).
6. Conclusion
The degree of loss synchronization between TCP connections sharing a bottleneck network link
is investigated in this project to verify the variability of the link utilization. We run the experiments in the
ONL and construct a special statistical plugin to obtain router statistics at the packet level. The parameters
in our observation are the router buffer sizes, propagation delays, and link capacities. The loss rates, link
utilizations, and TCP fairness index are also observed.
To observe the impact of the number of flows on the degree of TCP loss synchronization, we vary
the number of flows from 5 to 400 connections with a fixed delay, buffer size, and bottleneck link
bandwidth. We found that by increasing the number of flows, some connections still remain
synchronized, but not for all flows. Most of them desynchronize. Also, we observe the impact of buffer
sizes, delays, and bottleneck link bandwidths on the degree of TCP loss synchronization. With increased
delays, obviously the degree of total synchronization increases. It is also not noticeably changed when the
bottleneck link speed is increased. The buffer size affects the degree of TCP loss synchronization, but
erratically. Moreover, we also show the loss rates, link utilizations, and fairness index for each
experiment.
Finally, since the chance of total TCP loss synchronization is slight, to utilize the bottleneck link
and to reduce queuing delay, we believe the router buffer size could be reduced below BW*RTT. For
small RTT, the Georgia Tech model provides both a lower loss rate and high link utilization with good
TCP fairness. For longer delays, the Stanford model outperforms the other models with a small buffer
size.
22
6. References
[S. Turner et al., 2005] Jonathan S. Turner, Ken Wong, Jyoti Parwatikar, and John Lockwood, “Open
Network Laboratory tutorial,” 2005 available online at http://onl.wustl.edu.
[Trirumala et al., 2003] Ajay Tirumala, Feng Qin, Jon Dugan, Jim Ferguson, and Kevin Gibbs, “Iperf:
Traffic generator,” March 2003 available online at http://dast.nlanr.net/Projects/Iperf.
[Appenzeller et al., 2004] Guido Appenzeller, Isaac Keslassy, and Nick McKeown, “Sizing Router
Buffers,” In Proceedings ACM SIGCOMM 2004, September 2004.
[Le et al., 2005] Long Le, Kevin Jeffay, and F. Donelson Smith, “Sizing Router Buffers for Application
Performance,” Technical Report UNC-CS-TR05-111, Department of Computer Science, University of
North Carolina, January 2005.
[Dhamdhere et al., 2005] Mogh Dhamdhere, Hao Jiang, and Constantinos Dovrolis, “Buffer Sizing for
Congested Internet Links,” In Proceedings IEEE INFOCOM 2005, March 2005.
[Morris, 2000] Robert Morris, “Scalable TCP Congestion Control,” In Proceedings IEEE INFOCOM
2000.
[Villamizar and Song, 1994] Curtis Villamizar and Cheng Song, “High Performance TCP in ANSNET,”
ACM SIGCOMM Computer Communication Review Volume 24, Issue 5, October 1994, Pages: 45 - 60.
[Srolahti and Kuznetsov, 2002] Pasi Srolahti and Alexey Kuznetsov, “Congestion Control in Linux
TCP,” Technical report at Institute of Nuclear Research at Moscow, 2002 available online at
www.cs.helsinki.fi/research/iwtcp/papers/linuxtcp.pdf.
[Morris, 1997] Robert Morris, “TCP Behavior with Many Flows,” IEEE International Conference on
Network Protocols, October 1997.
[Wischik and Mckeown, 2005] Damon Wischik and Nick Mckeown, “PartI: Buffer Sizes for Core
Routers,” SIGCOMM Computer Communication Review, Vol. 35, No. 3. July 2005, Pages: 75-78.
[Qiu et al., 1999] Lili Qiu, Yin Zhang, and Srinivasan Keshav, “Understanding the Performance of Many
TCP Flows,” In Proceedings of the 7th International Conference on Network Protocols (ICNP’99) 1999.
[Sun et al., 2004] Jinsheng Sun, Moshe Zukerman, King-Tim Ko, Guanrong Chen, and Sammy Chan,
“Effect of Large Buffers on TCP Queuing Behavior,” In Proceedings IEEE INFOCOM 2004.
[Barman et al., 2004] Dhiman Barman, Georgios Smaragdakis, and Ibrahim Matta, “The Effect of Router
Buffer Size on HighSpeed TCP Performance,” In Proceedings IEEE Globecom 2004.
[Gorinsky et al., 2005] Sergey Gorinsky, Anshul Kantawala, and Jonathan S. Turner, “Link Buffer Sizing:
A New Look at the Old Problem,” In Proceedings ISCC 2005, 2005, Pages: 507 - 514.
[Avrachenkov et al., 2002] K.E. Avrachenkov, U. Ayesta, E. Altman, P. Nain, and C. Barakat, “The
Effect of Router Buffer Size on the TCP performance,” In Proceedings of LONIIS Workshop on
Telecommunication Networks and Teletraffic Theory, January 2002.
[F. Riley, 2002] George F. Riley, “On Standardized Network Topologies for Network Research,” In
Proceedings of the 2002 Winter Simulation Conference, Pages: 664 - 670.
23
[Sawashima et al., 1997] Hidenari Sawashima, Yoshiaki Hori, Hideki Sunahara, and Yuji Oie,
“Characteristics of UDP Packet Loss: Effect of TCP Traffic,” INET97, Japan 1997.
[Floyd and Jacobson, 1993] Sally Floyd and Van Jacobson, “Random Early Detection gateways for
Congestion Avoidance,” IEEE/ACM Transactions on Networking, V.1 N.4, August 1993, Pages: 397 413.
[NLANR, 1996] National Laboratory for Applied Network Research (NLANR), “Flow
Characterization,” available online at http://www.nlanr.net/Flowsresearch/fixstats.21.6.html.
24
7. Appendix
These graphs below show average link utilizations and queue sizes of BW*RTT model from
section 5 (Figure 5.2) at 5, 10, 100, and 250 msec RTT and the bottleneck link bandwidth is at 25 Mbps.
The green line represents the total bandwidth at the bottleneck link. The black and red lines represent the
logging hosts’ link bandwidths.
(a) RTT = 5 msec
99.54% average link utilization
(b) RTT = 10 msec
98.48% average link utilization
(c) RTT = 100 msec
93.82% average link utilization
(d) RTT = 250 msec
77.38% average link utilization
Figure 7.1: Average link utilizations and queue sizes of BW*RTT model at 5, 10, 100, and 250 msec
RTT with 25 Mbps bottleneck link bandwidth
25
Download