TCP CONGESTION CONTROL Table of Contents •

advertisement
TCP CONGESTION
CONTROL
by
SRINATH GOPALAN
AND
SURANJAN PRAMANIK
Table of Contents
• Motivation
• Terminology
• Implementation Schemes
• Simulation Results
• References
1
Motivation
• Exponential Increase in Network Demand
— Rising packet loss rates
— Low utilization and goodput
— Potential for congestion collapse
•Need for End-to-End congestion control
— To avoid congestion collapse
— Fairness
— As a tool for the application to better achieve its
own goals:
e.g. minimizing loss in delay and
maximizing the throughput
Congestion Control Before
TCP in the early 80’s
— TCP flow control to avoid overflowing receiver’s
buffer.
— TCP’s Go-Back-N retransmission.
— FIFO scheduling, drop tail queue management.
A series of congestion collapse in 1986
— Congestion collapse: Paths clogged with
unnecessarily-retransmitted packets [Nagle 84]
2
Congestion Control Today
• TCP
— Instrumental in preventing congestion collapse
— Limits transmission rate at the source
— Window-based rate control
• Increased and decreased based on network feedback
• Implicit congestion signal based on packet loss
• Slow-start, Congestion avoidance, Fast-retransmit,
Fast-recovery
• Exponential backoff of the retransmit timer, when
a retransmitted packet is itself dropped.
Terminology
Sender Maximum Segment Size (SMSS) - The size
of the largest segment that the sender can transmit.
Receiver Window (rwnd) - The most recently
advertised receiver window.
Congestion Window(cwnd) - A TCP state variable
which limits the amount of data a TCP can send.
Initial Window(IW) - Size of the sender’s congestion
window after the 3 way handshake is completed.
3
Terminology contd....
Flight Size - The amount of data that has been sent
but not yet acknowledged.
Slow Start Threshold(ssthresh) - It is a TCP state
variable to determine whether the slow start or the
congestion avoidance algorithm is to be used.
Maximum Burst(maxburst) - It is a TCP state variable
which limits the amount of data that can be sent after
coming out of Fast Recovery.
TCP Congestion Control
Mechanisms/Algorithms
Basic control mechanism: sliding windows
Modern TCP implementations contain a number of
algorithms aimed at controlling network
congestion while maintaining good user
throughput
— Slow Start
— Congestion avoidance
— Fast retransmit
— Fast recovery
TCP-Tahoe implements the first 3 algorithms
TCP-Reno implements all the 4 algorithms
4
Slow Start
Why need slow start ?
With unknown conditions, TCP need to slowly probe
the network to determine available capacity
Slow start is used at the beginning of a transfer or
after retransmission timeout
TCP increments cwnd by at most SMSS bytes for
each ACK received (Additive increase)
Slow Start ends when cwnd > ssthresh or when
congestion is observed.
On Timeout ssthresh = max(Flight Size/2,IW)
TCP without Slow -Start
5
TCP with Slow - Start
Congestion Avoidance
Starts when cwnd > ssthresh
cwnd is incremented by atmost 1 full-sized segment
per roundtrip time
cwnd += SMSS * SMSS / cwnd
Stops when congestion is detected (timeout)
Sender sends the min(cwnd,rwnd)
6
Fast Retransmit
TCP coarse grained timeout is inefficient, waits too
long before it retransmits
receiver gets out-of-order packets, sends ACK for
expected packets
sender sees these as duplicate ACK’s.
after 3 duplicate ACKs, sender retransmits first
unacknowledged packet without waiting for retransmit
timeout
set ssthresh = max(Flight Size/2, IW) ----- (1)
set cwnd = ssthresh + 3*SMSS
Fast Recovery
For each additional dup. ACK increase cwnd by
SMSS
— Slow start is not performed because dup. ACK
indicates additional segment has left the network
Transmit a segment if allowed by cwnd and rwnd
When next ACK acknowledges the new data sent,
set cwnd = ssthresh as in (1) and come out of fast
recovery
7
Example of TCP Windowing
Congestion avoidance
Slow-start
Fast Retransmit/Recovery
2W
4
W+1
W
2
1 RTT RTT
RTT
TCP Tahoe
First implementation which had congestion avoidance
mechanisms
used new algorithms like slow-start, congestion
avoidance and fast retransmit
modification to the RTT estimator used for setting
retransmission timeout values
Disadvantages:
• Retransmitting packets which might have already been
successfully delivered.
8
TCP Reno
!
"
#
$
Enhancement of TCP Tahoe
modified Fast retransmit operation to include Fast
recovery
prevents the pipe from going empty after Fast
retransmit
avoids need to slow-start as in TCP Tahoe
Disadvantages:
• retransmits at most one dropped packet per RTT
• suffers when multiple packets are dropped from a single
window of data
Two States for TCP Reno
3 Duplicate ACK’s
Fast Recovery
Regular
Ack for retransmitted
pkt received
9
TCP Sack
%
Implementation
•Three Duplicate ACK’s require to trigger
Fast-Recovery.
•Reduce congestion window by half; don’t slow-start
•Response to further duplicate ACK’s
Main Difference from Reno: When multiple pkts are
lost from a single window of data
Two States for TCP SACK
3 Duplicate ACK’s
Fast Recovery
Regular
Ack for everything sent before
Fast Recovery
10
TCP SACK Header.
TCP OPTIONS
IP Header
MAX 40 Bytes
20 Bytes
05
Length
Left edge of Block 1
TCP Header
20 Bytes
Right edge of Block 1
Left edge of Block 2
Right edge of Block 2
TCP SACK contd..
&
On Entering Fast Recovery
• Retransmit one packet
• Cut the congestion window into half (“cwnd”)
• Estimate the number of packets in the pipe( “pipe”)
11
TCP SACK contd...
Behavior in Fast Recovery
'
• When and how much to send?
Whenever the number of packets in the pipe is less
than the cwnd.
• What to send?
Fill “holes”, one packet at a time, in sequence
number order.If there are no holes,send new packets
• If a retransmitted packet is itself dropped then slow-start
• The current implementation waits for a retransmit timer
to detect the dropped packet
TCP SACK contd..
(
Behavior in Fast Recovery : receiving ACK
• Duplicate ACK’s: Decrement “pipe” call “send”
• An ACK that ends Fast Recovery: call “send”
• An ACK that does not end Fast Recovery ( SACK )
Decrement “pipe” by two packets once for the
retransmitted packet, and once for the original packet
(now presumed to have been dropped ). Call “send”
12
TCP SACK contd...
)
Behavior in Fast Recovery: sending data pkt
• Send if the number of packets in the “pipe” is less than
cwnd
• Use the SACK scoreboard to determine which pkt to send
• Increment “ pipe”
• use maxburst parameter to send new data.
TCP SACK
Snd.fack =4
Snd.una =1
Snd.next =9
8
7
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
DATA
Send Buffer
Receive Buffer
2 3
ACK
Score Board
SENDER
9
4
RECEIVER
13
TCP SACK
Snd.fack =7
Snd.una =1
Snd.next =9
1
1 2 3 4 5 6 7 8
1 2 3 4 5 6 7 8
DATA
Send Buffer
R 2 3 4
Receive Buffer
6
ACK
Score Board
8
7
SENDER
RECEIVER
TCP Reno And Sack
.
-
,
+
*
Comparison of throughput and congestion window
One PC at UCLA and another at PSC
Behavior of an FTP connection, one with TCP SACK
and another with TCP Reno
The 2 FTP’s were done at different times of the day
with different network traffic
Key :
Seq. : Sequence number of the packet
cwnd : congestion window
14
TCP Reno (high)
UCLA -PSC Reno
1.2e+07
Seq
Cwnd —
1e +07
8e +06
6e +06
4e +06
2e +06
0
20
40
60
80
100
120
140
TCP SACK (high)
UCLA -PSC SACK
1.2e+07
Seq
Cwnd —
1e +07
8e +06
6e +06
4e +06
2e +06
0
20
40
60
80
100
120
140
15
Results
Throughput
/
TCP Sack : 81 kbytes/s
0
TCP Reno : 63 kbytes/s
1
2
TCP Sack / TCP Reno : 1.29
TCP Reno (Avg.)
UCLA -PSC Reno
1.8e+07
Seq
Cwnd —
1.6e +07
1.4e +07
1.2e +07
1e +07
8e +06
6e +06
4e +06
2e +06
0
20
40
60
80
100
120
140
16
TCP SACK (Avg.)
UCLA -PSC SACK
1.8e+07
Seq
1.6e +07
1.4e +07
Cwnd —
1.2e +07
1e +07
8e +06
6e +06
4e +06
2e +06
0
20
40
60
80
100
120
140
Results
Throughput
3
TCP Sack : 132 Kbytes/s
4
TCP Reno : 104 Kbytes/s
6
5
TCP Sack / TCP Reno : 1.27
17
TCP Reno (Low)
UCLA -PSC Reno
4.0e+07
Seq
Cwnd —
3.5e +07
3.0e +07
2.5e +07
2.0e +07
1.5e +07
1e +07
5e +06
0
20
40
60
80
100
120
140
TCP SACK(Low)
UCLA -PSC SACK
4.0e+07
Seq
Cwnd —
3.5e +07
3.0e +07
2.5e +07
2.0e +07
1.5e +07
1e +07
5e +06
0
20
40
60
80
100
120
140
18
Results
Throughput
7
TCP Sack : 257 Kbytes/s
8
TCP Reno : 221 Kbytes/s
9
TCP Sack / TCP Reno : 1.16
:
TCP NewReno
>
=
<
;
Enhances the performance of TCP Reno without the
addition of SACK
used to recover from multiple packet loss in a single
window of data
eliminates the TCP Reno’s wait for retransmit timer
when multiple packets are lost from window
use of partial ACK :
Acknowledgement of some but not all packets that were
outstanding at the start of that Fast recovery period
19
TCP NewReno contd...
Behavior in Fast Recovery
?
•What to send?
The packet immediately following the
acknowledged packet in partial ACK.
• If a retransmitted packet is itself dropped then slow-start
• The current implementation waits for a retransmit timer
to detect the dropped packet
TCP Vegas.
Uses a different congestion avoidance mechanism
than TCP -Reno
@
TCP Reno senses packet losses as a signal of
network congestion while TCP Vegas uses the
difference in the expected & actual rates to adjust its
window size.
B
A
Diff = (Expected - Actual) Base RTT. ------- (1)
20
TCP Vegas contd.
Source computes Expected = cwnd/BaseRTT
BaseRTT is the minimum round trip time.
C
Source computes Actual = cwnd/RTT
Computes the estimated Back log in the queue from Diff
obtained using equation (1)
Source updates its window size based on Diff as follows
D
E
F
H
cwnd =
cwnd +1
cwnd - 1
cwnd
if Diff <
if Diff >
otherwise
G
TCP Vegas contd.
I
TCP Vegas has a few problems
• Re routing
— Rerouting a path may change the propagation delay of
the connection
— There is no serious problem for TCP Vegas if the new
route has shorter propagation delay
— For a greater propagation delay BaseRTT must be
updated else this could lead to a substantial decrease
in throughput
21
TCP Vegas contd..
J
Problems contd...
• Persistent Congestion
Delay can increase due to Congestion/ Re routing
TCP Vegas updates its BaseRTT if there is an increase in
propagation delay
During congestion the BaseRTT should not be increased
TCP Vegas & Reno compared
S1
10 Mbps,1ms
10 Mbps,1ms
S3
1ms
R1
R2
1.5Mbps
S2
10 Mbps,1ms
10 Mbps,xms
S4
Network Topology
22
Comparison contd..
X
x
W1
w1
w2
W2
ACK1
ACK2
Ratio
4
3.5
3.5
21,425
16,068
1.33
13
4.0
7.0
17,522
19,965
1.14
22
4.0
7.0
20,061
17,427
1.15
58
4.0
13.0
19,507
17,973
1.09
148
4.0
30.0
16,398
1.29
21,068
TCP Vegas with varying propagation delays
Comparison contd..
X x
W1
W2
ACK1
ACK2
Ratio
4
21,100
15,637
1.35
13
25,460
11,785
2.16
22
25,684
11,672
2.20
58
34,429
2,627
13.11
148
35,598
959
37.12
TCP Reno with varying propagation delays
23
Comparison
X
W1
W2
Buffer
ACK(R)
ACK(V)
Reno/Vegas
4
13,010
24,308
0.535
7
16,434
20,903
0.786
10
22,091
15,365
1.438
15
25,397
12,051
2.107
25
30,798
6,621
4.652
50
34,443
2,936
11.730
Throughput of TCP Reno Vs Vegas
TCP Pacing
Pacing
N
M
L
K
TCP congestion control mechanism can produce
bursty traffic .
Explicit Rate Control is sending packets at a
predetermined rate.
Pacing is a hybrid between pure rate control and
TCP’s use of acknowledgement -uses the TCP
window to determine how much to send and uses
rates instead of ACK to determine when to send.
24
TCP Pacing contd.
Implementation
O
P
Q
R
Timeouts are scheduled regular intervals of duration
and is given by RTT/cwnd
A packet is transmitted from the window whenever
the timer fires - this ensures that packet
transmissions are spread across the whole duration
of RTT.
Pacing imposes the extra overhead of using a timer
for each packet transmitted.
Paced Reno & Reno compared
S1
4x Mbps,5ms
4x Mbps,5ms
R1
40ms
BS
BR
x Mbps
Sn
4x Mbps,5ms
4x Mbps,5ms
Rn
Network Topology for Simulation Experiments
25
Simulation results
Simulation Results
26
Simulation Results
Simulation Results
27
Comparisons between SACK,
Reno, NewReno and Tahoe
8Mbps,0.1ms
S1
R1
0.8Mbps,100ms
K1
R1 indicates finite buffer drop tail gateway
Network Topology for Simulation Experiments
Simulation with 1 dropped Pkt
28
Simulation with 1 dropped pkt
Simulation with 2 dropped pkt
29
Simulation with 2 dropped pkt
Simulation with 3 dropped pkt
30
Simulation with 3 dropped pkt
Simulation with 4 dropped pkt
31
Simulation with 4 dropped pkt
References
•RFC 896 Congestion Control in IP/TCP - J.Nagle
• Congestion Avoidance and control - Van Jacobson.
•[F 98] Revisions to RFC 2001- Sally Floyd.
ftp://ftp.ee.lbl.gov/talks/sf-tcpimpl-aug98.ps
• Simulation Based Comparison of Tahoe, Reno and SACK TCP - Sally
Floyd and Kevin Fall
ftp://ftp.ee.lbl.gov/papers/sacks.ps
• TCP and Successive Fast Retransmits - Sally Floyd.
ftp://ftp.ee.lbl.gov/papers/fastretans.ps
• Improving the start up behavior of a congestion control scheme for TCP
ACM SIGCOMM - J Hoe..
www.acm.org/sigcomm/sigcomm96/program.html
32
References
• Issues in TCP Slow Start Restart after Idle - Hughes A;Touch J; Heidemann .J
• TCP Selective Acknowledgment Options - Mathis M; Madhavi J; Floyd .S
A. Romanow.. RFC -2018
• RFC 2001 - W .Stevens.
• RFC 2581 - TCP Congestion Control - W Stevens.
• RFC 2582 New Reno Modification to TCP’s Fast Recovery Alg -S. Floyd.
• Understanding the performance of TCP Pacing - Thomas Anderson
• UCLA Internet Research Lab
http://irl.cs.ucla.edu/sack.psc.f.html
http://irl.cs.ucla.edu/sack.f.html
33
Download