TCP Flow and Congestion Control

advertisement
TCP - Part II
Relates to Lab 5. This is an extended module that covers TCP data
transport, and flow control, congestion control, and error control in TCP.
1
Interactive and bulk data
TCP applications can be put into the following categories
bulk data transfer
- ftp, mail, http
interactive data transfer
- telnet, rlogin
TCP has algorithms to deal which each type of applications
efficiently.
2
tcpdump of an rlogin session
rlogin session
from Argon
to Neon
Argon.cs.virginia.edu
Neon.cs.virginia.edu
This is the output of typing 3 (three) characters :
44.062449 argon.cs.virginia.edu.1023 > neon.cs.virginia.edu.login: P 0:1(1) ack 1
44.063317 neon.cs.virginia.edu.login > argon.cs.virginia.edu.1023: P 1:2(1) ack 1 win 8760
44.182705 argon.cs.virginia.edu.1023 > neon.cs.virginia.edu.login: . ack 2 win 17520
48.946471 argon.cs.virginia.edu.1023 > neon.cs.virginia.edu.login: P 1:2(1) ack 2 win 17520
48.947326 neon.cs.virginia.edu.login > argon.cs.virginia.edu.1023: P 2:3(1) ack 2 win 8760
48.982786 argon.cs.virginia.edu.1023 > neon.cs.virginia.edu.login: . ack 3 win 17520
55:00.116581 argon.cs.virginia.edu.1023 > neon.cs.virginia.edu.login: P 2:3(1) ack 3 win
17520
55:00.117497 neon.cs.virginia.edu.login > argon.cs.virginia.edu.1023: P 3:4(1) ack 3 win 8760
55:00.183694 argon.cs.virginia.edu.1023 > neon.cs.virginia.edu.login: . ack 4 win 17520
3
Rlogin
•
•
•
•
“Rlogin” is a remote terminal application
Originally built only for Unix systems.
Rlogin sends one segment per character (keystroke)
Receiver echoes the character back.
• So, we really expect to have four segments per keystroke
4
Rlogin
• We would expect that tcpdump
shows this pattern:
character
cter
ACK of chara
c
echo of chara
• However, tcpdump shows this
pattern:
ter
ACK of echoed character
character
ACK and echo
of character
ACK of echoed character
• So, TCP has delayed the
transmission of an ACK
5
Delayed Acknowledgement
• TCP delays transmission of ACKs for up to 200ms
• The hope is to have data ready in that time frame. Then, the
ACK can be piggybacked with the data segment.
• Delayed ACKs explain why the ACK and the “echo of
character” are sent in the same segment.
6
tcpdump of a wide-area rlogin session
rlogin session
between argon.cs.virginia.edu
and
tenet.cs.berkeley.edu
argon.cs.virginia.edu
tenet.cs.berkeley.edu
This is the output of typing 9 characters :
54:16.401963 argon.cs.virginia.edu.1023 > tenet.CS.Berkeley.EDU.login: P 1:2(1) ack 2 win 16384
54:16.481929 tenet.CS.Berkeley.EDU.login > argon.cs.virginia.edu.1023: P 2:3(1) ack 2 win 16384
54:16.482154 argon.cs.virginia.edu.1023 > tenet.CS.Berkeley.EDU.login: P 2:3(1) ack 3 win 16383
54:16.559447 tenet.CS.Berkeley.EDU.login > argon.cs.virginia.edu.1023: P 3:4(1) ack 3 win 16384
54:16.559684 argon.cs.virginia.edu.1023 > tenet.CS.Berkeley.EDU.login: P 3:4(1) ack 4 win 16383
54:16.640508 tenet.CS.Berkeley.EDU.login > argon.cs.virginia.edu.1023: P 4:5(1) ack 4 win 16384
54:16.640761 argon.cs.virginia.edu.1023 > tenet.CS.Berkeley.EDU.login: P 4:8(4) ack 5 win 16383
54:16.728402 tenet.CS.Berkeley.EDU.login > argon.cs.virginia.edu.1023: P 5:9(4) ack 8 win 16384
7
Wide-area Rlogin: Observation 1
• Transmission of segments
follows a different pattern.
• The delayed acknowledgment does not kick in
• Reason is that there is
always data at aida when
the ACK arrives.
char1
r1
+ echo of cha
ACK of char 1
ACK + char2
f
ACK + echo o
char2
8
Wide-area Rlogin: Observation 2
• There are fewer transmissions than there are characters.
• Aida never has multiple segments outstanding.
• This is due to Nagle’s Algorithm:
Each TCP connection can have only one small (1-byte)
segment outstanding that has not been acknowledged.
• Implementation: Send one byte and buffer all subsequent bytes
until acknowledgement is received.Then send all buffered bytes in a
single segment. (Only enforced if data is arriving from application one
byte at a time)
• Nagle’s rule reduces the amount of small segments.
The algorithm can be disabled.
9
TCP:
Flow Control
Congestion Control
Error Control
10
What is Flow/Congestion/Error Control ?
• Flow Control:
Algorithms to prevent that the sender
overruns the receiver with information?
• Congestion Control: Algorithms to prevent that the sender
overloads the network
• Error Control:
Algorithms to recover or conceal the
effects from packet losses
 The goal of each of the control mechanisms is different.
 But the implementation is combined
11
TCP Flow Control
12
TCP Flow Control
•
TCP implements sliding window flow control
• Sending acknowledgements is separated from setting
the window size at sender.
• Acknowledgements do not automatically increase the
window size
• Acknowledgements are cumulative
13
Sliding Window Flow Control
• Sliding Window Protocol is performed at the byte level:
Advertised window
1
2
sent and
acknowledged
3
4
5
sent but not
acknowledged
6
7
8
can be sent
USABLE
WINDOW
9
10 11
can't sent
•Here: Sender can transmit sequence numbers 6,7,8.
14
Sliding Window: “Window Closes”
• Transmission of a single byte (with SeqNo = 6) and acknowledgement is
received (AckNo = 5, Win=4):
1
2
3
4
5
6
7
8
9
10 11
Transmit Byte 6
1
2
3
4
5
6
7
8
9
10 11
AckNo = 5, Win = 4
is received
1
2
3
4
5
6
7
8
9
10 11
15
Sliding Window: “Window Opens”
• Acknowledgement is received that enlarges the window to the right
(AckNo = 5, Win=6):
1
2
3
4
5
6
7
8
9
10 11
AckNo = 5, Win = 6
is received
1
2
3
4
5
6
7
8
9
10 11
• A receiver opens a window when TCP buffer empties (meaning that data
is delivered to the application).
16
Sliding Window: “Window Shrinks”
• Acknowledgement is received that reduces the window from the right
(AckNo = 5, Win=3):
1
2
3
4
5
6
7
8
9
10 11
AckNo = 5, Win = 3
is received
1
2
3
4
5
6
7
8
9
10 11
• Shrinking a window should not be used
17
Window Management in TCP
• The receiver is returning two parameters to the sender
AckNo
window size
(win)
32 bits
16 bits
• The interpretation is:
• I am ready to receive new data with
SeqNo= AckNo, AckNo+1, …., AckNo+Win-1
• Receiver can acknowledge data without opening the window
• Receiver can change the window size without acknowledging
data
18
Sliding Window: Example
Receiver
Buffer
Sender
sends 2K
of data
0
4K
2K SeqNo=0
2K
Sender blocked
Sender
sends 2K
of data
AckNo=2048
Win=2048
2K SeqNo=2
048
4K
Win=0
AckNo=4096
3K
Win
AckNo=4096
=1024
19
TCP Congestion Control
20
TCP Congestion Control
• TCP has a mechanism for congestion control. The
mechanism is implemented at the sender
• The window size at the sender is set as follows:
•Send Window = MIN (flow control window, congestion window)
where
• flow control window is advertised by the receiver
• congestion window is adjusted based on feedback from the
network
21
TCP Congestion Control
• The sender has two additional parameters:
– Congestion Window (cwnd)
Initial value is 1 MSS (=maximum segment size) counted as bytes
– Slow-start threshold Value (ssthresh)
Initial value is the advertised window size)
• Congestion control works in two modes:
– slow start (cwnd < ssthresh)
– congestion avoidance (cwnd >= ssthresh)
22
Slow Start
• Initial value:
– cwnd = 1 segment
• Note: cwnd is actually measured in bytes:
1 segment = MSS bytes
• Each time an ACK is received, the congestion window is increased by
MSS bytes.
– cwnd = cwnd + MSS
– If an ACK acknowledges two segments, cwnd is still increased by only 1
segment.
– Even if ACK acknowledges a segment that is smaller than MSS bytes long,
cwnd is increased by MSS.
• Does Slow Start increment slowly? Not really.
In fact, the increase of cwnd can be exponential
23
Slow Start Example
• The congestion
window size grows
very rapidly
– For every ACK, we
increase cwnd by
1 irrespective of
the number of
segments ACK’ed
• TCP slows down the
increase of cwnd
when
cwnd > ssthresh
cwnd =
1xMSS
segment 1
t1
ACK for segmen
cwnd =
2xMSS
cwnd =
4xMSS
cwnd =
7xMSS
segment 2
segment 3
ts 2
ACK for segmen
ts 3
ACK for segmen
segment 4
segment 5
segment 6
ts 4
ACK for segmen
ts 5
ACK for segmen
ts 6
ACK for segmen
24
Congestion Avoidance
• Congestion avoidance phase is started if cwnd has reached
the slow-start threshold value
• If cwnd >= ssthresh then each time an ACK is received,
increment cwnd as follows:
• cwnd = cwnd + MSS(MSS/ cwnd)
• So cwnd is increased by one segment (=MSS bytes) only if all
segments have been acknowledged.
25
Slow Start / Congestion Avoidance
• Here we give a more accurate version than in our earlier
discussion of Slow Start:
If cwnd <= ssthresh then
Each time an Ack is received:
cwnd = cwnd + MSS
else /* cwnd > ssthresh */
Each time an Ack is received :
cwnd = cwnd + MSS. MSS / cwnd
endif
26
Example of
Slow Start/Congestion Avoidance
Assume that ssthresh = 8
cwnd = 1
cwnd = 2
cwnd = 4
12
10
cwnd = 8
ssthresh
8
6
4
2
cwnd = 9
6
t=
4
t=
2
t=
0
0
t=
Cwnd (in segments)
14
Roundtrip times
cwnd = 10
27
Responses to Congestion
• Most often, a packet loss in a network is due to an overflow at
a congested router (rather than due to a transmission error)
• So, TCP assumes there is congestion if it detects a packet
loss
• A TCP sender can detect lost packets via:
• Timeout of a retransmission timer
• Receipt of a duplicate ACK
• When TCP assumes that a packet loss is caused by
congestion it reduces the size of the sending window
28
TCP Tahoe
• Congestion is assumed if sender has timeout or receipt of
duplicate ACK
• Each time when congestion occurs,
– cwnd is reset to one:
cwnd = MSS
– ssthresh is set to half the current size of the congestion
window:
ssthressh = cwnd / 2
– and slow-start is entered
29
Slow Start / Congestion Avoidance
• A typical plot of cwnd for a TCP connection (MSS = 1500
bytes) with TCP Tahoe:
30
TCP Error Control
Background on Error Control
TCP Error Control
31
Background: ARQ Error Control
• Two types of errors:
– Lost packets
– Damaged packets
• Most Error Control techniques are based on:
1. Error Detection Scheme (Parity checks, CRC).
2. Retransmission Scheme.
• Error control schemes that involve error detection and
retransmission of lost or corrupted packets are referred to as
Automatic Repeat Request (ARQ) error control.
32
Background: ARQ Error Control
All retransmission schemes use all or a subset of the following
procedures:
Positive acknowledgments (ACK)
Negative acknowledgment (NACK)
All retransmission schemes (using ACK, NACK or both)
rely on the use of timers
The most common ARQ retransmission schemes are:
Stop-and-Wait ARQ
Go-Back-N ARQ
Selective Repeat ARQ
33
Background: ARQ Error Control
• The most common ARQ retransmission schemes:
– Stop-and-Wait ARQ
– Go-Back-N ARQ
– Selective Repeat ARQ
• The protocol for sending ACKs in all ARQ protocols are based
on the sliding window flow control scheme
34
Background: Stop-and-Wait ARQ
• Stop-and-Wait ARQ is an addition to the Stop-and-Wait flow
control protocol:
• Packets have 1-bit sequence numbers (SN = 0 or 1)
• Receiver sends an ACK (1-SN) if packet SN is correctly
received
• Sender waits for an ACK (1-SN) before transmitting the next
packet with sequence number 1-SN
• If sender does not receive anything before a timeout value
expires, it retransmits packet SN
35
Background: Stop-and-Wait ARQ
• Lost Packet
Timeout
A
B
36
Background: Go-Back-N ARQ
Operations:
– A station may send multiple packets as allowed by
the window size
– Receiver sends a NAK i if packet i is in error. After that, the
receiver discards all incoming packets until the packet in error
was correctly retransmitted
– If sender receives a NAK i it will retransmit packet i and all
packets i+1, i+2,... which have been sent, but not been
acknowledged
37
Example of Go-Back-N ARQ
packets waiting
for ACK/NAK
1
2
3
2
3
4
A
2
ACK2
B
1
packet 1 is received, send ACK 2
A
2
3
4
3
packets
received
A
4
3
4
3
2
B
1
B
1
• In Go-back-N, if
packets are correctly
delivered, they are
delivered in the correct
sequence
• Therefore, the receiver
does not need to keep
track of `holes’ in the
sequence of delivered
packets
Time out for Packet 2
retransmit frame 2,3,4
38
Background: Go-Back-N ARQ
• Lost Packet
Timeout Packets 4,5,6
for Packet 2
are
retransmitted
A
B
Packets 5 and 6
are discarded
39
Background: Selective-Repeat ARQ
• Similar to Go-Back-N ARQ. However, the sender only
retransmits packets for which a time-out occured is received
• Advantage over Go-Back-N:
– Fewer Retransmissions.
• Disadvantages:
– More complexity at sender and receiver
– Each packet must be acknowledged individually (no
cumulative acknowledgements)
– Receiver may receive packets out of sequence
40
Example of Selective-Repeat ARQ
Packets waiting
for ACK/NAK
1
2
3
2
3
4
2
3
4
5
A
A
3
Frames
received
2
B
1
ACK2
Packet is correct, send ACK 2
Packet 2 does not arrive
4
3
ACK4
B
1
ACK5
Following packets are Acked
A
5
2
Timeout for packet 2:
retransmit only packet2
B
1
3
4
Receiver must keep
track of `holes’ in
the sequence of
delivered packets
Sender must
maintain one timer
per outstanding
packet
41
Background: Selective-Repeat ARQ
• Lost Packet
Timeout for Packet 4:
only Packet 4
is retransmitted
A
B
Packets 5 and 6
are buffered
42
Error Control in TCP
• TCP implements a variation of the Go-back-N retransmission
scheme
• TCP maintains a Retransmission Timer for each
connection:
– The timer is started during a transmission. A timeout
causes a retransmission
• TCP couples error control and congestion control (I.e., it
assumes that errors are caused by congestion)
• TCP allows accelerated retransmissions (Fast Retransmit)
43
TCP Retransmission Timer
• Retransmission Timer:
– The setting of the retransmission timer is crucial for
efficiency
– Timeout value too small  results in unnecessary
retransmissions
– Timeout value too large  long waiting time before a
retransmission can be issued
– A problem is that the delays in the network are not fixed
– Therefore, the retransmission timers must be adaptive
44
Round-Trip Time Measurements
• The retransmission mechanism of TCP is adaptive
• The retransmission timers are set based on round-trip time
(RTT) measurements that TCP performs
Segment 1
RTT #1
Segment 2
Segment 3
RTT #2
ent 2 + 3
egm
ACK for S
Segment
5
RTT #3
The RTT is based on time difference
between segment transmission and
ACK
But:
TCP does not ACK each
segment
Each connection has only one
timer
t1
en
ACK for Segm
egm
ACK for S
ACK for S
Segme
n
t4
ent 4
egment 5
45
Round-Trip Time Measurements
• Retransmission timer is set to a Retransmission Timeout
(RTO) value.
• RTO is calculated based on the RTT measurements.
• The RTT measurements are smoothed by the following
estimators srtt and rttvar:
srttn+1 = a RTT + (1- a ) srttn
rttvarn+1 = b ( | RTT - srttn+1 | ) + (1- b ) rttvarn
RTOn+1 = srttn+1 + 4 rttvarn+1
• The gains are set to a =1/4 and b =1/8
• srtt0 = 0 sec, rttvar0 = 3 sec, Also: RTO1 = srtt1 + 2 rttvar1
46
Karn’s Algorithm
segme
nt
Timeout !
RTT ?
RTT ?
• If an ACK for a retransmitted
segment is received, the sender
cannot tell if the ACK belongs to
the original or the
retransmission.
retransm
ission
of segm
ent
ACK
Karn’s Algorithm:
Don’t update srtt on any segments that have been retransmitted.
Each time when TCP retransmits, it sets:
RTOn+1 = max ( 2 RTOn, 64)
(exponential backoff)
47
Measuring TCP Retransmission Timers
argon.tcpip-lab.edu
("Argon")
neon.tcpip-lab.edu
("Neon")
Transfer file
Web client
Web server
•Transfer file from Argon to neon
• Unplug Ethernet of Argon cable in the middle of file transfer
48
Interpreting the Measurements
600
500
400
Seconds
• The interval between
retransmission attempts in
seconds is:
1.03, 3, 6, 12, 24, 48, 64,
64, 64, 64, 64, 64, 64.
• Time between retransmissions is doubled each
time (Exponential Backoff
Algorithm)
• Timer is not increased
beyond 64 seconds
• TCP gives up after 13th
attempt and 9 minutes.
300
200
100
0
0
2
4
6
8 10
Transmission Attempts
12
49
Download