8. TCP Congestion Control

advertisement
8. TCP Congestion Control
최양희
서울대학교 컴퓨터공학부
1
TCP Congestion Control
ƒ
ƒ
ƒ
ƒ
ƒ
Slow-start increase
Multiplicative decrease
Congestion avoidance
Measurement of variation
Exponential timer backoff
2002 Yanghee Choi
2
Congestion Control in TCP
ƒ To avoid congestion collapse, TCP must reduce
transmission rates when congestion occurs
ƒ Routers watch queue lengths and use techniques
ICMP source quench to inform host that congestion
has occurred
ƒ TCP uses packet drops and timeout as congestion
indication
ƒ To avoid congestion in advance, the sender must
adapt its transmission window to the available link
bandwidth
ƒ TCP connection’s rate is determined by transmission
window/round trip time
2002 Yanghee Choi
3
Congestion
ƒ Congestion: a condition of severe delay caused by an
overload of datagrams at one or more switching point
(e.g., at routers)
• When the sum of the connection rates over a link is
higher than the link’s rate, segments can be dropped
Transmission
rate adjustment
Transmission
network
Small-capacity
receiver
2002 Yanghee Choi
Internal
congestion
Large-capacity
receiver
4
Multiplicative Decrease
ƒ Upon loss of a segment, reduce the
congestion window by half (down to a
minimum of at least one segment).
ƒ For those segments that remain in the
allowed window, backoff the retransmission
timer exponentially
ƒ Provides quick and significant traffic reduction
to allow routers enough time to clear the
datagrams already in their queues
2002 Yanghee Choi
5
Additive Increase
ƒ Increment = (MSS x MSS)/Congestion
Window
CongestionWindow = CongestionWindow +
Increment
ƒ Add 1 segment to CongestionWindow, if each
packet sent out during the last RTT has been
ACKed.
ƒ Increment CongestionWindow by a little for
each ACK that arrives.
2002 Yanghee Choi
6
Slow Start
ƒ On connection establishment, TCP uses a window of the size of
1 MSS Congestion Window
ƒ At any time the sender has a transmission window of
Allowed_window = min(receiver_advertisement,
congestion_window)
ƒ Slow-Start(Additive) Recovery
• Whenever starting traffic on a new connection or increasing
traffic after a period of congestion, start the congestion
window at the size of a single segment and increase the
congestion window by one segment each time an ACK
arrives
• Avoids swamping the internet with additional traffic
immediately after congestion clears or when new connection
suddenly start
2002 Yanghee Choi
7
Slow Start
ƒ With the slow start scheme the congestion window is
exponentially increased
ƒ This can quickly congest the network and cause
packet drops
ƒ Once the congestion window reaches one half of its
original size before congestion, TCP enters a
congestion avoidance phase
ƒ During congestion avoidance, it increases the
congestion window by 1 only if all segments in the
window have been ACKed
2002 Yanghee Choi
8
Slow Start
ƒ Packet Injection rate = ACK Return Rate
ƒ Congestion Window (cwnd)
• initialized to one segment
• upon receiving ACK, cwnd is increased by one segment
ƒ
ƒ
ƒ
ƒ
current window = min (cwnd, advertised window)
Congestion Window = Flow Control by the Sender
Advertised Window = Flow Control by the Receiver
Exponential Increase
2002 Yanghee Choi
9
Congestion Avoidance
Algorithm
ƒ Slow Start Threshold Size (ssthresh)
ƒ ssthresh = 1/2 * (current window size),
if congested, i.e. timeout or duplicate
ACKs
ƒ cwnd = one segment, if timeout
cwnd < ssthresh cwnd incremented at every ACK
(slow start)
ƒ cwnd > ssthresh cwnd incremented in one RTT
(congestion avoidance)
Initial Value :
ssthresh = 65535 bytes, cwnd = one segment
2002 Yanghee Choi
10
Congestion Avoidance :
Example
2002 Yanghee Choi
11
2002 Yanghee Choi
12
Fast Retransmit/Recovery
ƒ (out-of-order segment is received) ---> (duplicate
ACK sent)
---> (congestion avoidance)
ƒ Jacobson’s modification :
• wait for three successive duplicate ACKs before
retransmission
(not waiting for the retransmission timeout) : Fast Retransmit
• then, congestion avoidance is performed (not slow start) :
Fast Recovery
ƒ 20% improvement in the throughput
2002 Yanghee Choi
13
2002 Yanghee Choi
14
Silly Window Syndrome
Receiver’s buffer is full
Application reads 1 byte
Room for one more byte
Header
Header
1 byte
2002 Yanghee Choi
Window update segment sent
New byte arrives
Receiver’s buffer is full
15
Silly Window Syndrome
ƒ SWS(Silly Window Syndrome)
• Each ACK advertises a small amount of space available and
each segment carries a small amount of data
• Consumes unnecessary network bandwidth
• Introduce unnecessary computational overhead
ƒ Avoiding silly window syndrome
• Sender avoids transmitting a small amount of data in each
segment
• Receiver avoids sending small increments in window
advertisements that can trigger small data packets
• TCP software must contain both sender and receiver silly
window syndrome avoidance code
2002 Yanghee Choi
16
Silly Window Syndrome
ƒ Receive-side silly window avoidance
• Before sending an updated window advertisement
after advertising a zero window, wait for space to
become available that is either at least 50% of the
total buffer size or equal to a maximum sized
segment
ƒ Delayed acknowledgements
• TCP delays sending an ACK when silly window
avoidance specifies that the window is not
sufficiently large to advertise
2002 Yanghee Choi
17
Bandwidth-Delay Product
ƒ Bandwidth-Delay Product
Pipe Capacity = BW X RTT
T1 across USA = 11,580 bytes
T3 across USA = 337,500 bytes
> max. allowable TCP window advertisement
(65535 bytes)
window scale option is used.
2002 Yanghee Choi
18
Timeout and Retransmission
ƒ Exponential Backoff
ƒ Upper Limit = 64 sec.
ƒ Round-Trip Time Measurement
Original
Jacobson
Karn’s Algorithm
2002 Yanghee Choi
19
Path MTU Discovery
ƒ Path MTU = minimum MTU in the path between two
hosts
ƒ Discovery by setting “don’t fragment” bit in the IP
header
ƒ ICMP “can’t fragment” error returned by a router
---> retransmit with reduced segment size
ƒ Route change ---> larger MTU may be possible
Try this every 10 minute. (rfc 1191)
2002 Yanghee Choi
20
Window Scale Option
ƒ Long Fat Pipe Network needs very large window size
ƒ Increases TCP window from 16 bits to 32 bits
ƒ 16 bits in the TCP header,
16 bits by window scale option (left shift operation)
Window = W(in header) * 2^Scale (in option)
Max. window = 65535 * 2^14 = 1,073,725,440 bytes
ƒ present only in SYN, SYN+ACK segments
ƒ can be different in both directions
ƒ Shift count is automatically chosen by TCP, based on
the size of the receive buffer
ƒ rfc 1323
2002 Yanghee Choi
21
Window Scale Option
2002 Yanghee Choi
3
3
1 byte
1 byte
Shift
count
1 byte
22
Timestamp Option
ƒ Sender places timestamp value
ƒ Receiver echoes the received timestamp in
ACK
ƒ Receiver does not know the time unit (just
echo)
ƒ No clock synchronization is required
ƒ Different from ICMP timestamp
ƒ Used for TCP level RTT calculation
ƒ RFC 1323
2002 Yanghee Choi
23
Timestamp Option
8
10
1 byte 1 byte
2002 Yanghee Choi
Timestamp value
4 bytes
Sender’s
Timestamp
Timestamp value
4 bytes
Most recently
Received
Timestamp value
24
TCP A
TCP B
<A,TSval=1,TSecr=120> ------>
RTT
<---- <ACK(A),TSval=127,TSecr=1>
RTT
<B,TSval=5,TSecr=127> ------>
<---- <ACK(B),TSval=131,TSecr=5>
......................
<C,TSval=65,TSecr=131> ------>
<---- <ACK(C),TSval=191,TSecr=65>
(etc)
2002 Yanghee Choi
25
TCP Performance
ƒ Host Network Interface is generally the
bottleneck point.
ƒ Measured performance limit
• Ethernet
8.6Mbps
• FDDI
80-98Mbps
• HIPPI (800M) 781Mbps
2002 Yanghee Choi
26
TCP Timers
ƒ Retransmission Timer
wait for ACK for normal data transfer
ƒ Persist Timer
to query the receiver to find out if the
window has been increased
ƒ Keepalive Timer
to know if the other end is still there
ƒ 2MSL Timer
when closing a connection, 120 seconds
2002 Yanghee Choi
27
TCP Persist Timer
win=0
window probe
win=0
ACK(win=0)
win=256
window probe
lost
ACK(win=0)
Deadlock
window probe
ACK(win=0)
Persist Timer
(normal TCP Exponential backoff, max = 60 sec)
2002 Yanghee Choi
28
TCP Keepalive Timer
ƒ To know if the other end is still there, after 2
hours of idle period, a probe segment is sent
ƒ Probe = segment with no data, but with
incorrect sequence number, receiver will
respond with correct sequence number
ƒ Does not distinguish network problem from
server problem
ƒ Controversial : transport layer function,
or application layer function
2002 Yanghee Choi
29
Download