CS 268: TCP & DECbit Congestion Control

advertisement
TCP Congestion Control
Anthony D. Joseph
CS262a
November 14, 2001
Topics
• End-to-end reliable transport layer
– Example: ARQ
• Flow-control (not today)
– Window- / rate-based control
– Avoid over-running receiver
• TCP congestion control
– Share the network fairly
February 9, 2000
CS 268 Lecture #7
2
Automatic Repeat/reQuest (ARQ)
• Motivation: congestion/flow control
intertwined with reliable transport
• Basis for most reliable transport schemes
• Relies on acknowledgments (ACK) and
timeouts
• Source sends packet
• Receiver ACKs each packet
• If data or ACK lost, timeout triggers and
source re-transmits
• Simplest version: Stop-and-Wait
February 9, 2000
CS 268 Lecture #7
3
What if ACK is Dropped?
• Receiver might assemble duplicate
frames
• Solve problem with sequence number
• How many bits?
February 9, 2000
CS 268 Lecture #7
4
A
Alternating Bit Protocol
msg, #0
ack, #0
msg, #1
ack, #1
msg, #0
ack, #0
February 9, 2000
B
• 1-bit sequence number
• Why is this inefficient?
• Consider 1Mb/s link,
100ms path delay, 1000
byte packets
• Send rate is only 10kb/s
<< 1Mb/s!
CS 268 Lecture #7
5
Sliding Window
• Pipelining: transmit next packet
before current one ACK'd
• Window limits the amount of
outstanding data
• How big should this window be?
• Twice bandwidth-delay product
– Keep the “pipe” full
• Selfish! Really only your share of pipe
February 9, 2000
CS 268 Lecture #7
6
Congestion Control
• Goal is to fill pipe, but what if
resources cannot keep pace?
• Congestion is not solved with
– Faster links
• 2x every 6 months, demand 2x / 6 months
– Larger queue buffers
– Better routing protocols
February 9, 2000
CS 268 Lecture #7
7
Design Space
• Router versus host
• Reservations (proactive) versus
adaptive (reactive)
• Window-based versus rate-based
February 9, 2000
CS 268 Lecture #7
8
Congestion Avoidance and
Control Paper Contributions
• Seminal paper on congestion control
for TCP and the Internet
• Widely cited, huge impact
• Slow-start (similar to Jain's CUTE
algorithm)
• Lots of nice intuition
• Creative, intuitive, and artistic
application of theory to practice
– But not fully rigorous
February 9, 2000
CS 268 Lecture #7
9
Motivation: History
• Explosive growth in networks 
congestion problems
• Oct 1986: Internet has series of
“congestion collapses”
– “Collapse” = increment in offered load
causes decrement in performance
• LBL/UCB throughput down by 1000
(32 kb/s to 40 b/s)
February 9, 2000
CS 268 Lecture #7
10
Collapse Questions
• Was 4.3BSD TCP mis-behaving?
• Could it be tuned to work under such
heavy network load?
• “Yes” to both
February 9, 2000
CS 268 Lecture #7
11
Results: 7 new algorithms in BSD/TCP
•
•
•
•
•
•
(1) RTT variance estimation (not just mean)
(2) Exponential timer backoff
(3) Slow-start
(4) Aggressive receiver ACKs (omitted)
(5) Dynamic window adaptation
(6) Karn's clamped retransmit backoff
(omitted)
• (7) fast retransmit (omitted)
February 9, 2000
CS 268 Lecture #7
12
Theme: “Conservation of Packets”
• In equilibrium, don't introduce new
packet until old packet leaves system
• 3 ways for conservation to fail:
– Connection doesn't reach equilibrium
– Sender injects new packets before old
ones leave
– Can't reach equilibrium because of
resource limits
February 9, 2000
CS 268 Lecture #7
13
Three Mechanisms
• (1) Getting to equilibrium
• (2) Staying at equilibrium
• (3) Adjusting equilibrium point for
dynamic contention
February 9, 2000
CS 268 Lecture #7
14
Getting to Equilibrium: Slow-Start
• Conservation easy to maintain with
“self-clocking”
– Generate a new data packet for each
received ACK
– But this makes the system hard to start
February 9, 2000
CS 268 Lecture #7
15
Solution: slow-start
• Subtle but, simple algorithm
• Introduce congestion window, cwnd, at
sender
• Upon start-up (or re-start after packet
loss), set cwnd to 1
• For each ACK, increase cwnd by 1
• When sending, send min(cwnd, wnd) where
“wnd” is receiver's advertised window
February 9, 2000
CS 268 Lecture #7
16
Slow-Start is Not Too Slow
• Window grows exponentially
• So time to get to window W (from 1)
is R log2 W for round-trip time R
• Guarantees that source will blast at
most twice the bottleneck rate
– Peaks of 200x before SS!
• Note: slow-start does NOT obey the
packet conservation principle!
February 9, 2000
CS 268 Lecture #7
17
Staying at Equilibrium: RTT
• If property (2) from above violated,
means round-trip timer failed
– Caused retransmission before last
packet left network
– Real timer algorithms often inadequate
(see Zhang's work)
February 9, 2000
CS 268 Lecture #7
18
Common Mistake
• Not estimating variance
– R, R increase with load, , in proportion to (1 - )-1
– For 75% utilization, R varies by a factor of 16
• TCP spec said R   R + (1 -  ) M
– where  = 0.9 and timeout value of R, ( = 2)
– but  = 2 works only up to 30% load...
– Instead, scale  dynamically in proportion to R
• Actually compute mean deviation to make code simper;
see appendix
– Note: improves low-load performance as well
(where  < 2)
February 9, 2000
CS 268 Lecture #7
19
Adjusting Equilibrium Point
for Dynamic Contention
• Two components of “Congestion
Avoidance”
• (1) Network must signal endpoints
• (2) Endpoints must react to signal by
decreasing utilization
February 9, 2000
CS 268 Lecture #7
20
How to Generate Congestion Signal
• Congestion  packet loss 
timeout  congestion signal
• No need to change network
– Unlike DECbit which requires all routers
on path to be modified
February 9, 2000
CS 268 Lecture #7
21
Endpoint action
• Queue length averaged over RTT is Li
• For constant queue size (steady-state), Li = N
• But, according to paper, under congestion:
Li = N + Li-1
– i.e., have to add in remnant leftover from previous
averaging interval
• Closed form is: Li = nL0
• Hence queue grows exponentially fast
• Thus, sender should back-off exponentially
fast by adjusting its window as follows:
Wi = d Wi-1 (d < 1)
February 9, 2000
CS 268 Lecture #7
22
Growing the Window
• While backoff triggered by packet
loss, how do we know when there's
spare capacity?
• Nothing tells us  experiment with
spontaneous increase
– Use multiplicative increase?
• Leads to instabilities
• Easy to drive system into saturation, but
hard to recover (rush-hour effect)
February 9, 2000
CS 268 Lecture #7
23
Growing the Window (cont’d)
• “Without justification” (see Jain and
Chiu), “best” increase policy is small,
constant changes (additive):
– Wi = Wi-1 + u (u << Wmax), where Wmax is
the “pipe size”
• Borrowed additive increase /
multiplicative decrease algorithm
from Jain et al, except for choice of
parameters (claimed universal)
February 9, 2000
CS 268 Lecture #7
24
Easy Implementation
• On timeout, set cwnd to ½ current
window
• On each ACK, increase cwnd by 1/cwnd
• When sending send min(cwnd, wnd)
February 9, 2000
CS 268 Lecture #7
25
Slow-Start and Congestion Avoidance
• Two separate algorithms with very
different objectives, but they should be
implemented together:
– Maintain variable “ssthresh”, which represents
crossover point
– On timeout
• Set ssthresh to 1/2 current window (x decrease)
• Set cwnd to 1 (initiate slow-start)
– Initiate slow-start up to ssthresh
– Beyond ssthresh, enter congestion avoidance
“linear mode”
• Yields impressive performance results
February 9, 2000
CS 268 Lecture #7
26
Future Work
• Endpoints can ensure that capacity is not
exceeded, but cannot necessarily find a fair
operating point
• Only in gateways, where flows converge, is there
enough information to control sharing and fair
allocation (see Jaffe)
• Some research directions
– Send signal early (basis for Floyd's RED gateways)
– Punish misbehaving hosts by sending drop signal
(Floyd/Fall)
February 9, 2000
CS 268 Lecture #7
27
Fast Retransmit
• (Omitted in paper)
• Silly to wait for timeout if we have a good
idea packet was dropped
–
–
–
–
–
–
Duplicate ACK means missing data
Maybe just reordered, wait for 3 dup ACKs
Respond with (fast) retransmit
Backoff by setting ssthresh to cwnd/2
Fast recovery: set cwnd to cwnd/2 (no slow-start)
If this doesn't work, window stalls and timeout
eventually fires
February 9, 2000
CS 268 Lecture #7
28
Discussion
• Pipes getting larger, connections getting
smaller (or staying fixed...)
• What happens if fair share of bandwidth is
less than one packet per RTT?
– Proposal for combined rate/window-based
congestion control
• All of this seems like trial and error?
– Where's the theory?
• What if we could change the routers?
– RED, FQ, real-time scheduling...
February 9, 2000
CS 268 Lecture #7
29
Discussion
• How to deal with malicious senders?
– Router-based controls
• How to deal with malicious receivers?
– Sender-side sanity checking / security
– Router-based controls
February 9, 2000
CS 268 Lecture #7
30
Take Aways / Recap
• Additive increase / multiplicative
decrease
• Slow-start bottleneck “search
algorithm”
• Distributed fairness sounds easy but
hard (Jaffe)
February 9, 2000
CS 268 Lecture #7
31
Download