12-Congestion.ppt

advertisement
EECS 122:
Introduction to Computer Networks
Congestion Control
Computer Science Division
Department of Electrical Engineering and Computer Sciences
University of California, Berkeley
Berkeley, CA 94720-1776
Katz, Stoica F04
Today’s Lecture: 10
2
17, 18, 19
6
10,
Application
Transport
11
14,
15, 16
7, 8, 9
21, 22, 23
25
Network (IP)
Link
Physical
Katz, Stoica F04
2
Finishing Last Lecture
Katz, Stoica F04
Big Picture

Where do IP routers belong?
Communication
Network
Switched
Communication
Network
Circuit-Switched
Communication
Network
Broadcast
Communication
Network
Packet-Switched
Communication
Network
Datagram
Network
Virtual Circuit Network
Katz, Stoica F04
4
Packet (Datagram) Switching Properties

Expensive forwarding
- Forwarding table size depends on number of different
destinations
- Must lookup in forwarding table for every packet

Robust
- Link and router failure may be transparent for end-hosts

High bandwidth utilization
- Statistical multiplexing

No service guarantees
- Network allows hosts to send more packets than
available bandwidth  congestion  dropped packets
Katz, Stoica F04
5
Virtual Circuit (VC) Switching

Packets not switched independently
- Establish virtual circuit before sending data

Forwarding table entry
- (input port, input VCI, output port, output VCI)
- VCI – Virtual Circuit Identifier


Each packet carries a VCI in its header
Upon a packet arrival at interface i
- Input port uses i and the packet’s VCI v to find the routing entry (i,
v, i’, v’)
- Replaces v with v’ in the packet header
- Forwards packet to output port i’
Katz, Stoica F04
6
VC Forwarding: Example
in in-VCI out out-VCI
…
…
… …
in in-VCI out out-VCI
…
…
… …
source
3
…
5
5
…
4
…
1
2
3
4
1
2
3
4
1
…
7
…
4
…
1
…
destination
11
…
1
2
3
4
1
2
3
4
11
1
2
3
4
1
2
3
4
1
7
in in-VCI out out-VCI
…
…
… …
2
…
11
…
3
…
7
…
Katz, Stoica F04
7
VC Forwarding (cont’d)

A signaling protocol is required to set up the state
for each VC in the routing table
- A source needs to wait for one RTT (round trip time)
before sending the first data packet

Can provide per-VC QoS
- When we set the VC, we can also reserve bandwidth
and buffer resources along the path
Katz, Stoica F04
8
VC Switching Properties

Less expensive forwarding
- Forwarding table size depends on number of different
circuits
- Must lookup in forwarding table for every packet

Much higher delay for short flows
- 1 RTT delay for connection setup

Less Robust
- End host must spend 1 RTT to establish new
connection after link and router failure

Flexible service guarantees
- Either statistical multiplexing or resource reservations
Katz, Stoica F04
9
Circuit Switching

Packets not switched independently
- Establish circuit before sending data

Circuit is a dedicated path from source to
destination
- E.g., old style telephone switchboard, where
establishing circuit means connecting wires in all the
switches along path
- E.g., modern dense wave division multiplexing (DWDM)
form of optical networking, where establishing circuit
means reserving an optical wavelength in all switches
along path

No forwarding table
Katz, Stoica F04 10
Circuit Switching Properties

Cheap forwarding
- No table lookup

Much higher delay for short flows
- 1 RTT delay for connection setup

Less robust
- End host must spend 1 RTT to establish new
connection after link and router failure

Must use resource reservations
Katz, Stoica F04
11
Forwarding Comparison
forwarding
cost
bandwidth
utilization
pure
packet
switching
high
virtual
circuit
switching
low
circuit
switching
high
flexible
low
flexible
yes
low
low
resource
none
reservations
robustness high
none
Katz, Stoica F04 12
Summary

Routers
- Key building blocks of today a network in general, and
Internet in particular

Main functionalities implemented by a router
-

Packet forwarding
Buffer management
Packet scheduling
Packet classification
Forwarding techniques
- Datagram (packet) switching
- Virtual circuit switching
- Circuit switching
Katz, Stoica F04 13
Starting New Lecture
Congestion Control
Katz, Stoica F04
What We Know
We know:
 How to process packets in a switch
 How to route packets in the network
 How to send packets reliably
We don’t know:
 How fast to send
Katz, Stoica F04 15
What’s at Stake?

Send too slow: link is not fully utilized
- wastes time

Send too fast: link is fully utilized but....
- queue builds up in router buffer (delay)
- overflow buffers in routers
- overflow buffers in receiving host (ignore)

Why are buffer overflows a problem?
- packet drops (mine and others)
- Interesting history....(Van Jacobson rides to the rescue)
Katz, Stoica F04 16
Abstract View
A
Sending Host

B
Buffer in Router
Receiving Host
We ignore internal structure of router and model
it as having a single queue for a particular inputoutput pair
Katz, Stoica F04 17
Three Congestion Control Problems

Adjusting to bottleneck bandwidth

Adjusting to variations in bandwidth

Sharing bandwidth between flows
Katz, Stoica F04 18
Single Flow, Fixed Bandwidth
A

100 Mbps
B
Adjust rate to match bottleneck bandwidth
- without any a priori knowledge
- could be gigabit link, could be a modem
Katz, Stoica F04 19
Single Flow, Varying Bandwidth
A

BW(t)
B
Adjust rate to match instantaneous bandwidth
- assuming you have rough idea of bandwidth
Katz, Stoica F04 20
Multiple Flows
Two Issues:
 Adjust total sending rate to match bandwidth
 Allocation of bandwidth between flows
A1
A2
A3
B1
100 Mbps
B2
B3
Katz, Stoica F04 21
Reality
Congestion control is a resource allocation problem involving
many flows, many links, and complicated global dynamics
Katz, Stoica F04
22
General Approaches

Send without care
- many packet drops
- not as stupid as it seems

Reservations
- pre-arrange bandwidth allocations
- requires negotiation before sending packets
- low utilization

Pricing
- don’t drop packets for the high-bidders
- requires payment model
Katz, Stoica F04 23
General Approaches (cont’d)

Dynamic Adjustment
-

probe network to test level of congestion
speed up when no congestion
slow down when congestion
suboptimal, messy dynamics, simple to implement
All three techniques have their place
- but for generic Internet usage, dynamic adjustment is
the most appropriate
- due to pricing structure, traffic characteristics, and good
citizenship
Katz, Stoica F04 24
TCP Congestion Control

TCP connection has window
- controls number of unacknowledged packets

Sending rate: ~Window/RTT

Vary window size to control sending rate
Katz, Stoica F04 25
Congestion Window (cwnd)



Limits how much data can be in transit
Implemented as # of bytes
Described as # packets in this lecture
MaxWindow = min(cwnd, AdvertisedWindow)
EffectiveWindow = MaxWindow – (LastByteSent – LastByteAcked)
MaxWindow
LastByteAcked
LastByteSent
EffectiveWindow
sequence number increases
Katz, Stoica F04 26
Two Basic Components

Detecting congestion

Rate adjustment algorithm
- depends on congestion or not
- three subproblems within adjustment problem
• finding fixed bandwidth
• adjusting to bandwidth variations
• sharing bandwidth
Katz, Stoica F04 27
Detecting Congestion

Packet dropping is best sign of congestion
- delay-based methods are hard and risky

How do you detect packet drops? ACKs
- TCP uses ACKs to signal receipt of data
- ACK denotes last contiguous byte received
• actually, ACKs indicate next segment expected

Two signs of packet drops
- No ACK after certain time interval: time-out
- Several duplicate ACKs (ignore for now)
Katz, Stoica F04 28
Rate Adjustment

Basic structure:
- Upon receipt of ACK (of new data): increase rate
- Upon detection of loss: decrease rate

But what increase/decrease functions should we
use?
- Depends on what problem we are solving
Katz, Stoica F04 29
Problem #1: Single Flow, Fixed BW

Want to get a first-order estimate of the available
bandwidth
- Assume bandwidth is fixed
- Ignore presence of other flows

Want to start slow, but rapidly increase rate until
packet drop occurs (“slow-start”)

Adjustment:
- cwnd initially set to 1
- cwnd++ upon receipt of ACK
Katz, Stoica F04 30
Slow-Start

cwnd increases exponentially: cwnd doubles
every time a full cwnd of packets has been sent
- Each ACK releases two packets
- Slow-start is called “slow” because of starting point
cwnd = 1
cwnd = 2
cwnd = 3
cwnd = 4
cwnd =
8
Katz, Stoica F04 31
Problems with Slow-Start

Slow-start can result in many losses
- roughly the size of cwnd ~ BW*RTT

Example:
- at some point, cwnd is enough to fill “pipe”
- after another RTT, cwnd is double its previous value
- all the excess packets are dropped!

Therefore, need a more gentle adjustment
algorithm once have rough estimate of bandwidth
Katz, Stoica F04 32
Problem #2: Single Flow, Varying BW

Want to be able to track available bandwidth, oscillating
around its current value

Possible variations: (in terms of RTTs)
- multiplicative increase or decrease: cwnd a*cwnd
- additive increase or decrease: cwnd cwnd + b

Four alternatives:
-
AIAD: gentle increase, gentle decrease
AIMD: gentle increase, drastic decrease
MIAD: drastic increase, gentle decrease (too many losses)
MIMD: drastic increase and decrease
Katz, Stoica F04 33
Problem #3: Multiple Flows

Want steady state to be “fair”

Many notions of fairness, but here all we require
is that two identical flows end up with the same
bandwidth

This eliminates MIMD and AIAD

AIMD is the only remaining solution!
Katz, Stoica F04 34
Buffer and Window Dynamics
A
B
x
C = 50 pkts/RTT
Rate (pkts/RTT)
60
50
40
30
Backlog in router (pkts)
Congested if > 20
20
10
487
460
433
406
379
352
325
298
271
244
217
190
163
136
109
82
55
0
28

No congestion  x increases by one packet/RTT every RTT
Congestion  decrease x by factor 2
1

Katz, Stoica F04 35
AIMD Sharing Dynamics
x
A
B
y
D
60
Rates equalize  fair share
50
40
30
20
10
487
460
433
406
379
352
325
298
271
244
217
190
163
136
109
82
55
0
28

No congestion  rate increases by one packet/RTT every RTT
Congestion  decrease rate by factor 2
1

E
Katz, Stoica F04 36
AIAD Sharing Dynamics
x
A
B
y
D
60
50
40
30
20
10
487
460
433
406
379
352
325
298
271
244
217
190
163
136
109
82
55
0
28

No congestion  x increases by one packet/RTT every RTT
Congestion  decrease x by 1
1

E
Katz, Stoica F04 37
AIMD
A
D
C
C
x
B
y
E
y
Limit rates:
x=y
x
Katz, Stoica F04 38
AIAD
A
D
C
C
x
B
y
E
y
Limit rates:
x and y depend
on initial
values
x
Katz, Stoica F04 39
Implementing AIMD

After each ACK
- increment cwnd by 1/cwnd (cwnd += 1/cwnd)
- as a result, cwnd is increased by one only if all
segments in a cwnd have been acknowledged

But need to decide when to leave slow-start and
enter AIMD
 use ssthresh variable
Katz, Stoica F04 40
Slow Start/AIMD Pseudocode
Initially:
cwnd = 1;
ssthresh = infinite;
New ack received:
if (cwnd < ssthresh)
/* Slow Start*/
cwnd = cwnd + 1;
else
/* Congestion Avoidance */
cwnd = cwnd + 1/cwnd;
Timeout:
/* Multiplicative decrease */
ssthresh = cwnd/2;
cwnd = 1;
Katz, Stoica F04 41
The big picture (with timeouts)
cwnd
Timeout
AIMD
Timeout
AIMD
ssthresh
Slow
Start
Slow
Start
Slow
Start
Time
Katz, Stoica F04 42
Congestion Detection Revisited

Wait for Retransmission Time Out (RTO)
- RTO kills throughput

In BSD TCP implementations, RTO is usually
more than 500ms
- the granularity of RTT estimate is 500 ms
- retransmission timeout is RTT + 4 * mean_deviation

Solution: Don’t wait for RTO to expire
Katz, Stoica F04 43
Fast Retransmits

Resend a segment after 3
duplicate ACKs
- a duplicate ACK means
that an out-of sequence
segment was received
cwnd = 1
cwnd = 2
cwnd = 4

Notes:
- ACKs are for next
expected packet
3 duplicate
- packet reordering can
ACKs
cause duplicate ACKs
- window may be too small
to get enough duplicate
ACKs
Katz, Stoica F04 44
Fast Recovery:
After a Fast Retransmit


ssthresh = cwnd / 2
cwnd = ssthresh
- instead of setting cwnd to 1, cut cwnd in half (multiplicative
decrease)

for each dup ack arrival
- dupack++
- MaxWindow = min(cwnd + dupack, AdvWin)
- indicates packet left network, so we may be able to send
more

receive ack for new data (beyond initial dup ack)
- dupack = 0
- exit fast recovery

But when RTO expires still do cwnd = 1
Katz, Stoica F04 45
Fast Retransmit and Fast Recovery
cwnd
AI/MD
Slow Start
Fast retransmit

Time
Retransmit after 3 duplicated acks
- Prevent expensive timeouts


Reduce slow starts
At steady state, cwnd oscillates around the
optimal window size
Katz, Stoica F04 46
TCP Congestion Control Summary

Measure available bandwidth
- slow start: fast, hard on network
- AIMD: slow, gentle on network

Detecting congestion
- timeout based on RTT
• robust, causes low throughput
- Fast Retransmit: avoids timeouts when few packets lost
• can be fooled, maintains high throughput

Recovering from loss
- Fast recovery: don’t set cwnd=1 with fast retransmits
Katz, Stoica F04 47
Issues to Think About

What about short flows? (setting initial cwnd)
- most flows are short
- most bytes are in long flows

How does this work over wireless links?
- packet reordering fools fast retransmit
- loss not always congestion related

High speeds?
- to reach 10gbps, packet losses occur every 90 minutes!

Why are losses bad?
- Tornado codes: can reconstruct data proportional to packets
that get through. Why not send at maximal rate?

Fairness: how do flows with different RTTs share link?
Katz, Stoica F04 48
Bonus Question

Why is TCP like Blanche Dubois?

Because it “relies on the kindness of strangers...”

What happens if not everyone cooperates?
Katz, Stoica F04 49
Download