Transport: TCP Manpreet Singh (Slides borrowed from various sources on the web)

advertisement
Transport: TCP
Manpreet Singh
(Slides borrowed from various sources on the web)
Announcements (1/2)



Everybody needs to join the class mailing
list...else I can't communicate class info.
Check the class archives to see if someone
else has picked the same lecture or TCP
application
We have a group of machines you can use
for simulation (snoopy, linus, etc.).


You need CSUG accounts to access these machines.
We’ll dig up more machines for those who want to
do kernel hacking.
Announcements (2/2)


Need a volunteer to give the "postmodern" E2E lecture 9/9 (in class...).
The non-research track students will
have to do an initial demo by 11/9.



Most of the functionality should be there
Allows us to give feed back
You time to do performance
measurements.
Roadmap


Why is TCP fair ?
Loss-based congestion schemes







Tahoe
Reno
NewReno
Sack
Delay-based congestion control (Vegas)
Modeling TCP throughput
Equation-based congestion control
The Desired Properties of a
Congestion Control Scheme

Efficiency (high utilization)

Optimality (high throughput, utility)

Fairness (resource sharing)


Distributedness (no central knowledge for
scalability)
Convergence and stability (fast convergence
after disturbance, low oscillation)
TCP Fairness
AIMD
TCP congestion
avoidance:
 AIMD: additive
increase,
multiplicative
decrease


increase window
by 1 per RTT
decrease window
by factor of 2 on
loss event
Fairness goal: if N TCP
sessions share same
bottleneck link, each
should get 1/N of link
capacity
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
Why is TCP fair?
Two competing sessions:


Additive increase gives slope of 1, as throughout increases
multiplicative decrease decreases throughput proportionally
R
equal bandwidth share
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 throughput R
Loss vs Delay as signal ???
Loss is a binary signal
Delay is a multi-bit signal
TCP
oscillation
Simulation-based
Comparisons of Tahoe, Reno,
and SACK TCP
Kevin Fall
Sally Floyd
Introduction


SACK compared with Tahoe, Reno and
New-Reno
Simulations designed to highlight
performance differences with and
without SACK
Comparison




Tahoe: Slow start, congestion avoidance
and fast retransmit
Reno: Tahoe + fast recovery
New-Reno: Reno with modified fast
recovery
SACK: Reno + selective ACKs
TCP Slowstart
Host B
RTT
Slowstart algorithm
(non-linear phase)
Host A
initialize: Congwin = 1
for (each segment ACKed)
Congwin++
until (loss event OR
CongWin > threshold)


exponential increase (per
RTT) in window size (not so
slow!)
loss event: timeout (Tahoe
TCP) and/or or three
duplicate ACKs (Reno TCP)
time
TCP Congestion Avoidance
Congestion avoidance
(linear phase)
/* slowstart is over
*/
/* Congwin > threshold */
Until (loss event) {
every w segments ACKed:
Congwin++
}
threshold = Congwin/2
Congwin = 1
perform slowstart1
1: TCP Reno skips slowstart (fast
recovery) after three duplicate ACKs
Fast Retransmit



Receiving small number of duplicate
ACKs (3) signals packet loss
Lost packet can be retransmitted before
timeout
This improves channel utilization
TCP/Reno Congestion Control
Initially:
cwnd = 1;
ssthresh = infinite (64K);
For each newly ACKed segment:
if (cwnd < ssthresh)
/* slow start*/
cwnd = cwnd + 1;
else
/* congestion avoidance; cwnd increases (approx.)
by 1 per RTT */
cwnd += 1/cwnd;
Triple-duplicate ACKs:
/* multiplicative decrease */
cwnd = ssthresh = cwnd/2;
Timeout:
ssthresh = cwnd/2;
cwnd = 1;
(if already timed out, double timeout value; this is called exponential backoff)
TCP/Reno: Big Picture
Tahoe + Fast Recovery
cwnd
TD
TD
TD
TO
ssthresh
ssthresh
ssthresh
ssthresh
Time
slow
start
congestion
avoidance
TD: Triple duplicate acknowledgements
TO: Timeout
congestion
avoidance
congestion
avoidance
slow congestion
start avoidance
Fast Recovery

Observation: Each duplicate ACK
indicates some packet has left pipe
Old cwnd
Packet lost
New cwnd = (old cwnd)/2
Left edge fixed
till ACK received
Usable window
increased by 1 for
each duplicate ACK
New-Reno extension

New-Reno continues with fast recovery
if a partial ACK is received
Old cwnd
Packet 1 lost
Packet 2 lost
New cwnd = (old cwnd)/2
LP: Last Packet sent
before loss detection
Usable window
increased by 1 for
each duplicate ACK
until ACK for LP is
received
Why use SACK?

Without SACK sender has to use one
of following retransmission strategies
- Retransmit 1 dropped packet / RTT
Reno, New-Reno
- Retransmit packets that might have
been successfully delivered
Tahoe
SACK option [RFC2018]

Ex: 2nd segment dropped (each
segment has 500 bytes)
seg
ack
5000
5500
5500
lost
6000
5500
Sack1 Sack1
left
right
6000
6500
SACK Congestion Control (1/2)


Conservative extensions to Reno
Fast recovery algorithm modified


Uses a variable called “pipe” to estimate
outstanding data in the flow
Rules for changing “pipe” variable


+ 1 when packet transmitted
- 1 when dup ACK received
SACK Congestion Control (2/2)



SACK sender tracks successfully sent
packets using “scoreboard” structure
Missing packets are retransmitted
Similar to New-Reno in exiting from fast
recovery – exits after all outstanding
data at time of loss is ACked
Simulation Model used

Three flows are setup from S1 to K1,
2nd and 3rd flows are used to change
packet drop pattern of 1st flow
One Packet Loss (1/2)
Performs slow start
Packet
dropped
Packet
retransmitted
One Packet Loss (2/2)
Performs fast recovery
Packet
dropped
Packet
retransmitted
Two Packet Loss (1/2)
Performs slow start
Packets
dropped
Packets
retransmitted
Two Packet Loss (2/2)
Performs fast recovery
Packets
dropped
Packets
retransmitted
Three Packet Losses (1/3)
Has to wait for timeout
Packets
dropped
Packets
retransmitted
Three Packet Losses (2/3)
No need for timeout
Retransmits 1 pkt/RTT
Packets
dropped
Packets
retransmitted
Three Packet Losses (3/3)
Retransmits more than
1 pkt /RTT
Packets
dropped
Packets
retransmitted
Observations




Tahoe: Robust, performs slow start
Reno: For > 2 losses, timeout is often
needed
New-Reno: Can avoid timeouts, but still
cannot retransmit > 1 pkt/RTT
SACK: Can retransmit > 1 pkt/RTT ,
thus recovers from losses faster
Conclusions



SACK can improve TCP performance
SACK can be used in high loss links too
(Ex: Wireless)
New-Reno demonstrates that certain
problems of Reno can be avoided
without SACK
Reno vs Vegas
(Congestion Avoidance)

Reno’s mechanism

Characteristics




uses the loss of segments as a signal
reactive not proactive
needs to create losses to find the available bandwidth
example
send window
congestion window
Threshold window
TCP Vegas
Idea: source watches for some sign that router’s
queue is building up and congestion will happen too;
e.g.,

RTT grows

sending rate flattens
KB

70
60
50
40
30
20
10
0.5
1.0 1.5
2.0
2.5 3.0
3.5 4.0 4.5
Time (seconds)
5.0
5.5
6.0
6.5
7.0 7.5
8.0 8.5
0.5
1.0 1.5
2.0
2.5 3.0
3.5 4.0 4.5
Time (seconds)
5.0
5.5
6.0
6.5
7.0 7.5
8.0 8.5
0.5
1.0 1.5
2.0
2.5 3.0
3.5 4.0 4.5
Time (seconds)
5.0
5.5
6.0
6.5
7.0 7.5
8.0 8.5
Avg. source send rate
Sending KBps
Congestion window
1100
900
700
500
300
100
In shaded region we expect throughput
to increase but it cannot increase beyond
available bandwidth
Queue size in router
Buffer space at router
10
5
Vegas’ approach

Basic idea





Vegas tries not to send at a rate that causes
buffers to fill
maintain the right amount of extra data
based on changes in the estimated amount of
extra data
window size vs. throughput
Keep the actual rate straying too far from
the available rate (resulting in smooth
congestion avoidance period)
Vegas Algorithm

define a given connection’s BaseRTT
 BaseRTT = the minimum of all measured
RTT




Calculate the current Actual sending rate
Compare Actual (A) to expected (E) and
adjusts the window (linear increase or decrease)



expected throughput = WindowSize /
BaseRTT
Actual rate = Flight size / RTT
If (E-A) > beta, cwnd - - (congestion state)
If (E-A) < alpha, cwnd++ (low utilization)
When a loss is detected, reduce the window by a half
Parameters


70
60
50
40
30
20
10
a = 1 buffer
b = 3 buffers
Black line = actual rate
Green line = expected rate
Shaded = region between a and b
CAM KBps

KB
Algorithm (cont)
0.5
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
6.5
7.0
7.5
8.0
5.0
5.5
6.0
6.5
7.0
7.5
8.0
Time (seconds)
240
200
160
120
80
40
0.5
1.0
1.5
2.0
2.5
3.0
3.5 4.0 4.5
Time (seconds)
Note: Linear decrease in Vegas does not violate AIMD since it
Happens before packets loss
Comparison of Reno and
Vegas (Retransmission)

Reno’s retransmission mechanism

retransmission timeout



based on RTT and variance estimates
BSD-based : 500ms
Fast Retransmit and Fast Recovery



When the sender receives duplicate acks, it
reduces the window size by a half and avoids
timeout which causes retransmission with slow
start
If multiple drops occur, timeout and slow start
will follow anyway.
19% increase in throughput
Vegas’ Retransmission




reads and records the system clock each time a
segment is sent
when an ACK arrives, Vegas reads the clock
again
RTT calculation using this time and the
timestamp recorded for the relevant segment
uses this more accurate RTT estimate to decide
to retransmit
Some fun topics to discuss…
Modeling TCP throughput

Consider congestion avoidance only
cwnd
TD
3W W
4 2
 3W8
2
W
bottleneck
bandwidth
ssthresh
W/2
Time
congestion
avoidance
Assume one packet loss (loss event) per cycle
Total packets send per cycle: 3W2/8
Thus p = 1/(3W2/8) = 8/(3W2)
=>
W  8p/ 3  1.6p
Modeling TCP throughput…

1/throughput = c * sqrt(p) * RTT
Equation-based Congestion Control






Don’t need reliability
But still want to be friendly to the network
What rate should we send the UDP traffic ?
Use detailed TCP analysis to relate
throughput to loss and RTT.
Measure these values and then calculated
appropriate throughput directly.
Result is rate-based and equation-driven
protocol called TFRC.
mulTCP

Effect of AIMD parameters on the
throughput of TCP
Download