cs143-Lecture5-transport

advertisement
cs/ee/ids 143 Communication Networks
Chapter 4 Transport
Text: Walrand & Parakh, 2010
Steven Low
CMS, EE, Caltech
Agenda
Internetworking
 Routing across LANs, layer2-layer3
 DHCP
 NAT
Transport layer
 Connection setup
 Error recovery: retransmission
 Congestion control
Protocol stack
Network mechanisms implemented as
protocol stack
Each layer designed separately, evolves
asynchronously
application
Many control mechanisms…
transport
Error control, congestion control (TCP)
network
Routing (IP)
link
Medium access control
physical
Coding, transmission, synchronization
Transport services
UDP
• Datagram service
• No congestion control
• No error/loss recovery
• Lightweight
TCP
• Connection oriented service
• Congestion control
• Error/loss recovery
• Heavyweight
UDP
1 ~ 65535 (216-1)
UDP header
≤ 65535 Bytes – 8 Bytes (UDP header) – 20 Bytes (IP header)
Usually smaller to avoid IP fragmentation (e.g., Ethernet MTU 1500 Bytes)
TCP
TCP header
Example TCP states
3-way handshake
4-way handshake
Possible issue: SYN flood attack
Result in large numbers of half-open connections and no new
connections can be made.
Window Flow Control
RTT
Source
1 2
W
W
time
ACKs
data
Destination
1 2
1 2
W
1 2
W
time
 ~ W packets per RTT
 Lost packet detected by missing ACK
ARQ (Automatic Repeat Request)
Go-back-N
Selective repeat
TCP
• Sender & receiver negotiate whether or
not to use Selective Repeat (SACK)
• Can ack up to 4 blocks of contiguous
bytes that receiver got correctly
e.g. [3; 10, 14; 16, 20; 25, 33]
Window control
 Limit the number of packets in the
network to window W
W  MSS
 Source rate =
bps
RTT
 If W too small then rate « capacity
If W too big then rate > capacity
=> congestion
 Adapt W to network (and conditions)
TCP window control
 Receiver flow control
 Avoid overloading receiver
 Set by receiver
 awnd: receiver (advertised) window
 Network congestion control




Avoid overloading network
Set by sender
Infer available network capacity
cwnd: congestion window
 Set W = min (cwnd, awnd)
TCP congestion control
 Source calculates cwnd from indication
of network congestion
 Congestion indications
 Losses
 Delay
 Marks
 Algorithms to calculate cwnd
 Tahoe, Reno, Vegas, …
TCP Congestion Controls
 Tahoe (Jacobson 1988)
 Slow Start
 Congestion Avoidance
 Fast Retransmit
 Reno (Jacobson 1990)
 Fast Recovery
 Vegas (Brakmo & Peterson 1994)
 New Congestion Avoidance
TCP Tahoe
(Jacobson 1988)
window
time
SS
CA
: Slow Start
: Congestion Avoidance
: Threshold
Slow Start
 Start with cwnd := 1 (slow start)
 On each successful ACK increment cwnd
cwnd := cnwd + 1
 Exponential growth of cwnd
each RTT: cwnd := 2 x cwnd
 Enter CA when cwnd >= ssthresh
Congestion Avoidance
 Starts when cwnd >= ssthresh
 On each successful ACK:
cwnd := cwnd + 1/cwnd
 Linear growth of cwnd
each RTT: cwnd := cwnd + 1
Packet Loss
 Assumption: loss indicates congestion
 Packet loss detected by
 Retransmission TimeOuts (RTO timer)
 Duplicate ACKs (at least 3) (Fast Retransmit)
Packets
1
2
3
4
5
7
6
Acknowledgements
1
2
3
3
3
3
Fast Retransmit
 Wait for a timeout is quite long
 Immediately retransmits after 3 dupACKs
without waiting for timeout
 Adjusts ssthresh
flightsize := min(awnd, cwnd)
ssthresh := max(flightsize/2, 2)
 Enter Slow Start (cwnd := 1)
Summary: Tahoe
 Basic ideas
 Gently probe network for spare capacity
 Drastically reduce rate on congestion
 Windowing: self-clocking
for every ACK {
if (W < ssthresh) then W++
else W += 1/W
}
for every loss {
ssthresh := W/2
W := 1
}
(SS)
(CA)
Seems a little too conservative?
TCP Reno
(Jacobson 1990)
SS
CA
for every ACK {
W += 1/W
}
for every loss {
W := W/2
}
(AI)
(MD)
How to halve W without emptying the pipe?
Fast Recovery
Fast recovery
 Idea: each dupACK represents a packet
having left the pipe (successfully
received)
 Enter FR/FR after 3 dupACKs
 Set ssthresh := max(flightsize/2, 2)
 Retransmit lost packet
 Set cwnd := ssthresh + ndup (window
inflation)
 Wait till W := min(awnd, cwnd) is large
enough; transmit new packet(s)
 On non-dup ACK, set cwnd := ssthresh
(window deflation)
 Enter CA
Example: FR/FR
S 1 2 3 4 5 6 7 8
1
9 10 11
time
Exit FR/FR
R
cwnd
ssthresh
0 0 0 0 0 0 0
8
7
4
time
8
9
4
11
4
4
4
 Fast retransmit
 Retransmit on 3 dupACKs
 Fast recovery
 Inflate window while repairing loss to fill pipe
Summary: Reno
 Basic ideas
 dupACKs: halve W and avoid slow start
 dupACKs: fast retransmit + fast recovery
 Timeout: slow start
dupACKs
congestion
avoidance
FR/FR
timeout
slow start
retransmit
Delay-based TCP: Vegas
(Brakmo & Peterson 1994)
window
time
SS
CA
 Reno with a new congestion avoidance
algorithm
 Converges (provided buffer is large) !
Congestion avoidance
 Each source estimates number of its own
packets in pipe from RTT
 Adjusts window to maintain estimate #
of packets in queues between a and b
for every RTT
{
if W/RTTmin – W/RTT < a / RTTmin
then W ++
if W/RTTmin – W/RTT > b / RTTmin
then W --
}
for every loss
W := W/2
Implications
 Congestion measure = end-to-end
queueing delay
 At equilibrium
 Zero loss
 Stable window at full utilization
 Nonzero queue, larger for more sources
 Convergence to equilibrium
 Converges if sufficient network buffer
 Oscillates like Reno otherwise
Theory-guided design: FAST
We will study them further in TCP modeling in the following
weeks
Summary




UDP header/TCP header
TCP 3-way/4-way handshake
ARQ: Go-back-N/selective repeat
Tahoe/Reno/New Reno/Vegas/FAST
-- useful details for your project
Why both TCP and UDP?
 Most applications use TCP, as this avoids reinventing error recovery in every application
 But some applications do not need TCP
 For example: Voice applications
Some packet loss is fine.
Packet retransmission introduces too much delay.
 For example: an application that sends just one
message, like DNS/SNMP/RIP.
TCP sends several packets before the useful one.
We may add reliability at application layer instead.
Mathematical model
TCP/AQM
pl(t)
TCP:
 Reno
 Vegas
 FAST
xi(t)
AQM:
 DropTail
 RED
 REM/PI
 AVQ
Congestion control is a distributed asynchronous algorithm to
share bandwidth
It has two components


TCP: adapts sending rate (window) to congestion
AQM: adjusts & feeds back congestion information
They form a distributed feedback control system


Equilibrium & stability depends on both TCP and AQM
And on delay, capacity, routing, #connections
Network model
Network
 Links l of capacities cl and congestion measure
Sources i
 Source rates xi(t)
Routing matrix R
x1(t)
é1 1 0ù
R= ê
ú
ë1 0 1û
x1 + x2 £ c1
x1 + x3 £ c2
p1(t)
p2(t)
x2(t)
x3(t)
pl(t)
Network model
x
y
R
F1
Network
TCP
G1
FN
q
AQM
GL
R
T
Rli  1 if source i uses link l
TCP CC model
consists of
T
x(t +1) = F (x(t), R p(t))
specs for Fi and Gl
p(t +1) = G (Rx(t), p(t))
p
IP routing
Reno, Vegas
Droptail, RED
Examples
Derive (Fi, Gl) model for
 Reno/RED
 Vegas/Droptail
 FAST/Droptail
Focus on Congestion Avoidance
Model: Reno
for
{
for
{
every ack (ca)
W += 1/W
}
every loss
W := W/2
}
Dwi ( t )
=
xi (t)(1- qi (t))
wi
-
wi (t)
xi (t)qi (t)
2
Model: Reno
for
{
for
{
every ack (ca)
W += 1/W
}
every loss
W := W/2
}
Dwi ( t )
=
throughput
xi (t)(1- qi (t))
wi (t)
window size
-
wi (t)
xi (t)qi (t)
2
qi (t) = å Rli pl (t)
l
round-trip
loss probability
link loss
probability
Model: Reno
for
{
for
{
every ack (ca)
W += 1/W
}
every loss
W := W/2
}
Dwi ( t )
=
xi (t)(1- qi (t))
wi (t)
2
i
-
1 x
xi (t +1) = xi (t) + 2 - qi (t)
Ti
2
Fi ( xi (t ),qi (t ))
wi (t)
xi (t)qi (t)
2
Uses:
wi (t)
xi (t) =
Ti
qi (t) » 0
Model: RED
yl (t) = å Rli xi (t)
marking prob
1
i
queue length
aggregate
link rate
bl (t +1) = [ bl (t) + yl (t) - cl ]
pl (t) = min {a bl (t),1}
pl (t )=Gl ( yl (t ), pl (t ))
+
source
rate
Model: Reno/RED
2
i
1 x
xi (t +1) = xi (t) + 2 - qi (t)
Ti
2
xi (t+1)=Fi ( xi (t ),qi (t ))
qi (t) = å Rli pl (t)
l
bl (t +1) = [ bl (t) + yl (t) - cl ]
pl (t) = max {a bl (t),1}
pl (t )=Gl ( yl (t ), pl (t ))
+
yl (t) = å Rli xi (t)
i
Decentralization structure
x
yy
R
F1
Network
TCP
G1
AQM
FN
qq
GL
R
T
x(t +1) = F(x(t), q(t))
p(t +1) = G(y(t), p(t))
p
qi (t) = å Rli pl (t)
l
yl (t) = å Rli xi (t)
i
Validation – Reno/REM
 30 sources, 3 groups with RTT = 3, 5, 7 ms
 Link capacity = 64 Mbps, buffer = 50 kB
 Smaller window due to small RTT (~0 queueing
delay)
Model: Vegas/Droptail
for every RTT
{
if W/RTTmin – W/RTT < a then W ++
if W/RTTmin – W/RTT > a then W --
}
for every loss
W := W/2
Fi:
ì
1
xi ( t +1) = í xi (t) + 2
Ti (t)
î
ì
1
xi ( t +1) = í xi (t) - 2
Ti (t)
î
xi (t +1) = xi (t)
Gl:
pl(t+1) = [pl(t) + yl (t)/cl - 1]+
queue size
if wi (t) - di xi (t) < ai di
if wi (t) - di xi (t) > ai di
else
Ti (t) = di + qi (t)
Model: FAST/Droptail
periodically
{
baseRTT
W :
W  a
RTT
}
xi (t +1) = xi (t) +
gi
Ti (t)
(ai - xi (t)qi (t))
+
é
ù
1
pl (t +1) = ê p l (t) + ( yl (t) - cl )ú
cl
ë
û
L., Peterson, Wang, JACM 2002
Validation: matching transients

wi (t   i f )
1 
f 
 i (t   i )   x0 (t )  c 
p   
w
c  i d i  p (t )


[Jacobsson et al 2009]
Same RTT, no cross traffic
Same RTT, cross traffic
Different RTTs, no cross traffic
Recap
Protocol (Reno, Vegas, FAST, Droptail, RED…)
x(t +1) = F (x(t), q(t))
p(t +1) = G (y(t), p(t))
Equilibrium
 Performance
 Throughput, loss, delay
 Fairness
 Utility
Dynamics
 Local stability
 Global stability
Download