Chapter 3 outline

advertisement
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer
3-1
TCP: Overview
 Point-to-point:
 one sender, one receiver
 reliable, in-order byte
steam:

no “message boundaries”
 pipelined:
 TCP congestion and flow
control set window size
 send & receive buffers
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
RFCs: 793, 1122, 1323, 2018, 2581
 full duplex data:
 bi-directional data flow
in same connection
 MSS: maximum segment
size
 connection-oriented:
 handshaking (exchange
of control msgs) init’s
sender, receiver state
before data exchange
 flow controlled:
 sender will not
socket
door
overwhelm receiver
segment
Transport Layer
3-2
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
Receive window
Urg data pnter
Options (variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
application
data
(variable length)
Transport Layer
3-3
TCP seq. #’s and ACKs
Seq. #’s:
 byte stream “number” of first byte in segment’s data
ACKs:
 seq # of next byte expected from other side
• In Rdt x.x protocols, the ack seq# is the current received one

cumulative ACK
• different from Selective Repeat
Q: how receiver handles out-of-order segments?


A: TCP spec doesn’t say
Practical approach: save in buffer
Q: How TCP implement duplex communication?

Seq. # for sending data, Ack# for receiving data
Transport Layer
3-4
An example
Host A
Host B
User
79
42
host ACKs
receipt, echoes
back ‘pass’
host ACKs
receipt, send
back use
password
Sequence number is
based on bytes, not packets
simple telnet scenario
time
Transport Layer
3-5
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
 longer than RTT

but RTT varies
 too short: premature
timeout
 unnecessary
retransmissions
 too long: slow reaction
to segment loss
Q: how to estimate RTT?
 SampleRTT: measured time from
segment transmission until ACK
receipt
 ignore retransmissions
 SampleRTT will vary, want
estimated RTT “smoother”
 average several recent
measurements, not just
current SampleRTT
Transport Layer
3-6
TCP Round Trip Time and Timeout
 Exponential weighted moving average
 influence of past sample decreases exponentially fast
 typical value:  = 0.125
[RFC 2988]
EstimatedRTT = (1- )*EstimatedRTT + *SampleRTT
Transport Layer
3-7
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
Estimated RTT
Transport Layer
3-8
TCP Round Trip Time and Timeout
Setting the timeout
 EstimtedRTT plus “safety margin”

large variation in EstimatedRTT -> larger safety margin
 first estimate of how much SampleRTT deviates from
EstimatedRTT:
DevRTT = (1-)*DevRTT +
*|SampleRTT-EstimatedRTT|
(typically,  = 0.25)
[RFC 2988]
Then set timeout interval:
TimeoutInterval = EstimatedRTT + 4*DevRTT
Transport Layer
3-9
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-10
TCP reliable data transfer
 TCP creates rdt
service on top of IP’s
unreliable service
 Pipelined segments
 Cumulative acks

Similar to GBN
 TCP uses single
retransmission timer


 Out of order packets
 Not specified
 Usually buffered
• Similar to SR
 Retransmissions are
triggered by:


timeout events
duplicate acks
Similar to GBN
Remove the timer
management overhead
Transport Layer
3-11
TCP sender events:
data rcvd from app:
 Create segment with
seq #
 seq # is byte-stream
number of first data
byte in segment
 start timer if not
already running (think
of timer as for oldest
unacked segment)
 expiration interval:
TimeOutInterval
timeout:
 retransmit segment that
caused timeout

Not Go Back all N
segments
 restart timer
Ack rcvd:
 If acknowledges previously
unacked segments


update what is known to
be acked
start timer if there are
outstanding segments
(since cumulative acks)
• Similar to GBN
Transport Layer 3-12
NextSeqNum = InitialSeqNum
SendBase = InitialSeqNum
loop (forever) {
switch(event)
event: data received from application above
create TCP segment with sequence number NextSeqNum
if (timer currently not running)
start timer
pass segment to IP
NextSeqNum = NextSeqNum + length(data)
event: timer timeout
retransmit not-yet-acknowledged segment with
smallest sequence number
start timer
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
} /* end of loop forever */
TCP
sender
(simplified)
Comment:
• One direction only
• SendBase-1: last
cumulatively
ack’ed byte
Example:
• SendBase-1 = 71;
y= 73, so the rcvr
wants 73+ ;
y > SendBase, so
that new data is
acked
Transport Layer 3-13
TCP: retransmission scenarios
Host A
X
loss
Sendbase
= 100
SendBase
= 120
SendBase
= 100
time
SendBase
= 120
lost ACK scenario
Host B
Seq=92 timeout
Host B
Seq=92 timeout
timeout
Host A
time
premature timeout
Transport Layer 3-14
TCP retransmission scenarios (more)
Host B
Host A
SendBase
= 120
time
Host B
X
loss
SendBase
= 120
Seq=92 timeout
Sendbase
= 100
SendBase
= 120
timeout
Seq=92 timeout
Host A
premature timeout
time
Cumulative ACK scenario
Double Timeout method in run time
Transport Layer 3-15
TCP ACK generation
[RFC 1122, RFC 2581]
Event at Receiver
TCP Receiver action
Arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
Delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK (why?)
Arrival of in-order segment with
expected seq #. One other
segment has ACK pending
Immediately send single cumulative
ACK, ACKing both in-order segments
Arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
Immediately send duplicate ACK,
indicating seq. # of next expected byte
Arrival of segment that
partially or completely fills gap
Immediate send ACK, provided that
segment starts at lower end of gap
Transport Layer 3-16
Fast Retransmit
 Retransmission
triggered by timeout
 Time-out period often
relatively long:

long delay before
resending lost packet
 Detect lost segments
via duplicate ACKs.


 If sender receives 3
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:

fast retransmit: resend
segment before timer
expires
Sender often sends
many segments back-toback (pipelining)
If segment is lost,
there will likely be many
duplicate ACKs.
Transport Layer 3-17
Fast retransmit algorithm:
event: ACK received, with ACK field value of y
if (y > SendBase) {
SendBase = y
if (there are currently not-yet-acknowledged segments)
start timer
}
else {
increment count of dup ACKs received for y
if (count of dup ACKs received for y = 3) {
resend segment with sequence number y
}
a duplicate ACK for
already ACKed segment
Y == SendBase (why?)
fast retransmit
Transport Layer 3-18
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-19
TCP: GBN or SR?
 Cumulative Acks --- GBN
 Buffer out-of-order packets --- SR
 Retransmit current BaseSeq only when
timeout --- SR
 TCP: a hybrid protocol
ACK #: not packet number, but byte number
 ACK #: expected (not like rdt x.x)

Transport Layer 3-20
TCP Flow Control
 receive side of TCP
connection has a
receive buffer:
flow control
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
 speed-matching
 app process may be
service: matching the
send rate to the
receiving app’s drain
rate
slow at reading from
buffer
Transport Layer 3-21
TCP Flow control: how it works
 Rcvr advertises spare
(Suppose TCP receiver
discards out-of-order
segments)
 spare room in buffer
room by including value
of RcvWindow in TCP
header
 Sender limits unACKed
data to RcvWindow

guarantees receive
buffer doesn’t overflow
= RcvWindow
= RcvBuffer-[LastByteRcvd LastByteRead]
Transport Layer 3-22
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-23
TCP Connection Management
Recall: TCP sender, receiver establish “connection” before
exchanging data segments
 initialize TCP variables:
 seq. #s
• Why not always 0?
– No confusion
– Security

buffers, flow control info (e.g. RcvWindow)
 client: connection initiator
connect();
 server: contacted by client
accept();
Transport Layer 3-24
TCP Connection Setup --Three-Way Handshaking
Step 1: client host sends TCP SYN
segment to server
 specifies initial seq #
 no data
client
server
Step 2: server host receives SYN,
replies with SYN/ACK segment
server allocates buffers
 specifies server initial seq.
#
Step 3: client receives SYN/ACK,
replies with ACK segment, which
may contain data

Wireshark Example
Transport Layer 3-25
TCP Connection Management (cont.)
Closing a connection:
close();
client
server
close
Step 1: client end system
sends TCP/FIN control
segment to server
FIN, replies with ACK.
Closes connection, sends
FIN.
timed wait
Step 2: server receives
close
closed
Transport Layer 3-26
TCP Connection Management (cont.)
Step 3: client receives FIN,
replies with ACK.

client
server
closing
Enters “timed wait” will respond with ACK
to received FINs
closing
Step 4: server, receives
Some applications simply
send RST to terminate TCP
connections immediately
Wireshark Example
timed wait
ACK. Connection closed.
closed
closed
Transport Layer 3-27
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-28
Principles of Congestion Control
Congestion:
 informally: “too many sources sending too much
data too fast for network to handle”
 different from flow control!
 manifestations:
 lost packets (buffer overflow at routers)
 long delays (queueing in router buffers)
 a top-10 problem!
Transport Layer 3-29
Causes/costs of congestion: scenario 1
Host A
 two senders, two
receivers
 one router,
infinite buffers
 no retransmission
Host B
lout
lin : original data
unlimited shared output
link buffers
 large delays
when congested
 maximum
achievable
throughput
Remember the queue delay formula?
Transport Layer 3-30
Causes/costs of congestion: scenario 2
 one router, finite buffers
 sender retransmission of lost packet
Host A
lin : original data
lout
l'in : original data, plus
retransmitted data
Host B
finite shared output
link buffers
Transport Layer 3-31
Causes/costs of congestion: scenario 2
= l
(goodput)
out
in
 “perfect” retransmission only when loss:
 Always want :
l
l > lout
in
 retransmission of delayed (not lost) packet makes
(than perfect case) for same
R/2
lout
R/2
R/2
lin
a.
R/2
lout
lout
lout
R/3
lin
b.
R/2
l
larger
in
If every packet
forwarded twice
R/4
lin
R/2
c.
“costs” of congestion:
 more work (retrans) for given “goodput”
 unneeded retransmissions: link carries multiple copies of pkt
Transport Layer 3-32
Causes/costs of congestion: scenario 3
 four senders
Q: what happens as l
in
and l increase ?
 multihop paths
 timeout/retransmit
in
Host A
lin : original data
lout
l'in : original data, plus
retransmitted data
finite shared output
link buffers
Host B
Transport Layer 3-33
Causes/costs of congestion: scenario 3
H
o
s
t
A
l
o
u
t
H
o
s
t
B
Another “cost” of congestion:
 when packet dropped, any “upstream” transmission
capacity used for that packet was wasted!
Transport Layer 3-34
Approaches towards congestion control
Two broad approaches towards congestion control:
End-end congestion
control:
 no explicit feedback from
network
 congestion inferred from
end-system observed loss,
delay
 approach taken by TCP
Network-assisted
congestion control:
 routers provide feedback
to end systems
 single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
 explicit rate sender
should send at
Transport Layer 3-35
Chapter 3 outline
 3.1 Transport-layer
services
 3.2 Multiplexing and
demultiplexing
 3.3 Connectionless
transport: UDP
 3.4 Principles of
reliable data transfer
 3.5 Connection-oriented
transport: TCP




segment structure
reliable data transfer
flow control
connection management
 3.6 Principles of
congestion control
 3.7 TCP congestion
control
Transport Layer 3-36
TCP Congestion Control
 end-end control (no network
assistance)
 sender limits transmission:
LastByteSent-LastByteAcked
 CongWin
 Roughly,
rate =

CongWin
Bytes/sec
RTT
Why this formula?
 CongWin is dynamic, function
of perceived network
congestion
How does sender
perceive congestion?
 loss event = timeout or
3 duplicate acks
 TCP sender reduces
rate (CongWin) after
loss event
three mechanisms:



AIMD
slow start
conservative after
timeout events
Transport Layer 3-37
TCP AIMD
multiplicative decrease:
cut CongWin in half
after loss event
congestion
window
24 Kbytes
additive increase:
increase CongWin by
1 MSS every RTT in
the absence of loss
events: probing
MSS: Maximum Segment Size
16 Kbytes
8 Kbytes
time
Long-lived TCP connection
Transport Layer 3-38
TCP Slow Start
 When connection begins,
CongWin = 1 MSS


Example: MSS = 500
bytes & RTT = 200 msec
initial rate = 20 kbps
 When connection begins,
increase rate
exponentially fast until
first loss event
 available bandwidth may
be >> MSS/RTT

desirable to quickly ramp
up to respectable rate
Transport Layer 3-39
TCP Slow Start (more)
 When connection


Host B
RTT
begins, increase rate
exponentially until
first loss event:
Host A
double CongWin every
RTT
done by incrementing
CongWin for every
ACKed segment
 Summary: initial rate
is slow but ramps up
exponentially fast
time
Transport Layer 3-40
Refinement (more)
Q: When should the
exponential
increase switch to
linear?
A: When CongWin
gets to 1/2 of its
value before
timeout.
Implementation:
 Variable Threshold
 At loss event, Threshold is
set to 1/2 of CongWin just
before loss event
Transport Layer 3-41
Refinement
Philosophy:
 After 3 dup ACKs:
is cut in half
 window then grows
linearly
 But after timeout event:
 CongWin instead set to
1 MSS;
 window then grows
exponentially
 to a threshold, then
grows linearly
 CongWin
• 3 dup ACKs indicates
network capable of
delivering some segments
• timeout before 3 dup
ACKs is “more alarming”
Transport Layer 3-42
Summary: TCP Congestion Control
 When CongWin is below Threshold, sender in
slow-start phase, window grows exponentially.
 When CongWin is above Threshold, sender is in
congestion-avoidance phase, window grows linearly.
 When a triple duplicate ACK occurs, Threshold
set to CongWin/2 and CongWin set to
Threshold.
 When timeout occurs, Threshold set to
CongWin/2 and CongWin is set to 1 MSS.
Transport Layer 3-43
TCP sender congestion control
State
Event
TCP Sender Action
Commentary
Slow Start
(SS)
ACK receipt
for previously
unacked
data
CongWin = CongWin + MSS,
If (CongWin > Threshold)
set state to “Congestion
Avoidance”
Resulting in a doubling of
CongWin every RTT
Congestion
Avoidance
(CA)
ACK receipt
for previously
unacked
data
CongWin = CongWin+MSS *
(MSS/CongWin)
Additive increase, resulting
in increase of CongWin by
1 MSS every RTT
SS or CA
Loss event
detected by
triple
duplicate
ACK
Threshold = CongWin/2,
CongWin = Threshold,
Set state to “Congestion
Avoidance”
Fast recovery,
implementing multiplicative
decrease. CongWin drops
to half (not drop below 1
MSS).
SS or CA
Timeout
Threshold = CongWin/2,
CongWin = 1 MSS,
Set state to “Slow Start”
Enter slow start
SS or CA
Duplicate
ACK
Increment duplicate ACK count
for segment being acked
CongWin and Threshold not
changed
Transport Layer 3-44
TCP Fairness
Fairness goal: if K TCP sessions share same
bottleneck link of bandwidth R, each should have
average rate of R/K
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
Transport Layer 3-45
Why is TCP fair?
Two competing sessions:
 Additive increase gives slope of 1, as throughout increases
 multiplicative decrease decreases throughput proportionally
R
equal bandwidth share
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 throughput R
Transport Layer 3-46
Fairness (more)
Fairness and UDP
 Multimedia apps often
do not use TCP

do not want rate
throttled by congestion
control
 Instead use UDP:
 pump audio/video at
constant rate, tolerate
packet loss
 Research area: TCP
friendly
Fairness and parallel TCP
connections
 nothing prevents app from
opening parallel cnctions
between 2 hosts.
 Web browsers do this
 Example: link of rate R
supporting 9 cnctions;


new app asks for 1 TCP, gets
rate R/10
new app asks for 9 TCPs,
gets R/2 !
Transport Layer 3-47
Chapter 3: Summary
 principles behind transport
layer services:
 multiplexing,
demultiplexing
 reliable data transfer
 flow control
 congestion control
 instantiation and
implementation in the
Internet
 UDP
 TCP
Next:
 leaving the network
“edge” (application,
transport layers)
 into the network
“core”
Transport Layer 3-48
Download