TCP: Overview RFCs: 793, 1122, 1323, 2018, 2581

advertisement
TCP: Overview
RFCs: 793, 1122, 1323, 2018, 2581
 point-to-point: one sender, one receiver
 connection-oriented:
 exchange control msgs first to initialize sender & receiver state
 full duplex data delivery:
 bi-directional data flow over the same connection
 reliable, in-order byte steam delivery
 no “message boundaries”
 sender & receiver must buffer data
 flow controlled
 Prevent sender from flooding receiver
 Congestion controlled
 Reduce potential jam in the network
Socket
Interface
4//26/05
TCP control
parameters(state)
application
writes data
application
reads data
TCP send buffer
TCP receive buff
1
CS118
What defines a TCP connection
uses 4 values to define a connection (a
communication association)
 TCP
local-host-addr, local-port#, remote-host-addr, remote-port#
 each of the two ends keeps state for on-going
communication
 sequence#
for data sent, received, ack'ed,
retransmission timer, flow & congestion window
TCP
UDP
IP
Ethernet
4//26/05
2
CS118
Issues To Consider
 packets may be lost,duplicated,re-ordered
 packets can be delayed arbitrarily long inside the
network
 the
delay between two communicating ends is
unknown beforehand and may vary over time
 port numbers can be reused later
a
later connection must not mistake packets from an
earlier connection as its own
4//26/05
3
CS118
TCP segment format
URG: urgent data
(generally not used)
ACK: ACK #
field valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab.
(setup, teardown
commands)
checksum
(as in UDP)
IP header
source port #
dest port #
sequence number
acknowledgement number
head not
len used U A P R S F
checksum
rcvr window size
ptr to urgent data
Options (variable length)
counting
by bytes
of data
# bytes
rcvr willing
to accept
application
data
(variable length)
32 bits
4//26/05
4
CS118
TCP Connection Establishment
listen( )
 initialize TCP control
variables:
 Initial seq. # used in each
direction
 Buffer size (rcvWindow)
client
server
connect( )
Three way handshake
1: client host sends TCP SYN segment
to server
connection
established
 specifies initial seq #
 Does not carry data
2: server receives SYN, replies with
SYN_ACK and SYN control segment
3: client end sends SYN_ACK

4//26/05
connection
established
May carry data
5
CS118
TCP Connection Close
 Either end can initiate the close of its
A
client
end of the connection at any time
1: one end (A) sends TCP FIN control
segment to the other
B
server
close( )
2: the other end (B) receives FIN,
replies with FIN_ACK; when it’s
ready to close too, send FIN
close( )
3: A receives FIN, replies with FINACK.
?
4: B receives FIN_ACK, close
connection
what problem does A have?
4//26/05
connection
closed
6
CS118
the well-known “two-army problem”
Blue army
Red army
Red army
Q: how can the 2 red armies agree on an attack
time?
Fact: the last one who send a message does not
whether the msg is delivered
Basic rule: one cannot send an ACK to
acknowledge an ACK
4//26/05
7
CS118
TCP Connection Close
A
1: one end (A) sends TCP FIN control
segment to the other
client
B
server
close( )
2: the other end (B) receives FIN,
replies with FIN_ACK; when it’s
ready to close too, send FIN
close( )
4: B receives FIN_ACK, close
connection
A Enters “timed wait”, waits for 2
min before deleting the connection
state
 Abort a connection: send “reset” to
the other end, enter closed state
immediately

4//26/05
timed wait
3: A receives FIN, replies with ACK.
connection
closed
connection
closed
All data assumed lost
8
CS118
TCP Connection Management (cont)
wait 2 min
TCP server
lifecycle
TCP client
lifecycle
4//26/05
9
CS118
A
I-finished(M)
TCP state-transition diagram
B
CLOSED
ACK (M+1)
Active open/SYN
Passive open
Close
Close
LISTEN
I-finished(N)
ack(N+1)
wait for 2MSL
before deleting
the conn state
SYN_RCVD
SYN/SYN + ACK
Send/SYN
SYN/SYN + ACK
Done
ACK
Close/FIN
SYN_SENT
SYN + ACK/ACK
ESTABLISHED
Close/FIN
FIN/ACK
FIN_WAIT_1
CLOSE_WAIT
FIN/ACK
ACK
Close/FIN
FIN_WAIT_2
CLOSING
FIN/ACK
4//26/05
10
ACK Timeout after two
segment lifetimes
TIME_WAIT
LAST_ACK
ACK
CLOSED
CS118
How to Set TCP Retransmission Timer
 TCP sets rxt timer based
Timeout!
on measured RTT
data
ACK
SRTT: EstimatedRTT
SRTT= (1-) x SRTT +  x
SampleRTT
retrans.
data
Timeout
 Setting retransmission
timer:
 SRTT
retrans.
plus “safety margin”
SampleRTT
Timer= SRTT + 4 X rttvar
4//26/05
data
ACK
11
CS118
After obtain a new RTT sample:
 difference = SampleRTT - SRTT
 SRTT = (1-) x SRTT +  x SampleRTT
= SRTT +  x difference
 rttvar = (1-) x rttvar +  x |difference| )
= rttvar +  (|difference| - rttvar)
 Retransmission Timer (RTO) = SRTT + 4 x rttvar
Typically:  = 1/8,  = 1/4
4//26/05
12
CS118
An Example
Assuming SRTT = 500 msec, rttvar = 120,
RTT(3)=600ms,  = |RTT - SRTT| = 100ms
SRTT = 500 + 0.125 * 100 = 512.5
rttvar = 120 + 0.25 (100 - 120) = 115
RTO = SRTT + 4 * rttvar = 512.5 + 460 = 972.5 ms
RTT(4)=650ms,  = |RTT - SRTT| =137ms
SRTT = 512 + 0.125 * 137 = 529
rttvar = rttvar + 0.25 (137 - 115) = 120
sender
600
650
receiver
4//26/05
13
CS118
Example RTT estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
4//26/05
Estimated RTT
14
CS118
How to measure RTT in cases of
retransmissions?
Options
 take the delay between first transmission and
final ACK?
 take the delay between last retransmission of
segment(n) and ACK(n)?
D
S
 Don’t measure?
RTT?
timeout
4//26/05
15
CS118
Karn’s algorithm
in case of retransmission
 do not take the RTT sample (do not update SRTT
or rttvar)
 double the retransmission timer value (RTO) after
each timeout
 Take RTT measure again upon next transmission
(without retrans.)
4//26/05
16
CS118
One more question
What initial SRTT, rttvar values to start with?
Currently by some engineered guessing
what if the guessed value too small?
Unnecessary
retransmissions
what if the guessed value too large?
In case of first or first few packets being lost, wait
longer than necessary before retransmission
current practice
initial SRTT value: 3 sec, rttvar 3 sec
when get first RTT, SRTTRTT, rttvar=SRTT/2
4//26/05
17
CS118
TCP’s seq. #s and ACK #s
Seq. #:
 The number of
first byte in
segment’s data
ACK #:
 seq # of next byte
expected from
other side
 cumulative ACK
Host A
Host B
Host A
sends
10byte data
host B ACKs
receipt of 10B
data from A,
and sends
5byte data
host ACKs
receipt
of 5B
A simple example
4//26/05
18
time
CS118
How to guarantee seq. # uniqueness
 sequence#s will eventually wrap around
 TCP assumes Maximum Segment Lifetime
(MSL) of 120 sec.
 make sure that for the same
[src-addr, src-port, dest-addr, dest-port]
tuple, the same sequence number does not get
reused within 2xMSL
 assure
that no two different data segments can bear
the same sequence number, as long as data’s life
time < 120 sec.
4//26/05
19
CS118
TCP: reliable data transfer
simplified sender, assuming
• one way data transfer
• not flow/congestion control
event: data received
from application
create, send segment
wait
wait
for
for
event
event
event: timeout
for segment
with seq # y
retransmit
segment
event: ACK received,
with ACK # y
ACK processing
4//26/05
00 SendBase = Initial_SeqNumber
01 NextSeqnum = Initial_SeqNumber
02
03 loop (forever) {
04 switch(event)
05 event: data received from application above
06
create TCP segment with seq. number NextSeqNum
07
start timer for segment SextSeqNum
08
pass segment to IP
09
NextSeqNum = NextSeqNum + length(data)
10 event: timer timeout for segment with seq. number y
11
retransmit segment with sequence number y
12
compute new timeout interval for segment y
13
restart timer
14 event: ACK received, with ACK field value of y
15
if (y > SendBase) {/* cumulative ACK of all data up to y*/
16
SendBase = y
17
If (any outstanding not-yet-ack'ed segments)
18
Start timer }
19
else { /* a duplicate ACK for already ACKed segment */
20
increment count of duplicate ACKs received for y
21
if (count of dup. ACKS received for y = 3) {
22
resend segment with sequence number y
23
reset dup. count
24
}
25 } /* end of loop forever */
20
CS118
Fast Retransmit
 Time-out period often
relatively long:

 If sender receives 3
long delay before
resending lost packet
ACKs for the same
data, it supposes that
segment after ACKed
data was lost:
 Detect lost segments via
duplicate ACKs.
Sender often sends many
segments back-to-back
 If segment is lost, there
will likely be many
duplicate ACKs.

4//26/05

21
fast retransmit: resend
segment before timer
expires
CS118
TCP: retransmission scenarios
Host A
X
loss
Sendbase
= 100
SendBase
= 120
SendBase
= 100
time
4//26/05
Host B
Seq=92 timeout
Host B
SendBase
= 120
Seq=92 timeout
timeout
Host A
time
lost ACK scenario
22
premature timeout
CS118
TCP retransmission scenarios (more)
Host A
Host A
Host B
Host B
timeout
timeout
X
X
loss
ACK592
ACK592
ACK592
ACK592
timeout
SendBase
= 120
time
time
Fast RXT scenario
Cumulative ACK scenario
4//26/05
23
CS118
TCP Receiver: when to send ACK?
Event
TCP Receiver action
in-order segment arrival, no gaps,
everything earlier already ACKed
delayed ACK: wait up to 500ms,
If nothing arrived, send ACK
in-order segment arrival, no gaps,
one delayed ACK pending
immediately send one
cumulative ACK
out-of-order arrival: higher-thanexpect seq. #, gap detected
send duplicate ACK, indicating
seq. # of next expected byte
arrival of segment that partially or
completely fills a gap
immediate ACK if segment starts
at the lower end of the gap
4//26/05
24
CS118
TCP Flow Control
flow control
Prevent sender from overrunning
receiver’s buffer by transmitting
too much too fast
receiver: informs sender of
(dynamically changing)
amount of free buffer space
 RcvWindow field in
TCP header
sender: keeps the amount of
transmitted, unACKed data
no more than most recently
received RcvWindow
throughput = window-size bytes/sec
RTT
Special case: When RcvWindow = 0
• sender can send a 1-byte segment
• receiver can respond with current size
• receiver buffer eventually freed  windown
size increased
4//26/05
25
CS118
Design Choice:
Counting bytes or counting packets?
pro’s of counting bytes: flexibility
 need a byte counter somewhere anyway
 can repackage data for retransmission
 e.g.
first sent segment-1 with 200 bytes
 300 more bytes are passed down from application
 Segment-1 times out, send new segment with 500
byte data
200
4//26/05
300
26
CS118
Counting Bytes: con's
 sequence number runs out faster
 needs
a larger sequence# field
 easily fall into traps of transmitting small packets
 network
overhead goes up with the number of packets
transmitted
 silly window syndrome: receiver ACKed a single
byte, causing sender to send single byte segment
forever
4//26/05
27
CS118
Design Choices:
Understand the consequence of the design
 TCP sequence number: 32 bits4 Gbytes
 wrap-around time:
•
•
•
•
50 Kbps: ~20 hours
Ethernet (10 Mbps): about an hour
FDDI (100 Mbps): 6 minutes
at 1Gbps: about 30 seconds
 TCP window size: 16-bits64Kbytes max
assume RTT = 100 msec
 can keep a channel of 5 Mbps fully utilized
 OC3(155 Mbps) x 100 msec = 1.9 MB, need a window size at
least 21 bits
 1 Gbps x 100 msec =
4//26/05
28
CS118
Always Keeps the Big Picture in Mind
M
Ht M
Hn Ht M
Hl Hn Ht M
application
transport
network
link
physical
Web
server
Web
browser
HTTP
Socket interface
TCP
HTTP
Socket interface
TCP
Unreliable network
data packet
delivery
Application process
Application process
Write
bytes
TCP
TCP
Send buffer
Receive buffer
segment
4//26/05
Read
bytes
29
segment
CS118
Download