CS244a: An Introduction to Computer Networks

advertisement
Transport Layer

Transport Layer Services




connection-oriented vs. connectionless
multiplexing and demultplexing
UDP: Connectionless Unreliable Service
TCP: Connection-Oriented Reliable Service



connection management: set-up and tear down
reliable data transfer protocols
flow and congestion control
Readings: Chapter 5
Fall 2007
CSci232:
Transport Layer & TCP
1
Transport Protocols
• Lowest level end-toend protocol.
– Header generated by
sender is interpreted
only by the destination
– Routers view transport
header as part of the
payload
7
7
6
6
5
5
Transport
Transport
IP
IP
IP
Datalink
2
2
Datalink
Physical
1
1
Physical
router
Fall 2007
CSci232:
Transport Layer & TCP
2
Transport Services and Protocols
• provide logical communication
between app processes
running on different hosts
• transport protocols run in
end systems
application
transport
network
data link
physical
network
data link
physical
network
data link
physical
network
data link
physical
– send side: breaks app
messages into segments,
passes to network layer
– rcv side: reassembles
segments into messages,
passes to app layer
network
data link
physical
network
data link
physical
application
transport
network
data link
physical
• more than one transport
protocol available to apps
– Internet: TCP and UDP
Fall 2007
CSci232:
Transport Layer & TCP
3
Transport Layer Services
• Underlying best-effort network
–
–
–
–
drops messages
re-orders messages
delivers duplicate copies of a given message
delivers messages after an arbitrarily long delay
• Common end-to-end services
–
–
–
–
–
Fall 2007
guarantee message delivery
deliver messages in the same order they are sent
deliver at most one copy of each message
allow the receiver to flow control the sender
support multiple application processes on each host
CSci232:
Transport Layer & TCP
4
Transport vs.
Application and Network Layer
• application layer:
application processes
and message exchange
• network layer: logical
communication
between hosts
• transport layer: logical
communication support
for app processes
– relies on, enhances,
network layer services
Fall 2007
CSci232:
Household analogy:
12 kids sending letters to
12 kids
• processes = kids
• app messages = letters
in envelopes
• hosts = houses
• transport protocol =
Ann and Bill
• network-layer protocol
= postal service
Transport Layer & TCP
5
End to End Issues
• Transport services built on top of (potentially)
unreliable network service
– packets can be corrupted or lost
– Packets can be delayed or arrive “out of order”
• Do we detect and/or recover errors for apps?
–
Error Control & Reliable Data Transfer
• Do we provide “in-order” delivery of packets?
–
Connection Management & Reliable Data Transfer
• Potentially different capacity at destination, and
potentially different network capacity
– Flow and Congestion Control
Fall 2007
CSci232:
Transport Layer & TCP
6
Internet Transport Protocols
TCP service:
• connection-oriented: setup
required between client,
server
• reliable transport between
sender and receiver
• flow control: sender won’t
overwhelm receiver
• congestion control: throttle
sender when network
overloaded
UDP service:
• unreliable data transfer
between sender and
receiver
• does not provide:
connection setup,
reliability, flow control,
congestion control
Both provide logical communication between app processes
running on different hosts!
Fall 2007
CSci232:
Transport Layer & TCP
7
Multiplexing/Demultiplexing
Multiplexing at send host:
gathering data from multiple
app processes, enveloping data
with header (later used for
demultiplexing)
Demultiplexing at rcv host:
delivering received segments
to correct application process
= API (“socket”)
application
transport
network
P3
= process
P1
P1
application
P2
transport
network
link
P4
application
transport
network
link
link
physical
host 1
Fall 2007
physical
host 2
CSci232:
Transport Layer & TCP
physical
host 3
8
How Demultiplexing Works
• host receives IP datagrams
– each datagram has source IP
address, destination IP
address
– each datagram carries 1
transport-layer segment
– each segment has source,
destination port number
(recall: well-known port
numbers for specific
applications)
• host uses IP addresses & port
numbers to direct segment to
appropriate app process
(identified by “socket’)
Fall 2007
CSci232:
32 bits
source port #
dest port #
other header fields
application
data
(message)
TCP/UDP segment format
Transport Layer & TCP
9
UDP: User Datagram Protocol [RFC 768]
• “no frills,” “bare bones”
Internet transport
protocol
• “best effort” service, UDP
segments may be:
– lost
– delivered out of order to
app
• connectionless:
– no handshaking between
UDP sender, receiver
– each UDP segment handled
independently of others
Fall 2007
CSci232:
Why is there a UDP?
• no connection
establishment (which can
add delay)
• simple: no connection state
at sender, receiver
• small segment header
• no congestion control: UDP
can blast away as fast as
desired
Transport Layer & TCP
10
UDP (cont’d)
• often used for
streaming multimedia
apps
Length, in
– loss tolerant
– rate sensitive
• other UDP uses
– DNS
– SNMP
bytes of UDP
segment,
including
header
• reliable transfer over
UDP: add reliability at
application layer
– application-specific
error recovery!
Fall 2007
32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
CSci232:
Transport Layer & TCP
11
UDP Checksum
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
Sender:
• treat segment contents
as sequence of 16-bit
integers
• checksum: addition (1’s
complement sum) of
segment contents
• sender puts checksum
value (1’s complement of
1’s complement sum of 16bit words) into UDP
checksum field
Fall 2007
CSci232:
Receiver:
•
•
compute checksum of received
segment
check if computed checksum
equals checksum field value:
– NO - error detected
– YES - no error detected. But
maybe errors nonetheless? More
later ….
Transport Layer & TCP
12
Checksum: Example
arrange data segment
in sequences of
16-bit words
+
sum:
checksum(1’s complement):
0110011001100110
1101010101010101
0000111100001111
0100101011001011
1011010100110100
verify by adding: 1111111111111111
Fall 2007
CSci232:
Transport Layer & TCP
13
TCP Overview
• Full duplex
• Flow control: keep sender from
• Connection-oriented
• Byte-stream
overrunning receiver
– app writes bytes
– TCP sends segments
– app reads bytes
• Congestion control: keep sender
from overrunning network
Application process
Write
bytes
…
…
Application process
TCP
TCP
Send buffer
Receive buffer
Segment
Segment
…
Read
bytes
Segment
Transmit segments
Fall 2007
CSci232:
Transport Layer & TCP
14
Functionality Split
• Network provides best-effort delivery
• End-systems implement many functions
–
–
–
–
–
–
–
–
Fall 2007
Reliability
In-order delivery
Demultiplexing
Message boundaries
Connection abstraction
Flow Control
Congestion control
…
CSci232:
Transport Layer & TCP
15
High-Level TCP Characteristics
• Protocol implemented entirely at the ends
– Fate sharing
• Protocol has evolved over time and will
continue to do so
–
–
–
–
Fall 2007
Nearly impossible to change the header
Use options to add information to the header
Change processing at endpoints
Backward compatibility is what makes it TCP
CSci232:
Transport Layer & TCP
16
Evolution of TCP
1984
Nagel’s algorithm
to reduce overhead
of small packets;
predicts congestion
collapse
1975
Three-way handshake
Raymond Tomlinson
In SIGCOMM 75
1983
BSD Unix 4.2
supports TCP/IP
1974
TCP described by
Vint Cerf and Bob Kahn
In IEEE Trans Comm
1987
Karn’s algorithm
to better estimate
round-trip time
1986
Congestion
collapse
observed
1982
TCP & IP
RFC 793 & 791
1975
Fall 2007
1980
1985
CSci232:
Transport Layer & TCP
1990
4.3BSD Reno
fast retransmit
delayed ACK’s
1988
Van Jacobson’s
algorithms
congestion avoidance
and congestion control
(most implemented in
4.3BSD Tahoe)
1990
17
TCP Through the 1990s
1994
T/TCP
(Braden)
Transaction
TCP
1993
TCP Vegas
(Brakmo et al)
real congestion
avoidance
1993
Fall 2007
1994
ECN
(Floyd)
Explicit
Congestion
Notification
1996
SACK TCP
(Floyd et al)
Selective
Acknowledgement
1996
Hoe
Improving TCP
startup
1994
1996
FACK TCP
(Mathis et al)
extension to SACK
1996
CSci232:
Transport Layer & TCP
18
TCP Segment Header Structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
Fall 2007
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
counting
by bytes
of data
(not segments!)
rcvr window size
ptr urgent data
Options (variable length)
# bytes
rcvr willing
to accept
application
data
(variable length)
CSci232:
Transport Layer & TCP
19
TCP Segment Format (cont)
• Each connection identified with 4-tuple:
– (SrcPort, SrcIPAddr, DstPort, DstIPAddr)
• Sliding window + flow control
– acknowledgment, SequenceNum, AdvertisedWinow
Data(SequenceNum)
Sender
• Flags
Receiver
Acknowledgment +
AdvertisedWindow
– SYN, FIN, ACK, RESET, PUSH, URG
• Checksum
– pseudo header (src & dst IP addresses) + TCP header + data
Fall 2007
CSci232:
Transport Layer & TCP
20
TCP Connection Set Up
Three way handshake:
TCP sender, receiver establish
Step 1: client sends TCP SYN
“connection” before
control segment to server
exchanging data segments
– specifies initial seq #
• initialize TCP variables:
– seq. #
– buffers, flow control info
• client: end host that
initiates connection
• server: end host contacted
by client
Step 2: server receives SYN,
replies with SYN+ACK control
segment
– ACKs received SYN
– specifies server  receiver
initial seq. #
Step 3:client receives SYN+ACK,
replies with ACK segment (which
may contain 1st data segment)
Fall 2007
CSci232:
Transport Layer & TCP
21
TCP 3-Way Hand-Shake
client
server
Question:
initiate
connection
SYN
received
a. What kind of
“state” client and
server need to
maintain?
b. What initial
sequence # should
client (and server)
use?
connection
established
connection
established
Fall 2007
CSci232:
Transport Layer & TCP
22
TCP Connection Setup Example
No. Time
Source
>
Destination
1 13.734375 70.13.155.114
128.101.35.150
Seq=758244755 Len=0 MSS=1260
Proto SrcPort>DstPort [Flags]
TCP 1414 > 22 [SYN]
2 13.968750 128.101.35.150
70.13.155.114
TCP 22 > 1414 [SYN, ACK]
Seq=3778406755 Ack=758244756 Win=25200 Len=0 MSS=1460
3 13.968750 70.13.155.114
128.101.35.150
TCP
Seq=758244756 Ack=3778406756 Win=16384 Len=0
Fall 2007
CSci232:
1414 > 22 [ACK]
Transport Layer & TCP
23
TCP Connection Setup Example
No. Time
Source
>
Destination
Proto SrcPort>DstPort [Flags]
1 13.6611233 70.13.155.114
128.101.35.204
TCP 1567 > 80 [SYN]
Seq=3724852786 Len=0 MSS=1260
2 13.890625 128.101.35.204
70.13.155.114
TCP 80> 1567 [SYN, ACK]
Seq=484733971 Ack=3724852787 Win=25200 Len=0 MSS=1460
3 13.890625 70.13.155.114
128.101.35.204
TCP
Seq=3724852787 Ack=484733972 Win=17640 Len=0
1567 > 80 [ACK]
4 13.890625 70.13.155.114
128.101.35.204
TCP
Seq=73724852787 Ack=484733972 Win=17640 Len=564
1567 > 80 [PSH,ACK]
5 14.630860 128.101.35.204
70.13.155.114
TCP 80> 1567 [ACK]
Seq=484733972 Ack=3724853351 Win=25200 Len=0 MSS=1460
Fall 2007
CSci232:
Transport Layer & TCP
24
3-Way Handshake: Finite State Machine
Client FSM?
Server FSM?
Upper layer:
initiate connection
sent SYN w/ initial seq =x
?
?
SYN
sent
?
closed
?
?
?
SYN+ACK received
sent ACK
?
conn
estab’ed
?
info (“state”) maintained at client?
Fall 2007
CSci232:
Transport Layer & TCP
25
Connection Setup Error Scenarios
• Lost (control) packets
– What happen if SYN lost? client vs. server actions
– What happen if SYN+ACK lost? client vs. server actions
– What happen if ACK lost? client vs. server actions
• Duplicate (control) packets
– What does server do if duplicate SYN received?
– What does client do if duplicate SYN+ACK received?
– What does server do if duplicate ACK received?
Fall 2007
CSci232:
Transport Layer & TCP
26
Connection Setup Error Scenarios
(cont’d)
• Importance of (unique) initial seq. no.?
– When receiving SYN, how does server know it’s a new
connection request?
– When receiving SYN+ACK, how does client know it’s a
legitimate, i.e., a response to its SYN request?
• Dealing with old duplicate packets from old
connections (or from malicious users)
– If not careful: “TCP Hijacking”
• How to choose unique initial seq. no.?
–
randomly choose a number (and add to last syn# used)
• Other security concern:
– “SYN Flood” -- denial-of-service attack
Fall 2007
CSci232:
Transport Layer & TCP
27
TCP State Diagram: Connection Setup
Client
CLOSED
Server
passive OPEN
CLOSE
delete TCB
create TCB
CLOSE
delete TCB
LISTEN
SYN
RCVD
active OPEN
create TCB
Snd SYN
SEND
snd SYN
rcv SYN
snd SYN ACK
rcv SYN
snd ACK
SYN
SENT
Rcv SYN, ACK
rcv ACK of SYN
Snd ACK
CLOSE
Send FIN
Fall 2007
ESTAB
CSci232:
Transport Layer & TCP
29
TCP: Closing Connection
Remember TCP duplex connection!
Client wants to close connection:
Step 1: client end system sends
TCP FIN control segment to
server
client
client
closing
Step 2: server receives FIN,
replies with ACK. half closed
Step 3: client receives ACK.
half
closed
half closed, wait for server to close
server
half
closed
server
closing
Server finishes sending data,
also ready to close:
Step 4: server sends FIN.
Fall 2007
CSci232:
Transport Layer & TCP
30
TCP: Closing Connection (cont’d)
Step 5: client receives FIN,
replies with ACK.
connection fully closed
client
client
closing
Step 6: server, receives
ACK. connection fully
closed
half
closed
server
half
closed
server
closing
Well Done!
full
closed
Problem Solved?
Fall 2007
CSci232:
Transport Layer & TCP
full
closed
31
TCP: Closing Connection (revised)
client
Two Army Problem!
Step 5: client receives FIN,
replies with ACK.
– Enters “timed wait” - will
respond with ACK to
received FINs
server
client
closing
half
closed
half
closed
server
closing
ACK. connection fully
closed
Step 7: client, timer expires,
connection fully closed
timed wait
Step 6: server, receives
X
timeout
full
closed
full closed
Fall 2007
CSci232:
Transport Layer & TCP
32
TCP Connection Tear-Down Example
No. Time
Source
> Destination
Proto
80 35.156250 70.13.155.114 128.101.35.150
TCP
Seq=758246388 Ack=3778411633 Win=15920 Len=32
81 35.156250 70.13.155.114 128.101.35.150
TCP
Seq=758246420 Ack=3778411633 Win=15920 Len=0
SrcPort>DstPort [Flags]
1414 > 22 [PSH,ACK]
1414 > 22 [FIN, ACK]
82 35.437500 128.101.35.150
70.13.155.114
TCP 22 > 1414 [ACK]
Seq=3778411633 Ack=758246420 Win=25200 Len=0 13.968750
83 35.453125 128.101.35.150
70.13.155.114
Seq=3778411633 Ack=758246421 Win=25200 Len=0
84 35.453125 128.101.35.150
70.13.155.114
Seq=3778411633 Ack=758246421 Win=25200 Len=0
TCP 22 > 1414 [ACK]
13.968750
TCP 22 > 1414 [FIN,ACK]
13.968750
85 35.453125 70.13.155.114
128.101.35.150
Seq=758246421 Ack=3778411634 Win=15920 Len=0
TCP
Fall 2007
CSci232:
1414 > 22 [ACK]
Transport Layer & TCP
33
State Diagram: Connection Tear-down
CLOSE
send FIN
FIN
WAIT-1
ACK
FIN WAIT-2
Active Close ESTAB
CLOSE
send FIN
rcv FIN
Passive Close
send ACK
CLOSE
WAIT
rcv FIN
snd ACK
CLOSE
snd FIN
rcv FIN+ACK
snd ACK CLOSING
LAST-ACK
rcv ACK of FIN
rcv FIN
snd ACK
Fall 2007
rcv ACK of FIN
TIME WAIT
CLOSED
Timeout=2min
delete TCB
CSci232:
Transport Layer & TCP
34
TCP Connection Management FSM
TCP client lifecycle
TCP client
lifecycle
Fall 2007
CSci232:
Transport Layer & TCP
35
TCP Connection Management FSM
TCP server lifecycle
TCP server
lifecycle
Fall 2007
CSci232:
Transport Layer & TCP
36
Reliability and Error Recovery
• ARQ vs. FEC
– automatic retransmission request
– forward error correction
• General ARQ Algorithms
– Stop & Wait
• Perform issue: low utilization when delay-bw product large
– Sliding Window Protocols
• Go-Back-N
• Selective Repeat
• Key design issues: window size vs. size of seq. no. space
Fall 2007
CSci232:
Transport Layer & TCP
37
Error Recovery: Stop and Wait
• ARQ
– Receiver sends
acknowledgement (ACK) when
it receives packet
– Sender waits for ACK and
timeouts if it does not arrive
within some time period
Fall 2007
CSci232:
Receiver
Timeout
• Simplest ARQ protocol
• Send a packet, stop and
wait until ACK arrives
Sender
Time
Transport Layer & TCP
38
ACK lost
Fall 2007
Timeout
Timeout
Timeout
Timeout
Timeout
Time
Timeout
Recovering from Error
Packet lost
CSci232:
Transport Layer & TCP
Early timeout
DUPLICATE
PACKETS!!!
39
Problems with Stop and Wait
• How to recognize a duplicate
• Performance
– Can only send one packet per round trip
Fall 2007
CSci232:
Transport Layer & TCP
40
How to Recognize Resends?
• Use sequence numbers
– both packets and acks
• Sequence # in packet is
finite  How big should it
be?
– For stop and wait?
• One bit – won’t send seq
#1 until received ACK for
seq #0
Fall 2007
CSci232:
Transport Layer & TCP
41
Problem with Stop & Wait Protocol
Sender
Receiver
first packet bit
transmitted, t = 0
RTT
first packet bit
arrives
ACK arrives, send
next packet, t =
RTT + L / R
• Can’t keep the pipe full
– Utilization is low
when bandwidth-delay product (R x RTT)is large!
Fall 2007
CSci232:
Transport Layer & TCP
42
Stop & Wait: Performance Analysis
Example:
1 Gbps connection, 15 ms end-end prop. delay,
data segment size: 1 KB = 8Kb
Ttransmit
U sender
L (packetlengt hin bit s)
8 kb

 9
R (t ransmission rat e,bps) 10 b/s
 8 106 s  0.008 ms
L/R
L
.008



 0.00027
RTT  L / R RTT * R  L 30.008
– U sender: utilization, i.e., fraction of time sender busy sending
– 1KB data segment every 30 msec (round trip time)
--> 0.027% x 1 Gbps = 33kB/sec throughput over 1 Gbps link
Moral of story:
network protocol limits use of physical resources!
Fall 2007
CSci232:
Transport Layer & TCP
43
How to Keep the Pipe Full?
• Send multiple packets without
waiting for first to be acked
– Number of pkts in flight = window
• Reliable, unordered delivery
– Several parallel stop & waits
– Send new packet after each ack
– Sender keeps list of unack’ed
packets; resends after timeout
– Receiver same as stop & wait
• How large a window is
needed?
– Suppose 10Mbps link, 4ms delay,
500byte pkts
• 1? 10? 20?
– Round trip delay * bandwidth =
capacity of pipe
Fall 2007
CSci232:
Transport Layer & TCP
44
Pipelined (Sliding Window) Protocols
Pipelining: sender allows multiple, “in-flight”, yet-tobe-acknowledged data segments
– range of sequence numbers must be increased
– buffering at sender and/or receiver
• Two generic forms of pipelined protocols:
Go-Back-N and Selective Repeat
Fall 2007
CSci232:
Transport Layer & TCP
45
Pipelining: Increased Utilization
sender
receiver
first packet bit transmitted, t = 0
last bit transmitted, t = L / R
first packet bit arrives
last packet bit arrives, send ACK
last bit of 2nd packet arrives, send ACK
last bit of 3rd packet arrives, send ACK
RTT
ACK arrives, send next
packet, t = RTT + L / R
Increase utilization
by a factor of 3!
U
Fall 2007
sender
=
3*L/R
RTT + L / R
CSci232:
=
.024
30.008
= 0.0008
microsecon
ds
Transport Layer & TCP
46
Sliding Window
• Reliable, ordered delivery
• Receiver has to hold onto a packet until all
prior packets have arrived
– Why might this be difficult for just parallel stop & wait?
– Sender must prevent buffer overflow at receiver
• Circular buffer at sender and receiver
– Packets in transit  buffer size
– Advance when sender and receiver agree packets at
beginning have been received
Fall 2007
CSci232:
Transport Layer & TCP
47
Sender/Receiver State
Sender
Max ACK received
Receiver
Next expected
Next seqnum
…
…
…
…
Sender window
Receiver window
Sent & Acked
Sent Not Acked
OK to Send
Not Usable
Fall 2007
Max acceptable
CSci232:
Received & Acked
Acceptable Packet
Not Usable
Transport Layer & TCP
48
Window Sliding – Common Case
• On reception of new ACK (i.e. ACK for something that
was not acked earlier)
– Increase sequence of max ACK received
– Send next packet
• On reception of new in-order data packet (next
expected)
– Hand packet to application
– Send cumulative ACK – acknowledges reception of all packets up to
sequence number
– Increase sequence of max acceptable packet
Fall 2007
CSci232:
Transport Layer & TCP
49
Loss Recovery
• On reception of out-of-order packet
– Send nothing (wait for source to timeout)
– Cumulative ACK (helps source identify loss)
• Timeout (Go-Back-N recovery)
– Set timer upon transmission of packet
– Retransmit all unacknowledged packets
• Performance during loss recovery
– No longer have an entire window in transit
– Can have much more clever loss recovery
Fall 2007
CSci232:
Transport Layer & TCP
50
Go-Back-N in Action
Fall 2007
CSci232:
Transport Layer & TCP
51
Selective Repeat
• Receiver individually acknowledges all correctly
received pkts
– Buffers packets, as needed, for eventual in-order delivery to
upper layer
• Sender only resends packets for which ACK not
received
– Sender timer for each unACKed packet
• Sender window
– N consecutive seq #’s
– Again limits seq #s of sent, unACKed packets
Fall 2007
CSci232:
Transport Layer & TCP
52
Selective Repeat:
Sender, Receiver Windows
Fall 2007
CSci232:
Transport Layer & TCP
53
Sequence Numbers
• How large does size of sequence number space
need to be?
– Must be able to detect wrap-around
– Depends on sender/receiver window size
• E.g.
– size of seq. no. space = 8, send win=recv win=7
– If pkts 0..6 are sent succesfully and all acks lost
• Receiver expects 7,0..5, sender retransmits old 0..6!!!
•
size of sequence no. space must be  send window
+ recv window
Fall 2007
CSci232:
Transport Layer & TCP
54
Sequence Numbers in TCP
• TCP regards data as a “byte-stream”
– each byte in byte stream is numbered.
• 32 bit value, wraps around
• initial values selected at start up time
• TCP breaks up byte stream in packets.
– Packet size is limited to the Maximum Segment Size (MSS)
• Each packet has a sequence number
– seq. no of 1st byte indicates where it fits in the byte stream
• TCP connection is duplex
– data in each direction has its own sequence numbers
13450
14950
packet 8
Fall 2007
16050
packet 9
CSci232:
17550
packet 10
Transport Layer & TCP
55
TCP Seq. #’s and ACKs
Seq. #’s:
byte stream
“number”of first
byte in segment’s
data
Host B
Host A
User
types
‘C’
host ACKs
receipt of
‘C’, echoes
back ‘C’
ACKs:
seq # of next byte
expected from
other side
host ACKs
receipt
of echoed
‘C’
time
red: A-to-B
green: B-to-A
simple telnet scenario
Fall 2007
CSci232:
Transport Layer & TCP
56
TCP Reliable Data Transfer
• TCP creates reliable
data transfer service
on top of IP’s
unreliable service
• Pipelined segments
• Cumulative ACKs
• TCP uses single
retransmission timer
Fall 2007
CSci232:
• Retransmissions are
triggered by:
– timeout events
– duplicate acks
• Initially consider
simplified TCP sender:
– ignore duplicate acks
– ignore flow control,
congestion control
Transport Layer & TCP
57
TCP = Go-Back-N Variant
• Sliding window with cumulative acks
– Receiver can only return a single “ack” sequence number to the
sender.
– Acknowledges all bytes with a lower sequence number
– Starting point for retransmission
– Duplicate acks sent when out-of-order packet received
• But: sender only retransmits a single packet.
– Reason???
• Only one that it knows is lost
• Network is congested  shouldn’t overload it
• Error control is based on byte sequences, not
packets.
– Retransmitted packet can be different from the original lost
packet – Why?
Fall 2007
CSci232:
Transport Layer & TCP
58
TCP Sender Events:
data rcvd from app:
• Create segment with
seq #
• seq # is byte-stream
number of first data
byte in segment
• start timer if not
already running (think
of timer as for oldest
unacked segment)
• expiration interval:
TimeOutInterval
Fall 2007
CSci232:
timeout:
• retransmit segment
that caused timeout
• restart timer
ACK received:
• If acknowledges
previously unACKed
segments
– update what is known to
be ACKed
– start timer if there are
outstanding segments
Transport Layer & TCP
59
TCP ACK generation
[RFC 1122, RFC 2581]
Event at Receiver
TCP Receiver Action
Arrival of in-order segment with
expected seq #. All data up to
expected seq # already ACKed
Delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
Arrival of in-order segment with
expected seq #. One other
segment has ACK pending
Immediately send single cumulative
ACK, ACKing both in-order segments
Arrival of out-of-order segment
higher-than-expect seq. # .
Gap detected
Immediately send duplicate ACK,
indicating seq. # of next expected byte
Arrival of segment that
partially or completely fills gap
Immediate send ACK, provided that
segment starts at lower end of gap
Fall 2007
CSci232:
Transport Layer & TCP
60
TCP Flow Control
flow control
• receive side of TCP
connection has a
receive buffer:
sender won’t overflow
receiver’s buffer by
transmitting too much,
too fast
• speed-matching
service: matching the
send rate to the
receiving app’s drain
rate
• app process may be
slow at reading from
buffer
Fall 2007
CSci232:
Transport Layer & TCP
61
TCP Flow Control: How It Works
(Suppose TCP receiver discards
out-of-order segments)
• spare room in buffer
• Rcvr advertises spare
room by including value
of RcvWindow in
segments
• Sender limits unACKed
data to RcvWindow
– guarantees receive buffer
doesn’t overflow
= RcvWindow (dynamically
changes)
= RcvBuffer-[LastByteRcvd LastByteRead]
Fall 2007
CSci232:
Transport Layer & TCP
62
TCP Segment Structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
Fall 2007
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
counting
by bytes
of data
(not segments!)
rcvr window size
ptr urgent data
Options (variable length)
# bytes
rcvr willing
to accept
application
data
(variable length)
CSci232:
Transport Layer & TCP
63
Triggering Transmission
• How does TCP decide to transmit a
segment?
– MSS (Maximum segment size)
• Set to size of the largest segment TCP can send without
local IP fragmentation (MTU of directly connected)
– Sending process explicitly asked to do (Push to flush)
– Firing timer
• Silly Window Syndrome
– Flow control needs to be maintained
– Sender can transmit full segment (MSS) when Acked by
receiver
Fall 2007
CSci232:
Transport Layer & TCP
64
Silly Window Syndrome (cont’d)
– Window currently closed from receiver
– ACK opens MSS/2 bytes
– Should sender transmit MSS/2?
• Original TCP implementation silent
• Early implementation of TCP decided to go ahead
• Sender can not know when the window will open for full
MSS
– If sender is aggressive, sending available window size
• results Silly window syndrome
• small segment size remains indefinitely
– Hence a problem when either sender transmits a small
segment or receiver opens window a small amount
Fall 2007
CSci232:
Transport Layer & TCP
65
Triggering Transmission (cont’d)
– Receiver may delay ACKs, but how long?
– Ultimate solution lies with sender:
• When does the TCP sender decide to transmit a segment?
• Nagle’s Algorithm:
– Waiting too long hurt interactive applications (Telnet)
– Without waiting, risk of sending a bunch of tiny packets
(silly window syndrome)
– Wait till timer expires:
• Self clocking: As long as TCP has any data in flight, sender
receives an ACK which can be used to trigger transmission
• If no data in flight, immediately send the segment (setting
TCP_NoDElAY option)
Fall 2007
CSci232:
Transport Layer & TCP
66
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
• longer than RTT
– but RTT varies
• too short:
premature timeout
– unnecessary
retransmissions
• too long: slow
reaction to segment
loss
Fall 2007
Q: how to estimate RTT?
• SampleRTT: measured time
from segment transmission
until ACK receipt
– ignore retransmissions,
why?
• SampleRTT will vary, want
estimated RTT “smoother”
– average several recent
measurements, not just
current SampleRTT
CSci232:
Transport Layer & TCP
67
Round-trip Time Estimation
• Wait at least one RTT before retransmitting
• Importance of accurate RTT estimators:
– Low RTT estimate
• unneeded retransmissions
– High RTT estimate
• poor throughput
• RTT estimator must adapt to change in RTT
– But not too fast, or too slow!
• Spurious timeouts
– “Conservation of packets” principle – never more than a
window worth of packets in flight
Fall 2007
CSci232:
Transport Layer & TCP
68
Adaptive Retransmission
(Original Algorithm)
• Measure SampleRTT for each segment/ ACK
pair
• Compute weighted running average of RTT
– EstRTT = a x EstimatedRTT + (1-a) x SampleRTT
 a between 0.8 and 0.9 ( to smooth Estimated RTT)
- Small a indicates temp. fluctuation, a large value more stable,
may not be quick to adapt to real changes
• Set timeout based on EstRTT
– TimeOut = 2 x EstRTT
Fall 2007
CSci232:
Transport Layer & TCP
69
Retransmission Ambiguity
Sender
Receiver
SampleR TT
Receiver
SampleR TT
Sender
• ACK is for Original transmission but was for retransmission
=> Sample RTT is too large
• ACK is for retransmission but was for original => Sample RTT
too small
Fall 2007
CSci232:
Transport Layer & TCP
70
Karn/Partridge Algorithm
• Solution:
• Do not sample RTT when retransmitting
– only measures sample RTT for segments sent once
• Double timeout for each retransmission
– Next timeout to be twice the last timeout, rather than basing it on the
last Estimated RTT
• Karn and Patridge proposal is exponential backoff
– Congestion is most likely cause of lost segments
– TCP sources should not react too aggressively to a timeout
– More timeouts mean more cautious the source should become
(congestion problem)
Fall 2007
CSci232:
Transport Layer & TCP
71
Jacobson/ Karels Algorithm
• Original computation for RTT did not take the
variance of sample RTTs into account
– If variation among samples is small, Estimated RTT can be better
used without increasing the estimate twice
– A large variance in the samples mean Time out values should not be
too tightly coupled to the Estimated RTT
• New Calculations for average RTT
– Diff = SampleRTT - EstRTT
– EstRTT = EstRTT + (  x Diff)
– Dev = Dev +  ( |Diff| - Dev)
• where  is a fraction between 0 and 1
• Consider variance when setting timeout value
– TimeOut = m x EstRTT + f x Dev
• where m = 1 and f = 4
Fall 2007
CSci232:
Transport Layer & TCP
72
TCP Round Trip Time Estimation
EstimatedRTT = (1- a)*EstimatedRTT + a*SampleRTT
• Exponential weighted moving average
• influence of past sample decreases exponentially fast
• typical value: a = 0.125
Setting the timeout interval
• Estimted RTT plus “safety margin”
– large variation in EstimatedRTT -> larger safety
margin
• “safty margin”: accommodate variations in estimatedRTT
DevRTT = (1-)*DevRTT + *|SampleRTT-EstimatedRTT|
(typically,  = 0.25)
TimeoutInterval = EstimatedRTT + 4*DevRTT
Fall 2007
CSci232:
Transport Layer & TCP
73
Example RTT Estimation:
RTT: gaia.cs.umass.edu to fantasia.eurecom.fr
350
RTT (milliseconds)
300
250
200
150
100
1
8
15
22
29
36
43
50
57
64
71
78
85
92
99
106
time (seconnds)
SampleRTT
Fall 2007
CSci232:
Estimated RTT
Transport Layer & TCP
74
Timestamp Extension
• Used to improve timeout mechanism by
more accurate measurement of RTT
• When sending a packet, insert current time
into option
– 4 bytes for time, 4 bytes for echo a received timestamp
• Receiver echoes timestamp in ACK
– Actually will echo whatever is in timestamp
• Removes retransmission ambiguity
– Can get RTT sample on any packet
Fall 2007
CSci232:
Transport Layer & TCP
75
Timer Granularity
• Many TCP implementations set RTO
(Retransmission Timeout) in multiples of
200,500,1000ms
• Why?
– Avoid spurious timeouts – RTTs can vary quickly due to
cross traffic
– Make timers interrupts efficient
• What happens for the first couple of
packets?
– Pick a very conservative value (seconds)
Fall 2007
CSci232:
Transport Layer & TCP
76
Important Lessons
• TCP state diagram  setup/teardown
• TCP timeout calculation  how is RTT
estimated
• Modern TCP loss recovery
– Why are timeouts bad?
– How to avoid them?  e.g. fast retransmit
Fall 2007
CSci232:
Transport Layer & TCP
77
Fast Retransmit
• What are duplicate acks (dupacks)?
– Repeated acks for the same sequence
• When can duplicate acks occur?
– Loss
– Packet re-ordering
– Window update – advertisement of new flow control window
• Assume re-ordering is infrequent and not of large
magnitude
– Use receipt of 3 or more duplicate acks as indication of loss
– Don’t wait for timeout to retransmit packet
Fall 2007
CSci232:
Transport Layer & TCP
78
Fast Retransmit
X
Retransmission
Duplicate Acks
Sequence No
Packets
Acks
Fall 2007
Time
CSci232:
Transport Layer & TCP
79
TCP (Reno variant)
X
X
X
Now what? - timeout
X
Sequence No
Packets
Acks
Fall 2007
Time
CSci232:
Transport Layer & TCP
80
SACK
• Basic problem is that cumulative acks
provide little information
• Selective acknowledgement (SACK)
essentially adds a bitmask of packets
received
– Implemented as a TCP option
– Encoded as a set of received byte ranges (max of 4
ranges/often max of 3)
• When to retransmit?
– Still need to deal with reordering  wait for out of
order by 3pkts
Fall 2007
CSci232:
Transport Layer & TCP
81
SACK
X
X
X
X
Sequence No
Now what? – send
retransmissions as soon
as detected
Packets
Acks
Fall 2007
Time
CSci232:
Transport Layer & TCP
82
Download