User Datagram Protocol (UDP) UDP Header Format Transmission

advertisement
UDP Header Format
User Datagram Protocol (UDP)
Thin wrapper around IP services
• Service Model
0
– Unreliable unordered datagram service
– Addresses multiplexing of multiple connections
16
31
Destination Port
UDP Length
UDP Checksum
Length includes 8-byte header and data
Checksum
•
•
• Multiplexing
8
Source Port
– Uses IP checksum algorithm
– Computed on header, data and “pseudo header”:
– 16-bit port numbers (some are “well-known”)
• Checksum
0
– Validate header
– Optional in IPv4
– Mandatory in IPv6
8
16
31
Source IP Address
Destination IP Address
0
17 (UDP)
UDP Length
Transmission Control Protocol (TCP)
• Guaranteed delivery:
TCP
– Messages delivered in the order they were sent
– Messages delivered at most once
•
•
•
•
3
No limit on message size
Synchronization between sender and receiver
Multiple connections per host
Flow control
4
TCP
TCP vs. Direct Link
• Connection oriented
• Explicit connection setup requires
• RTT varies, depending on destination and network
condition
 adaptive approach to retransmission
• Packets
– Explicit setup and teardown required
• Byte stream abstraction
– No boundaries in data
– App writes bytes, TCP send segments, App receives bytes
• Full duplex
– Data flows in both directions simultaneously
– Point-to-point connection
– Delayed
– Reordered
– Late
• Implements congestion control
– Flow control: receiver controls sender rate
– Congestion control: network indirectly controls sender rate
5
TCP vs. Direct Link
6
TCP: Connection Stages
1. Connection setup
• Peer capabilities vary
– 3-way handshake
– Minimum link speed on route
– Buffering capacity at destination
2. Data transport: Sender writes data, and TCP…
– Breaks data into segments
– Sends segment in IP packets
– Retransmits, reorders and removes duplicates as
necessary
– Delivers data to receiver
 adaptive approach to window sizes
• Network capacity varies
– Other traffic competes for most links
 Requires global congestion control strategy
3. Teardown
– 4 step exchange
7
8
TCP Segment Header
TCP Segment Header Format
0
8
16
• 16-bit source and destination ports
• 32-bit send and ACK sequence numbers
• 4-bit header length (unit = 32 bits)
31
Source Port
Destination Port
Sequence Number
ACK Sequence Number
Header Length
0
Flags
Advertised Window
TCP Checksum
Urgent Pointer
Options
– Minimum 5 (20 bytes)
– Used as offset to first data byte
• 6 × 1-bit flags
–
–
–
–
–
–
Meta header
0
8
16
31
Source IP Address
Destination IP Address
0
16 (TCP)
TCP Segment Length
URG:
ACK:
PSH:
RST:
SYN:
FIN:
*Segment contains urgent data
ACK sequence number is valid
*Do not delay delivery of data
Reset connection (reject or abn. termination)
Synchronize segment for setup
Final segment for teardown
9
10
TCP Segment Header (cont.)
TCP Options
• Negotiate maximum segment size (MSS)
• 16-bit advertised window
– Each host suggests a value
– Minimum of two values is chosen
– Prevents IP fragmentation over first and last hops
– Space remaining in receive window
• 16-bit checksum
– Uses IP checksum algorithm
– Computed on header, data and pseudo header
• Packet timestamp
– Allows RTT calculation for retransmitted packets
– Extends sequence number space for identification of stray packets
• 16-bit urgent data pointer
• Negotiate advertised window scaling factor
– If URG = 1
– Index of last byte of urgent data in segment
– Allows larger windows: 64KB too small for routes with large
bandwidth-delay products
11
12
TCP: Data Transport
TCP Byte Stream
• Data broken into segments
– Limited by maximum segment size (MSS)
– Negotiable during connection setup
– Typically set to
• MTU of directly connected network – size of TCP and IP
headers
Application
process
Write
bytes
• Three events cause a segment to be sent
– At least MSS bytes of data ready to be sent
– Explicit PUSH operation by application
– Periodic timeout
Application
process
Read
bytes
TCP
TCP
Send buffer
Recv buffer
TCP Segment
… TCP Segment
TCP Segment
13
14
TCP SNs and ACKs
Seq. #’s:
– Count bytes, not
packets. First SN to
avoid insertion
Host A
User
types
‘C’
ACKs:
– SN of next byte
expected from other
side
– cumulative ACK
GBN: TCP spec doesn’t say
what to do with premature
packets - up to
implementation
TCP ACK rules
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
time
15
Event
TCP Receiver action
in-order segment arrival,
no gaps,
everything else already ACKed
delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
in-order segment arrival,
no gaps,
one delayed ACK pending
immediately send single
cumulative ACK
out-of-order segment arrival
higher-than-expect seq. #
gap detected
send duplicate ACK, indicating seq. #
of next expected byte
arrival of segment that
partially or completely fills gap
immediate ACK if segment starts
at lower end of gap
16
TCP: retransmission scenarios
X
loss
time
Host A
Host B
lost ACK scenario
Host B
Round-trip time (RTT)
Seq=100 timeout
Seq=92 timeout
timeout
Host A
TCP: Retransmission and Timeouts
Retransmission TimeOut (RTO)
Guard
Band
Host A
Estimated RTT
Data1
Data2
ACK
ACK
Host B
TCP uses an adaptive retransmission timeout value
Dynamic network (congestion, changes in routing)
=> RTT cannot be static
premature timeout,
cumulative ACKs
17
18
TCP: Retransmission and Timeouts
TCP: Retransmission and Timeouts
(Jacobson/Karels alg.)
RTO value is important:


too big: wait too long to retransmit a packet
too small: unnecessarily retransmit packets.
Newer algorithm estimates std. dev. of RTT:
1.
Original algorithm for picking RTO:
1. EstimatedRTT = α · EstimatedRTT + (1 - α) · SampleRTT
2.
2. RTO = 2 · EstimatedRTT
3.
4.
Characteristics of the original algorithm:


Std. dev. implicitly assumed to be bounded by RTT.
But if utilization = 75%, could have factor 16 between
“typical” (mean±2stdev) short and long RTTs
19
Diff = SampleRTT - EstimatedRTT
EstimatedRTT = EstimatedRTT + δ·Diff
(for some 0<δ<1)
Deviation = Deviation + δ ·( |Diff| - Deviation )
RTO = μ · EstimatedRTT + φ · Deviation
μ≈1
φ≈4
20
TCP Sliding Window Protocol – Sender
Side
TCP: Retransmission and Timeouts
(Karn’s Alg.)
Host A
Host B
Host A
Retransmission
Wrong RTT
Sample
Host B
•
•
•
LastByteAcked <= LastByteSent
LastByteSent <= LastByteWritten
Buffer bytes between LastByteAcked and LastByteWritten
Retransmission
Maximum buffer size
Wrong RTT
Sample
Advertised window
Problem: How to estimate RTT of retransmitted
packets?
Solution: Don’t! Also: double RTO.
Data available, but
outside window
First unacknowledged byte
Last byte sent
21
22
TCP Sliding Window Protocol –
Receiver Side
•
•
•
•
•
TCP Flow Control
Receiving side
LastByteRead < NextByteExpected
NextByteExpected <= LastByteRcvd + 1
Buffer bytes between NextByteRead and LastByteRcvd
Shrinks as data arrives and
Grows as the application consumes data
– Receive buffer size = MaxRcvBuffer
– LastByteRcvd - LastByteRead < = MaxRcvBuffer
– AdvertisedWindow = MaxRcvBuffer - (NextByteExpected NextByteRead)
•
•
Maximum buffer size
Shrinks as data arrives and
Grows as the application consumes data
Sending side
Advertised window
– Send buffer size = MaxSendBuffer
– LastByteSent - LastByteAcked < = AdvertisedWindow
– EffectiveWindow = AdvertisedWindow - (LastByteSent LastByteAcked)
• EffectiveWindow > 0 to send data
– LastByteWritten - LastByteAcked < = MaxSendBuffer
– block sender if (LastByteWritten - LastByteAcked) + y > MaxSenderBuffer
Buffered, out-of-order data
Next byte expected (ACK value)
Next byte to be read by application
23
24
TCP Flow Control
TCP Flow Control
• Problem: Slow receiver application
• Problem: Application delivers tiny pieces of data to TCP
– Advertised window goes to 0
– Sender cannot send more data
– Receiver may not spontaneously generate update or update may be
lost
– Sender gets stuck
– Example: telnet in character mode
– Each piece sent as a segment, returned as ACK
– Very inefficient
• Solution
• Solution
– Sender periodically sends 1-byte segment, ignoring advertised window
of 0
– Eventually window opens
– Sender learns of opening from next ACK of 1-byte segment
– Delay transmission to accumulate more data
– Nagle’s algorithm
• Send first piece of data
• Accumulate data until first piece ACK’ed
• Send accumulated data and restart accumulation
• Not ideal for some traffic (e.g. mouse motion)
25
TCP Flow Control
26
TCP Bit Allocation Limitations
• Problem: Slow application reads data in tiny pieces
• Sequence numbers vs. packet lifetime
– Receiver advertises tiny window
– Sender fills tiny window
– Known as silly window syndrome
– Assumed that IP packets live less than 60 seconds
– Can we send 232 bytes in 60 seconds?
– approx. 573Mbps: Less than an STS-12 line
• Solution
• Advertised window vs. delay-bandwidth
– Advertise window opening only when MSS or ½ of buffer is
available
– Sender delays sending until window is MSS or ½ of
receiver’s buffer (estimated)
27
– Only 16 bits for advertised window
– coast-coast RTT = 100 ms
– Adequate for only 5.24 Mbps!
28
TCP Sequence Numbers –
32-bit
Bandwidth
Speed
TCP Connection Establishment
3-Way Handshake
• Exchange initial sequence
numbers (j,k)
• Message Types
Time until wrap around
T1
1.5 Mbps
6.4 hours
Ethernet
10 Mbps
57 minutes
T3
45 Mbps
13 minutes
FDDI
100 Mbps
6 minutes
STS-3
155 Mbps
4 minutes
STS-12
622 Mbps
55 seconds
STS-24
1.2 Gbps
28 seconds
Client
Server
listen
– Synchronize (SYN)
– Acknowledge (ACK):
cumulative!
• Passive Open
– Server listens for connection
from client
• Active Open
Time flows down
– Client initiates connection to
server
29
30
TCP State Descriptions
TCP: Connection Termination
• Message Types
– Finished (FIN)
– Acknowledge (ACK)
Client
Server
• Active Close
– Sends no more data
• Passive close
– Accepts no more data
• Connection can be half
closed (one-way)
Time flows down
31
CLOSED
Disconnected
LISTEN
Waiting for incoming connection
SYN_RCVD
Connection request received
SYN_SENT
Connection request sent
ESTABLISHED
Connection ready for data transport
CLOSE_WAIT
Connection closed by peer
LAST_ACK
Connection closed by peer, closed locally, await ACK
FIN_WAIT_1
Connection closed locally
FIN_WAIT_2
Connection closed locally and ACK’d
CLOSING
Connection closed by both sides simultaneously
TIME_WAIT
Wait for network to discard related packets
TCP State Transition Diagram
TCP State Transition Diagram
Passive open
Close
SYN/SYN + ACK
SYN_RCVD
Close/FIN
Close/FIN
LISTEN
SYN/SYN + ACK
ESTABLISHED
ACK
FIN +
ACK/ACK
SYN + ACK/ACK
CLOSE_WAIT
CLOSING
FIN_WAIT_2
Close/FIN
LAST_ACK
ACK
TIME_WAIT
FIN/ACK
SYN_SENT
FIN/ACK
FIN/ACK
FIN_WAIT_1
– State transitions
• Describe the path taken by a server under normal conditions
• Describe the path taken by a client under normal conditions
• Describe the path taken assuming the client closes the
connection first
– TIME_WAIT state
• What purpose does this state serve
• Prove that at least one side of a connection enters this state
• Explain how both sides might enter this state
Close
Send/SYN
ACK
• Questions
Active
open/SYN
CLOSED
ACK
Timeout
CLOSED
33
TCP State Transition Diagram
Close
SYN/SYN + ACK
SYN_RCVD
Close/FIN
Close/FIN
LISTEN
ACK
FIN +
ACK/ACK
FIN/ACK
SYN_RCVD
SYN + ACK/ACK
Close
Close/FIN
Close/FIN
CLOSE_WAIT
ACK
LAST_ACK
TIME_WAIT
ACK
Timeout
FIN +
ACK/ACK
FIN/ACK
35
SYN_SENT
SYN + ACK/ACK
ESTABLISHED
FIN/ACK
CLOSE_WAIT
CLOSING
FIN_WAIT_2
CLOSED
SYN/SYN + ACK
FIN/ACK
FIN_WAIT_1
Close/FIN
ACK
Close
LISTEN
Send/SYN
ACK
FIN/ACK
CLOSING
FIN_WAIT_2
SYN_SENT
ESTABLISHED
FIN/ACK
FIN_WAIT_1
Passive open
SYN/SYN + ACK
SYN/SYN + ACK
Active
open/SYN
CLOSED
Close
Send/SYN
ACK
TCP State Transition Diagram
Active
open/SYN
CLOSED
Passive open
34
Close/FIN
LAST_ACK
ACK
TIME_WAIT
ACK
Timeout
CLOSED
36
Congestion
H1
Congestion
Control &
Avoidance
A1(t)
10Mb/s
R1
H2
D(t)
1.5Mb/s
H3
A2(t)
100Mb/s
A1(t)
A1(t)+A2(t)
A2(t)
Cumulative
bytes
A2(t)
D(t)
X(t)
A1(t)
X(t)
D(t)
37
TCP Congestion Control
t
38
Ideal steady state: self-clocking
Basic idea: control rate by window size.
• Average rate ≤ (window)/RTT
– Crude
• Add notion of congestion window
– Effective window is minimum of
Advertised window (flow control), and
Congestion window (congestion control)
39
40
TCP Congestion Control
Slow Start
• Start up phase: quickly find the correct rate
Destination
Source
• Objective: determine available capacity
• Idea:
– “Slow Start”
• Steady state: gently try to increase rate, back
off quickly when congestion detected
– Begin with cwnd = 1 packet
– Increment cwnd by 1 packet for
each ACK
– “Congestion Avoidance”
• Meaning: double every RTT!
…
• Phases are determined by the value of
variable ssthres
41
Slow Start Implementation
42
Slow Start Trace
When starting or restarting after timeout,
cwnd=1.
• On each ack for new segment, cwnd += segSize.
43
• Each “dot” is a 512B packet sent, y-axis is sequence
number, x-axis is time, straight line is 20 KBps of available
bandwidth.
• without ss: ~7KBps, with ss: ~19KBps
44
Host Solutions
Congestion is good?
• Q: How does the source determine
whether or not the network is
congested?
• A: Timeout signals packet loss
• Empty buffers => low delay, low utilization
• Full buffers => good utilizaion, but high
delay, potential loss
• Real question:
how much congestion is too much?
– Packet loss is rarely due to transmission error (on
wired networks)
– Lost packet implies congestion!
45
Congestion Avoidance
46
How to get to steady-state?
• Control vs. avoidance
• If overusing link => packet loss => decrease rate
• Why increase at all?
– Control: minimize impact of congestion when it occurs
– Avoidance: avoid producing congestion
– Must check all the time so in order not to leave
“dead” bandwidth; only indication is dropped
packets
• In terms of operating point limits
• Slow-start: multiplicative increase
optimal load
– Timeout: decrease to 1!
control
power
avoidance
idealized
power curve
• Symmetric multiplicative increase and decrease: strong
oscillation, poor throughput. “Rush-hour effect.”
load
47
48
Additive Increase/ Multiplicative
Decrease
Rush Hour Effect
• Easy to drive the
network into
saturation, but
difficult for the
network to recover.
• Analogy to rush hour
traffic
rate
Arrivals &
departures
Source
– Increment cwnd by one packet per
RTT
• Linear increase
– Divide CongestionWindow by
two whenever a timeout occurs
• Multiplicative decrease
Destination
…
Queue size
• Algorithm
50
AIMD: additive increase,
multiplicative decrease
• increase window by
1 per RTT
• decrease window
by factor of 2 on
loss event
Why AIMD?
Fairness goal: if N TCP sessions
share same bottleneck link,
each should get 1/N of link
capacity
Model: Two sessions compete for R bandwidth
underutilized &
unfair to 1
R
overutilized &
unfair to 1
overutilized &
unfair to 2
TCP connection 1
TCP
connection 2
desired
region
bottleneck
router
capacity R
underutilized &
unfair to 2
Conn 1 throughput
51
full utilization line
R
52
Model assumptions
• Sessions know if link is
overused (losses)
• Sessions don’t know
relative rates
• Simplification:
Sessions respond
simultaneously, and in
the same direction (both
increase or both
decrease)
AIMD Convergence
Additive Increase – up at 45º angle
R
(both connections add 1)
Multiplicative Decrease – down
R
toward the origin
X
pt. of convergence
full utilization
line
Conn 1 throughput
full utilization line
R
R
Conn 1 throughput
53
54
Convergence Avoidance Typical Trace
TCP Congestion Avoidance
• Trace: sawtooth behavior
• When a new segment is acked, the sender does the
following:
KB
– If (cwnd < ssthresh) cwnd += segSize
– else cwnd += segSize/cwnd
– (What happens when an ACK arrives for x new segments?)
• On timeout:
– ssthresh := cwnd/2
70
60
50
40
30
20
10
1.0
– cwnd := 1 (i.e., slow start)
2.0
3.0
4.0
5.0
6.0
7.0
8.0
9.0
10.0
Time (seconds)
55
56
Fast Retransmit and Fast Recovery
Sender
Problem: crude TCP timeouts lead
to idle periods, slow start is not
fast
Fast retransmit:
• use duplicate ACKs to trigger
retransmission
Fast recovery:
• skip slow start, go directly to half
the last successful cwnd (called
ssthresh)
TCP Congestion Control: summary
• Maintain threshold window size (“last good estimate”)
• Threshold value
Receiver
Packet 1
Packet 2
Packet 3
ACK 1
Packet 4
ACK 2
Packet 5
ACK 2
– Initially set to maximum window size
– Set to 1/2 of current window on timeout or 3 dup ACKs
• Congestion window drops to 1 on timeout, drops by half on 3
dup ACKs
• When congestion window smaller than threshold:
Packet 6
ACK 2
ACK 2
TIMEOUT!
Retransmit
packet 3
– Double window for each window ACK’d (multiplicative increase)
• When congestion window larger than threshold:
ACK 6
– Increase window by one MSS for each window ACK’d
• Try to avoid timeouts by fast retransmit
57
58
TCP Dynamics: Rate
TCP Congestion Window Trace
• TCP Reno
70
Congestion Window
60
timeouts
threshold
• Sending rate: Congwin*MSS / RTT
congestion
window
• Assume fixed RTT
50
W
fast retransmission
40
30
20
additive increase
W/2
10
slow start period

0
0
10
20
30
40
50
60
Actual Sending rate:

Time

59
between W*MSS / RTT and (1/2) W*MSS / RTT
Average (3/4) W*MSS / RTT
60
TCP Dynamics: Loss
Congestion Avoidance
• TCP’s strategy: increase load until congestion occurs, then
back off
• Loss rate (TCP Reno)
• Consider a cycle
• Alternative Strategy
W
– Predict when congestion is about to happen and reduce rate just
before packets start being discarded
• Two possibilities
– Some help from network:
W/2




• DECbit, RED
Total packet sent:
– Host-centric
about (3/8) W2 MSS/RTT = O(W2)
One packet loss
• TCP Vegas
Loss Probability: p=O(1/W2) or W=O(1/√p)
61
62
Download