Multiplexing/demultiplexing

advertisement
Transport services and protocols
network
data link
physical
❍
❍
2: Application Layer
Recall: segment - unit of data
exchanged between
transport layer entities
❍ aka TPDU: transport
protocol data unit
application-layer
data
segment
header
segment
receiver
M
M
Ht M
Hn segment
Multiplexing:
gathering data from multiple
app processes, enveloping
data with header (later used
for demultiplexing)
M
application
transport
network
multiplexing/demultiplexing:
❒ based on sender, receiver
port numbers, IP addresses
❍ source, dest port #s in
each segment
❍ recall: well-known port
numbers for specific
applications
P4
application
transport
network
P1
M
P2
application
transport
network
2: Application Layer
source port: x
dest. port: 23
server B
source port:23
dest. port: x
port use: simple telnet app
Web client
host A
Source IP: A
Dest IP: B
source port: x
dest. port: 80
❒ “no frills,” “bare bones”
Source IP: C
Dest IP: B
source port: x
dest. port: 80
Web
server B
port use: Web server
2: Application Layer
5
dest port #
other header fields
application
data
(message)
TCP/UDP segment format
UDP: User Datagram Protocol
Web client
host C
Source IP: C
Dest IP: B
source port: y
dest. port: 80
2
32 bits
source port #
2: Application Layer
3
Multiplexing/demultiplexing: examples
host A
2: Application Layer
Multiplexing/demultiplexing
Demultiplexing: delivering
received segments to
correct app layer processes
P3
application
transport
network
data link
physical
real-time
bandwidth guarantees
reliable multicast
1
Multiplexing/demultiplexing
network
data link
physical
rt
rt
relies on, enhances, network
layer services
❍
network
data link
physical
po
po
unordered unicast or
multicast delivery: UDP
❒ services not available:
application
transport
network
data link
physical
network
data link
physical
s
an
tr
❒ unreliable (“best-effort”),
d
en
den
❍
network
data link
physical
network
data link
physical
al
❍
network
data link
physical
congestion
flow control
connection setup
application
transport
network
data link
physical
c
gi
network
data link
physical
s
an
tr
❍
d
en
den
❒
❍
al
❒
c
gi
❒
Internet transport services:
❒ reliable, in-order unicast
delivery (TCP)
network
data link
physical
network
data link
physical
lo
❒
application
transport
network
data link
physical
lo
logical communication
between app’ processes
running on different hosts
transport protocols run in
end systems
transport vs network layer
services:
network layer: data transfer
between end systems
transport layer: data
transfer between processes
❒ provide
Transport-layer protocols
Internet transport
protocol
❒ “best effort” service, UDP
segments may be:
❍ lost
❍ delivered out of order
to app
❒ connectionless:
❍ no handshaking between
UDP sender, receiver
❍ each UDP segment
handled independently
of others
4
[RFC 768]
Why is there a UDP?
❒ no connection
establishment (which can
add delay)
❒ simple: no connection state
at sender, receiver
❒ small segment header
❒ no congestion control: UDP
can blast away as fast as
desired
2: Application Layer
6
1
UDP checksum
UDP: more
❒ often used for streaming
multimedia apps
❍ loss tolerant
❍ rate sensitive
❒
other UDP uses
(why?):
Goal: detect “errors” (e.g., flipped bits) in transmitted
segment
32 bits
Length, in
bytes of UDP
segment,
including
header
source port #
dest port #
length
checksum
DNS
❍ SNMP
❒ reliable transfer over UDP:
add reliability at
application layer
❍ application-specific
error recover!
Sender:
❒ treat segment contents
❍
as sequence of 16-bit
integers
❒ checksum: addition (1’s
complement sum) of
segment contents
❒ sender puts checksum
value into UDP checksum
field
Application
data
(message)
UDP segment format
2: Application Layer
❒ compute checksum of
received segment
❒ check if computed checksum
equals checksum field value:
❍ NO - error detected
❍ YES - no error detected.
But maybe errors
nonethless? More later ….
2: Application Layer
7
Principles of Reliable data transfer
Receiver:
Reliable data transfer: getting started
❒ important in app., transport, link layers
rdt_send(): called from above,
(e.g., by app.). Passed data to
deliver to receiver upper layer
❒ top-10 list of important networking topics!
deliver_data(): called by
rdt to deliver data to upper
send
side
udt_send(): called by rdt,
to transfer packet over
unreliable channel to receiver
❒ characteristics of unreliable channel will determine
complexity of reliable data transfer protocol (rdt)
2: Application Layer
receive
side
rdt_rcv(): called when packet
arrives on rcv-side of channel
2: Application Layer
9
Reliable data transfer: getting started
Pipelined protocols
We’ll:
❒ incrementally develop sender, receiver sides of
reliable data transfer protocol (rdt)
❒ consider only unidirectional data transfer
Pipelining: sender allows multiple, “in-flight”, yet-tobe-acknowledged pkts
❍
❒
8
❍
❍
10
range of sequence numbers must be increased
buffering at sender and/or receiver
but control info will flow on both directions!
use finite state machines (FSM) to specify
sender, receiver
event causing state transition
actions taken on state transition
state: when in this
“state” next state
uniquely determined
by next event
state
1
event
actions
state
2
2: Application Layer
11
❒
Two generic forms of pipelined protocols: go-Back-N,
selective repeat
2: Application Layer
12
2
GBN in
action
Go-Back-N
Sender:
❒ k-bit seq # in pkt header
❒ “window” of up to N, consecutive unack’ed pkts allowed
❒ ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”
❍
may deceive duplicate ACKs (see receiver)
❒ timer for each in-flight pkt
❒
timeout(n): retransmit pkt n and all higher seq # pkts in window
2: Application Layer
Selective Repeat
❒
Selective repeat: sender, receiver windows
buffers pkts, as needed, for eventual in-order delivery
to upper layer
sender only resends pkts for which ACK not
received
❍
❒
14
receiver individually acknowledges all correctly
received pkts
❍
❒
2: Application Layer
13
sender timer for each unACKed pkt
sender window
❍
❍
N consecutive seq #’s
again limits seq #s of sent, unACKed pkts
2: Application Layer
15
Selective repeat
sender
data from above :
timeout(n):
❒ in-order: deliver (also
ACK(n) in [sendbase,sendbase+N]:
❒ mark pkt n as received
❒ if n smallest unACKed pkt,
advance window base to
next unACKed seq #
2: Application Layer
18
Selective repeat in action
❒ send ACK(n)
❒ resend pkt n, restart timer
16
receiver
pkt n in [rcvbase, rcvbase+N-1]
❒ if next available seq # in
window, send pkt
2: Application Layer
❒ out-of-order: buffer
deliver buffered, in-order
pkts), advance window to
next not-yet-received pkt
pkt n in
[rcvbase-N,rcvbase-1]
❒ ACK(n)
otherwise:
❒ ignore
2: Application Layer
17
3
Selective repeat:
dilemma
TCP: Overview
Example:
❒
❒ seq #’s: 0, 1, 2, 3
point-to-point:
❍
❒ window size=3
❒
❒ receiver sees no
difference in two
scenarios!
❒ incorrectly passes
duplicate data as new
in (a)
❒
❒
send & receive buffers
application
reads data
TCP
send buffer
TCP
receive buffer
connection-oriented:
handshaking (exchange
of control msgs) init’s
sender, receiver state
before data exchange
❍
TCP congestion and flow
control set window size
application
writes data
bi-directional data flow
in same connection
MSS: maximum segment
size
❍
no “message boundaries”
❒
socket
door
full duplex data:
❍
pipelined:
❍
Q: what relationship
between seq # size
and window size?
❒
one sender, one receiver
reliable, in-order byte
steam:
❍
❒
RFCs: 793, 1122, 1323, 2018, 2581
socket
door
flow controlled:
sender will not
overwhelm receiver
❍
segment
2: Application Layer
TCP segment structure
source port #
counting
by bytes
of data
(not segments!)
dest port #
sequence number
ACK: ACK #
valid
acknowledgement number
head not
UA P R S F
len used
PSH: push data now
(generally not used)
checksum
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
rcvr window size
ptr urgent data
Options (variable length)
# bytes
rcvr willing
to accept
application
data
(variable length)
Internet
checksum
(as in UDP)
2: Application Layer
wait
wait
for
for
event
event
Seq. #’s:
❍ byte stream
“number” of first
byte in segment’s
data
ACKs:
❍ seq # of next byte
expected from
other side
❍ cumulative ACK
Q: how receiver handles
out-of-order segments
❍ A: TCP spec doesn’t
say, - up to
implementor
TCP:
reliable
data
transfer
simplified sender, assuming
•one way data transfer
•no flow, congestion control
event: timer timeout for
segment with seq # y
Simplified
TCP
sender
retransmit segment
event: ACK received,
with ACK # y
ACK processing
2: Application Layer
23
Host B
Host A
User
types
‘C’
Seq=
42, AC
K=79,
data
S eq=
host ACKs
receipt
of echoed
‘C’
79
= ‘C ’
= ‘C
data
=43,
, AC K
’
host ACKs
receipt of
‘C’, echoes
back ‘C’
Seq=4
3, ACK
=80
simple telnet scenario
2: Application Layer
21
TCP: reliable data transfer
event: data received
from application above
create, send segment
20
TCP seq. #’s and ACKs
32 bits
URG: urgent data
(generally not used)
2: Application Layer
19
time
22
00 sendbase = initial_sequence number
01 nextseqnum = initial_sequence number
02
03 loop (forever) {
04
switch(event)
05
event: data received from application above
06
create TCP segment with sequence number nextseqnum
07
start timer for segment nextseqnum
08
pass segment to IP
09
nextseqnum = nextseqnum + length(data)
10
event: timer timeout for segment with sequence number y
11
retransmit segment with sequence number y
12
compue new timeout interval for segment y
13
restart timer for sequence number y
14
event: ACK received, with ACK field value of y
15
if (y > sendbase) { /* cumulative ACK of all data up to y */
16
cancel all timers for segments with sequence numbers < y
17
sendbase = y
18
}
19
else { /* a duplicate ACK for already ACKed segment */
20
increment number of duplicate ACKs received for y
21
if (number of duplicate ACKS received for y == 3) {
22
/* TCP fast retransmit */
23
resend segment with sequence number y
24
restart timer for segment y
25
}
26
} /* end of loop forever */
2: Application Layer
24
4
TCP: retransmission scenarios
[RFC 1122, RFC 2581]
Host A
TCP Receiver action
Event
delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
in-order segment arrival,
no gaps,
one delayed ACK pending
immediately send single
cumulative ACK
out-of-order segment arrival
higher-than-expect seq. #
gap detected
send duplicate ACK, indicating seq. #
of next expected byte
arrival of segment that
partially or completely fills gap
immediate ACK if segment starts
at lower end of gap
2: Application Layer
timeout
in-order segment arrival,
no gaps,
everything else already ACKed
sender won’t overrun
receiver’s buffers by
transmitting too much,
too fast
RcvBuffer = size or TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
X
100
CK=
, 8 byte
s data
ACK
20 byt
e s da
Seq=92
, 8 byte
s data
AC
time
ta
0
10
K=
120
A C AC K=
=100
lost ACK scenario
, 8 byte
s data
Seq=
100,
20
K= 1
premature timeout,
cumulative ACKs
2: Application Layer
25
26
TCP Round Trip Time and Timeout
TCP Flow Control
flow control
, 8 byte
s data
A
Host B
Seq=92
loss
Seq=92
time
Host A
Host B
Seq=92
Seq=100 timeout
Seq=92 timeout
TCP ACK generation
receiver: explicitly
informs sender of
(dynamically changing)
amount of free buffer
space
❍ RcvWindow field in
TCP segment
sender: keeps the amount
of transmitted,
unACKed data less than
most recently received
RcvWindow
Q: how to set TCP
timeout value?
❒ longer than RTT
note: RTT will vary
❒ too short: premature
timeout
❍ unnecessary
retransmissions
❒ too long: slow reaction
to segment loss
❍
Q: how to estimate RTT?
❒ SampleRTT: measured time from
segment transmission until ACK
receipt
❍ ignore retransmissions,
cumulatively ACKed segments
❒ SampleRTT will vary, want
estimated RTT “smoother”
❍ use several recent
measurements, not just
current SampleRTT
receiver buffering
2: Application Layer
TCP Round Trip Time and Timeout
28
TCP Connection Management
Recall: TCP sender, receiver
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
establish “connection”
before exchanging data
segments
❒ initialize TCP variables:
❍ seq. #s
❍ buffers, flow control
info (e.g. RcvWindow)
❒ client: connection initiator
❒ Exponential weighted moving average
❒ influence of given sample decreases exponentially fast
❒ typical value of x: 0.1
Setting the timeout
❒ EstimtedRTT plus “safety margin”
❒ large variation in EstimatedRTT -> larger safety margin
Socket clientSocket = new
Socket("hostname","port
number");
Timeout = EstimatedRTT + 4*Deviation
❒
Deviation = (1-x)*Deviation +
x*|SampleRTT-EstimatedRTT|
2: Application Layer
2: Application Layer
27
server: contacted by client
Socket connectionSocket =
welcomeSocket.accept();
29
Three way handshake:
Step 1: client end system
sends TCP SYN control
segment to server
❍ specifies initial seq #
Step 2: server end system
receives SYN, replies with
SYNACK control segment
❍
❍
❍
ACKs received SYN
allocates buffers
specifies server->
receiver initial seq. #
2: Application Layer
30
5
TCP Connection Management (cont.)
TCP Connection Management (cont.)
Closing a connection:
Step 3: client receives FIN,
server
FIN
Step 1: client end system
close
Enters “timed wait” will respond with ACK
to received FINs
client
closing
modification, can handly
simultaneous FINs.
closed
FIN
closing
FIN
ACK. Connection closed.
Note: with small
ACK
server
ACK
Step 4: server, receives
FIN
timed wait
FIN, replies with ACK.
Closes connection, sends
FIN.
❍
ACK
sends TCP FIN control
segment to server
Step 2: server receives
replies with ACK.
timed wait
client closes socket:
clientSocket.close();
client
close
ACK
closed
closed
2: Application Layer
2: Application Layer
31
TCP Connection Management (cont)
32
Principles of Congestion Control
Congestion:
informally: “too many sources sending too much
data too fast for network to handle”
❒ different from flow control!
❒ manifestations:
❍ lost packets (buffer overflow at routers)
❍ long delays (queueing in router buffers)
❒ a top-10 problem!
❒
TCP server
lifecycle
TCP client
lifecycle
2: Application Layer
2: Application Layer
33
Causes/costs of congestion: scenario 1
34
Causes/costs of congestion: scenario 2
two senders, two
receivers
❒ one router,
infinite buffers
❒ no retransmission
❒
❒
❒
one router, finite buffers
sender retransmission of lost packet
large delays
when congested
❒ maximum
achievable
throughput
❒
2: Application Layer
35
2: Application Layer
36
6
Causes/costs of congestion: scenario 2
❒ always:
l
=
in
lout
Causes/costs of congestion: scenario 3
❒ four senders
(goodput)
l > lout
in
retransmission of delayed (not lost) packet makes l
in
(than perfect case) for same lout
❒ multihop paths
❒ “perfect” retransmission only when loss:
❒
❒ timeout/retransmit
larger
Q: what happens as l
in
and l increase ?
in
“costs” of congestion:
❒ more work (retrans) for given “goodput”
❒ unneeded retransmissions: link carries multiple copies of pkt
2: Application Layer
2: Application Layer
37
38
Approaches towards congestion control
Causes/costs of congestion: scenario 3
Two broad approaches towards congestion control:
End-end congestion
control:
❒ no explicit feedback from
network
❒ congestion inferred from
end-system observed loss,
delay
❒ approach taken by TCP
Another “cost” of congestion:
❒ when packet dropped, any “upstream transmission
capacity used for that packet was wasted!
2: Application Layer
to end systems
❍ single bit indicating
congestion (SNA,
DECbit, TCP/IP ECN,
ATM)
❍ explicit rate sender
should send at
2: Application Layer
40
TCP congestion control:
❒ end-end control (no network assistance)
❒ transmission rate limited by congestion window size, Congwin,
❒
over segments:
“probing” for usable
bandwidth:
❍
❍
Congwin
❍
w segments, each with MSS bytes sent in one RTT:
throughput =
❒ routers provide feedback
39
TCP Congestion Control
❒
Network-assisted
congestion control:
w * MSS
Bytes/sec
RTT
2: Application Layer
41
ideally: transmit as fast
as possible (Congwin as
large as possible)
without loss
increase Congwin until
loss (congestion)
loss: decrease Congwin,
then begin probing
(increasing) again
❒
two “phases”
❍
❍
❒
slow start
congestion avoidance
important variables:
❍
❍
Congwin
threshold: defines
threshold between two
slow start phase,
congestion control
phase
2: Application Layer
42
7
TCP Slowstart
Host A
initialize: Congwin = 1
for (each segment ACKed)
Congwin++
until (loss event OR
CongWin > threshold)
Congestion avoidance
one segme
nt
RTT
Slowstart algorithm
TCP Congestion Avoidance
Host B
two segm
/* slowstart is over
*/
/* Congwin > threshold */
Until (loss event) {
every w segments ACKed:
Congwin++
}
threshold = Congwin/2
Congwin = 1
perform slowstart 1
ents
four segme
nts
❒ exponential increase (per
RTT) in window size (not so
slow!)
time
❒ loss event: timeout (Tahoe
TCP) and/or or three
duplicate ACKs (Reno TCP)
2: Application Layer
AIMD
❒ Additive increase gives slope of 1, as throughout increases
❒ multiplicative decrease decreases throughput proportionally
TCP connection 1
TCP
connection 2
bottleneck
router
capacity R
2: Application Layer
equal bandwidth share
R
Connection 2 throughput
❍
44
Two competing sessions:
Fairness goal: if N TCP
sessions share same
bottleneck link, each
should get 1/N of link
capacity
increase window by 1
per RTT
decrease window by
factor of 2 on loss
event
2: Application Layer
Why is TCP fair?
TCP Fairness
TCP congestion
avoidance:
❒ AIMD: additive
increase,
multiplicative
decrease
❍
1: TCP Reno skips slowstart (fast
recovery) after three duplicate ACKs
43
loss: decrease window by factor of 2
congestion avoidance: additive increase
loss: decrease window by factor of 2
congestion avoidance: additive increase
Connection 1 throughput R
2: Application Layer
45
TCP latency modeling
TCP latency Modeling
46
K:= O/WS
Q: How long does it take to Notation, assumptions:
receive an object from a ❒ Assume one link between
client and server of rate R
Web server after sending
❒ Assume: fixed congestion
a request?
❒ TCP connection establishment
❒ data transfer delay
window, W segments
❒ S: MSS (bits)
❒ O: object size (bits)
❒ no retransmissions (no loss,
Two cases to consider:
no corruption)
❒ WS/R > RTT + S/R: ACK for first segment in
window returns before window’s worth of data
sent
❒ WS/R < RTT + S/R: wait for ACK after sending
2: Application Layer
window’s worth of data sent
Case 1: latency = 2RTT + O/R
47
Case 2: latency = 2RTT + O/R
+ (K-1)[S/R + RTT - WS/R]
2: Application Layer
48
8
TCP Latency Modeling: Slow Start
TCP Latency Modeling: Slow Start (cont.)
❒ Now suppose window grows according to slow start.
Example:
❒ Will show that the latency of one object of size O is:
Latency = 2 RTT +
O/S = 15 segments
O
Sù
S
é
+ P ê RTT + ú - ( 2 P - 1)
R
Rû
R
ë
K = 4 windows
where P is the number of times TCP stalls at server:
initiate TCP
connection
request
object
first window
= S/R
RTT
second window
= 2S/R
Q=2
third window
= 4S/R
P = min{K-1,Q} = 2
P = min{Q, K - 1}
Server stalls P=2 times.
fourth window
= 8S/R
- where Q is the number of times the server would stall
if the object were of infinite size.
- and K is the number of windows that cover the object.
complete
transmission
object
delivered
time at
server
time at
client
2: Application Layer
TCP Latency Modeling: Slow Start (cont.)
❒
initiate TCP
connection
S
= time to transmit the kth window
R
+
request
object
éS
k -1 S ù
êë R + RTT - 2 R úû = stall time after the kth window
first window
= S/R
RTT
second window
= 2S/R
third window
= 4S/R
latency =
P
O
+ 2 RTT + å stallTime p
R
p =1
fourth window
= 8S/R
P
O
S
S
+ 2 RTT + å [ + RTT - 2k -1 ]
R
R
k =1 R
O
S
S
= + 2 RTT + P[ RTT + ] - (2 P - 1)
R
R
R
=
50
Summary
S
+ RTT = time from when server starts to send segment
R
until server receives acknowledgement
2 k -1
2: Application Layer
49
principles behind
transport layer services:
❍
Next:
❍
❒ leaving the network
multiplexing/demultiplexing
reliable data transfer
❍ flow control
❍ congestion control
❒ instantiation and
implementation in the Internet
❍ UDP
❍ TCP
“edge” (application
transport layer)
❒ into the network “core”
complete
transmission
object
delivered
time at
client
time at
server
2: Application Layer
51
2: Application Layer
52
9
Download