Ch25 TCP: Reliable Transport Service

advertisement
Department of Engineering Science
ES465/CES 440, Intro. to Networking & Network Management
TCP- Reliable Transport Service
http://www.sonoma.edu/users/k/kujoory
References
• “Computer Networks & Internet,” Douglas Comer, 6th ed, Pearson, 2014, Ch 25,
Textbook, 5th ed, slides by Lami Kaya (LKaya@ieee.org) with some changes.
• “Computer Networks,” A. Tanenbaum, 5th ed., Prentice Hall, 2011, ISBN:
13:978013212695-3.
• “Computer & Communication Networks,” Nader F. Mir, 2nd ed, Prentice Hall, 2015, ISBN:
13: 9780133814743.
• “Data Communications Networking,” Behrouz A. Forouzan, 4th ed, Mc-Graw Hill, 2007
• “Data & Computer Communications,” W. Stallings, 7th ed., Prentice Hall, 2004.
• “Computer Networks: A Systems Approach," L. Peterson, B. Davie, 4th Ed., Morgan
Kaufmann 2007.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
1
Topics Covered
•
•
•
•
•
•
•
•
•
•
•
•
•
•
25.1 Introduction
25.2 The Transmission Control Protocol
25.3 The Service TCP Provides to Applications
25.4 End-to-End Service & Virtual Connections
25.5 Techniques That Transport Protocols Use
25.6 Techniques to Avoid Congestion
25.7 The Art of Protocol Design
25.8 Techniques Used in TCP to Handle Packet Loss
25.9 Adaptive Retransmission
25.10 Comparison of Retransmission Times
25.11 Buffers, Flow Control, & Windows
25.12 TCP's Three-Way Handshake
25.13 TCP Congestion Control
25.14 TCP Segment Format
Ali Kujoory
6/30/2016
Not to be reproduced without permission
2
25.1 Introduction
• This chapter
– considers transport protocols in general
– examines TCP
• the major transport protocol used in the Internet
– explains how the TCP protocol provides reliable delivery
– reviews the service that TCP provides to applications
– examines the techniques TCP uses to achieve reliability
Service is communication
between TCP & user
application & use
primitives (e.g., send,
receive, connect,
disconnect) in software.
Ali Kujoory
6/30/2016
Protocol is the standardized
communication between peer
entities (TCP-TCP)
An entity is a piece of
software that does the job.
Not to be reproduced without permission
3
25.2 The Transmission Control Protocol (TCP)
• Although IP is a best effort service (unreliable), TCP
software is designed & must:
– Guarantee prompt & reliable communication
– Deliver data in exactly the same order that it was sent
– Allow no loss or duplication
• In the TCP/IP suite, the TCP provides reliable transport
service
Ali Kujoory
6/30/2016
Not to be reproduced without permission
4
25.3 The Service TCP Provides to Applications
The service offered by TCP has the following features:
• Connection-Oriented
• Stream Interface
1. an application must first request
a connection to a destination
2. transfer data in order &
3. terminate connection gracefully
• Point-to-Point Communication
– an application sends a
continuous sequence of octets
– it does not group data into
records or messages
• Reliable Connection Startup
– TCP allows the two applications
to reliably start communication
– each TCP connection has
exactly two endpoints
• Graceful Connection Shutdown
• Complete Reliability
– TCP guarantees that the data
sent across a connection will be
delivered completely & in order
– TCP insures that both sides have
agreed to shut down the
connection
• Full Duplex Communication
– allows data to flow in either
direction
Ali Kujoory
6/30/2016
Not to be reproduced without permission
5
Overview of OSI & TCP/IP Protocol Suites
OSI Stack
Application
Presentation
Session
Transport
Network
TCP/IP Stack
File
Transfer
Protocol
(FTP)
RFC 959
Simple Mail TELNET
Hypertext
Transfer
(Terminal
Transfer
Protocol Emulation)
Protocol
(SMTP)
RFC 854,
(HTTP)
RFC 821861
RFC 2616
822
..
Transmission Control Protocol (TCP)
RFC 793, 1122, 1323
Address Resolution
ARP RFC 826
RARP RFC 903
Simple
Domain
Network
Name
Video
Management
Syatem
and
Protocol
(DNS)
Voice
(SNMP)
RFC 1034- over IP
RFC 14411035
1452
User Datagram Protocol
(UDP), RFC 768
Internet Protocol (IP)
RFC 791
Internet Control Message
Protocol (ICMP) RFC 792
Data Link
Network Interface Cards:
Ethernet, Token Ring, RFC 894, RFC 1042, RFC 1231
Physical
Transmission Media:
Twisted Pair, Coax, or Fiber Optics
Notes:
 Other applications: Security, ..
 Trivial FTP (TFTP, RFC 783) runs over UDP
 Original RFCs are shown. For updated RFCs go to http://www.ietf.org
Ali Kujoory
6/30/2016
Not to be reproduced without permission
6
25.4 End-to-End Service & Virtual Connections
• TCP is classified as an end-to-end protocol
– It provides communication between an application on one
computer to an application on another computer
• The connections in TCP are called virtual connections
– because connections are achieved in software
• TCP software modules on two machines exchange
messages to achieve the illusion of a connection
– TCP provides the reliable delivery service for the application
• TCP uses IP to carry messages
– IP treats each TCP message as data to be transferred
– IP provides TCP the delivery service
• Fig. 25.1 illustrates how TCP views the Internet
– TCP software is needed at each end of a virtual connection
• but not on intermediate routers
Ali Kujoory
6/30/2016
Not to be reproduced without permission
7
25.4 End-to-End Service & Virtual Connections
Figure 25.1 Illustration of how TCP views the underlying Internet.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
8
25.5 Techniques That Transport Protocols Use
• An end-to-end transport
protocol must be carefully
designed to achieve
efficient & reliable
transfer
• The major problems /
issues to be considered,
such as:
1. Unreliable Communication
by IP beneath
• Messages sent across the
Internet can be
⌐ lost, duplicated,
corrupted, delayed, or
delivered out of order
Ali Kujoory
6/30/2016
2. End System Reboot &
• either of the two end
systems might crash &
reboot
3. Heterogeneous End
Systems (Windows/Apple)
• Different OSs, e.g.,
Windows vs Apple
• Speed: a sender can
generate data so fast that
it overruns a slow
receiver
4. Congestion in the Internet
• If senders aggressively
transmit data
⌐ intermediate switches &
routers can become
overrun
Not to be reproduced without permission
9
25.5 Techniques That Transport Protocols Use
• There are techniques that
communication systems
use to overcome some of
the problems, e.g.,
– to compensate for bits that
are changed during
transmission (error detection)
• a protocol might include parity
bits
• a checksum, or
• a cyclic redundancy check
(CRC)
Ali Kujoory
6/30/2016
• Transport protocols do
more than detect errors
– they employ techniques that
can repair or circumvent
problems (error correction)
• Transport protocols use a
variety of tools
– to handle some of the most
complicated communication
problems
• The next sections discuss
basic mechanisms
Not to be reproduced without permission
10
25.5.1 Sequencing Handle Duplicates & Out-of-Order Delivery
• To handle duplicate packets &
out-of-order deliveries
– transport protocols use
sequencing
• The sender attaches a
sequence # to each packet
• The receiver stores both the
sequence # of the last packet
received in order, as well as
– a list of additional packets that
arrived out of order
• The receiver examines the
sequence # to determine how the
packet should be handled
• If the packet is the next one
expected (i.e., has arrived in
order) the
– protocol software delivers the
packet to the next highest layer
– protocol checks its list to see
whether additional packets can
also be delivered
• If the packet has arrived out of
order
– the protocol software adds the
packet to the list
• If the packet has already been
delivered or the seq # matches
one of the packets waiting on
the list, the
– software discards the new copy
Ali Kujoory
6/30/2016
Not to be reproduced without permission
11
25.5.2 Retransmissions Handle Lost Packets
• To handle packet loss
– transport protocols use positive
acknowledgement (ACK) with
retransmission
• Whenever a frame arrives
intact
– the receiver sends a small ACK
message that reports
successful reception
• Sender ensures that each
packet is transferred
successfully
• Whenever it sends a packet the
sender starts a timer
• If an acknowledgement arrives
before the timer expires
• If the timer expires before an
acknowledgement arrives
– the protocol sends another copy
of the packet & starts the timer
again
• Sending a second copy is
known as retransmitting
– retransmission cannot succeed if
a hardware failure has
permanently disconnected the
network or if the receiving
computer has crashed
– there is a bound for the
maximum # of retransmissions
– if bound exceeded, the
destination will be declared
unreachable
– the software cancels the timer
Ali Kujoory
6/30/2016
Not to be reproduced without permission
12
25.5.3 Techniques Avoid Replay
• Extraordinarily long delays
can lead to replay errors
• E.g., consider the following
sequence of events
– Assume two computers agree
to communicate at 1 PM
– One computer sends a
sequence of 10 packets to
the other
– A hardware problem causes
packet 3 to be delayed due to
• routes change to avoid the
hardware problem
• Protocol software on the
sending computer retransmits
packet 3 & sends the remaining
packets without error
Ali Kujoory
6/30/2016
– At 1:05 PM the two
computers agree to
communicate again
– After the second packet
arrives, the delayed copy of
packet 3 arrives from the
earlier conversation
– Packet 3 arrives from the
second conversation
• A packet from an earlier
conversation might be
accepted &
– the correct packet discarded as a
duplicate
Not to be reproduced without permission
13
25.5.3 Techniques Avoid Replay
• Replay errors can also occur
with control packets
• Consider a situation in which
two application programs form
a TCP connection,
communicate, close the
connection, & then form a new
connection
– The message of closing the
connection might be duplicated &
one copy might be delayed long
enough for the second
connection to be established
• To prevent replays, protocols
mark each session with a
unique ID (e.g., the time the
session was established), &
– require the unique ID to be
present in each packet
• The protocol discards any
arriving packet that contains an
incorrect ID
• An ID must not be reused until
a reasonable time has passed
• A protocol should be designed
so that the duplicate message
will not cause the second
connection to be closed
Ali Kujoory
6/30/2016
Not to be reproduced without permission
14
25.5.4 Flow Control Prevents Data Overrun
• Techniques are available
to prevent a fast computer
from sending so much
data to overrun a slower
receiver
– Flow control techniques are
employed to handle the
problem
• The simplest form of flow
control is a stop-and-go
– a sender waits after
transmitting each packet
– when the receiver is ready for
another packet, the receiver
sends a control message,
usually a form of ACK
Ali Kujoory
6/30/2016
– stop-and-go protocols result
in extremely low throughput
• Another flow control
technique known as
sliding window
– The sender & receiver use a
fixed window size
• which is the maximum amount
of data that can be sent before
an acknowledgement arrives
– The sender retains a copy in
case retransmission is
needed
– The receiver must have preallocated buffer space
Not to be reproduced without permission
15
25.5.4 Flow Control Prevents Data Overrun
• If a packet arrives in
sequence, the receiver
– passes the packet to the
receiving application &
– transmits an ACK to the
sender
• When an ACK arrives, the
sender
– discards its copy of the
ACKed packet &
– transmits the next packet
• Fig. 25.2 illustrates
sliding window
mechanism
Ali Kujoory
6/30/2016
• Sliding window can
increase throughput
dramatically
• Compare the sequence of
transmissions with a stopand-go scheme & a sliding
window scheme
• Fig. 25.3 contains a
comparison for a 4-packet
transmission in either
case
Not to be reproduced without permission
16
25.5.4 Flow Control Prevents Data Overrun
Figure 25.2 Illustration of a sliding window
(a) in initial, (b) intermediate, & (c) fixed position.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
17
25.5.4 Flow Control Prevents Data Overrun
Figure 25.3 Comparison of a transmission using (a) stop-and-go, &
(b) sliding window.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
18
25.5.4 Flow Control Prevents Data Overrun
• Tg is the throughput that can
be achieved with a stop-andgo protocol
• To understand the
significance of sliding
window
– imagine an extended
communication that involves
many packets
– For such networks, a sliding
window protocol can increase
performance substantially.
– The potential improvement is:
where
• W is the window size
• Tw is the throughput that can
be achieved with a sliding
window protocol
Ali Kujoory
6/30/2016
• Throughput cannot be
increased arbitrarily, by
just increasing the window
size
– The bandwidth of the
underlying network imposes
an upper bound;
• bits cannot be sent faster than
the hardware can carry them
– The equation can be rewritten
(B is the underlying
bandwidth):
Not to be reproduced without permission
19
25.6 Techniques to Avoid Congestion
• How easily can congestion occur?
• Consider case in Fig. 25.4
Figure 25.4 Four hosts connected by two switches.
• Assume each connection in Fig. 25.4 operates at 1 Gbps
• Consider what happens if both computers attached to switch1
attempt to send data to a computer attached to switch2
– Switch1 receives data at an aggregate rate of 2 Gbps, but can only
forward 1 Gbps to switch2
– This situation is known as congestion
Ali Kujoory
6/30/2016
Not to be reproduced without permission
20
25.6 Techniques to Avoid Congestion
• Congestion results in delay
• If congestion persists
– the switch will run out of memory & begin discarding packets
• Retransmission can be used to recover lost packets
– But retransmission sends more packets into the network
• If the situation persists, network can become unusable
– this condition is known as congestion collapse
Ali Kujoory
6/30/2016
Not to be reproduced without permission
21
25.6 Techniques to Avoid Congestion
• In the Internet, congestion usually occurs in routers
• Transport protocols attempt to avoid congestion collapse
– by monitoring the network & reacting quickly once congestion starts
• There are two basic approaches:
1. Arrange for intermediate systems (i.e., routers) to inform a sender
when congestion occurs implemented either by:
• having routers send a special message to the source of packets when
congestion occurs, or by
• having routers set a bit in the header of each packet that experiences
delay caused by congestion
2. Use increased delay or packet loss as an estimate of congestion
• Implemented by the computer that receives the packet including
information in the ACK to inform the original sender
⌐ It takes however a long delay before the original sender is informed
Ali Kujoory
6/30/2016
Not to be reproduced without permission
22
25.6 Techniques to Avoid Congestion
• Using delay & loss to estimate congestion is
reasonable in the Internet because:
– Modern network hardware works well
– Most delay & loss results from congestion, not hardware failures
• The appropriate response to congestion
– Reducing the rate at which packets are being transmitted
– Sliding window protocols can achieve the effect of reducing the
rate by temporarily reducing the window size
Ali Kujoory
6/30/2016
Not to be reproduced without permission
23
25.7 The Art of Protocol Design
• Techniques needed to solve specific problems are wellknown, but protocol design is nontrivial, because:
1st, Protocol details must be chosen carefully
• Small design errors can result in incorrect operation,
unnecessary packets, or delays, e.g.,
• If sequence #s are used, each packet must contain a sequence
# in the packet header
• The field must be large enough so sequence #s are not reused
frequently, but small enough to avoid wasting unnecessary
bandwidth
2nd, Protocols can interact in an unexpected way, e.g.,
• Consider the interaction between flow control & congestion
control mechanisms
Ali Kujoory
6/30/2016
Not to be reproduced without permission
24
25.7 The Art of Protocol Design
• A sliding window scheme uses more of the network
bandwidth to improve throughput
• A congestion control mechanism does the opposite
– It reduces the # of packets being inserted to prevent the network
from collapsing
• Computer system reboot poses another serious
challenge to transport protocol design
– Imagine a situation where two applications
•
•
•
•
establish a connection
begin sending data, & then the computer receiving data reboots
software on the rebooted computer has no knowledge of a connection
protocol software on the sending computer considers the connection
valid
– If a protocol is not designed carefully
• a duplicate packet can cause a computer to incorrectly create a
connection & begin receiving data in midstream
Ali Kujoory
6/30/2016
Not to be reproduced without permission
25
25.8 Techniques Used in TCP to Handle Packet Loss
• Which techniques does TCP use to achieve reliability?
– The answer is complex
• because TCP uses a variety of schemes that are combined in novel
ways
• TCP uses retransmission to compensate for packet loss
• TCP provides data flow in both directions
– both sides of a communication participate in retransmission
– when TCP receives data, it sends an ACK back to the sender
• Whenever it sends data
– TCP starts a timer, & retransmits the data if the timer expires
• TCP retransmission operates as Fig. 25.5 illustrates
Ali Kujoory
6/30/2016
Not to be reproduced without permission
26
25.8 Techniques Used in TCP to Handle Packet Loss
Timer starts
Timer resets
Timer restarts
Timer resets
Timer restarts
Timer starts
Timer resets
Figure 25.5 Illustration of TCP retransmission after a packet loss.
• TCP's retransmission is the key to its success
– because it handles communication across an arbitrary path
• TCP must be ready to retransmit any message loss
Ali Kujoory
6/30/2016
Not to be reproduced without permission
27
25.8 Techniques Used in TCP to Handle Packet Loss
• How long should TCP wait
before retransmitting?
• TCP faces a difficult
challenge:
– ACKs from a computer on a LAN
are expected to arrive within a
few ms
– but a satellite connection
requires hundreds of ms
• On one hand
– waiting too long for such an
ACK leaves the network idle &
does not maximize throughput
• On the other hand
– retransmitting quickly does not
work well on a satellite
connection
• because the unnecessary traffic
consumes network bandwidth &
lowers throughput
Ali Kujoory
6/30/2016
– Bursts of datagrams can cause
congestion
• which causes transmission delays
along a given path to change
rapidly
– The total time required to send a
message & receive an ACK can
increase
• Since TCP handles multiple
apps to communicate among
multiple computers at multiple
destinations concurrently &
traffic conditions
– TCP must handle a variety of
delays that can change rapidly
Not to be reproduced without permission
28
25.9 Adaptive Retransmission
• Before TCP was invented
– transport protocols used a
fixed value for
retransmission delay, &
– protocol designers or network
managers chose a value that
was large enough for the
expected delay
• TCP designers realized
that a fixed timeout
would not operate well for
the Internet
– Thus, they chose to make
TCP's retransmission
adaptive
– TCP monitors current delay
on each connection
• It adapts (changes) the
retransmission timer
accordingly
Ali Kujoory
6/30/2016
Not to be reproduced without permission
29
25.9 Adaptive Retransmission
• How can TCP monitor Internet delays?
• TCP cannot know the exact delays
– TCP estimates round-trip delay for each active connection
• By measuring the time needed to receive a response
• TCP records the time at which the message was sent
• When a response arrives
– TCP subtracts the time the message was sent from the current
time to produce a new estimate of the round-trip delay for that
connection
• As it sends data packets & receives ACKs
–
–
–
–
TCP generates a sequence of round-trip estimates
It uses a statistical function to produce a weighted average
TCP keeps an estimate of the variance
It uses a linear combination of the estimated mean & variance to
compute estimated time
Ali Kujoory
6/30/2016
Not to be reproduced without permission
30
25.9 Adaptive Retransmission
• TCP adaptive
retransmission works well
• Using the variance helps
TCP react quickly
– when delay increases
following a burst of packets
• Using a weighted average
helps TCP reset the
retransmission timer
– if the delay returns to a lower
value after a temporary burst
Ali Kujoory
6/30/2016
• When the delay remains
constant
– TCP adjusts the
retransmission timeout to a
value that is slightly longer
than the mean round-trip
delay
• When delays start to vary
– TCP adjusts the
retransmission timeout to a
value greater than the mean
to accommodate peaks
Not to be reproduced without permission
31
25.10 Comparison of Retransmission Times
• How does adaptive
retransmission help TCP to
maximize throughput on
each connection?
– consider a case of packet
loss on two connections that
have different round-trip
delays
• If the delay is small
– TCP uses a small timeout
• Goal: wait long enough to
determine that a packet was
lost without waiting longer
than necessary
• Fig. 25.6 illustrates traffic on
such two connections
– TCP sets the retransmission
timeout to be slightly longer
than the mean round-trip
delay
• If the delay is large
– TCP uses a large
retransmission timeout
Figure 25.6 Timeout & retransmission of two TCP
connections that have different round –trip delays.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
32
25.11 Buffers, Flow Control, & Windows
• TCP uses a window mechanism to control the flow of
data
• Unlike the simplistic packet-based window scheme
described above
– a TCP window is measured in bytes
• When a connection is established
– each end of the connection allocates a buffer
• to hold incoming data & sends the size of the buffer to the other end
• As data arrives
– receiving TCP sends ACKs, which specify the remaining buffer
size
• Window refers to the buffer space available at any time
– a notification that specifies the size of the window is known as a
window advertisement
– a receiver sends a window advertisement with each ACK
Ali Kujoory
6/30/2016
Not to be reproduced without permission
33
25.11 Buffers, Flow Control, & Windows
• If the receiver can read data as quickly as it arrives
– a receiver will send a positive window advertisement along with
each ACK
• If the sender operates faster than the receiver
– incoming data will eventually fill the receiver's buffer
– causing the receiver to advertise a zero (0) window
• A sender that receives a zero window advertisement
– must stop sending
• until the receiver again advertises a positive window
• Fig. 25.7 illustrates window advertisements
Ali Kujoory
6/30/2016
Not to be reproduced without permission
34
25.11 Buffers, Flow Control, & Windows
Figure 25.7 A sequence of messages that illustrates TCP window
advertisements for a maximum segment size of 1000 bytes.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
35
25.12 TCP's Three-Way Handshake
• To establish or terminate connections reliably
– TCP uses a 3-way handshake
• in which three messages are exchanged
• During the 3-way handshake to start a connection
– each side sends a control message that specifies
• an initial buffer size (for flow control) &
• a sequence #
• TCP's 3-way exchange is necessary & sufficient to
ensure unambiguous agreement
– despite packet loss, duplication, delay, & replay events
• The handshake insures that TCP
– will not open or close a connection until both ends have agreed
Ali Kujoory
6/30/2016
Not to be reproduced without permission
36
25.12 TCP's Three-Way Handshake
• Term synchronization segment (SYN segment)
– to describe the control messages used in a 3-way handshake to
create a connection
• Term FIN segment (short for finish segment)
– to describe control messages used in a 3-way handshake to close
a connection
• Fig. 25.8 illustrates the 3-way handshake to create a
connection
• A key aspect of the 3-way handshake is
– the selection of sequence #s
– TCP requires each end to generate a random 32-bit sequence #
that becomes the initial sequence
Ali Kujoory
6/30/2016
Not to be reproduced without permission
37
25.12 TCP's Three-Way Handshake
• Connection Establishment
Figure 25.8 The 3-way handshake used to create a TCP connection.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
38
25.12 TCP's Three-Way Handshake
• If an application attempts to
establish a new TCP
connection after a computer
reboots
• So TCP avoids replay
problems
– TCP chooses a new random #
• The probability of selecting a
random value that matches the
sequence used on a previous
connection is low
– The sequence #s on the new
connection will differ from the
sequence #s used on the old
connection
• The 3-way handshake uses FIN
segments to close
– An ACK is sent in each direction
along with a FIN to guarantee
that all data has arrived before
the connection is terminated
• Fig. 25.9 illustrates the
exchange
Ali Kujoory
6/30/2016
Not to be reproduced without permission
39
25.12 TCP's Three-Way Handshake
• Connection Termination
Figure 25.8 The 3-way handshake used to close a connection.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
40
25.13 TCP Congestion Control
• Congestion control is one
of the most interesting
mechanisms in TCP
• In the Internet, delay or
packet loss is more likely to
be caused by congestion
than a hardware failure
• Retransmission can
worsens the problem of
congestion
– by injecting additional copies
of a packet
• To avoid congestion
collapse
– It responds to congestion by
reducing the rate at which it
retransmits data
• Although we think of
reducing the rate of
transmission,
– TCP does not compute a data
rate, instead,
– TCP bases transmission on
buffer size, i.e.,
• the receiver advertises a
window size, &
• the sender can transmit data
to fill the receiver's window
before an ACK is received
– TCP uses changes in delay as
a measure of congestion
Ali Kujoory
6/30/2016
Not to be reproduced without permission
41
25.13 TCP Congestion Control
• To control the data rate
– TCP imposes a restriction
on the window size
• By temporarily reducing
the window size at the
receiving TCP,
– the sending TCP effectively
reduces the data rate
• TCP can achieve a
reduction in data rate
– by temporarily reducing the
window size
• In the extreme case where
loss occurs
– TCP temporarily reduces the
window to one-half of its
current value
• TCP uses a special
congestion control
mechanism when starting
a new connection or when
a message is lost
– instead of transmitting
enough data to fill the
receiver's buffer
• TCP begins by sending a
single message containing
data
Ali Kujoory
6/30/2016
Not to be reproduced without permission
42
25.14 Versions of TCP Congestion Control
• If an acknowledgement
arrives without additional
loss, TCP
– doubles the amount of data
being sent, &
– sends two additional
messages
• If both acknowledgements
arrive
– TCP sends 4 messages, & so
on
• The exponential
increase continues
– until TCP is sending ½ of the
receiver's advertised window
Ali Kujoory
6/30/2016
• When ½ of the original
window size is reached
– TCP slows the rate of
increase &
– increases the window size
linearly
• as long as congestion does
not occur
• The approach is known as
slow start
• TCP's congestion control
mechanisms respond well
to increases in traffic by
– backing off quickly, TCP is
able to alleviate congestion
Not to be reproduced without permission
43
25.15 Other Variations: ACK & ECN
• TCP uses a single format for
all messages
– including messages that carry
data, those that carry ACKs, &
– messages that are part of the 3way handshake used to create or
terminate a connection (SYN &
FIN)
• TCP uses the term segment to
refer to a message
• TCP segment format (next slide
or Fig. 25.10)
• A TCP connection contains two
streams of data
– one flowing in each direction
• ECN (Explicit Congestion
Notification)
Ali Kujoory
6/30/2016
• If the applications at each end
are sending data simultaneously
TCP can send a single segment
that carries
– Outgoing data
– ACK for incoming data, &
– A window advertisement that
specifies the amount of additional
buffer space available for
incoming data
• Some of the fields in the
segment refer to
– the data stream traveling in the
forward direction
– while other fields refer to the data
stream traveling in the reverse
direction
Not to be reproduced without permission
44
TCP Segment Format* (Substitutes Fig. 25.10)
• Source/destination (S/D) port
0
– Identifies service S/D access points.
– Port #s < 1024, well-known ports,
• Used for standard services.
Destination Port
segment
Acknowledgment Number
Header Reserved
Length (4 bits) Flags (8 bits)
– A piggybacked ACK.
– Contains seq # of the next octet that
Transport Entity expects to receive.
– Every byte is numbered in a TCP
stream.
• TCP Header Length, 4 bits
– # of 32-bit words in header
including options.
• Flags (Code bits), 8 bits
– CWR, ECE, URG, ACK, PSH, RST,
SYN, FIN
* (https://en.wikipedia.org/wiki/Transmission_Control_Protocol)
6/30/2016
31
Sequence Number
• ACK #
Ali Kujoory
15
Source Port
• Seq #
– Identifies seq # of the 1st data octet
in this segment.
– Except when SYN is present.
– If SYN is present, it is the Initial Seq
#, ISN (the first data octet is ISN + 1)
7
Checksum
Window
Urgent Pointer
Options
Padding
User Data
…….
• When a flag is set to 1:
– CWR - Congestion Window Reduced
– ECN - Explicit Congestion notification
Echo
– URG - Urgent pointer in use for Urgent
data (e.g., Delete or Control C)
– ACK - Ack number valid (piggy back ack)
– PSH – send control data in receiving app
– RST - Reset connection abruptly
– SYN - To establish connection
– FIN - To release connection
Not to be reproduced without permission
45
TCP Segment Format (2)
• Window
• Checksum
– Flow control credit allocation.
– Contains # of data octets beginning
with the one indicated in ACK field
which sender is willing to accept.
– Window = 0
• Indicates that no buffers are available
though it can ACK a segment.
• Urgent Pointer
– When URG bit is set, e.g., when we
interrupt or abort a session.
– Byte offset from current seq # at
which urgent data are to be found.
– Max TCP payload size to negotiate.
• Default payload size = 536 bytes
– Selective Acknowledgement lets
receiver tell sender ranges of seq #s
that it has received.
• Padding
– All zero bytes to make the header
round # of 32-bit words.
Ali Kujoory
6/30/2016
• Protocol # = 6 for TCP + byte count for
TCP segment (including header).
– UDP uses the similar pseudo header
for its checksum.
Pseudo TCP Segment
Header
• Options - several options, e.g.,
– Checksums header, data, and a
conceptual pseudo-header.
– Algorithm: add up all 16-bit words in
1’s complement and then take the 1’s
complement of the sum.
– Pseudo header checking helps
detect misdelivered packets.
32 bits
IP Source address
IP Destination address
000000 protocol segment length
TCP Header
//
//
Options
//
User Data
Not to be reproduced without permission
//
46
Appendix
Ali Kujoory
6/30/2016
Not to be reproduced without permission
47
TCP Connection Establishment - 3-way Handshake (1)
primitives
protocols
TCP A
Client A (User A)
CLOSED initially
primitives
TCP B
ACTIVE OPEN
Server B(User B)
CLOSED initially
PASSIVE OPEN
OPEN ID
OPEN ID
SYN, ISN=100. mss=1024, win=4096
SYN, ACK, ISN=500, AN=101, mss=1024, win=4096
OPEN SUCCESS
SYN, ACK, SN=101, AN =501, win=4096
OPEN SUCCESS
• Server B passively waits for incoming connection.
– By executing PASSIVE OPEN and OPEN SUCCESS primitives.
– OPEN ID provides connection name, OPEN SUCCESS reports completion of OPEN.
• Client A executes an ACTIVE (CONNECT) primitive by specifying
– IP address and port of Server B,
– Max TCP segment size it is willing to accept, and,
– Optionally some user data (e.g., a password).
• TCP A sends a TCP segment with seq # 100 (as ISN = Initial Seq. #).
– SYN bit = 1, ACK bit = 0, mss = Max segment size = 1024.
– It then waits for a response.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
48
TCP Connection Establishment - 3-way Handshake (2)
primitives
protocols
TCP A
Client A (User A)
CLOSED initially
primitives
TCP B
ACTIVE OPEN
Server B(User B)
CLOSED initially
PASSIVE OPEN
OPEN ID
OPEN ID
SYN, ISN=100. mss=1024, win=4096
SYN, ACK, ISN=500, AN=101, mss=1024, win=4096
OPEN SUCCESS
SYN, ACK, SN=101, AN =501, win=4096
OPEN SUCCESS
• SYN segment uses 1 byte of seq space so it can be ACKed unambiguously.
• When SYN arrives at B, TCP B checks to see if it is for the port listening.
– If not, it sends a reply with RST bit=1 to reject the connection.
– The process can in turn accept or reject the incoming segment.
• If TCP B accepts the TCP segment.
– it will send an ACK back which is responded by another ACK by A.
Notes:
• This is a full duplex connection and the seq numbering is independent in each direction.
• Both endpoints must agree to participate and exchange rules for exchange during “3way handshake”.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
49
TCP Data Transfer Simplified
primitives
Client A (User A)
protocols
TCP A
SEND 50
TCP B
SN=101, AN=501, ACK, DATA(50)
primitives
Server B(User B)
RECEIVE 50
SN=501, AN=151, ACK
SN=501, AN=151, ACK, DATA(1000)
RECEIVE 1000
SEND 1000
SN=151, AN=1501, ACK
• Client A issues a SEND primitive with 50 bytes of data.
• TCP A issues a data segment with 50 bytes.
• TCP B ACKs the segment and sends a RECEIVE primitive to its higher
layer.
• Some time later, TCP B is asked to send 1000 bytes to TCP A.
Note:
• The data could be buffered before being transmitted to side A or if the PUSH bit was
set (PUSH indication in SEND request), it would force a segment to be transmitted
immediately.
• TCP A ACKs the segment and sends a RECEIVE primitive to its higher layer.
• ACKs can be cumulative & ack several packets.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
50
TCP Normal (Graceful) Close Simplified
protocols
primitives
Client A has no
More data to send
TCP A
CLOSE
primitives
TCP B
SN=151, AN=1501, FIN, ACK
User B
CLOSE
SN=1501, AN=152, ACK
RECEIVE 1000
SN=1501, AN=152, ACK, DATA(1000)
SN=2501, AN=152, FIN, ACK
TERMINATE
SN=152, AN=2502, FIN, ACK
SEND 1000
CLOSE
TERMINATE
• Full-duplex TCP can be thought of a pair of simplex connections.
– Each simplex connection is released independently in each direction.
•
•
•
•
•
User A sends a CLOSE primitive to TCP A.
Lifetime = The time a
To release the connection, TCP sends a TCP segment with FIN Segment
bit=1.
segment may stay in the network.
When FIN is ACKed, that direction is shut down.
Data may continue to follow in the other direction.
When both directions have been shut down, the connection is terminated.
– One FIN and one ACK in each direction.
• If a response to a FIN is not received within 2 Max segment lifetime, sender of the FIN
releases the connection.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
51
Data Stream Push
• When application passes data to
TCP, TCP may send it immediately
or buffer it.
– Ordinarily, TCP TE decides when
sufficient data has accumulated to
form a TPDU for transmission.
• How if the application wanted the
data to be sent immediately?
– E.g., interactive game
• Push is a notification from sender
to receiver to pass all the data that
it has to the receiving process.
– It avoids waiting for full buffers.
• Push is a data labeling facility
among TCP Services, a marker to
delineate message boundaries.
– TCP user can require transmission of
all data up to push flag.
– Receiver will deliver in same manner.
Ali Kujoory
6/30/2016
• TCP user can acquire TE to
transmit all outstanding data,
– up to and including that labeled with
a PUSH flag.
• On the receiving end,
– TCP TE will deliver these data to
user in the same manner.
• User might request this if it has
come to a logical break in the data.
• Push is used for interactive users
– User expects instant response for
each stroke to force delivery of octets
currently in the stream without
waiting for buffer to fill.
• Remote login (TELNET)
• Windows & Linux us
TCP_NODELAY.
TE = Transport Entity
TPDU = Transport Protocol Data Unit
Not to be reproduced without permission
52
Urgent Data Signal
• Another data labeling facility
among TCP services.
– When application has priority data
that needs to be processed
immediately, e.g., hitting CTRL-C.
• Indicates urgent data is upcoming
in stream.
• Provides a means to inform
destination TCP user
– that significant or “urgent” data is in
the upcoming data stream.
• When DEL or CTRL-C keys are hit
to break off a remote computation
that has already begun
– The application puts some control
info in the data stream and gives it to
TCP with the URGENT flag.
Ali Kujoory
6/30/2016
• TCP stops accommodating data
and transmits everything it has for
that connection immediately.
• When urgent data are received at
destination, the receiving
application is interrupted.
– So it can stop whatever it was doing
and read the data stream to find the
urgent data.
• The end of urgent data is marked
so the application knows when it is
over.
• The start of the urgent data is not
marked.
– It is up to the application to figure it
out, a crude signaling mechanism.
• Urgent data signal is rarely used.
Not to be reproduced without permission
53
TCP Congestion Control
• When the load offered to a
network is more than it can handle
– Routers buffers are filled up &
congestion builds up.
• Although IP process tries to
manage congestion,
– TCP process needs to slow down
sending rate at the source by
manipulating window size
dynamically.
– TCP job is to provide end-to-end
reliability & avoid packet losses.
• Bit errors are generally taken care by
the datalink.
• TCP uses AIMD in response to
binary congestion signals to control
the bandwidth.
• Basic rule is “Do not inject a new
packet into the network until the
old one is delivered”.
AIMD = Additive Increase Multiplicative Decrease
Ali Kujoory
6/30/2016
• TCP sender maintains 2 windows:
– credit = Window the receiver has
granted (flow control).
– cwnd = Congestion window, network
capacity.
• Each window reflects # of bytes
the sender may transmit.
• In steady state on a non-congested
connection
credit = cwnd
• During congestion, define
allow_win = MIN (credit, cwnd)
Min of the two windows
Allowed window = # of bytes that may be sent
– If sender cwnd = 32kB, & receiver
offers credit=64kB, sender will send
only 32kB.
– If sender cwnd = 80kB, & receiver
offers credit=64kB, sender will send
only 64kB.
Not to be reproduced without permission
54
TCP Congestion Control (2)
• Congestion window controls the sending rate.
– Sender transmission rate = cwnd / RTT; window can stop sender quickly.
• Consider sender sends 4 packets over a fast link=100Mbps to the router that
is connected to a slow link = 1Mbps.
• Packets arrive the router quickly, buffered in router, & come to receiver.
• Receiver sends ACKs & are received at about the rate over the slow link.
• This will be the rate the sender will use to send packets & not queue in
router.
• This timing is called ACK clock (regular receipt of ACKs).
– The rate that paces traffic & smoothes out sender bursts.
ACKs pace new segments into the network and smooth bursts.
A burst of packets from a sender and the returning ACK clock.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
55
TCP Congestion Control - Slow Start (3)
• If we use AIMD for congestion control, it can be shown
that AIMD would be very slow for the transmissions to
reach the right speed.
• Consider a path supporting 10 Mbps with RTT = 100
msec.
cwnd = congestion window = Bandwidth-delay product =
10Mbps x 100msec = 1Mbits = 100 packets of 1250 bytes.
• So if cwnd starts at 1 packet = 10000 bits & increases
every RTT, it will take 100 RTTs =100 x 100msec = 10
sec to reach cwnd.
• This would be too long and unacceptable.
• Jacobson proposed a mix of linear and multiplication to
solve the problem.
– Called slow start technique, RFC 3390
– Would provide an efficient solution.
AIMD = Additive Increase Multiplicative Decrease
Ali Kujoory
6/30/2016
Not to be reproduced without permission
56
TCP Congestion Control - Slow Start (4)
• When a connection is
established,
• The congestion window grows
exponentially until, either
– sender initializes congestion
window to size of max segment
in use on the connection.
– It sends one max segment
allw_win =1
• If this segment is ACKed before
the timer goes off,
– sender doubles the congestion
window & sends 2 segments.
• Then, if these segments are
ACKed in time,
– sender doubles the congestion
window again.
Ali Kujoory
6/30/2016
– a timeout occurs, or
– the receiver’s window is reached.
• E.g., if burst size 1024, 2048,
4096 bytes works fine,
– but 8192 gives timeout, then
congestion window is set to 4096
to avoid congestion.
• Called the slow-start algorithm
by Jacobson.
– Really exponential - not slow.
– All TCP implementations are
required to support it.
Not to be reproduced without permission
57
TCP Slow Start (5)
• Slow Start algorithm works for initializing a connection,
when
– TCP sender finds a reasonable window size for the connection.
• Very easy to drive a network into saturation, but hard for
it to recover.
• Once congestion occurs, it takes a long time for
congestion to clear.
• Under slow start, exponential growth of cwnd may
worsen the congestion.
• So Jacobson made a modification to this, next slide.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
58
TCP Slow Start (6)
Van Jacobson, LBNL, proposed use of slow start to begin with;
followed by a linear growth in cwnd as follows:
1. Start the TCP connection with an initial threshold, i.e.,
a) Slow_start_threshold = flow control window (e.g. 32 KB), and
b) cwnd (congestion window) = Max segment size (MSS).
2. Use slow-start (i.e., increase window exponentially every
RTT), till the network can handle it.
3. When the threshold is hit, stop increasing exponentially.
4. Increase the cwnd linearly (additive increase) by one max
segment size that is acknowledged (successful transmission).
5. When there is a packet loss and a timeout occurs,
a) Set new_threshold = (current cwnd) / 2, and
b) Reset cwnd = MSS (does not cause loss).
6. Go to step 2.
• See RFC 2001, “TCP Slow Start, Congestion Avoidance, Fast Retransmit, and Fast Recovery,” and
RFC 3390, “ Increasing TCP's Initial Window“. Other algorithms: HighSpeed TCP (RFC 3649), TCP
Friendly Rate Control (RFC 3448). LBNL = Lawrence Berkeley National Laboratory
Ali Kujoory
6/30/2016
Not to be reproduced without permission
59
Example - TCP Slow Start (7)
• Slow start grows congestion window exponentially.
– Doubles every RTT while keeping ACK clock going.
Increment cwnd for each new ACK
ACK
Slow start from an initial congestion
window of one segment.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
60
Example - TCP Slow Start (8)
• Additive increase grows cwnd slowly.
– Adds 1 every RTT.
– Keeps ACK clock.
ACK
Additive increase from an initial
congestion window of one segment.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
61
Example - TCP Slow Start (9)
TCP Tahoe (4.2BSD) - Assume that cwnd=64 initially & after timeout, threshold
is set to 32 kB, & cwnd = max segment size = 1 kB. Let us follow the steps:
1. Let MSS = 1 kB, slow start begins, window is increased every time a new ACK arrives.
5a. After the 1st timeout (packet loss), threshold is set to 1/2 x 40 current threshold=20 kB
3. After cwnd hits threshold
(32 kB), it grows linearly to 40 kB
5. After timeout due to
packet loss, reset & slow
start and stop ACK clock.
5a. After timeout, set Threshold
= 1/2 current cwnd = 20 kB
4. cwnd
grows linearly
2. Slow start,
cwnd grows
exponentially
5b. cwnd = 1 K
= Max segment
Slow start followed
Slow start,
by2.additive
increase
cwnd
grows
in TCP Tahoe.
2b. Transmission 0,
cwnd = 1 kB = MSS
exponentially
BSD = Berkeley Software Distribution (various UNIX flavors), widely used by Sun Microsystem & DEC
Ali Kujoory
6/30/2016
Not to be reproduced without permission
62
TCP Congestion Control (10)
• Jacobson further improved the congestion control, TCP
Reno,
– Named after 4.3BSD Reno in 1990.
• For faster recovery, use sawtooth (linear) AIMD after a
packet loss.
– Retransmit lost packet after 3 duplicate ACKs.
– New packet for each duplicate ACK until loss is repaired.
The ACK clock doesn’t stop, so no
need to slow-start
Ali Kujoory
6/30/2016
Fast recovery and
the sawtooth pattern
of TCP Reno.
Not to be reproduced without permission
63
TCP Congestion Control (11)
• TCP uses AIMD with loss signal to control congestion.
– Implemented as a congestion window (cwnd) for the number of
segments that may be in the network.
– Uses several mechanisms that work together.
Name
Mechanism
Purpose
ACK clock
Congestion window (cwnd)
Smooth out packet bursts
Slow-start
Double cwnd each RTT
Rapidly increase send rate to
reach roughly the right level
Additive
Increase
Increase cwnd by 1 packet
each RTT
Slowly increase send rate to
probe at about the right level
Fast
retransmit
/ recovery
Resend lost packet after 3
duplicate ACKs; send new
packet for each new ACK
Recover from a lost packet
without stopping ACK clock
AIMD = Additive Increase Multiplicative Decrease
Ali Kujoory
6/30/2016
Not to be reproduced without permission
64
TCP Congestion Control (12)
• SACK (Selective ACKs) extends ACKs with a vector to describe
received segments and hence losses.
– A negotiable option that allows more accurate retransmissions.
– A later improvement for more efficient recovery in congestion control,
(RFC 2883 & 3517); SACK is now widely used.
Selective Acknowledgement.
No way for us to know that packets 2
and 5 were lost with only ACKs
• Still another addition to alert the hosts for congestion is ECN (Explicit
Congestion Notification) using the ECN mechanism in IP packet.
– A router informs the receiver via ECN flag in the IP packet that congestion
is approaching & the receiver echos back to the sender in the TCP ACKs.
Ali Kujoory
6/30/2016
Not to be reproduced without permission
65
Retransmission Timer Management
• Jacobson (1988) proposed a dynamic algorithm.
• When a segment is sent, timer RTO (chosen > RTT) starts to see
– How long the ACK takes, and
– Trigger a transmission if it takes too late
• Let
SRTT = α * SRTT + (1 - α) R
Estimated RTT for updates
Current RTT
R = Measured time for
ACK to come back
Smoothing factor, typically ~= 7/8
• Then
RTO = b * RTT
• Initial implementation used b ~=2, i.e., RTO = 2 RTT, but this was
too inflexible & resulted in unexpected retransmissions.
• Jacobson proposed RTTVAR = β RTTVAR + (1 - β) |SRTT - R|
with RTTVAR = RTT variation, β = ¾, &
RTO =SRTT + 4 x RTTVAR
Implement 4x by shift operation
• This gave acceptable RTO & is easy to implement.
RTO = Retransmission Timeout, SRTT = Smoothed Round Trip Time, pdf = probability density function
Ali Kujoory
6/30/2016
Not to be reproduced without permission
66
Summary
• Transport layer provides cost
effective end-to-end data
transport (source to destination
– Connection-oriented (reliable), or
– Connectionless (datagram)
services.
• UDP, independent datagram
– Unreliable
– used in Network Management &
Real-time applications.
Ali Kujoory
6/30/2016
• TCP for reliable transport
provides
– TCP uses a 20-byte header
– Accessible with service
primitives
– Allow segmentation
– Allow multiplex/demultiplex
multiple processes
– Implements several timers
– 3-way handshake connection
setup
– Error correction by
Retransmission
– TCP flow control with variable
sliding windows (credit).
– TCP congestion control by
Bandwidth allocation
Not to be reproduced without permission
67
Download