CS514: Intermediate Course in Operating Systems Lecture 4: Sept. 5

advertisement
CS514: Intermediate
Course in Operating
Systems
Professor Ken Birman
Ben Atkin: TA
Lecture 4: Sept. 5
TCP Streams
• One week ago we very briefly saw
how TCP overcomes failures
• TCP is the workhorse of the Internet
– Under load, many routers drop all nonTCP traffic first!
– This is because TCP is a “good citizen”
– Every web operation uses its own TCP
connection!
• Today: look at TCP in more detail
• In what sense is the Internet “the
bottom half of the TCP protocol”?
TCP is a “stream”
protocol
• Basic concepts
• Implementation issues, usual
optimizations
• Where are the costs?
• Van Jacobson optimizations for
TCP
• Routers: RED, RSVP and RIO
• Reliability and consistency
Streams concept
• Reliable, point-to-point
communication channel
• Like a telephone connection:
– Information received in order sent, no
loss or duplication
– Call setup required before
communication is possible (in contrast
with basic message transport via UDP)
– No message structure: abstraction is a
stream of bytes
• Automatic flow control, error
correction
TCP sliding window
sender provides data
window has k “segments”
initially empty
initially empty
receiver consumes data
TCP sliding window
sender provides data
window has k “segments”
mi
IP packets carry segments
still empty...
receiver consumes data
TCP sliding window
sender provides data
window has k “segments”
mi+1 mi
receiver replies with
acks and nacks. sender
resends missing data
IP packets carry segments
mi
receiver consumes data
TCP sliding window
sender provides data
window has k “segments”
mi+k mi+k-1 ....
receiver replies with
acks and nacks. sender
resends missing data
mi
-
IP packets carry segments
- mi+k-2 - mi+k-3 ...
mi
receiver consumes data
TCP sliding window
sender provides data
window has k “segments”
mi+k mi+k-1 ....
receiver replies with
acks and nacks. sender
resends missing data
mi+k+1
-
When acknowledgement is
received, segment number
keeps incrementing but slot
number is reused.
IP packets carry segments
- mi+k-2 - mi+k-3 ...
mi
receiver consumes data
TCP sliding window
sender provides data
window has k “segments”
mi+k mi+k-1 ....
receiver replies with
acks and nacks. sender
resends missing data
mi
-
IP packets carry segments
- mi+k-2 - mi+k-3 ...
mi
receiver consumes data
Typical implementation
issues?
• When to send the ack
– Send early: inefficient, channel clogged with
acks
– Send late: sender side fills window and waits
• When to send the nack
– Send early: sender will send duplicates of all
msgs
– Send late: long delay waiting for desired data
• How big to make the window
• Send messages in “bursts”?
Where are the costs?
• Excess packets sent/received: very costly
– Hence want minimal number of acks,
nacks
– Also want to avoid excess
retransmissions
• Notice “tension” between sending
acks/nacks too soon, and retransmission
too soon, and between doing so too late.
– Too soon: consumes bandwidth
– Too late: leaves processes idle
Costs (cont)
• Delays on sender side:
– Overheads associated with scheduling
(e.g. if window fills up
• Avoiding “nervous” scheduling:
– Highwater/lowater mark scheme lets
sender sit idle until there are several
window slots free
– Ideally, seek window size at which
sender, receiver are rate matched and
neither ever waits
Costs (cont)
• Delays on receiver side
– Want a large enough window so that any
error correction is “in the future” for
receiver
– Don’t want to delay nacks too long (else
retransmission delayed too long)
• Nervous scheduling less of an issue
here
– Don’t use hiwater/lowater scheme in
receiver
Timed approach
• Measure round-trip time (e.g. perhaps 1ms)
• Track rate of transmission for recent past
• Use to calibrate various constants:
– Nack if a missing packet is late by 50% of
expected time
– Calibrate window to be 50-75% full in steady
state
• Experience: very hard to make it work;
variability in network load/latencies too big
Van Jacobson
optimizations
• Dynamically adjust window size:
while no loss detected, repeatedly
increase size (linearly)
• Detect loss: halve size
(“exponential” backoff)
• Experience is very positive, many
TCP’s use this
• Also optimize to supress unchanging
header fields
Dealing with failures
• Packets lost, duplicated, out of order:
easy, just use sequence numbers (TCP
calls these “segment” numbers)
• Sender or receiver fails, or line breaks:
– After excessive retransmissions, or
– After excessive wait for missing data, or
– After not seeing “keepalives” for too
long
... break the connection and report “end of
file”
Problems with this
approach?
• Channel can break because of a
transient condition!
• Example: overloaded machine,
connection that temporarily fails,
router crashes and must reboot
itself (all are relatively common
conditions)
• Systems with many TCP channels:
some may break but others stay
connected!
Inconsistently broken
TCP channels
primary
backup
Clients initially connected to primary, which keeps
backup up to date. (For example, in a database system)
Inconsistently broken
TCP channels
primary
backup
Transient problem causes some links to break but not all.
Backup thinks it is now primary, primary thinks backup is down
Inconsistently broken
TCP channels
primary
backup
Some clients still connected to primary, but one has switched
to backup and one is completely disconnected from both
Why should this matter?
• Suppose that primary and backup
are a service used for air traffic
control
• Service tells controllers which parts
of airspace are “available” for
routing flights towards airport
• Primary and backup may try and give
different controllers access to the
same airspace! Each thinks it is “in
charge” for the system as a whole!
Subtle semantics
questions
• Are the “reliability semantics” of TCP
actually different from those of RPC?
– In both cases, what you “know” is limited to
what has been explicitly acknowledged
– Both can report “failures” when none has
occured
– Ultimately, TCP and RPC give same
guarantees!
• Many systems run RPC over TCP as the
“reliable” RPC option. Is this different
from normal RPC?
Reliability/Consistency
summary
• TCP connections can overcome loss
of individual packets in
communication layer
• RPC protocols also overcome such
loss
• Both report failures inconsistently
• Not clear how either could be used
to implement a “safe” primarybackup server for our ATC example!
TCP and Router Issues
Overload!
Server
TCP and Router Issues
• Designers of routers need to deal
with overload
• Very hard to predict!
– Most studies show that load on routers
is nearly random
– At any point in time, most load is from
some set of TCP connections
– Goal: TCP flow control should kick in
before router is totally overloaded
Some options?
• We could just wait for the overload
to go away
– Eventually load will presumably drop
– Or routes will adapt
• But this could take a long time
– Jahanian study: can take hours for route
changes to propagate
– Usually, however, routes adapt within a
few minutes if better options exist
Some options?
• The router could send some sort of
“I’m getting overloaded” message
back to the TCP sender
– This could be done: TCP packets are
recognizable by their IP headers
– But it might be slow and when one
router is overloaded, perhaps many are –
potential for a storm of such messages
– Also seems to violate end-to-end
philosophy
• Can we signal without extra msgs?
TCP and Router Issues
Overload!
Server
Ouch!
Slow down
Some options
• Also, keep in mind that at any
point in time, a router might be
handling thousands of TCP
connections!
– The networks crowd calls them
“flows”
– So the router, faced with load,
might have to send thousands of
separate “ouch” messages!
Some options?
• What about adding a bit to the
TCP/IP header: “encountered an
overloaded router”
– Router would set the bit if it was
overloaded
– But during overload, packets often
must be dropped
– Also, the bit would be seen at the
destination… not the sender
TCP and Router Issues
Overload!
Ouch!
Server
Some options
• Can we detect problems without
extra messages?
– Sender might notice a problem because
of NACKs
– Receiver could notice
• Missing packets
• Changing inter-packet spacing (assumes that
TCP normally achieves a very regular
spacing, which only happens under good
conditions)
– Problem is that by the time we notice
these things, the router may be in deep
doo-doo!
Random Early Detection
• Work was done by Van Jacobson
with Sally Floyd
• They used a network simulator to
understand how it would work.
– You can use simulators too, for your
projects
– The best one is from Estrin’s group and
is called NS-2 (widely used to evaluate
network protocols)
• Abbreviated as RED
Random Early Detection
• Idea is very simple
– Router senses that load is increasing
• It simply notices that it has less available
memory for buffering
• This is because packets are entering faster
than they can be forwarded
– Picks a packet at random and discards it
• Even though perhaps it could be forwarded
• Takes “unreliability” to a new level!
• E.g. “upper level of the bridge is crowded, so
toss a few cars off the edge”
Random Early Detection
• Receiver detects the loss and sends
a NACK
• The network isn’t completely
overloaded yet so the NACK gets
through
• Sender chokes back
• Often combined with flow control
that senses changing inter-packet
spacing
How Internet
Companies think of the
Network
Layers 1-3
How Internet
Companies think of the
Network
TCP
Layer 4: “end-to-end”
Layers 1-3
Server
“ease off!”
TCP is a good citizen
• We view the Internet as the bottom
half of the TCP protocol
• And TCP is a good citizen that
behaves itself
– Chokes back as requested
– An elegant dialog between the network
and the protocol
• Notice that it is entirely stateless
– Cooperation between “network”
protocols, not “distributed system”
TCP issues
• We’ve seen that connections
can break inappropriately
• And now have seen that TCP
can choke back because of
congestion
• What if we want to run audio or
video over the Internet?
Styles of Audio/Video
• Asynchronous
– Play back a pre-recorded CD or a radio
broadcast
– Download a copy of a short news video
– For these cases, we don’t have any “real
time” requirements
• Synchronous or real-time
– More like a telephone conversation
– Need the data with short latencies
TCP challenges
• TCP works well for file transfer,
fetching web pages, email… etc
• The technology is not very good for
any sort of real-time use
– Telephone over the Internet
– Media delivery that lasts a long time and
can’t be transferred in advance, like a
live broadcast
• Also, not very robust against various
forms of attack by intruders
Research on better TCP
• One idea is to reserve resources
– RSVP: Resource Reservation Protocol
– Proposed by Floyd and others
– Idea is to set aside resources needed for
this TCP stream
• What’s a resource?
– Buffering space in routers
– Guarantee of a percentage of bandwidth
on the links out of routers
RSVP
• How it works:
– When making the TCP connection, user
specifies desired quality of service
(QoS)
– A reservation request is sent to the
destination
• Hop by hop we set aside the needed
resources
• Called a “lease”
• Upon successful traversal of the network, the
TCP session can start
– Now, each time a packet arrives, router
must do flow classification
Keeping RSVP stateless
• How can we avoid a form of
shared state between clients
and routers?
– Leases are designed to vanish if
not renewed
– They have a timeout, perhaps 10s
– Renewal benefits from QoS
properties of the connection!
RSVP criticisms?
• Doesn’t work well if network routes
change dynamically or failures occur
• Router slows down because
– Flow classification is hard work
– Needs enough resource for guarantees
• Very hard to bill the user
– Resources cost real money!
– In this case, many ISPs participate in
session: how to split costs?
– ISPs may not want to disclose their
route information!
RSVP: A dead standard?
• Everyone knows the acronym
• Corresponds to an IETF standard
• But seems unlikely to be used
– Core issue is cost
– Number of reservations could rise with
number of endpoints squared!
– And most resource is mostly unused…
– And the billing issue may sound silly, but
not to ISPs
• Road Runner is a typical ISP: Independent
Service Provider (seller of Internet access)
How else can we get
QoS?
• One could argue that RSVP is not
really and end-to-end solution
• Problem is that routers have a form
of shared state, even if only leased
• Led to proposals by Clark and others
at MIT for an end-to-end
approximation with similar behavior
• Called Diffsrv: Differential Quality of
Service
Diffsrv idea
• Basic idea is that reservation is
tracked at entry to the network
– Need a form of network service to figure
out if reservation, theoretically, can be
satisfied
• Packets are marked “in profile” or
“out of profile”
– E.g I reserve 100kbits and am in profile
if I send < 100kbits, out of profile if I
exceed my reservation
– Requires a single bit per packet
RIO: RED with I/O bits
• Routers now implement RED but
selectively drop out of profile (or
unreserved) packets in preference to
in-profile packets
• In limit, router drops all packets
except in-profile packets
• Statistically should average out
much as if real reservations were
done… but…
RIO: RED with I/O bits
• Keep in mind that the Internet isn’t a
synchronous system
• During congestion one can easily
have bursts of load or other big
fluctuations
• This means that even for in-profile
packets and even with no outside
load a router could still become
overloaded!
• Thus, RIO can’t guarantee QoS,
unlike RSVP which can!
Convoy phenomenon
Load injected is steady…
B
But router B sees no load…
A
C
While C is overloaded
Convoy phenomenon
• A well known problem in networks and
other systems
• Things “bunch up”
– Often due to locks associated with concurrency
– Also, little rate mismatches
– Once things start to pile up, convoys can form
• In many systems, most elements are idle
except for one component which is
overloaded!
• Convoys are the threat when using RIO
Readings and homework
• For hackers: Graph the performance
(latency, throughput) of TCP as seen
by receivers as one sender sends
the same data to gradually
increasing numbers of destinations
(1, 2, ....). How much of the behavior
can you “explain”?
• Read about Corba (Chapter 6)
Download