CS514: Intermediate Course in Operating Systems Lecture 24: Nov. 16

advertisement
CS514: Intermediate
Course in Operating
Systems
Professor Ken Birman
Ben Atkin: TA
Lecture 24: Nov. 16
Solving real problems
with real-time protocols
• When does real-time matter?
– Air traffic application: we want rapid
response when events occur
– Telecommunications application: switch
requires real-time reactions to events
that occur
• Two categories of real-time
– We want an action to be predictably fast
– We want an action to occur before a
deadline passes
Predictability
• If this is our goal…
– Any well-behaved mechanism may
be adequate
– But we should be careful about
uncommon disruptive cases
• For example, cost of failure handling
is often overlooked
• Risk is that an infrequent scenario
will be very costly when it occurs
Predictability: Examples
• Probabilistic multicast protocol
– Very predictable if our desired latencies
are larger than the expected
convergence
– Much less so if we seek latencies that
bring us close to the expected latency of
the protocol itself
• Rule of thumb?
– Real-time doesn’t mean “as fast as
possible” – more often it means “slow
and steady” !
Mixing issues
• Telephone networks need a
mixture of properties
– Real-time response
– High performance
– Stable behavior even when failures
and recoveries occur
• Can we use our tools to solve
such a problem?
Friedman’s SS7
experiment
• Used Horus to emulate a
telephone switching system
• Idea is to control a telephone
switch that handles 800
telephone numbers in software
• Horus runs the “800 number
database” on a cluster of
processors next to the switch
IN coprocessor example
SS7
switch
SS7
switch
SS7
switch
SS7
switch
IN coprocessor example
SS7
switch
coprocessor
SS7
switch
SS7
switch
coprocessor
coprocessor
coprocessor
SS7
switch
Role of coprocessor
• A simple database
• Basically
– Switch does a query
• How should I route a call to 617-253-8117
from 607-227-3919?
• Reply: use output line 6
– Time limit of 100ms on transaction
• Also runs a background protocol to
update the database as things
change, on a separate network…
Goals for coprocessor
• Right now, people use hardware
fault-tolerant machines for this
– E.g. Stratus “pair and a spare”
– Mimics one computer but tolerates
hardware failures
– Performance an issue…
Goals for coprocessor
• What we want
– Scalability: ability to use a cluster of
machines for the same task, with better
performance when we use more nodes
– Fault-tolerance: a crash or recovery
shouldn’t disrupt the system
– Real-time response: must satisfy the
100ms limit at all times
• Desired: “7 to 9-nines availability”
• Downtime: any period when a series
of requests might all be rejected
IN coprocessor example
Switch itself asks for
help when remote
number call is sensed
SS7
switch
External adaptor
(EA) processors run
the query protocol
EA
EA
Query Element (QE)
processors do the
number lookup (inmemory database).
Goals: scalable
memory without loss
of processing
performance as
number of nodes is
increased
Primary backup scheme adapted (using small Horus process
groups) to provide fault-tolerance with real-time guarantees
Options?
• A simple scheme:
– Organize nodes as groups of 2
processes
– Use virtual synchrony multicast
• For query
• For response
• Also for updates and membership
tracking
– A bit like our ATC example…
IN coprocessor example
SS7
switch
EA
EA
Step 1: Switch sees incoming request
IN coprocessor example
SS7
switch
EA
EA
Step 2: Switch waits while EA procs. multicast request to
group of query elements (“partitioned” database)
IN coprocessor example
SS7
switch
EA
EA
Think
Think
Step 3: The query elements do the query in duplicate
IN coprocessor example
SS7
switch
EA
EA
Step 4: They reply to the group of EA processes
IN coprocessor example
SS7
switch
EA
EA
Step 6: EA processes reply to switch, which routes call
Results?
• Terrible performance!
– Solution has 2 Horus multicasts on each
critical path
– Experience: about 600 queries per
second but no more
• Also: slow to handle failures
– Freezes for as long as 6 seconds
• Performance doesn’t improve much
with scale either
Next try?
• Consider taking Horus off the critical
path
• Idea is to continue using Horus
– It manages groups
– And we use it for updates to the
database and for partitioning the QE set
• But no multicasts on critical path
– Instead use a hand-coded scheme
Roy’s hand-coded
scheme
• Queue up a set of requests from
an EA to a QE
• Periodically, sweep the set into
a message and send as a batch
• Process, also as a batch
• Send the batch of replies back
to EA
Clever twists?
• Split into a primary and
secondary EA for each request
– Secondary steps in if no reply
seen in 50ms
– Batch size calculated so that 50ms
should be “long enough”
• Hand optimized I/O and batching
code…
Results?
• Able to sustain 22,000 emulated
telephone calls per second
• Able to guarantee response within
100ms and no more than 3% of calls
are dropped (randomly)
• Performance is not hurt by a single
failure or recovery while switch is
running
• Can put database in memory:
memory size increases with number
of nodes in cluster
Keys to success
• Horus is doing the hard work of
configuration management
– But configuration is only “read” by code
on critical path
– Horus is not really in the performancecritical section of code
• Also: need enough buffering space to
keep running while a failure is
sensed and reported
Coprocessors galore
• SS7 thinks of the scalable
cluster as a coprocessor
• But coprocessor thinks of Horus
as a sort of coprocessor
– It sits off to one side
– Reports membership changes
– But “interface” is really just a
shared memory segment
Same problem with
Totem or CASD?
• Can’t use these technologies with
100ms timeout! The basic delivery
latency already exceeds 100ms
• Could probably tune either protocol
to this setup
• ... but Friedman can probably double
his performance too, by tuning Horus
to the setup
• Conclusion is that real-time should
be understood from needs of
application, not a specific theory
Other settings with a
strong temporal element
• Load balancing
– Idea is to track load of a set of
machines
– Can do this at an access point or
in the client
– Then want to rebalance by issuing
requests preferentially to less
loaded servers
Load-balancing with an
external adaptor
EA
Load-balancing on client
Load
summary
Picks a
lightlyloaded
machine
Load balancing in farms
• Akamai widely cited
– They download the rarely-changing
content from customer web sites
– Distribute this to their own web farm
– Then use a hacked DNS to redirect web
accesses to a close-by, less-loaded
machine
• Real-time aspects?
– The data on which this is based needs to
be fresh or we’ll send to the wrong server
Real-time in industry
• Very common in factory settings
– At time t start the assembly line
– Planning: from time t0 to t1 produce
MIPS CPU chips on fab-unit 16…
– If the pressure rises too quickly, reduce
the temperature
• Often, we use real-time operating
systems in support of such
applications
Robotics, embedded
systems
• Many emerging applications involve
coordination of action by many
components
• E.g. robots that cooperate to
construct something
• Demand for real-time embedded
systems technology will be
widespread in industry
• Little is understood about networks
in such settings… a big opportunity
Future directions in realtime
• Expect GPS time sources to be
common within five years
• Real-time tools like periodic process
groups will also be readily available
(members take actions in a
temporally coordinated way)
• Increasing focus on predictable high
performance rather than provable
worst-case performance
• Increasing use of probabilistic
techniques
Future Directions
• David Tennenhouse (MIT, then
DARPA ITO, then MCI):
– Get real
– Get small
– Get moving!
Conclusions?
• Protocols like pbcast are potentially
appealing in a subset of applications that
are naturally probabilistic to begin with,
and where we may have knowledge of
expected load levels, etc.
• More traditional virtual synchrony
protocols with strong consistency
properties make more sense in standard
networking settings
• Many ways to combine temporal+logical
props.
Ending on a thought
question
• Distributed systems depend on many
layers of software, hardware, and many
assumptions
• New wave of embedded systems will
demand real-time solutions!
• Are such systems ultimately probabilistic,
or ultimately deterministic?
• Do current reliable systems converge
towards deterministic behavior or
converge towards chaotic behaviors?
Download