CS514: Intermediate Course in Operating Systems Professor Ken Birman Ben Atkin: TA Lecture 24: Nov. 16 Solving real problems with real-time protocols • When does real-time matter? – Air traffic application: we want rapid response when events occur – Telecommunications application: switch requires real-time reactions to events that occur • Two categories of real-time – We want an action to be predictably fast – We want an action to occur before a deadline passes Predictability • If this is our goal… – Any well-behaved mechanism may be adequate – But we should be careful about uncommon disruptive cases • For example, cost of failure handling is often overlooked • Risk is that an infrequent scenario will be very costly when it occurs Predictability: Examples • Probabilistic multicast protocol – Very predictable if our desired latencies are larger than the expected convergence – Much less so if we seek latencies that bring us close to the expected latency of the protocol itself • Rule of thumb? – Real-time doesn’t mean “as fast as possible” – more often it means “slow and steady” ! Mixing issues • Telephone networks need a mixture of properties – Real-time response – High performance – Stable behavior even when failures and recoveries occur • Can we use our tools to solve such a problem? Friedman’s SS7 experiment • Used Horus to emulate a telephone switching system • Idea is to control a telephone switch that handles 800 telephone numbers in software • Horus runs the “800 number database” on a cluster of processors next to the switch IN coprocessor example SS7 switch SS7 switch SS7 switch SS7 switch IN coprocessor example SS7 switch coprocessor SS7 switch SS7 switch coprocessor coprocessor coprocessor SS7 switch Role of coprocessor • A simple database • Basically – Switch does a query • How should I route a call to 617-253-8117 from 607-227-3919? • Reply: use output line 6 – Time limit of 100ms on transaction • Also runs a background protocol to update the database as things change, on a separate network… Goals for coprocessor • Right now, people use hardware fault-tolerant machines for this – E.g. Stratus “pair and a spare” – Mimics one computer but tolerates hardware failures – Performance an issue… Goals for coprocessor • What we want – Scalability: ability to use a cluster of machines for the same task, with better performance when we use more nodes – Fault-tolerance: a crash or recovery shouldn’t disrupt the system – Real-time response: must satisfy the 100ms limit at all times • Desired: “7 to 9-nines availability” • Downtime: any period when a series of requests might all be rejected IN coprocessor example Switch itself asks for help when remote number call is sensed SS7 switch External adaptor (EA) processors run the query protocol EA EA Query Element (QE) processors do the number lookup (inmemory database). Goals: scalable memory without loss of processing performance as number of nodes is increased Primary backup scheme adapted (using small Horus process groups) to provide fault-tolerance with real-time guarantees Options? • A simple scheme: – Organize nodes as groups of 2 processes – Use virtual synchrony multicast • For query • For response • Also for updates and membership tracking – A bit like our ATC example… IN coprocessor example SS7 switch EA EA Step 1: Switch sees incoming request IN coprocessor example SS7 switch EA EA Step 2: Switch waits while EA procs. multicast request to group of query elements (“partitioned” database) IN coprocessor example SS7 switch EA EA Think Think Step 3: The query elements do the query in duplicate IN coprocessor example SS7 switch EA EA Step 4: They reply to the group of EA processes IN coprocessor example SS7 switch EA EA Step 6: EA processes reply to switch, which routes call Results? • Terrible performance! – Solution has 2 Horus multicasts on each critical path – Experience: about 600 queries per second but no more • Also: slow to handle failures – Freezes for as long as 6 seconds • Performance doesn’t improve much with scale either Next try? • Consider taking Horus off the critical path • Idea is to continue using Horus – It manages groups – And we use it for updates to the database and for partitioning the QE set • But no multicasts on critical path – Instead use a hand-coded scheme Roy’s hand-coded scheme • Queue up a set of requests from an EA to a QE • Periodically, sweep the set into a message and send as a batch • Process, also as a batch • Send the batch of replies back to EA Clever twists? • Split into a primary and secondary EA for each request – Secondary steps in if no reply seen in 50ms – Batch size calculated so that 50ms should be “long enough” • Hand optimized I/O and batching code… Results? • Able to sustain 22,000 emulated telephone calls per second • Able to guarantee response within 100ms and no more than 3% of calls are dropped (randomly) • Performance is not hurt by a single failure or recovery while switch is running • Can put database in memory: memory size increases with number of nodes in cluster Keys to success • Horus is doing the hard work of configuration management – But configuration is only “read” by code on critical path – Horus is not really in the performancecritical section of code • Also: need enough buffering space to keep running while a failure is sensed and reported Coprocessors galore • SS7 thinks of the scalable cluster as a coprocessor • But coprocessor thinks of Horus as a sort of coprocessor – It sits off to one side – Reports membership changes – But “interface” is really just a shared memory segment Same problem with Totem or CASD? • Can’t use these technologies with 100ms timeout! The basic delivery latency already exceeds 100ms • Could probably tune either protocol to this setup • ... but Friedman can probably double his performance too, by tuning Horus to the setup • Conclusion is that real-time should be understood from needs of application, not a specific theory Other settings with a strong temporal element • Load balancing – Idea is to track load of a set of machines – Can do this at an access point or in the client – Then want to rebalance by issuing requests preferentially to less loaded servers Load-balancing with an external adaptor EA Load-balancing on client Load summary Picks a lightlyloaded machine Load balancing in farms • Akamai widely cited – They download the rarely-changing content from customer web sites – Distribute this to their own web farm – Then use a hacked DNS to redirect web accesses to a close-by, less-loaded machine • Real-time aspects? – The data on which this is based needs to be fresh or we’ll send to the wrong server Real-time in industry • Very common in factory settings – At time t start the assembly line – Planning: from time t0 to t1 produce MIPS CPU chips on fab-unit 16… – If the pressure rises too quickly, reduce the temperature • Often, we use real-time operating systems in support of such applications Robotics, embedded systems • Many emerging applications involve coordination of action by many components • E.g. robots that cooperate to construct something • Demand for real-time embedded systems technology will be widespread in industry • Little is understood about networks in such settings… a big opportunity Future directions in realtime • Expect GPS time sources to be common within five years • Real-time tools like periodic process groups will also be readily available (members take actions in a temporally coordinated way) • Increasing focus on predictable high performance rather than provable worst-case performance • Increasing use of probabilistic techniques Future Directions • David Tennenhouse (MIT, then DARPA ITO, then MCI): – Get real – Get small – Get moving! Conclusions? • Protocols like pbcast are potentially appealing in a subset of applications that are naturally probabilistic to begin with, and where we may have knowledge of expected load levels, etc. • More traditional virtual synchrony protocols with strong consistency properties make more sense in standard networking settings • Many ways to combine temporal+logical props. Ending on a thought question • Distributed systems depend on many layers of software, hardware, and many assumptions • New wave of embedded systems will demand real-time solutions! • Are such systems ultimately probabilistic, or ultimately deterministic? • Do current reliable systems converge towards deterministic behavior or converge towards chaotic behaviors?