Why is this interesting?

Working with Mike on Distributed Computing Theory, 1978-1992 Nancy Lynch Theory of Distributed Systems Group MIT-CSAILab Fischer Fest July, 2003 1 My joint papers with Mike • Abstract complexity theory [Lynch, Meyer, Fischer 72] [Lynch, Meyer, Fischer 72, 76] • Lower bounds for mutual exclusion [Burns, Fischer, Jackson, Lynch, Peterson 78, 82] • K-consensus [Fischer, Lynch, Burns, Borodin 79] • Describing behavior and implementation of distributed systems [Lynch, Fischer 79, 80, 81] • Decomposing shared-variable algorithms [Lynch, Fischer 80, 83] • Optimal placement of resources in networks [Fischer, Griffeth, Guibas, Lynch 80, 81, 92] • Time-space tradeoff for sorting [Borodin, Fischer, Kirkpatrick, Lynch, Tompa 81] 2 More joint papers • Global snapshots of distributed computations [Fischer, Griffeth, Lynch 81, 82] • Lower bound on rounds for Byzantine agreement [Fischer, Lynch 81, 82] • Synchronous vs. asynchronous distributed systems [Arjomandi, Fischer, Lynch 81, 83] • Efficient Byzantine agreement algorithms [Dolev, Fischer, Fowler, Lynch, Strong 82] • Impossibility of consensus [Fischer, Lynch, Paterson 82, 83, 85] • Colored ticket algorithm [Fischer, Lynch, Burns, Borodin 83] • Easy impossibility proofs for distributed consensus problems [Fischer, Lynch, Merritt 85, 86] 3 And still more joint papers! • FIFO resource allocation in small shared space [Fischer, Lynch, Burns, Borodin 85, 89] • Probabilistic analysis of network resource allocation algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86] • Reliable communication over unreliable channels [Afek, Attiya, Fekete, Fischer, Lynch, Mansour, Wang, Zuck 92, 94] • 16 major projects…14 in the area of distributed computing theory… • Some well known, some not… 4 In this talk: • I’ll describe what these papers are about, and what I think they contributed to the field of distributed computing theory. • Put in context of earlier/later research. • Give you a little background about why/how we wrote them. 5 By topic: 1. 2. 3. 4. 5. 6. 7. Complexity theory Mutual exclusion and related problems Semantics of distributed systems Sorting Resource placement in networks Consistent global snapshots Synchronous vs. asynchronous distributed systems 8. Distributed consensus 9. Reliable communication from unreliable channels 6 1. Prologue (Before distributed computing) • MIT, 1970-72 • Mike Fischer + Albert Meyer’s research group in algorithms and complexity theory. – Amitava Bagchi, Donna Brown, Jeanne Ferrante, David Johnson, Dennis Kfoury, me, Robbie Moll, Charlie Rackoff, Larry Stockmeyer, Bostjan Vilfan, Mitch Wand, Frances Yao,… • Lively…energetic…ideas…parties…fun • My papers with Mike (based on my thesis): – Priority arguments in complexity theory [Lynch, Meyer, Fischer 72] – Relativization of the theory of computational complexity [Lynch, Meyer, Fischer 72, 76] 7 Abstract Complexity Theory • Priority arguments in complexity theory [Lynch, Meyer, Fischer 72] • Relativization of the theory of computational complexity [Lynch, Meyer, Fischer 72, Trans AMS 76] • What are these papers about? – They show the existence of pairs, A and B, of recursive problems that are provably hard to solve, even given the other as an oracle. – In fact, given any hard A, there’s a hard B that doesn’t help. • What does this have to do with Distributed Computing Theory? – Nothing at all. – Well, foreshadows our later focus on lower bounds/impossibility results. – Suggests study of relative complexity (and computability) of problems, which has become an important topic in DCT. 8 2. Many Years Later…Mutual Exclusion and Related Problems • Background: – We met again at a 1977 Theory conference, decided the world needed a theory for distributed computing. – Started working on one…with students: Jim Burns, Paul Jackson (Georgia Tech), Gary Peterson (U. Wash.) – Lots of visits, Georgia Tech and U. Wash. – Mike’s sabbatical at Georgia Tech, 1980 – Read lots of papers: • Dijkstra---Mutual exclusion,… • Lamport---Time clocks,… • Johnson and Thomas---Concurrency control • Cremers and Hibbard---Lower bound on size of shared memory to solve 2-process mutual exclusion – Finally, we wrote one: • Data Requirements for Implementation of N-Process Mutual Exclusion Using a Single Shared Variable 9 [Burns, Fischer, Jackson, Lynch, Peterson 78, 82] Mutual Exclusion • Data Requirements for Implementation of N-Process Mutual Exclusion Using a Single Shared Variable [Burns, F, Jackson, L, Peterson ICPP 78, JACM 82] • What is this paper about? – N processes accessing read-modify-write shared memory, solving N-process lockout-free mutual exclusion. – Lower bound of (N+1)/2 memory states. • Constructs bad executions that “look like” other executions, as far as processes can tell. – For bounded waiting, lower bound of N+1 states. – Nearly-matching algorithms • Based on distributed simulation of a centralized scheduler process. 10 Mutual Exclusion • Theorem: Lower bound of N states, bounded waiting: – Let processes 1,…N enter the trying region, one by one. – If the memory has < N states, two processes, say i and j, must leave the memory in the same state. – Then processes i+1,…j are hidden and can be bypassed arbitrarily many times. • Theorem: Lower bound of N+1 states, bounded waiting: – Uses a slightly more complicated strategy of piecing together execution fragments. • Theorem: Lower bound of N/2 states, for lockoutfreedom. – Still more complicated strategy. • Two algorithms… – Based on simulating a centralized scheduler – Bounded waiting algorithm with only ~N states. – Surprising lockout-free algorithm with only ~N/2 states! 11 Mutual Exclusion • Why is this interesting? – Cute algorithms and lower bounds. – Some of the first lower bound results in DCT. – “Looks like” argument for lower bounds, typical of many later impossibility arguments. – Virtual scheduler process provides some conceptual modularity, for the algorithms. – We worked out a lot of formal definitions: • State-machine model for asynchronous shared memory systems. • With liveness assumptions, input/output distinctions. • Exclusion problems, safety and liveness requirements. 12 Decomposing Algorithms Using a Virtual Scheduler • Background: – Continuing on the same theme… – Mutual exclusion algorithms (ours and others’) were complicated, hard to understand and prove correct. – We realized we needed ways of decomposing them. – Our algorithms used a virtual scheduler, informally. – Could we make this rigorous? – We did: • A Technique for Decomposing Algorithms that Use a Single Shared Variable [Lynch, Fischer 80, JCSS 83] 13 Decomposing Distributed Algorithms • A Technique for Decomposing Algorithms that Use a Single Shared Variable [Lynch, Fischer 80, JCSS 83] • What is this paper about? – Defines a “supervisor system”, consisting of N contending processes + one distinguished, permanent supervisor process. Sup – Shows that, under certain conditions, a system of N contenders with no supervisor can simulate a supervisor system. Sup – Generalizes, makes explicit, the technique used in [BFJLP 78, 82]. 14 – Applies this method to two mutual exclusion algorithms. Decomposing Distributed Algorithms • Why is this interesting? – Explicit decomposition of a complex distributed algorithm. – Prior to this point, distributed algorithms were generally presented as monolithic entities. – Foreshadows later extensive uses of decomposition, transformations. – Resulting algorithm easier to understand. – Not much increase in costs. 15 Generalization to K-Exclusion • Background: – Still continuing on the same theme… – Started inviting other visitors to join us (Borodin, Lamport, Arjomandi,…) – Our mutual exclusion algorithms didn’t tolerate stopping failures, either in critical region or in trying/exit regions. – So we added in some fault-tolerance: • K-exclusion, to model access to K resources (so progress can continue if < K processes fail in the critical region). • f-robustness, to model progress in the presence of at most f failures overall. – Still focused on bounds on the size of shared memory. – Wrote: • Resource allocation with immunity to limited process failure [Fischer, Lynch, Burns, Borodin FOCS 79] 16 K-Exclusion • Resource allocation with immunity to limited process failure [Fischer, Lynch, Burns, Borodin FOCS 79] • What is this paper about? – N processes accessing RMW shared memory, solving f-robust lockout-free K-exclusion. – Three robust algorithms: • Unlimited robustness, FIFO, O(N2) states – Simulates queue of waiting processes in shared memory. – Distributed implementation of queue. • f-robust, bounded waiting, O(N) states – Simulates centralized scheduler. – Fault-tolerant, distributed implementation of scheduler, using 2f+1 “helper” processes. • f-robust, FIFO, O(N (log N)c) states – Corresponding lower bounds: • (N2) states for unlimited robustness, FIFO • (N) states for f-robustness, bounded waiting 17 K-Exclusion • Why is this interesting? – New definitions: • K-exclusion problem, studied later by [Dolev, Shavit], others. • f-robustness (progress in the presence of at most f failures), for exclusion problems – Lots of cute algorithms and lower bounds. – Among the earliest lower bound results in DCT; more “looks like” arguments. – Algorithmic ideas: • Virtual scheduler process, fault-tolerant implementation. • Distributed implementation of shared queue. • Passive communication with possibly-failed processes. – Proof ideas: • Refinement mapping used to show that the real algorithm implements the algorithm with a virtual scheduler. • First example I know of a refinement mapping used to verify a distributed algorithm. 18 K-Exclusion • Background: – Lots of results in the FOCS 79 paper. – Only one seems to have made it to journal publication: the Colored Ticket Algorithm. 19 K-Exclusion • The Colored Ticket Algorithm [F, L, Burns, Borodin 83] • Distributed FIFO resource allocation using small shared space [F, L, Burns, Borodin 85, TOPLAS 89] • What are these papers about? – FIFO K-exclusion algorithm, unlimited robustness, O(N2) states – Simulates queue of waiting processes in shared memory; first K processes may enter critical region. – First distributed implementation: Use tickets with unbounded numbers. Keep track of last ticket issued and last ticket validated for entry to critical region. – Second version: Use size N batches of K + 1 different colors. Reuse a color if no ticket of the same color is currently issued or validated. – Corresponding (N2) lower bound. 20 K-Exclusion • Why is this interesting? – Algorithm modularity: Bounded-memory algorithm simulating unbounded-memory version. – Similar strategy to other bounded-memory algorithms, like [Dolev, Shavit] bounded concurrent timestamp algorithm 21 3. Semantics of Distributed Systems • Background: – When we begin working in DCT, there were no usable mathematical models for: • Describing distributed algorithms. • Proving correctness, performance properties. • Stating and proving lower bounds. – So we had to invent them. – At first, ad hoc, for each paper. – But soon, we were led to define something more general: • On describing the behavior and implementation of distributed systems [Lynch, Fischer 79, 80, 81] – We got to present this in Evian-les-Bains, France 22 Semantics of Distributed Systems • On describing the behavior and implementation of distributed systems [Lynch, Fischer 79, 80, TCS 81] • What is this paper about? – It defined a mathematical modeling framework for asynchronous interacting processes. – Based on infinite-state state machines, communicating using shared variables. – Input/output distinction, fairness. – Input-enabled, with respect to environment’s changes to shared variables. – External behavior notion: Set of “traces” of accesses to shared variables that arise during fair executions. – Implementation notion: Subset for sets of traces. – Composition definition, compositionality results. – Time measure. Time bound proofs using recurrences. 23 Semantics of Distributed Systems • Why is this interesting? – Very early modeling framework for asynchronous distributed algorithms. – Includes features that have become standard in later models, esp., implementation = subset for trace sets – Predecessor of I/O automata (but uses shared variables rather than shared actions). – Had notions of composition, hiding. – Had a formal notion of implementation, though no notion of simulation relations. – Differs from prior models [Hoare] [Milne, Milner]: • Based on math concepts (sets, sequences, etc.) instead of notations (process expressions) and proof rules. • Simpler notions of external behavior and implementation. 24 4. Sorting • A time-space tradeoff for sorting on non-oblivious machines [Borodin, F, Kirkpatrick, L, Tompa JCSS 81] • What is this paper about? – Defined DAG model for sorting programs. – Defined measures: • Time T = length of longest path • Space S = log of number of vertices – Proved lower bound on product S T = (N2), for sorting N elements. – Based on counting arguments for permutations. • Why is this interesting? – – – – I don’t know. A neat complexity theory result. Tight bound [Munro, Paterson] It has nothing to do with distributed computing theory. 25 5. Resource Allocation in Networks • Background: – K-exclusion = allocation of K resources. – Instead of shared memory, now consider allocating K resources in a network. – Questions: • Where to place the resources? • How to locate them efficiently? – Experiments (simulations), analysis. – Led to papers: • Optimal placement of resources in networks [Fischer, Griffeth, Guibas, Lynch 80, ICDCS 81, Inf & Comp 92] • Probabilistic analysis of network resource allocation algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86] 26 Resource Allocation in Networks • Optimal placement of resources in networks [Fischer, Griffeth, Guibas, Lynch 80, ICDCS 81, Inf & Comp 92] • What is this paper about? – How to place K resources in a tree to minimize the total expected cost (path length) of servicing (matching) exactly K requests arriving randomly at nodes. – One-shot resource allocation problem. – Characterizations, efficient divide-and-conquer algorithms for determining optimal placements. – Theorem: Fair placements (where each subtree has approx. expected number of needed resources) are approximately optimal. – Cost O(K) 27 Resource Allocation in Networks • Why is this interesting? – Optimal placements not completely obvious, e.g., can’t minimize flow on all edges simultaneously. – Analysis uses interesting properties of convex functions. – Results suggested by Nancy Griffeth’s experiments. – Uses interesting math observation: mean vs. median of binomial distributions are always within 1, that is, median(n,p)  {np,np}. • Unfortunately, already discovered (but not that long ago!) [Jogdeo, Samuels 68], [Uhlmann 63, 66] 28 Resource Allocation in Networks • Probabilistic analysis of network resource allocation algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86] • What is this paper about? – An algorithm for matching resources to requests in trees. • The search for a resource to satisfy each request proceeds sequentially, in larger and larger subtrees. • Search in a subtree reverses direction when it discovers that the subtree is empty. • Cost is independent of size of network. – Analysis: • For simplified case: – Requests and responses all at same depth in the tree – Choose subtree to search probabilistically – Constant message delay • Proved that worst case = noninterfering requests. • Expected time for noninterfering requests is constant, independent of size of network or number of requests. 29 Resource Allocation in Networks • Why is this interesting? – Cute algorithm. – Performance independent of size of network---an interesting “scalability” criterion. – Analysis decomposed in an interesting way: • Analyze sequential executions (using traditional methods). • Bound performance of concurrent executions in terms of performance of sequential executions. 30 6. Consistent Global Snapshots • Background: – We studied database concurrency control algorithms [Bernstein, Goodman] – Led us to consider implementing transaction semantics in a distributed setting. – Canonical examples: • Bank transfer and audit transactions. • Consistent global snapshot (distributed checkpoint) transactions. – Led to: • Global states of a distributed system [Fischer, Griffeth, Lynch 81, TOSE 82] 31 Consistent Global Snapshots • Global states of a distributed system [Fischer, Griffeth, Lynch 81, TOSE 82] • What is this paper about? – Models distributed computations using multi-site transactions. – Defines correctness conditions for checkpoint (consistent global snapshot): • Returns transaction-consistent state that includes all transactions completed before the checkpoint starts, and possibly some of those that overlap. – Provides a general, nonintrusive algorithm for computing a checkpoint: • Mark transactions as “pre-checkpoint” or “post-checkpoint”. • Run pre-checkpoint transactions to completion, in a parallel execution of the system. – Applications to bank audit, detecting inconsistent data, system recovery. 32 Global snapshots • Why is this interesting? – The earliest distributed snapshot algorithm I know of. – Some similar ideas to [Chandy, Lamport 85] “marker” algorithm, but: • Presented in terms of transactions. • Used formal automaton model [Lynch, Fischer 79], which makes it a bit obscure. 33 7. Synchronous vs. Asynchronous Distributed Systems • Synchronous vs. asynchronous distributed systems [Arjomandi, Fischer, Lynch 81, JACM 83] • Background: – Eshrat Arjomandi visited us at Georgia Tech, 1980. – Working on PRAM algorithms for matrix computations – We considered extensions to asynchronous, distributed setting. • What is this paper about? – Defines a synchronization problem, the s-session problem, in which the processes perform at least s “sessions”, then halt. – In a session, each process performs at least one output. – Motivated by “sufficient interleaving” of matrix operations. – Shows that this problem can be solved in time O(s) in a synchronous system (obvious). – But it takes time (s diam) in an asynchronous system (not obvious). 34 Synchronous vs. Asynchronous Distributed Systems • Why is this interesting? – Cute impossibility proof. – First proof I know that synchronous systems are inherently faster than asynchronous systems, even in a non-fault-prone setting. – Interesting contrast with synchronizer results of [Awerbuch], which seem to say that asynchronous systems are just as fast as synchronous systems. – The difference is that the session problem requires preserving global order of events at different nodes. 35 8. Distributed Consensus • Background: – Leslie Lamport visited us at Georgia Tech, 1980. – We discussed his new Albanian agreement problem and algorithms. – The algorithms took a lot of time (f+1 rounds) and a lot of processes (3f + 1). – We wondered why. – Led us to write: • A lower bound for the time to assure interactive consistency [Fischer, Lynch 81, IPL 82] 36 Lower Bound on Rounds, for Byzantine Agreement • A lower bound for the time to assure interactive consistency [Fischer, Lynch 81, IPL 82] • What is this paper about? – You probably already know. – Shows that f+1 rounds are needed to solve consensus, with up to f Byzantine failures. • Uses a chain argument, constructing a chain spanning from all-0, failure-free execution to all-1, failure-free execution. 0 0 0 0 1 1 1 1 37 Lower Bound on Rounds, for Byzantine Agreement • Why is this interesting? – A fundamental fact about fault-tolerant distributed computing. – First chain argument I know of. – Proved for Byzantine failures, but similar ideas used later to prove the same bound for less drastic failure models: • Byzantine failures with authentication [Dolev, Strong], [De Millo, Lynch, Merritt] • Stopping failures [Merritt]. – Leaves open the questions of minimizing communication and storage 38 BA with Small Storage/Communication • Background: – 1981, by now we’ve moved to MIT, Yale – Next we considered the communication/storage complexity. – Most previous algorithms used exponential communication. – [Dolev 82] used 4f + 4 rounds, O(n4 log n) bits of communication. – We wrote: • A Simple and Efficient Byzantine Generals Algorithm [Lynch, Fischer, Fowler 82] • Efficient Byzantine agreement algorithms [Dolev, Fischer, Fowler, Lynch, Strong IC 82] 39 BA with Small Storage/Communication • A Simple and Efficient Byzantine Generals Algorithm [Lynch, Fischer, Fowler 82] • Efficient Byzantine agreement algorithms [Dolev, Fischer, Fowler, Lynch, Strong IC 82] • What is in this paper? – A new algorithm, using 2f+3 rounds and O(nf + f3 log f) bits. – Asymmetric: Processes try actively to decide on 1. – Processes “initiate” a proposal to decide 1, relay each other’s proposals. – Initiate at later stages based on number of known proposals. – Threshold for initiation increases steadily at later rounds. – Decide after 2f + 3 rounds based on sufficient known proposals. 40 BA with Small Storage/Communication • Why is this interesting? – Efficient algorithm, at the time. – Presentation was unstructured; later work of [Srikanth, Toueg 87] added more structure to such algorithms, in the form of a “consistent broadcast” primitive. – Later algorithms [Berman, Garay 93], [Garay, Moses 93], [Moses, Waarts 94], used poly communication and only f+1 rounds (the minimum). 41 Impossibility of Consensus • Background: – For other consensus problems in fault-prone settings, like approximate agreement [Dolev, Lynch, Pinter, Stark, Weihl], algorithms for synchronous models always seemed to extend to asynchronous models. – So we tried to design a fault-tolerant asynchronous consensus algorithm. – We failed. – Tried to prove an impossibility result, failed. – Tried to find an algorithm,… – Finally found the impossibility result: • Impossibility of distributed consensus with one faulty process [F, L, Paterson 82, PODS 83, JACM 85] 42 Impossibility of Consensus • What is in this paper? – You know. A proof that you can’t solve consensus in asynchronous systems, with even one stopping failure. – Uses a “bivalence” argument, which focuses on the form that a decision point would have to take: – And does a case analysis to prove that this configuration can’t occur. 0-valent 1-valent 43 Impossibility of Consensus • Why is this interesting? – A fundamental fact about fault-tolerant distributed computing. – Addresses issues of interest to system builders, not just theoreticians. – Makes it clear it’s necessary to strengthen the model or weaken the problem. – Result is not obvious. – “Proof just hard enough” (Mike Fischer). – Proof methods: Bivalence, chain arguments • Leslie may say more about this… 44 The Beginnings of PODC • Around this time (1981), we noticed that: – There was a lot of distributed computing theory, – But no conferences devoted to it. • E.g., FLP appeared in a database conference (PODS) • 1982: Mike and I (and Robert Probert) started PODC 45 Lower Bounds on Number of Processes • Background: – [Pease, Shostak, Lamport 80] showed 3f+1 lower bound on number of processes for Byzantine agreement. – [Dolev 82] showed 2f+1 connectivity bound for BA. – [Lamport 83] showed 3f+1 lower bound on number of processes for weak BA. – [Coan, Dolev, Dwork, Stockmeyer 85] showed 3f+1 lower bound for Byzantine firing squad problem. – [Dolev, Lynch, Pinter, Stark, Weihl 83] claimed 3f+1 bound for approximate BA. – [Dolev, Halpern, Strong 84] showed 3f+1 lower bound for Byzantine clock synchronization. – Surely there is some common reason for all these results. – We unified them, in: • Easy impossibility proofs for distributed consensus problems [Fischer, Lynch, Merritt PODC 85, DC 86] 46 Lower Bounds on Number of Processes • What is this paper about? – Gives a uniform set of proofs for Byzantine agreement, weak BA, approximate BA, Byzantine firing squad, and Byzantine clock synchronization. – Proves 3f+1 lower bounds on number of processes, and 2f+1 lower bounds on connectivity. – Uses “hexagon vs. triangle” arguments: B C A A A C B C B 47 Lower Bounds on Number of Processes • Why is this interesting? – Basic results about fault-tolerant computation. – Unifies a lot of results/proofs. – Cute proofs. 48 9. Epilogue: Reliable Communication • Reliable communication over unreliable channels [Afek, Attiya, Fekete, Fischer, Lynch, Mansour, Wang, Zuck 92, JACM 94] • Background: – Much later, we again became interested in a common problem. – Two reliable processes, communicating over two unreliable channels: reorder, lose, duplicate. – Bounded size messages (no sequence numbers). – Can we implement a reliable FIFO channel? – [Wang, Zuck 89] showed impossible for reo + dup. – Any one kind of fault is easy to tolerate. – Alternating Bit Protocol tolerates loss + dup – Remaining question: reordering + loss, bounded size messages 49 Reliable Communication • What is this paper about? – Resolves the question of implementability of reliable FIFO channels over unreliable channels exhibiting reordering + loss, using bounded size messages. – An algorithm, presented in two layers: • Layer 1: Uses the given channels to implement channels that do not reorder (but may lose and duplicate messages). – Receiver-driven (probes to request messages). – Uses more and more messages as time goes on, to “swamp out” copies of old messages. – Lots of messages. – Really lots of messages. • Layer 2: Uses ABP over layer 1 to get reliable FIFO channel. – A lower bound, saying that any such algorithm must use lots of messages. 50 Reliable Communication • Why is this interesting? – Fundamental facts about communication in fault-prone settings. – Related impossibility results proved by [Mansour, Schieber 92], [Wang, Zuck 89], [Tempero, Ladner 90, 95]. 51 Conclusions • • • • • A lot of projects, a lot of papers. Especially during 1979-82. Especially during Mike’s sabbatical, 1980. They seem pretty interesting, even now! Some are well known, some not… • Anyway, we had a lot of fun working on them. • Thanks to Mike for a great collaboration! 52

Why is this interesting?

Related documents

Products

Support

Why is this interesting?

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib