Why is this interesting?

advertisement
Working with Mike on
Distributed Computing Theory,
1978-1992
Nancy Lynch
Theory of Distributed Systems Group
MIT-CSAILab
Fischer Fest
July, 2003
1
My joint papers with Mike
• Abstract complexity theory [Lynch, Meyer, Fischer 72]
[Lynch, Meyer, Fischer 72, 76]
• Lower bounds for mutual exclusion
[Burns, Fischer, Jackson, Lynch, Peterson 78, 82]
• K-consensus [Fischer, Lynch, Burns, Borodin 79]
• Describing behavior and implementation of distributed
systems [Lynch, Fischer 79, 80, 81]
• Decomposing shared-variable algorithms
[Lynch, Fischer 80, 83]
• Optimal placement of resources in networks
[Fischer, Griffeth, Guibas, Lynch 80, 81, 92]
• Time-space tradeoff for sorting [Borodin, Fischer,
Kirkpatrick, Lynch, Tompa 81]
2
More joint papers
• Global snapshots of distributed computations
[Fischer, Griffeth, Lynch 81, 82]
• Lower bound on rounds for Byzantine agreement
[Fischer, Lynch 81, 82]
• Synchronous vs. asynchronous distributed systems
[Arjomandi, Fischer, Lynch 81, 83]
• Efficient Byzantine agreement algorithms
[Dolev, Fischer, Fowler, Lynch, Strong 82]
• Impossibility of consensus
[Fischer, Lynch, Paterson 82, 83, 85]
• Colored ticket algorithm
[Fischer, Lynch, Burns, Borodin 83]
• Easy impossibility proofs for distributed consensus
problems [Fischer, Lynch, Merritt 85, 86]
3
And still more joint papers!
• FIFO resource allocation in small shared space
[Fischer, Lynch, Burns, Borodin 85, 89]
• Probabilistic analysis of network resource allocation
algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86]
• Reliable communication over unreliable channels
[Afek, Attiya, Fekete, Fischer, Lynch, Mansour, Wang,
Zuck 92, 94]
• 16 major projects…14 in the area of distributed
computing theory…
• Some well known, some not…
4
In this talk:
• I’ll describe what these papers are about, and
what I think they contributed to the field of
distributed computing theory.
• Put in context of earlier/later research.
• Give you a little background about why/how we
wrote them.
5
By topic:
1.
2.
3.
4.
5.
6.
7.
Complexity theory
Mutual exclusion and related problems
Semantics of distributed systems
Sorting
Resource placement in networks
Consistent global snapshots
Synchronous vs. asynchronous distributed
systems
8. Distributed consensus
9. Reliable communication from unreliable channels
6
1. Prologue
(Before distributed computing)
• MIT, 1970-72
• Mike Fischer + Albert Meyer’s research group in
algorithms and complexity theory.
– Amitava Bagchi, Donna Brown, Jeanne Ferrante, David
Johnson, Dennis Kfoury, me, Robbie Moll, Charlie Rackoff,
Larry Stockmeyer, Bostjan Vilfan, Mitch Wand, Frances Yao,…
• Lively…energetic…ideas…parties…fun
• My papers with Mike (based on my thesis):
– Priority arguments in complexity theory
[Lynch, Meyer, Fischer 72]
– Relativization of the theory of computational complexity
[Lynch, Meyer, Fischer 72, 76]
7
Abstract Complexity Theory
• Priority arguments in complexity theory
[Lynch, Meyer, Fischer 72]
• Relativization of the theory of computational complexity
[Lynch, Meyer, Fischer 72, Trans AMS 76]
• What are these papers about?
– They show the existence of pairs, A and B, of recursive problems
that are provably hard to solve, even given the other as an oracle.
– In fact, given any hard A, there’s a hard B that doesn’t help.
• What does this have to do with Distributed Computing
Theory?
– Nothing at all.
– Well, foreshadows our later focus on lower bounds/impossibility
results.
– Suggests study of relative complexity (and computability) of
problems, which has become an important topic in DCT.
8
2. Many Years Later…Mutual
Exclusion and Related Problems
• Background:
– We met again at a 1977 Theory conference, decided the world
needed a theory for distributed computing.
– Started working on one…with students: Jim Burns, Paul Jackson
(Georgia Tech), Gary Peterson (U. Wash.)
– Lots of visits, Georgia Tech and U. Wash.
– Mike’s sabbatical at Georgia Tech, 1980
– Read lots of papers:
• Dijkstra---Mutual exclusion,…
• Lamport---Time clocks,…
• Johnson and Thomas---Concurrency control
• Cremers and Hibbard---Lower bound on size of shared memory to
solve 2-process mutual exclusion
– Finally, we wrote one:
• Data Requirements for Implementation of N-Process Mutual
Exclusion Using a Single Shared Variable
9
[Burns, Fischer, Jackson, Lynch, Peterson 78, 82]
Mutual Exclusion
• Data Requirements for Implementation of N-Process
Mutual Exclusion Using a Single Shared Variable
[Burns, F, Jackson, L, Peterson ICPP 78, JACM 82]
• What is this paper about?
– N processes accessing read-modify-write shared memory,
solving N-process lockout-free mutual exclusion.
– Lower bound of (N+1)/2 memory states.
• Constructs bad executions that “look like” other executions,
as far as processes can tell.
– For bounded waiting, lower bound of N+1 states.
– Nearly-matching algorithms
• Based on distributed simulation of a centralized scheduler
process.
10
Mutual Exclusion
• Theorem: Lower bound of N states, bounded waiting:
– Let processes 1,…N enter the trying region, one by one.
– If the memory has < N states, two processes, say i and j, must
leave the memory in the same state.
– Then processes i+1,…j are hidden and can be bypassed
arbitrarily many times.
• Theorem: Lower bound of N+1 states, bounded waiting:
– Uses a slightly more complicated strategy of piecing together
execution fragments.
• Theorem: Lower bound of N/2 states, for lockoutfreedom.
– Still more complicated strategy.
• Two algorithms…
– Based on simulating a centralized scheduler
– Bounded waiting algorithm with only ~N states.
– Surprising lockout-free algorithm with only ~N/2 states!
11
Mutual Exclusion
• Why is this interesting?
– Cute algorithms and lower bounds.
– Some of the first lower bound results in DCT.
– “Looks like” argument for lower bounds, typical of many later
impossibility arguments.
– Virtual scheduler process provides some conceptual modularity,
for the algorithms.
– We worked out a lot of formal definitions:
• State-machine model for asynchronous shared memory
systems.
• With liveness assumptions, input/output distinctions.
• Exclusion problems, safety and liveness requirements.
12
Decomposing Algorithms Using a
Virtual Scheduler
• Background:
– Continuing on the same theme…
– Mutual exclusion algorithms (ours and others’) were
complicated, hard to understand and prove correct.
– We realized we needed ways of decomposing them.
– Our algorithms used a virtual scheduler, informally.
– Could we make this rigorous?
– We did:
• A Technique for Decomposing Algorithms that Use
a Single Shared Variable [Lynch, Fischer 80,
JCSS 83]
13
Decomposing Distributed Algorithms
• A Technique for Decomposing Algorithms that Use a
Single Shared Variable [Lynch, Fischer 80, JCSS 83]
• What is this paper about?
– Defines a “supervisor system”, consisting of N contending
processes + one distinguished,
permanent supervisor process.
Sup
– Shows that, under certain conditions, a system of N contenders
with no supervisor can simulate a supervisor system.
Sup
– Generalizes, makes explicit, the technique used in [BFJLP 78, 82].
14
– Applies this method to two mutual exclusion algorithms.
Decomposing Distributed Algorithms
• Why is this interesting?
– Explicit decomposition of a complex distributed
algorithm.
– Prior to this point, distributed algorithms were
generally presented as monolithic entities.
– Foreshadows later extensive uses of decomposition,
transformations.
– Resulting algorithm easier to understand.
– Not much increase in costs.
15
Generalization to K-Exclusion
• Background:
– Still continuing on the same theme…
– Started inviting other visitors to join us (Borodin, Lamport,
Arjomandi,…)
– Our mutual exclusion algorithms didn’t tolerate stopping
failures, either in critical region or in trying/exit regions.
– So we added in some fault-tolerance:
• K-exclusion, to model access to K resources (so progress can
continue if < K processes fail in the critical region).
• f-robustness, to model progress in the presence of at most f
failures overall.
– Still focused on bounds on the size of shared memory.
– Wrote:
• Resource allocation with immunity to limited process
failure [Fischer, Lynch, Burns, Borodin FOCS 79]
16
K-Exclusion
• Resource allocation with immunity to limited process
failure [Fischer, Lynch, Burns, Borodin FOCS 79]
• What is this paper about?
– N processes accessing RMW shared memory,
solving f-robust lockout-free K-exclusion.
– Three robust algorithms:
• Unlimited robustness, FIFO, O(N2) states
– Simulates queue of waiting processes in shared memory.
– Distributed implementation of queue.
• f-robust, bounded waiting, O(N) states
– Simulates centralized scheduler.
– Fault-tolerant, distributed implementation of scheduler, using
2f+1 “helper” processes.
• f-robust, FIFO, O(N (log N)c) states
– Corresponding lower bounds:
• (N2) states for unlimited robustness, FIFO
• (N) states for f-robustness, bounded waiting
17
K-Exclusion
• Why is this interesting?
– New definitions:
• K-exclusion problem, studied later by [Dolev, Shavit], others.
• f-robustness (progress in the presence of at most f failures),
for exclusion problems
– Lots of cute algorithms and lower bounds.
– Among the earliest lower bound results in DCT; more “looks like”
arguments.
– Algorithmic ideas:
• Virtual scheduler process, fault-tolerant implementation.
• Distributed implementation of shared queue.
• Passive communication with possibly-failed processes.
– Proof ideas:
• Refinement mapping used to show that the real algorithm
implements the algorithm with a virtual scheduler.
• First example I know of a refinement mapping used to verify a
distributed algorithm.
18
K-Exclusion
• Background:
– Lots of results in the FOCS 79 paper.
– Only one seems to have made it to journal publication:
the Colored Ticket Algorithm.
19
K-Exclusion
• The Colored Ticket Algorithm [F, L, Burns, Borodin 83]
• Distributed FIFO resource allocation using small shared
space [F, L, Burns, Borodin 85, TOPLAS 89]
• What are these papers about?
– FIFO K-exclusion algorithm, unlimited robustness, O(N2) states
– Simulates queue of waiting processes in shared memory; first K
processes may enter critical region.
– First distributed implementation: Use tickets with unbounded
numbers. Keep track of last ticket issued and last ticket validated
for entry to critical region.
– Second version: Use size N batches of
K + 1 different colors. Reuse a color if
no ticket of the same color is currently
issued or validated.
– Corresponding (N2) lower bound.
20
K-Exclusion
• Why is this interesting?
– Algorithm modularity: Bounded-memory algorithm
simulating unbounded-memory version.
– Similar strategy to other bounded-memory
algorithms, like [Dolev, Shavit] bounded concurrent
timestamp algorithm
21
3. Semantics of Distributed Systems
• Background:
– When we begin working in DCT, there were no usable
mathematical models for:
• Describing distributed algorithms.
• Proving correctness, performance properties.
• Stating and proving lower bounds.
– So we had to invent them.
– At first, ad hoc, for each paper.
– But soon, we were led to define something more general:
• On describing the behavior and implementation of
distributed systems [Lynch, Fischer 79, 80, 81]
– We got to present this in Evian-les-Bains, France
22
Semantics of Distributed Systems
• On describing the behavior and implementation of
distributed systems [Lynch, Fischer 79, 80, TCS 81]
• What is this paper about?
– It defined a mathematical modeling framework
for asynchronous interacting processes.
– Based on infinite-state state machines,
communicating using shared variables.
– Input/output distinction, fairness.
– Input-enabled, with respect to
environment’s changes to shared variables.
– External behavior notion:
Set of “traces” of accesses to shared variables that arise
during fair executions.
– Implementation notion: Subset for sets of traces.
– Composition definition, compositionality results.
– Time measure. Time bound proofs using recurrences.
23
Semantics of Distributed Systems
• Why is this interesting?
– Very early modeling framework for asynchronous distributed
algorithms.
– Includes features that have become standard in later models,
esp., implementation = subset for trace sets
– Predecessor of I/O automata (but uses shared variables rather
than shared actions).
– Had notions of composition, hiding.
– Had a formal notion of implementation, though no notion of
simulation relations.
– Differs from prior models [Hoare] [Milne, Milner]:
• Based on math concepts (sets, sequences, etc.) instead of
notations (process expressions) and proof rules.
• Simpler notions of external behavior and implementation.
24
4. Sorting
• A time-space tradeoff for sorting on non-oblivious
machines [Borodin, F, Kirkpatrick, L, Tompa JCSS 81]
• What is this paper about?
– Defined DAG model for sorting programs.
– Defined measures:
• Time T = length of longest path
• Space S = log of number of vertices
– Proved lower bound on product S T = (N2), for sorting N elements.
– Based on counting arguments for permutations.
• Why is this interesting?
–
–
–
–
I don’t know.
A neat complexity theory result.
Tight bound [Munro, Paterson]
It has nothing to do with distributed computing theory.
25
5. Resource Allocation in Networks
• Background:
– K-exclusion = allocation of K resources.
– Instead of shared memory, now consider allocating K resources
in a network.
– Questions:
• Where to place the resources?
• How to locate them efficiently?
– Experiments (simulations), analysis.
– Led to papers:
• Optimal placement of resources in networks
[Fischer, Griffeth, Guibas, Lynch 80, ICDCS 81, Inf &
Comp 92]
• Probabilistic analysis of network resource allocation
algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86]
26
Resource Allocation in Networks
• Optimal placement of resources in networks
[Fischer, Griffeth, Guibas, Lynch 80, ICDCS 81,
Inf & Comp 92]
• What is this paper about?
– How to place K resources in a tree to minimize the total expected
cost (path length) of servicing (matching) exactly K requests
arriving randomly at nodes.
– One-shot resource allocation problem.
– Characterizations, efficient divide-and-conquer algorithms for
determining optimal placements.
– Theorem: Fair placements
(where each subtree has
approx. expected number
of needed resources) are
approximately optimal.
– Cost O(K)
27
Resource Allocation in Networks
• Why is this interesting?
– Optimal placements not completely obvious, e.g., can’t
minimize flow on all edges simultaneously.
– Analysis uses interesting properties of convex
functions.
– Results suggested by Nancy Griffeth’s experiments.
– Uses interesting math observation: mean vs. median of
binomial distributions are always within 1, that is,
median(n,p)  {np,np}.
• Unfortunately, already discovered (but not that
long ago!) [Jogdeo, Samuels 68], [Uhlmann 63, 66]
28
Resource Allocation in Networks
• Probabilistic analysis of network resource allocation
algorithm [Lynch, Griffeth, Fischer, Guibas 85, 86]
• What is this paper about?
– An algorithm for matching resources to requests in trees.
• The search for a resource to satisfy each request proceeds
sequentially, in larger and larger subtrees.
• Search in a subtree reverses direction when it discovers that
the subtree is empty.
• Cost is independent of size of network.
– Analysis:
• For simplified case:
– Requests and responses all at same depth in the tree
– Choose subtree to search probabilistically
– Constant message delay
• Proved that worst case = noninterfering requests.
• Expected time for noninterfering requests is constant,
independent of size of network or number of requests.
29
Resource Allocation in Networks
• Why is this interesting?
– Cute algorithm.
– Performance independent of size of network---an
interesting “scalability” criterion.
– Analysis decomposed in an interesting way:
• Analyze sequential executions (using traditional
methods).
• Bound performance of concurrent executions in
terms of performance of sequential executions.
30
6. Consistent Global Snapshots
• Background:
– We studied database concurrency control algorithms
[Bernstein, Goodman]
– Led us to consider implementing transaction
semantics in a distributed setting.
– Canonical examples:
• Bank transfer and audit transactions.
• Consistent global snapshot (distributed
checkpoint) transactions.
– Led to:
• Global states of a distributed system
[Fischer, Griffeth, Lynch 81, TOSE 82]
31
Consistent Global Snapshots
• Global states of a distributed system
[Fischer, Griffeth, Lynch 81, TOSE 82]
• What is this paper about?
– Models distributed computations using multi-site transactions.
– Defines correctness conditions for checkpoint (consistent global
snapshot):
• Returns transaction-consistent state that includes all
transactions completed before the checkpoint starts, and
possibly some of those that overlap.
– Provides a general, nonintrusive algorithm for computing a
checkpoint:
• Mark transactions as “pre-checkpoint” or “post-checkpoint”.
• Run pre-checkpoint transactions to completion, in a parallel
execution of the system.
– Applications to bank audit, detecting inconsistent data, system
recovery.
32
Global snapshots
• Why is this interesting?
– The earliest distributed snapshot algorithm I know of.
– Some similar ideas to [Chandy, Lamport 85] “marker”
algorithm, but:
• Presented in terms of transactions.
• Used formal automaton model [Lynch, Fischer 79],
which makes it a bit obscure.
33
7. Synchronous vs. Asynchronous
Distributed Systems
• Synchronous vs. asynchronous distributed systems
[Arjomandi, Fischer, Lynch 81, JACM 83]
• Background:
– Eshrat Arjomandi visited us at Georgia Tech, 1980.
– Working on PRAM algorithms for matrix computations
– We considered extensions to asynchronous, distributed setting.
• What is this paper about?
– Defines a synchronization problem, the s-session problem, in
which the processes perform at least s “sessions”, then halt.
– In a session, each process performs at least one output.
– Motivated by “sufficient interleaving” of matrix operations.
– Shows that this problem can be solved in time O(s) in a
synchronous system (obvious).
– But it takes time (s diam) in an asynchronous system (not
obvious).
34
Synchronous vs. Asynchronous
Distributed Systems
• Why is this interesting?
– Cute impossibility proof.
– First proof I know that synchronous systems are
inherently faster than asynchronous systems, even in a
non-fault-prone setting.
– Interesting contrast with synchronizer results of
[Awerbuch], which seem to say that asynchronous
systems are just as fast as synchronous systems.
– The difference is that the session problem requires
preserving global order of events at different nodes.
35
8. Distributed Consensus
• Background:
– Leslie Lamport visited us at Georgia Tech, 1980.
– We discussed his new Albanian agreement problem and
algorithms.
– The algorithms took a lot of time (f+1 rounds) and a lot
of processes (3f + 1).
– We wondered why.
– Led us to write:
• A lower bound for the time to assure interactive
consistency [Fischer, Lynch 81, IPL 82]
36
Lower Bound on Rounds,
for Byzantine Agreement
• A lower bound for the time to assure interactive
consistency [Fischer, Lynch 81, IPL 82]
• What is this paper about?
– You probably already know.
– Shows that f+1 rounds are needed to solve consensus, with up
to f Byzantine failures.
• Uses a chain argument, constructing a chain spanning from
all-0, failure-free execution to all-1, failure-free execution.
0
0
0
0
1
1
1
1
37
Lower Bound on Rounds,
for Byzantine Agreement
• Why is this interesting?
– A fundamental fact about fault-tolerant distributed
computing.
– First chain argument I know of.
– Proved for Byzantine failures, but similar ideas used
later to prove the same bound for less drastic failure
models:
• Byzantine failures with authentication
[Dolev, Strong], [De Millo, Lynch, Merritt]
• Stopping failures [Merritt].
– Leaves open the questions of minimizing communication
and storage
38
BA with Small
Storage/Communication
• Background:
– 1981, by now we’ve moved to MIT, Yale
– Next we considered the communication/storage
complexity.
– Most previous algorithms used exponential
communication.
– [Dolev 82] used 4f + 4 rounds, O(n4 log n) bits of
communication.
– We wrote:
• A Simple and Efficient Byzantine Generals Algorithm
[Lynch, Fischer, Fowler 82]
• Efficient Byzantine agreement algorithms
[Dolev, Fischer, Fowler, Lynch, Strong IC 82]
39
BA with Small
Storage/Communication
• A Simple and Efficient Byzantine Generals Algorithm
[Lynch, Fischer, Fowler 82]
• Efficient Byzantine agreement algorithms
[Dolev, Fischer, Fowler, Lynch, Strong IC 82]
• What is in this paper?
– A new algorithm, using 2f+3 rounds and O(nf + f3 log f) bits.
– Asymmetric: Processes try actively to decide on 1.
– Processes “initiate” a proposal to decide 1, relay each other’s
proposals.
– Initiate at later stages based on number of known proposals.
– Threshold for initiation increases steadily at later rounds.
– Decide after 2f + 3 rounds based on sufficient known proposals.
40
BA with Small
Storage/Communication
• Why is this interesting?
– Efficient algorithm, at the time.
– Presentation was unstructured; later work of
[Srikanth, Toueg 87] added more structure to such
algorithms, in the form of a “consistent broadcast”
primitive.
– Later algorithms [Berman, Garay 93], [Garay, Moses
93], [Moses, Waarts 94], used poly communication and
only f+1 rounds (the minimum).
41
Impossibility of Consensus
• Background:
– For other consensus problems in fault-prone settings, like
approximate agreement [Dolev, Lynch, Pinter, Stark, Weihl],
algorithms for synchronous models always seemed to extend to
asynchronous models.
– So we tried to design a fault-tolerant asynchronous consensus
algorithm.
– We failed.
– Tried to prove an impossibility result, failed.
– Tried to find an algorithm,…
– Finally found the impossibility result:
• Impossibility of distributed consensus with one faulty
process [F, L, Paterson 82, PODS 83, JACM 85]
42
Impossibility of Consensus
• What is in this paper?
– You know. A proof that you can’t solve consensus in
asynchronous systems, with even one stopping failure.
– Uses a “bivalence” argument, which focuses on the
form that a decision point would have to take:
– And does a case analysis to prove
that this configuration can’t occur.
0-valent
1-valent
43
Impossibility of Consensus
• Why is this interesting?
– A fundamental fact about fault-tolerant distributed
computing.
– Addresses issues of interest to system builders, not
just theoreticians.
– Makes it clear it’s necessary to strengthen the model
or weaken the problem.
– Result is not obvious.
– “Proof just hard enough” (Mike Fischer).
– Proof methods: Bivalence, chain arguments
• Leslie may say more about this…
44
The Beginnings of PODC
• Around this time (1981), we noticed that:
– There was a lot of distributed computing theory,
– But no conferences devoted to it.
• E.g., FLP appeared in a database conference
(PODS)
• 1982: Mike and I (and Robert Probert) started
PODC
45
Lower Bounds on Number of
Processes
• Background:
– [Pease, Shostak, Lamport 80] showed 3f+1 lower bound on number
of processes for Byzantine agreement.
– [Dolev 82] showed 2f+1 connectivity bound for BA.
– [Lamport 83] showed 3f+1 lower bound on number of processes
for weak BA.
– [Coan, Dolev, Dwork, Stockmeyer 85] showed 3f+1 lower bound for
Byzantine firing squad problem.
– [Dolev, Lynch, Pinter, Stark, Weihl 83] claimed 3f+1 bound for
approximate BA.
– [Dolev, Halpern, Strong 84] showed 3f+1 lower bound for
Byzantine clock synchronization.
– Surely there is some common reason for all these results.
– We unified them, in:
• Easy impossibility proofs for distributed consensus problems
[Fischer, Lynch, Merritt PODC 85, DC 86]
46
Lower Bounds on Number of
Processes
• What is this paper about?
– Gives a uniform set of proofs for Byzantine
agreement, weak BA, approximate BA, Byzantine firing
squad, and Byzantine clock synchronization.
– Proves 3f+1 lower bounds on number of processes, and
2f+1 lower bounds on connectivity.
– Uses “hexagon vs. triangle” arguments:
B
C
A
A
A
C
B
C
B
47
Lower Bounds on Number of
Processes
• Why is this interesting?
– Basic results about fault-tolerant computation.
– Unifies a lot of results/proofs.
– Cute proofs.
48
9. Epilogue: Reliable Communication
• Reliable communication over unreliable channels
[Afek, Attiya, Fekete, Fischer, Lynch, Mansour,
Wang, Zuck 92, JACM 94]
• Background:
– Much later, we again became interested in a common problem.
– Two reliable processes, communicating over two unreliable
channels: reorder, lose, duplicate.
– Bounded size messages
(no sequence numbers).
– Can we implement a reliable FIFO channel?
– [Wang, Zuck 89] showed impossible for reo + dup.
– Any one kind of fault is easy to tolerate.
– Alternating Bit Protocol tolerates loss + dup
– Remaining question: reordering + loss, bounded size messages
49
Reliable Communication
• What is this paper about?
– Resolves the question of implementability of reliable FIFO
channels over unreliable channels exhibiting reordering + loss,
using bounded size messages.
– An algorithm, presented in two layers:
• Layer 1: Uses the given channels to implement channels that
do not reorder (but may lose and duplicate messages).
– Receiver-driven (probes to request messages).
– Uses more and more messages as time goes on, to “swamp out”
copies of old messages.
– Lots of messages.
– Really lots of messages.
• Layer 2: Uses ABP over layer 1 to get reliable FIFO channel.
– A lower bound, saying that any such algorithm must use lots of
messages.
50
Reliable Communication
• Why is this interesting?
– Fundamental facts about communication in fault-prone
settings.
– Related impossibility results proved by
[Mansour, Schieber 92], [Wang, Zuck 89],
[Tempero, Ladner 90, 95].
51
Conclusions
•
•
•
•
•
A lot of projects, a lot of papers.
Especially during 1979-82.
Especially during Mike’s sabbatical, 1980.
They seem pretty interesting, even now!
Some are well known, some not…
• Anyway, we had a lot of fun working on them.
• Thanks to Mike for a great collaboration!
52
Download