On the Weakest Failure Detector Ever Petr Kouznetsov (Max Planck Institute for SWS) Joint work with: Rachid Guerraoui (EPFL) Nancy Lynch (MIT) © 2007 P. Kouznetsov Maurice Herlihy (Brown) Calvin Newport (MIT) Big picture Choosing a model: Optimistic model: the system is very efficient but likely to fail Conservative model: the system is very robust but inefficient (or impossible to implement) What is the right model? © 2007 P. Kouznetsov 2 Synchrony assumptions Asynchronous read-write shared memory model: no bounds on relative processing speed Very appealing in practice! Too conservative: most problems are not solvable [FLP85, LA87; HS,SZ,BG93]; (solvable in synchronous systems though) © 2007 P. Kouznetsov 3 So what do we need exactly? What is the minimal amount of synchrony that circumvents some asynchronous impossibility? “minimal amount of synchrony”? - The weakest failure detector © 2007 P. Kouznetsov 4 Model Asynchronous read-write shared-memory system with failure detectors FD p q FD © 2007 P. Kouznetsov r FD 5 Comparing failure detectors Failure detector D is weaker than failure detector D’ if there exists an algorithm that emulates D using D’ D’ p D D’ D’ q D © 2007 P. Kouznetsov r D 6 The weakest non-trivial failure detector A failure detectors X that is: non-trivial: circumvents some asynchronous impossibility weaker than any non-trivial failure detector The “easiest” non-trivial problem? © 2007 P. Kouznetsov 7 A Very Weak Failure Detector Y outputs a non-empty set of process ids Eventually, the same set U is output at every correct process: U is not the current set of correct processes Example: Π={p,q,r}, C={p,q} Y outputs {p},{q},{p,r},{q,r},{p,q,r} © 2007 P. Kouznetsov 8 Y is non-trivial Theorem 1 Y solves (N-1)-set agreement Every process in P1,…,PN proposes a value and must decide on some proposed value so that: At most N-1 distinct values are decided (!) not solvable in asynchronous systems [HS93,BG93,SZ93] © 2007 P. Kouznetsov 9 Set agreement is almost solvable If N-1 or less distinct values are proposed, e.g., if N-1 or less processes participate k-convergence [YNG98] Y should handle the case when N values are around © 2007 P. Kouznetsov 10 Citizens and gladiators Split the system into Gladiators (the stable output of Y) and Citizens (all the rest) Gladiators eliminate at least one value using (G-1)-convergence or adopt a value from Citizens Y © 2007 P. Kouznetsov Π-Y 11 Correctness Eventually, Gladiators are not the set of correct processes ⇨ At least one gladiator is faulty, or at least one Citizen is correct ⇨ Gladiators commit on G-1 values or adopt a value from a citizen ⇨ At least one process gives up its value ⇨ at most N-1 values survive! © 2007 P. Kouznetsov 12 Y is minimal Theorem 2 Y is weaker than any stable nontrivial failure detector D D is stable if, eventually, the same value is permanently output at every correct process (e.g., P, ⃟P, Ω, Ωk) © 2007 P. Kouznetsov 13 Minimality proof: toy example Consider a “faithful” failure detector D that solves a wait-free impossible problem P: in every execution E, D outputs the same value v that depends only on correct(E) Claim 1 For all v, there is a non-empty set of processes C such that v cannot be output by D when C is the set of correct processes Suppose not: v is valid for any C => D can be replaced with a “dummy” that always outputs v --- a contradiction! © 2007 P. Kouznetsov 14 Minimality proof: general case Consider any non-trivial stable D Claim 2 For all v, there exists an infinite execution E in which v cannot be the only value output by D Reduction: As long as D is stable on v: use E(v) to extract Y © 2007 P. Kouznetsov 15 Conclusions Y is the weakest non-trivial stable failure detector (can be generalized to the f-resilient case – Yf) (N-1)-set agreement is the easiest non-trivial problem? © 2007 P. Kouznetsov 16 Future Establishing the “weakest ever” result in the most general class of failure detectors (not Y!) Y is not the weakest: an unstable “composition” of Ωn and Y is even weaker! [Chen et al., Zielinski, …] © 2007 P. Kouznetsov 17 Thank you! © 2007 P. Kouznetsov 18 k-convergence [YNG98] Processes propose values and commit on or adopt one of the proposed values: If a process commits, then at most k values are committed or adopted If k or less values proposed, every process commits (!) wait-free solvable for any k (!!) (N-1)-convergence almost solves (N-1)-agreement! But termination is an issue in case all N values are around – that’s where Y is of use! © 2007 P. Kouznetsov 19 Minimality proof: general case Consider any non-trivial stable D Claim 2 For all v, there exists an infinite execution E in which v cannot be the only value output by D Reduction: As long as D is stable: Locate a faulty process in a finite prefix of E (including all steps of faulty(E) ) Or, output correct(E) Y is extracted! © 2007 P. Kouznetsov 20 Generalization to f-resilience f-resilient impossible problems: can be solved when less or f fail but cannot when f fail Yf output a set of size ≥N-f Eventually, the same set U is permanently output at every correct process Yf is the weakest stable failure detector to circumvent an f-resilient impossibility © 2007 P. Kouznetsov 21 Big picture Addressing the WFD question contributes to: Understanding complexity and computability bounds of distributed abstractions Establishing a clean classification of problems in distributed computing “WFD ever” corresponds to the easiest non-trivial problem in distributed computing © 2007 P. Kouznetsov 22