Failure detector talk

advertisement
On the Weakest Failure
Detector Ever
Petr Kouznetsov (Max Planck Institute for SWS)
Joint work with:
Rachid Guerraoui (EPFL)
Nancy Lynch (MIT)
© 2007 P. Kouznetsov
Maurice Herlihy (Brown)
Calvin Newport (MIT)
Big picture
Choosing a model:


Optimistic model: the system is very efficient
but likely to fail
Conservative model: the system is very robust
but inefficient (or impossible to implement)
What is the right model?
© 2007 P. Kouznetsov
2
Synchrony assumptions



Asynchronous read-write shared memory
model: no bounds on relative processing
speed
Very appealing in practice!
Too conservative: most problems are not
solvable [FLP85, LA87; HS,SZ,BG93];
(solvable in synchronous systems though)
© 2007 P. Kouznetsov
3
So what do we need exactly?
What is the minimal amount of synchrony
that circumvents some asynchronous
impossibility?

“minimal amount of synchrony”?
- The weakest failure detector
© 2007 P. Kouznetsov
4
Model
Asynchronous read-write shared-memory
system with failure detectors
FD
p
q
FD
© 2007 P. Kouznetsov
r
FD
5
Comparing failure detectors
Failure detector D is weaker than failure
detector D’ if there exists an algorithm that
emulates D using D’
D’
p
D
D’
D’
q
D
© 2007 P. Kouznetsov
r
D
6
The weakest non-trivial failure detector
A failure detectors X that is:
non-trivial: circumvents some asynchronous
impossibility
weaker than any non-trivial failure detector
The “easiest” non-trivial problem?
© 2007 P. Kouznetsov
7
A Very Weak Failure Detector
Y outputs a non-empty set of process ids
Eventually, the same set U is output at every
correct process:
U is not the current set of correct processes
Example:
 Π={p,q,r}, C={p,q}
 Y outputs {p},{q},{p,r},{q,r},{p,q,r}
© 2007 P. Kouznetsov
8
Y is non-trivial
Theorem 1 Y solves (N-1)-set agreement
Every process in P1,…,PN proposes a value and
must decide on some proposed value so that:
 At most N-1 distinct values are decided
(!) not solvable in asynchronous systems
[HS93,BG93,SZ93]
© 2007 P. Kouznetsov
9
Set agreement is almost solvable

If N-1 or less distinct values are proposed,
e.g., if N-1 or less processes participate
k-convergence [YNG98]

Y should handle the case when N values are
around
© 2007 P. Kouznetsov
10
Citizens and gladiators


Split the system into Gladiators (the stable output of
Y) and Citizens (all the rest)
Gladiators eliminate at least one value using
(G-1)-convergence or adopt a value from Citizens
Y
© 2007 P. Kouznetsov
Π-Y
11
Correctness
Eventually, Gladiators are not the set of correct
processes
⇨ At least one gladiator is faulty, or at least one
Citizen is correct
⇨ Gladiators commit on G-1 values or adopt a
value from a citizen
⇨ At least one process gives up its value ⇨ at
most N-1 values survive!
© 2007 P. Kouznetsov
12
Y is minimal
Theorem 2 Y is weaker than any stable nontrivial failure detector D
D is stable if, eventually, the same value is
permanently output at every correct process
(e.g., P, ⃟P, Ω, Ωk)
© 2007 P. Kouznetsov
13
Minimality proof: toy example
Consider a “faithful” failure detector D that solves a
wait-free impossible problem P:
in every execution E, D outputs the same value v that
depends only on correct(E)
Claim 1 For all v, there is a non-empty set of
processes C such that v cannot be output by D when
C is the set of correct processes
Suppose not: v is valid for any C
=> D can be replaced with a “dummy” that always
outputs v --- a contradiction!
© 2007 P. Kouznetsov
14
Minimality proof: general case
Consider any non-trivial stable D
Claim 2 For all v, there exists an infinite
execution E in which v cannot be the only
value output by D
Reduction:
 As long as D is stable on v: use E(v) to extract
Y
© 2007 P. Kouznetsov
15
Conclusions


Y is the weakest non-trivial stable failure
detector (can be generalized to the f-resilient
case – Yf)
(N-1)-set agreement is the easiest non-trivial
problem?
© 2007 P. Kouznetsov
16
Future

Establishing the “weakest ever” result in the
most general class of failure detectors (not Y!)
Y is not the weakest: an unstable “composition” of
Ωn and Y is even weaker!
[Chen et al., Zielinski, …]
© 2007 P. Kouznetsov
17
Thank you!
© 2007 P. Kouznetsov
18
k-convergence [YNG98]
Processes propose values and commit on or adopt
one of the proposed values:
 If a process commits, then at most k values are
committed or adopted
 If k or less values proposed, every process commits
(!) wait-free solvable for any k
(!!) (N-1)-convergence almost solves (N-1)-agreement!
But termination is an issue in case all N values are
around – that’s where Y is of use!
© 2007 P. Kouznetsov
19
Minimality proof: general case
Consider any non-trivial stable D
Claim 2 For all v, there exists an infinite execution E
in which v cannot be the only value output by D
Reduction:
 As long as D is stable:
Locate a faulty process in a finite prefix of E (including all
steps of faulty(E) )
Or, output correct(E)

Y is extracted!
© 2007 P. Kouznetsov
20
Generalization to f-resilience
f-resilient impossible problems: can be solved
when less or f fail but cannot when f fail


Yf output a set of size ≥N-f
Eventually, the same set U is permanently
output at every correct process
Yf is the weakest stable failure detector to
circumvent an f-resilient impossibility
© 2007 P. Kouznetsov
21
Big picture
Addressing the WFD question contributes to:


Understanding complexity and computability
bounds of distributed abstractions
Establishing a clean classification of
problems in distributed computing
“WFD ever” corresponds to the easiest non-trivial
problem in distributed computing
© 2007 P. Kouznetsov
22
Download