V p

advertisement
Distributed Consensus (continued)
Byzantine Generals Problem
Solution with signed message
A signed message satisfies all the conditions of oral message,
plus two extra conditions
• Signature cannot be forged. Forged message are detected and
discarded by loyal generals.
• Anyone can verify its authenticity of a signature.
Signed messages improve resilience.
Example
commander 0
1{0}
1{0}
commander 0
0{0}
1{0}
0{0,2}
0{0,2}
lieutenent 1
lieutenant 2 lieutenent 1
(a)
1{0,1}
lieutenant 2
(b)
discard
Using signed messages, byzantine consensus is feasible with
3 generals and 1 traitor. In (b) the the loyal lieutenants compute the
consensus value by applying some choice function on the set of values
Signature list
v{0}
1
v{0,1}
0
2
7
v{0,1,7}
v{0,1,7,4}
4
Byzantine consensus:
The signed message algorithms SM(m)
Commander i sends out a signed message v{i} to each lieutenant j ≠ i
Lieutenant j, after receiving a message v{S}, appends it to a set V.j, only if
(i) it is not forged, and (ii) it has not been received before.
If the length of S is less than m+1, then lieutenant j
(i) appends his own signature to S, and
(ii) sends out the signed message to every other lieutenant
whose signature does not appear in S.
Lieutenant j applies a choice function on V.j to make the final decision.
Theorem of signed messages
If n ≥ m + 2, where m is the maximum number of traitors,
then SM(m) satisfies both IC1 and IC2.
Proof.
Case 1. Commander is loyal. The bag of each process will
contain exactly one message, that was sent by the commander.
(Try to visualize this)
Proof of signed message theorem
Case 2. Commander is traitor.
• The signature list has a size (m+1), and there are m traitors, so
at least one lieutenant signing the message must be loyal.
• Every loyal lieutenant i will receive every other loyal
lieutenant’s message. So, every message accepted by j is also
accepted by i and vice versa. So V.i = V.j.
Example
0
a
{a, b, c}
3
1
c
a
b
c
b
f
2
{a, b,-}
3 accepts c,
but 2 rejects f
With m=2 and a signature list of length 2, the loyal generals may
not receive the same order from the commander who is a traitor.
When the length of the signature list grows to 3, the problem is resolved
Concluding remarks
• The signed message version tolerates a larger
number (n-2) of faults.
• Message complexity however is the same in both
cases.
Message complexity = (n-1)(n-2) … (n-m+1)
Failure detectors
Failure detector for crash failures
• The design of fault-tolerant algorithms will be simple if
processes can detect (crash) failures.
• In synchronous systems with bounded delay channels,
crash failures can definitely be detected using timeouts.
Failure detectors for asynchronous
systems
In asynchronous distributed systems, the detection of
crash failures is imperfect. There will be false positives
and false negatives. Two properties are relevant:
Completeness. Every crashed process is eventually suspected.
Accuracy. No correct process is ever suspected.
Failure Detectors
An FD is a distributed oracle that provides hints about the
operational status of processes.
However:
• Hints may be incorrect
• FD may give different hints to different processes
• FD may change its mind (over & over) about the
operational status of a process
13
Typical FD Behavior
FD at q
trust
trust
suspect
Process p
trust
suspect
suspect
(permanently)
up
down
14
Revisit the Consensus problem
input
1
2
3
4
output
Agreed value
Example
1
0
3
6
5
7
4
2
0 suspects {1,2,3,7} to have failed. Does this satisfy completeness?
Does this satisfy accuracy?
Classification of completeness
• Strong completeness. Every crashed process is
eventually suspected by every correct process, and
remains a suspect thereafter.
• Weak completeness. Every crashed process is
eventually suspected by at least one correct process,
and remains a suspect thereafter.
Note that we don’t care what mechanism is used for suspecting a process.
Classification of accuracy
• Strong accuracy. No correct process is ever
suspected.
• Weak accuracy. There is at least one correct
process that is never suspected.
Transforming completeness
Weak completeness can be transformed into strong completeness
Program strong completeness (program for process i};
define D: set of process ids (representing the suspects);
initially D is generated by the weakly complete failure detector of i;
{program for process i}
do true 
send D(i) to every process j ≠ i;
receive D(j) from every process j ≠ i;
D(i) := D(i) ∪ D(j);
if j ∈ D(i)  D(i) := D(i) \ j fi
od
Eventual accuracy
A failure detector is eventually strongly accurate, if there exists a time
T after which no correct process is suspected.
(Before that time, a correct process be added to and removed from
the list of suspects any number of times)
A failure detector is eventually weakly accurate, if there exists a time
T after which at least one process is no more suspected.
Classifying failure detectors
Perfect P. (Strongly) Complete and strongly accurate
Strong S. (Strongly) Complete and weakly accurate
Eventually perfect ◊P.
(Strongly) Complete and eventually strongly accurate
Eventually strong ◊S
(Strongly) Complete and eventually weakly accurate
Other classes are feasible: W (weak completeness) and
weak accuracy) and ◊W
Motivation
Question 1. Given a failure detector of a certain type,
how can we solve the consensus problem?
Question 2. How can we implement these classes of
failure detectors in asynchronous distributed
systems?
Question 3. What is the weakest class of failure
detectors that can solve the consensus problem?
(Weakest class of failure detectors is
closest to reality)
Application of Failure Detectors
Applications often need to determine which processes are up (operational)
and which are down (crashed). This service is provided by Failure Detector.
FDs are at the core of many fault-tolerant algorithms and applications, like
•
•
•
•
Group Membership
Group Communication
Atomic Broadcast
Primary/Backup systems
•
•
•
•
Atomic Commitment
Consensus
Leader Election
…..
23
q s
p
q s
q
t
q
q
s
SLOW
r
24
5
5
p
Consensus
7
8
q
t
Crash!
5
8
2
s
5
r
5
25
Solving Consensus
• In synchronous systems: Possible
• In asynchronous systems: Impossible [FLP83]
even if:
• at most one process may crash, and
• all links are reliable
26
A more complete classification
of failure detectors
strong accuracy
strong completeness
weak completeness
Perfect P
weak accuracy
Strong S
Weak W
◊ strong accuracy ◊ weak accuracy
◊P
◊S
◊
W
Consensus using P
{program for process p, t = max number of faulty processes}
initially Vp := (⊥, ⊥, ⊥, …, ⊥); {array of size n}
Vp[p] = input of p; Dp := Vp; rp :=1
{Vp[q] = ⊥ means, process p thinks q is a suspect. Initially everyone is a suspect}
{Phase 1} for round rp= 1 to t +1
send (rp, Dp, p) to all;
wait to receive (rp, Dq, q) from all q, {or else q becomes a suspect};
for k = 1 to n Vp[k] = ⊥ ∧ ∃ (rp, Dq, q): Dq[k] ≠ ⊥  Vp[k] := Dq[k]
end for
end for
{at the end of Phase 1, Vp for each correct process is identical}
{Phase 2} Final decision value is the input from the first element Vp[j]: Vp[j] ≠ ⊥
Understanding consensus using P
Why continue (t+1) rounds?
It is possible that a process p sends out the first message to q
and then crashes. If there are n processes and t of them
crashed, then after at most (t +1) asynchronous rounds, Vp for
each correct process p becomes identical, and contains all
inputs from processes that may have transmitted at least once.
Understanding consensus using P
Sends (1, D1) and
then crashes
1
Sends (2, D2) and
then crashes
Sends (t, Dt) and
then crashes
2
Well, I received D from 1, but
did everyone receive it? To ensure
multiple rounds of broadcasts are
necessary …
Completely connected topology
Well, I received D from 1, but
did everyone receive it? To ensure
multiple rounds of broadcasts are
necessary …
t
Consensus using other type of
failure detectors
Algorithms for reaching consensus with several other forms of
failure detectors exist. In general, the weaker is the failure
detector, the closer it is to reality (a truly asynchronous
system), but the harder is the algorithm for implementing
consensus.
Consensus using S
Vp := (⊥, ⊥, ⊥, …, ⊥); Vp[p] := input of p; Dp := Vp
(Phase 1) Same as phase 1 of consensus with P – it runs for (t+1) asynchronous rounds
(Phase 2) send (Vp, p) to all;
receive (Dq, q) from all q;
for k = 1 to n ∃Vq[k]: Vp[p] ≠ ⊥ ∧ Vq[k] = ⊥  Vp[k] := Dp[k] :=⊥ end for
(Phase 3) Decide on the first element Vp [j]: Vp [j] ≠ ⊥
Consensus using S: example
Assume that there are six processes: 0,1,2,3,4,5.
Of these 4, 5 crashed. And 3 is the process that
will never be suspected. Assuming that k is the
(0, ⊥, 2, 3, ⊥,⊥)
0
(⊥, 1, ⊥, 3, ⊥,⊥)
1
input from process k, at the end of phase 1, the
following is possible:
V0 = (0, ⊥, 2, 3, ⊥,⊥)
V1 = (⊥, 1, ⊥, 3, ⊥,⊥)
V2 = (0, 1, 2, 3, ⊥,⊥)
V3 = (⊥, 1, ⊥, 3, ⊥,⊥)
At the end of phase 3, the processes agree upon
the input from process 3
2
(0, 1, 2, 3, ⊥,⊥)
5
3
(⊥, 1, ⊥, 3, ⊥,⊥)
4
Conclusion
P
S
Consensus
Problem
◊P
◊S
Cannot solve
consensus
W
◊W
Asynchronous system
Cannot
solve
Can solve
consensus
consensus
Paxos
• A solution to the asynchronous consensus problem due to
Lamport.
• Runs on a completely connected network of n processes
• Tolerates up to m failures, where n >2m+1. Processes can
crash and messages may be lost, but Byzantine failures are
ruled out
• Although the requirements for consensus are agreement,
validity, and termination, Paxos primarily guarantees
agreement and validity.
(If it guaranteed all three properties, then that would violate FLP)
Properties
Safety Properties
Validity. Only a proposed value can be chosen as the final
decision.
Agreement. Two different processes cannot make different
decisions.
Liveness Properties
Termination. Some proposed value is eventually chosen.
(Is it really satisfied without some form of randomization?)
Notification. If a value has been chosen, a node can eventually
learn the value.
Three roles of processes
• Each process may play three different roles: proposer,
acceptor and learner
proposer
proposer
acceptor
decision
proposer
Too simplistic. What if the acceptor crashes?
Paxos algorithm
Phase 1 (prepare):
Step 1.1. Each proposer sends a proposal (v, n) to each acceptor
Step 1.2. If n is the largest sequence number of a proposal
received by an acceptor, then it sends an ack (n,-,-) to its
proposer, which is a promise that it will ignore all proposals
numbered lowered than n.
In case an acceptor has already accepted a proposal with a
sequence number n’< n and a proposed value v, it responds
with an ack (n, v, n’). (it implies that the proposer has no
point in trying to push the same value with a larger sequence
no) [It can however send a new request with the value v]
Paxos algorithm
Phase 2 (accept):
Step 2.1. If a proposer receives an ack (n,-,-) from a majority of
acceptors, then it sends accept (n,v) to all acceptors, asking
them to accept this value.
(Note. If however, an acceptor returned an ack (n,v,n’) to the proposer in
phase 1 (which means that it already accepted proposal with value v )
then the proposer must include the value v with the highest sequence
number in its request to the acceptors.)
Step 2.2. An acceptor accepts a proposal (n,v) unless it has
already promised to consider proposals with a sequence
number greater than n.
The final decision
When a majority of the acceptors accepts a proposed value, it
becomes the final decision value.
The acceptors multicast the accepted value to the learners. It
enables them to determine if a proposal has been accepted
by a majority of acceptors.
The learners convey it to the client processes.
Observations
Observation 1. An acceptor accepts a proposal with a sequence
number n if it has not sent a promise to any proposal with a
sequence number n’> n .
Observation 2. If a proposer sends an accept (v,n) message in
phase 2, then either no acceptor in a majority has accepted a
proposal with a sequence number n’< n , or v is the value in
the highest numbered proposal among all accepted proposals
with sequence numbers n’< n accepted by at least once
acceptor in a majority of them.
What about Liveness?
Consider the following scenario:
(Phase 1) Proposer 1 sends out prepare (v, n1);
(Phase 1) Proposer 2 sends out prepare (v,n2), where n2 > n1 ;
(Phase 2) Proposer 1’s accept (n1) is declined, since the acceptor
has already promised to proposer 2 that it will not accept any proposal
numbered lower than n2. So proposer 1 restarts phase 1 with a higher
number n3 > n2;
(Phase 2) Proposer 2’s accept request is now declined on a similar ground;
The race can go on forever! To avoid this, either elect a single
proposer (how?) or use randomization.
Download