Consensus

advertisement
Distributed Algorithms:
Agreement Protocols
Problems of Agreement

A set of processes need to agree on a value
(decision), after one or more processes have
proposed what that value (decision) should be
 Examples:
mutual exclusion, election, transactions

Processes may be correct, crashed, or they may
exhibit arbitrary (Byzantine) failures
 Messages are exchanged on an one-to-one basis,
and they are not signed
Consensus and related problems
 System model
N processes {p1, p2, ..., pN}
Communication is reliable but processes may fail
At most f processes out of N may be faulty.
 Crash failure
 Byzantine failure (arbitrary)
The system is logically fully connected
A receiver process knows the identity of the sender process
Limiting faults solely to the processes simplifies the solution to
the agreement problems
 Recently agreement problems have been studied under the failure of
communication channels only & under the failure of both process &
communication channels
Authenticated & Non-authenticated messages
 To reach an agreement, processes have to exchange their
values and relay the received values to other procs
Authenticated or signed message system – A (faulty) process
cannot forge a message or change the contents of a received
message (before it relays the message to other).
 A process can verify the authenticity of a received message
Non-authenticated or unsigned or oral message – A (faulty)
process can forge a message and claimed to have received it
from another process or change the contents of a received
message before it relays the message to other.
 A process has no way of verifying the authenticity of a received message
Two Agreement Problems

Consensus problem:
N processes agree on a value (e.g. synchronized action –
go / abort)
Consensus may have to be reached in the presence of
failure
 Process failure – crash/fail-stop, arbitrary failure

Communication failure
All process i starts in an “undecided” state
Every process i proposes a value vi , from a set D while in
the undecided state.
Process i exchanges messages until it makes decision di
and moves to decided state.
A consensus is reached if all correct processes agree on the
same value di
Consensus Requirements

Termination: Eventually each correct process sets its
decision value
This may not be possible in the presence of process crashes in
asynchronous system

Agreement: The decision value is same for all correct
processes
Arbitrary (Byzantine) failures may cause inconsistency and prevent
agreement

Integrity: if all correct processes propose the same value,
any correct process decides that value

Consensus may involve a proposal stage and an
agreement stage
Byzantine Generals Problem
 Proposed and solved by Lamport
Consider a battle ground. There are a number of generals
at different positions and want to reach an agreement in
their attack plan, i.e, “attack” or “retreat”.
Generals are separated geographically and communicate
through messengers. Some of the generals are “loyal” and
some are “traitors”.
 Upper bound on number of traitors
Pease et al. showed that it is impossible to reach a
consensus if f exceeds (N-1)/3
Byzantine Generals Problem

“Byzantine generals” problem: a “commander”
process i orders value v.
The “lieutenant” processes must agree on what the
commander ordered.
Processes may be faulty

provide wrong or contradictory messages
Integrity requirement:

A distinguished process decides a value for others to agree upon
Solution only exists if N > 3f, where f : #faulty processes
Differs from consensus in that a distinguished process
supplies a value that the others are to agree upon,
instead of each of them proposing a value
Byzantine Generals Problem
 Requirements
Termination: Eventually each process sets its decision
variable
Agreement: The decision value of all correct processes is
the same
Integrity: If the commander is correct, then all correct
processes agree on the value the commander proposed
Note: integrity implies agreement when the commander
is correct; but the commander need not be correct
IC: A Variant of Consensus
 Interactive Consistency Problem
Every process proposes a single value.
The goal of the algorithm is for the correct processes
to agree on a vector of values, one for each process –
the “decision vector”
Ex – for each of a set of processes to obtain the same
information about their respective states
IC: A Variant of Consensus
 Requirements
Termination: Eventually each process sets its decision
variable
Agreement: The decision vector of all correct processes
is the same
Integrity: If pi is correct, then all correct processes agree
on vi as the ith component of its vector
Relationship between C, BG & IC
 Although it is common to consider the BG problem
with arbitrary process failures, in fact each of the
three problems – C, BG, & IC – is meaningful in the
context of either arbitrary or crash failures
 Each can be framed assuming either a synchronous
or an asynchronous system
 It is sometimes possible to derive a solution to one
problem using a solution to another
Relationship between C, BG & IC
 Suppose that there exist solutions to C, BG & IC
Ci(v1, v2, … vN) returns the decision value of pi in a run of the
solution to the consensus problem where v1, v2, … are the
values that the processes proposed
BGi(j, v) returns the decision value of pi in a run of the
solution to the BG problem, where pj, the commander
proposed the value v
ICi(v1, v2, … vN)[ j ] returns the jth value in the decision vector
of pi in a run of the solution to the IC problem, where v1, v2,
… are the values that the processes proposed
 It is possible to construct solutions out of the solutions
to other problems
Relationship between C, BG & IC
IC can be solved by using BG’s solution by running it N times,
once with each process pi (i = 1, 2, … N) acting as the
commander:
 ICi(v1, v2, … vN)[ j ] = BGi(j, v) (i = 1, 2, … N)
C can be solved by using IC’s solution by running IC to
produce a vector of values at each process, then applying an
appropriate function on the vector’s values to derive a single
value:
 Ci(v1, v2, … vN) = majority(ICi(v1, v2, … vN)[1], … ICi(v1, v2, … vN)[N] )
BG can be solved from C as follows:
 The commander pj sends its proposed value v to itself and each of the
remaining processes
 All processes run C with values v1, v2, … vN that they receive (pj may be faulty)
 They derive BGi(j, v) = Ci(v1, v2, … vN) (i = 1, 2, … N)
Consensus
 Solving consensus is equivalent to solving reliable
and totally ordered multicast
Given a solution to one, we can solve the other
 Implementing consensus with RTO-multicast
Collect all processes into a group g
Each process pi performs RTO-multicast(g, vi)
Each process pi chooses di = mi, where mi is the first value
that pi RTO-delivers
 Termination property follows from the reliability of the multicast
 The agreement and integrity properties follow from the reliability and total
ordering of multicast delivery
Chandra & Toueg [1996] demonstrates how RTO-multicast
can be derived from consensus
Consensus in a synchronous system
 We discuss an algorithm that uses only a basic
multicast protocol to solve consensus in a
synchronous system
 The algorithm assumes that up to f of the N
processes exhibit crash failures
Communication Model
p2
p1
p3
p5
p4
•Complete graph (i.e. logically fully connected)
•Synchronous, network
p2
a
Multicast
a
p1
p3
a
a
p5
p4
Send a message to all processors in one round
p2
a
p1
p3
p5
a
p4
a
a
At the end of round: everybody receives a
Multicast
a
p1
b
p2
b
a
b
a
a
p5
b
p3
p4
Two or more processes can multicast at
the same round
a,b
p2
b
p1
p3
a
p5
p4
a,b
a,b
Crash Failures
Faulty
processor
p2
a
a
p1
p3
a
a
p5
p4
Faulty
processor
p2
a
p1
p3
a
p5
p4
Some of the messages are lost,
they are never received
Faulty
processor
p2
a
p1
p3
p5
p4
a
Consensus
0
Start
1
4
2
3
Everybody has an initial value
3
Finish
3
3
3
3
Everybody must decide the same value
Validity condition:
If everybody starts with the same value
they must decide that value
Start
Finish
1
1
1
1
1
1
1
1
1
1
A simple algorithm using B-multicast
Each processor:
1. B-multicast value to all processors
2. Decide on the minimum
(only one round is needed)
Start
0
1
4
2
3
0,1,2,3,4
0
B-multicast values
0,1,2,3,4
0,1,2,3,4
1
4
2
0,1,2,3,4
3
0,1,2,3,4
Decide on minimum
0,1,2,3,4
0
0,1,2,3,4
0,1,2,3,4
0
0
0
0,1,2,3,4
0
0,1,2,3,4
Finish
0
0
0
0
0
This algorithm satisfies the validity condition
Start
Finish
1
1
1
1
1
1
1
1
1
1
If everybody starts with the same initial value,
everybody decides on that value (minimum)
Consensus with Crash Failures
The simple algorithm doesn’t work
Each processor:
1. B-multicast value to all processors
2. Decide on the minimum
Start
0
1
0
fail
0
2
4
3
The failed processor doesn’t multicast
its value to all processors
0
Multicasted values
0,1,2,3,4
fail
1
1,2,3,4
4
2
3
1,2,3,4
0,1,2,3,4
Decide on minimum
0,1,2,3,4
0
fail
0
1,2,3,4
1
1
0
1,2,3,4
0,1,2,3,4
0
Finish
fail
0
1
1
No Consensus!!!
0
If an algorithm solves consensus for
f failed process we say it is:
an f-resilient consensus algorithm
The input and output of
a 3-resilient consensus algorithm
Example:
Start
Finish
0
1
2
4
3
1
1
New validity condition:
if all non-faulty processes start with
the same value then all non-faulty processes
decide that value
Start
Finish
1
1
1
1
1
1
1
An f-resilient algorithm
Round 1:
B-multicast my value
Round 2 to round f+1:
Multicast any new received values
End of round f+1:
Decide on the minimum value received
Example: f=1 failures, f+1 = 2 rounds needed
0
Start
1
4
2
3
Example: f=1 failures, f+1 = 2 rounds needed
Round 1
0,1,2,3,4
0
1
fail
0
(new values)
1,2,3,4
0
2
4
3
1,2,3,4
0,1,2,3,4
B-multicast all values to everybody
Example: f=1 failures, f+1 = 2 rounds needed
Round 2
0,1,2,3,4
1
0,1,2,3,4
4
2
3
0,1,2,3,4
0,1,2,3,4
B-multicast all new values to everybody
Example: f=1 failures, f+1 = 2 rounds needed
Finish
0,1,2,3,4
0
0,1,2,3,4
0
0
0
0,1,2,3,4
0,1,2,3,4
Decide on minimum value
Example: f=2 failures, f+1 = 3 rounds needed
0
Start
1
4
2
3
Another example execution with 3 failures
Example: f=2 failures, f+1 = 3 rounds needed
0
Round 1
1,2,3,4
1
1,2,3,4
Failure 1
0
2
4
3
1,2,3,4
0,1,2,3,4
Multicast all values to everybody
Example: f=2 failures, f+1 = 3 rounds needed
0
Round 2
0,1,2,3,4
Failure 1
1
1,2,3,4
4
2
3
1,2,3,4
0,1,2,3,4
Failure 2
Multicast new values to everybody
Example: f=2 failures, f+1 = 3 rounds needed
0
Round 3
0,1,2,3,4
Failure 1
1
0,1,2,3,4
4
2
3
O, 1,2,3,4
0,1,2,3,4
Failure 2
Multicast new values to everybody
Example: f=2 failures, f+1 = 3 rounds needed
0
Finish
0,1,2,3,4
Failure 1
0
0,1,2,3,4
0
0
3
O, 1,2,3,4
0,1,2,3,4
Failure 2
Decide on the minimum value
Example: f=2 failures, f+1 = 3 rounds needed
0
Start
1
4
2
3
Another example execution with 3 failures
Example: f=2 failures, f+1 = 3 rounds needed
0
Round 1
1,2,3,4
1
1,2,3,4
Failure 1
0
2
4
3
1,2,3,4
0,1,2,3,4
Multicast all values to everybody
Example: f=2 failures, f+1 = 3 rounds needed
0
Round 2
0,1,2,3,4
1
0,1,2,3,4
Failure 1
4
0,1,2,3,4
0,1,2,3,4
3
2
Multicast new values to everybody
Remark: At the end of this round all processes
know about all the other values
Example: f=2 failures, f+1 = 3 rounds needed
0
Round 3
0,1,2,3,4
Failure 1
1
0,1,2,3,4
4
2
3
0,1,2,3,4
0,1,2,3,4
Failure 2
Multicast new values to everybody
(no new values are learned in this round)
Example: f=2 failures, f+1 = 3 rounds needed
0
Finish
0,1,2,3,4
Failure 1
0
0,1,2,3,4
0
0
3
0,1,2,3,4
0,1,2,3,4
Failure 2
Decide on minimum value
If there are f failures and f+1 rounds then
there is a round with no failed process
Round
Example:
5 failures,
6 rounds
No failure
1
2
3
4
5
6
In the algorithm, at the end of the
round with no failure:
• Every (non faulty) process knows
about all the values of all other
participating processes
•This knowledge doesn’t change until
the end of the algorithm
Therefore, at the end of the
round with no failure:
everybody would decide the same value
However, we don’t know the exact position
of this round, so we have to let the algorithm
execute for f+1 rounds
Validity of algorithm:
when all processes start with the same
input value then the consensus is that value
This holds, since the value decided from
each process is some input value
A Lower Bound
Theorem: Any f-resilient consensus algorithm
requires at least f+1 rounds
Proof sketch:
Assume for contradiction that f
or less rounds are enough
Worst case scenario:
There is a process that fails in
each round
Worst case scenario
Round
pi
1
a
pk
before process pi fails, it sends its value a
to only one process pk
Worst case scenario
Round
1
2
a
pm
pk
before process pk fails, it sends value a
to only one process pm
Worst case scenario
Round
1
2
f
3
decide
b
………
a
pn
Process pn may decide a, and all other
processes may decide another value (b)
Worst case scenario
Round
1
2
3
f
b
decide
………
a
Therefore f rounds are not enough
At least f+1 rounds are needed
pn
Consensus in synchronous systems
Up to f faulty processes
Duration of round:
max. delay of B-multicast
Dolev & Strong, 1983:
Any algorithm to reach consensus despite
up to f failures requires (f +1) rounds.
Byzantine agreement: synchronous
Faulty process
p1 (Commander)
p1 (Commander)
1:v
1:v
1:w
2:1:v
2:1:w
p3
p2
3:1:u
3 says 1 says ‘u’
1:x
p2
p3
3:1:x
Nothing can be done to improve a correct
process’ knowledge beyond the first stage:
- It cannot tell which process is faulty.
Lamport et al, 1982:
No solution for N = 3, f = 1
(assuming private comm. channels)
Pease et al, 1982:
No solution for N<= 3*f
Byzantine agreement for N > 3*f
Example with N=4, f=1:
- 1st round: Commander sends a value to each lieutenant
- 2nd round: Each of the lieutenants sends the value it has
received to each of its peers.
- A lieutenant receives a total of (N – 2) + 1 values, of
which (N – 2) are correct.
- By majority(), the correct lieutenants compute the same value.
p1 (Commander)
p1 (Commander)
1:v
2:1:v
p2
1:v
1:u
1:v
2:1:u
p3
3:1:u
4:1:v
p2
4:1:v
3:1:w
p3
4:1:v
2:1:u
3:1:w
In general, O(N(f+1)) msg’s
p4
1:w
3:1:w
4:1:v
2:1:v
1:v
O(N2) for signed msg’s
p4
Four Byzantine Generals: N = 4, f = 1 in a Synchronous DS
p1 (Commander)
p1 (Commander)
1:v
2:1:v
p2
1:v
1:v
2:1:u
p3
3:1:u
4:1:v
1:u
p2
4:1:v
3:1:w
p4
p3
3:1:w
4:1:v
2:1:v
1:v
1:w
2:1:u
4:1:v
3:1:w
p4
Faulty processes
p2 decides on majority(v,u,v) = v
p4 decides on majority(v,v,w) = v
p2, p3, p4 decide on
majority(u,v, w) = 
Asynchronous system
 Solutions to consensus and BG problem ( and to IC)
exist in synchronous systems
 No algorithm can guarantee to reach consensus in
an asynchronous system, even with one process
crash failure
 In an asynchronous system, processes can respond
to messages at arbitrary times – so a crashed
process is indistinguishable from a slow one
 There is always some continuation of the processes’
execution that avoids consensus being reached
Impossibility of (deterministic) consensus in
asynchronous systems
M.J. Fischer, N. Lynch, and M. Paterson: “Impossibility of
distributed consensus with one faulty process”, J. ACM,
32(2), pp. 374-382, 1985.
A crashed process cannot be distinguished from a slow one.
- Not even with a 100% reliable comm. network !
There is always a chance that some continuation of the
processes’ execution avoid consensus being reached.
Contd
 Note the word “guarantee” in the statement of the
impossibility result
 The result does not mean that processes can never
reach consensus in an asynchronous system if one
is faulty – it allows that consensus can be reached
with some probability greater than zero
 For example, despite the fact that our systems are
often effectively asynchronous, transaction systems
have been reaching consensus regularly for many
years
Download