consensus

advertisement
Distributed
Algorithms
(22903)
The wait-free hierarchy and the
universality of consensus
Lecturer: Danny Hendler
This presentation is based on the book “Distributed Computing”
by Hagit attiya & Jennifer Welch
The Consensus object: each
process has a private input
32
19
© 2003 Herlihy and Shavit
21
2
13
They Communicate
© 2003 Herlihy and Shavit
14
3
They Agree on Some Process’
Input
19
19
© 2003 Herlihy and Shavit
19
15
4
Formally: the Consensus Object
-Supports a single operation: decide
-Each process pi calls decide with some input vi
from some domain. decide returns a value from
the same domain.
-The following requirements must be met:
- Agreement: In any execution E, all decide
operations must return the same value.
- Validity: The values returned by the
operations must equal one of the inputs.
5
Wait-free consensus can be
solved easily by compare&swap
Comare&swap(b,old,new)
atomically
v
read from b
if (v = old) {
b
new
return success
}
else
return failure;
Motorola 680x0
IBM 370
Sun SPARC
80X86
MIPS
PowerPC
DECAlpha
How?
6
Would this consensus algorithm
from reads/writes work?
Initially decision=null
Decide(v) ; code for pi, i=0,1
1.
2.
3.
4.
5.
if (decision = null)
decision=v
return v
else
return decision
7
A proof that wait-free
consensus for 2 or more
processes cannot be solved by
registers.
8
A FIFO queue
Supports 2 operations:
• q.enqueue(x) – returns ack
• q.dequeue – returns the first
item in the queue or empty if
the queue is empty.
9
FIFO queue + registers can
implement 2-process consensus
Initially Q=<0> and Prefer[i]=null, i=0,1
Decide(v) ; code for pi, i=0,1
1. Prefer[i]:=v
2. qval=Q.deq()
3. if (qval = 0) then return v
4. else return Prefer[1-i]
There is no wait-free implementation of a
FIFO queue shared by 2 or more
processes from registers
10
A proof that wait-free
consensus for 3 or more
processes cannot be solved by
FIFO queue (+ registers)
11
The wait-free hierarchy
We say that object type X solves wait-free n-process
consensus if there exists a wait-free consensus
algorithm for n processes using only shared objects of
type X and registers.
The consensus number of object type X is n, denoted
CN(X)=n, if n is the largest integer for which X solves
wait-free n-process consensus. It is defined to be
infinity if X solves consensus for every n.
Lemma: If CN(X)=m and CN(Y)=n>m, then there is no
wait-free implementation of Y from instances of X and
registers in a system with more than m processes. 12
The wait-free hierarchy (cont’d)
registers
1
FIFO queue, stack, 2
test-and-set
…
Compare-and-swap

13
The universality of conensus
An object is universal if, together with registers, it
can implement any other object in a wait-free
manner.
We will show that any object X with
consensus number n is universal in a system
with n or less processes
An algorithm is lock-free if it guarantees that some
operation terminates after some finite total number
of steps performed by processes.
The lock-freedom progress property is
weaker than wait-freedom.
14
Universal constructions
Given the sequential specification of any object,
implement a linearizable wait-free concurrent version
of it:
• A lock-free construction using CAS
• A lock-free construction using consensus
• A wait-free construction using consensus
• A bounded-memory wait-free construction using
consensus
15
A lock-free universal
algorithm using CAS
Each operation is represented by a shared record of type opr.
typedef opr structure {
inv
;the operation invocation, including its parameters
new-state ;the new state of the object, after applying the operation
response ;The response of the operation
}
Head
inv
new-state
response
inv
new-state
response
…
inv
new-state
response
16
A lock-free universal algorithm
using CAS (cont’d)
Head
inv
new-state
response
inv
new-state
response
…
anchor
inv
new-state=init
response
Initially Head points to the anchor record. Head.newstate is initialized with
the implemented object’s initial state.
1.
2.
3.
4.
5.
6.
7.
When inv occurs
point:=new opr, point.inv:=inv
repeat
h:=Head
point.new-state, point.response=apply(inv, h.new-state)
until compare&swap(Head, h, point)=h
return point.response
17
A lock-free universal
algorithm using consensus
Each operation is represented by a shared record of type opr.
typedef opr structure {
seq
;the operation’s sequential number (register)
inv
;the operation invocation, including its parameters (register)
new-state ;the new state of the object, after applying the operation (register)
response ;The response of the operation, including its return value (register)
after
;A pointer to the next record (consensus object)
Head
seq
inv
new-state
response
after
seq
inv
new-state
response
after
…
anchor
seq=1
inv=null
new-state=init
response=null
after
18
A lock-free universal algorithm using
consensus (cont’d)
Head
seq
inv
new-state
response
after
seq
inv
new-state
response
after
…
anchor
seq=1
inv=null
new-state=init
response=null
after
Initially all Head entries points to the anchor record.
1.
When inv occurs
2.
point:=new opr, point.inv:=inv
3.
for j=0 to n-1 ; find a record with the maximum sequenece number
4.
if Head[j].seq > Head[i].seq then Head[i]=Head[j]
5.
repeat
6.
win:=decide(Head[i].after,point) ; try to thread your operation
7.
win.seq:=Head[i].seq+1
8.
< win.new-state, win.response > :=apply(win.inv, Head[i].new-state)
9.
Head[i]=win ; point to the following record
10.
until win=point
11.
return point.response
19
A wait-free universal algorithm using
consensus
Each operation is represented by a shared record of type opr.
typedef opr structure {
seq
;the operation’s sequential number (register)
inv
;the operation invocation, including its parameters (register)
new-state ;the new state of the object, after applying the operation (register)
response ;The response of the operation, including its return value (register)
after
;A pointer to the next record (consensus object)
We add a helping mechanism
Announce
seq
inv
new-state
response
after
When performing operation
with sequence number j, try
to help process (j mod n)20
A wait-free universal algorithm using
consensus (cont’d)
Initially all Head and Announce entries point to the anchor record.
1. When inv occurs
2.
Announce[i]:=new opr, Announce[i].inv:=inv,Announce[i].seq:=0
3.
for j=0 to n-1 ; find a record with the maximum sequenece number
4.
if Head[j].seq > Head[i].seq then Head[i]=Head[j]
5.
while Announce[i].seq=0 do
6.
priority:=Head[i].seq+1 mod n ; ID of process with priority
7.
if Announce[priority].seq=0 ; If help is needed
8.
then point:=Announce[priority] ; help the other process
9.
else point:=Announce[i] ; perform own operation
10.
win:=decide(Head[i].after, point)
11.
< win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state)
12.
win.seq:=Head[i].seq+1
13.
Head[i]=win
14. return Announce[i].reponse
21
A proof that the universal algorithm
using consensus is wait-free
22
A bounded-memory wait-free universal
algorithm using consensus
What is the number of records needed
by the algorithm?
Unbounded!
The following algorithm uses a bounded # of records
• Each process allocates records from its private pool
• A record is recycled once we’re sure it will not be
referenced anymore
• We don’t need this mechanism if we use a language
with a GC (such as Java)
23
A bounded-memory wait-free universal
algorithm using consensus (cont’d)
When can we recycle record #k?
No process trying to thread record (k+n+1) or higher
will write record k.
After all the processes that thread records
k…k+n terminate, record k can be freed.
When process p finishes threading record m it releases
records m-1…m-n. After record k is released by the
operations threading records k+1…k+n – it can be
24
recycled.
A bounded-memory wait-free universal algorithm
using consensus: data structures
Each operation is represented by a shared record of type opr.
typedef opr structure {
seq
;the operation’s sequential number (register)
inv
;the operation invocation, including its parameters (register)
new-state ;the new state of the object, after applying the operation (register)
response ;The response of the operation, including its return value (register)
after
;A pointer to the next record (consensus object)
before ;A pointer to the previous record
released[1..n] initially true
Head
seq
inv
new-state
response
before
after
seq
inv
new-state
response
before
after
…
anchor
seq
inv
new-state
response
before
after 25
A bounded-memory wait-free universal algorithm
using consensus (cont’d)
Initially all Head and Announce entries point to the anchor record.
1.
2.
When inv occurs
point:=a free record from private pool, point.inv:=inv,point.seq:=0
for r:=1 to n do point.released[r]:=false, Announce[i]:=point
3.
for j=0 to n-1 ; find a record with the maximum sequenece number
4.
if Head[j].seq > Head[i].seq then Head[i]=Head[j]
5.
while Announce[i].seq=0 do
6.
priority:=Head[i].seq+1 mod n ; ID of process with priority
7.
if Announce[priority].seq=0 ; If help is needed
8.
then point:=Announce[priority] ; help the other process
9.
else point:=Announce[i] ; perform own operation
10.
win:=decide(Head[i].after, point)
11.
< win.new-state,win.reponse > :=apply(win.inv,Head[i].new-state)
12.
win.before:=Head[i]
13.
win.seq:=Head[i].seq+1
14.
Head[i]=win
15. temp:=Announce[i].before
16. for r:=1 to n do
17.
if temp<> anchor then
26
18.
before-temp:=temp.before, temp.released[r]:=true, temp:= before-temp
19. return Announce[i].response
How many records are
required by the algorithm?
Each incomplete operation may waste n distinct records
There may be up to n incomplete operations
At any point in time, up to n2 non-recycable records
All non-recycable records may belong to same process!
Each pool should have O(n2) records, O(n3) total
records needed
27
Download