pptx

advertisement
LEADER ELECTION
CS 271
1
Election Algorithms
• Many distributed algorithms need one process to
act as coordinator
– Doesn’t matter which process does the job, just need
to pick one
• Election algorithms: technique to pick a unique
coordinator (aka leader election)
• Types of election algorithms: Bully and Ring
algorithms
CS 271
2
Bully Algorithm
• Each process has a unique numerical ID
• Processes know Ids and address of all other
process
• Communication is assumed reliable
• Key Idea: select process with highest ID
• Process initiates election if it just recovered from
failure or if coordinator failed
• 3 message types: election, OK, I won
• Processes can initiate elections simultaneously
– Need consistent result
CS 271
3
Bully Algorithm Details
• Any process P can initiate an election
• P sends Election messages to all process with
higher Ids and awaits OK messages
• If no OK messages, P becomes coordinator &
sends I won to all process with lower Ids
• If it receives OK, it drops out & waits for I won
• If a process receives Election msg, it returns OK
and starts an election
• If a process receives I won then sender is
coordinator
CS 271
4
Bully Algorithm Example
a)
b)
c)
Process 4 holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election
CS 271
5
Bully Algorithm Example
d)
e)
Process 6 tells 5 to stop
Process 6 wins and tells everyone
CS 271
6
Simple Ring-based Election
•
•
•
•
•
Processes have unique Ids and arranged in a logical ring
Each process knows its neighbors
Select process with highest ID as leader
Begin election if just recovered or coordinator has failed
Send Election to closest downstream node that is alive
– Sequentially poll each successor until a live node is found
• Each process tags its ID on the message
• Initiator picks node with highest ID and sends a coordinator
message
• Multiple elections can be in progress—no harm.
CS 271
7
Ring Algorithm Example
CS 271
8
Ring Algorithm Example
CS 271
9
Comparison
• Assume n processes and one election in
progress
• Bully algorithm
– Worst case: initiator is node with lowest ID
• Triggers n-2 elections at higher ranked nodes: O(n2)
msgs
– Best case: immediate election: n-2 messages
• Ring
– 2 (n-1) messages always
CS 271
10
Highlights of Leader Election
• Basic idea: each process has a unique
process-id.
• Once leader is discovered died, elect process
with highest (lowest) process-id.
CS 271
11
BROADCAST PROTOCOLS
CS 271
12
Broadcast Protocols
• Why Broadcast protocols?
– Data replication
– Highly available servers
– Cluster management
– Distributed logging
– ……
• Sometimes, message is received, but delivered
later to satisfy some order requirements.
CS 271
13
Ordering properties: FIFO(Cornell)
• Fifo or sender ordered multicast: fbcast
Messages are delivered in the order they
were sent (by any single sender)
a
e
p
q
r
s
CS 271
14
Ordering properties: FIFO
a
e
p
q
r
s
b
c
d
delivery of c to p is delayed until after b is delivered
CS 271
15
Limitations of FIFO Broadcast
Scenario:
• User A broadcasts a message to a mailing list
• B delivers that message
• B broadcasts reply
• C delivers B’s response without A´s original
message
• and misinterprets the message
CS 271
16
Ordering properties: Causal
• Causal or happens-before ordering: cbcast
If send(a)  send(b) then deliver(a) occurs
before deliver(b) at common destinations
a
p
q
r
s
b
CS 271
17
Ordering properties: Causal
a
p
q
r
s
b
c
delivery of c to p is delayed until after b is delivered
CS 271
18
Ordering properties: Causal
a
e
p
q
r
s
b
c
delivery of c to p is delayed until after b is delivered
e is sent (causally) after b
CS 271
19
Ordering properties: Causal
a
e
p
q
r
s
b
c
d
delivery of c to p is delayed until after b is delivered
delivery of e to r is delayed until after b&c are delivered
CS 271
20
Limitation of Causal Broadcast
Causal broadcast does not impose any order on
unrelated messages.
Two replicas can deliver operations/request in
different order.
CS 271
21
Ordering properties: Total
• Total or locally total multicast: atomic bcast
Messages are delivered in same order to all
recipients (including the sender)
a
e
p
q
r
s
b
c
d
all deliver a, b, c, d, then e
CS 271
22
Simple Causal broadcast protocol
• Each broadcast message carries all causally
preceding messages
• Before delivery, ensure causality by delivering
any missed causally preceding messages.
CS 271
23
Isis Causal Broadcast
•
•
•
•
Each process maintains a time vector of size n.
Initially VT[i] = 0.
When p sends a new message m: VT[p]++
Each message is piggybacked with VTm which
is the current VT of the sender.
• When p delivers a message, p updates its
vector: for k in 1..n:
– VTp[k] = max{ VTp[k], VTm[k] }.
CS 271
24
Isis Causal Order
• Requirement for delivery at node j:
– VTsender[sender] = VTreceiver[sender]+1
• This is the next message from sender
– VTsender[k] =< VTreceiver[k] for all k not sender
• Receiver has received all causally preceding messages
send
er
VTsender
recei
ver
CS 271
VTreceiver
25
Total order
• Different classes of total order broadcast:
– Fixed sequencer
– Moving sequencer using Token
– Dstributed agreement using Timestamp
CS 271
26
Using Sequencer (Amoeba)
• Delivery algorithm similar to FIFO except for
using a special “sequencer” to order messages
• Sender attaches unique id i to each message
m and sends <m,i> to the sequencer as well as
to all destinations
• Sequencer maintains sequence number S
(consecutive and increasing) and broadcast
<i, S> to all destinations.
• Message(k) is delivered
– if all messages(j) (0  j < k) are received
CS 271
27
Distributed Total Order Protocol (ISIS)
• Processes collectively agree on sequence
numbers (priority) in three rounds
• Sender sends message <m, id> to all receivers;
• Receivers suggest priority (sequence number)
and reply to sender with proposed priority;
• Sender collects all proposed priorities; decides
on final priority (breaking ties with process
ids), and resends the agreed final priority for
message m
• Receivers deliver message m according to
decided final priority
CS 271
28
ISIS algorithm for total ordering
P2
1 Message
3
22
P4
1
3 Agreed Seq
1
2
P1
Group g: P1, P2, P3, P4
3
P3
CS 271
29
Download