L14

advertisement
Lecture 14
Synchronization (cont)
Logistics

Project




P01 deadline on Wednesday November 3rd.
Non-blocking IO lecture: Nov 4th.
P02 deadline on Wednesday November 15th.
Next quiz

Tuesday 16th.
EECE 411: Design of Distributed Software Applications
Roadmap


Clocks can not be perfectly synchronized.
What can I do in these conditions?

Figure out how large is the drift


Design the system to take drift into account


Example: GPS systems
Example: server design to provide at-most-once
semantics
Do not use physical clocks!

Consider only event order



(1) Logical clocks (Lamport)
 But this does not account for causality!
(2) Vector clocks!
Mutual exclusion; leader election
EECE 411: Design of Distributed Software Applications
Last time: “Happens-before” relation
The happened-before relation on the set of events in a
distributed system:
 if a and b in the same process, and a occurs before b,
then a → b
 if a is an event of sending a message by a process, and b
receiving same message by another process then a → b
Two events are concurrent if nothing can be said about the order
in which they happened (partial order)
EECE 411: Design of Distributed Software Applications
Lamport’s logical clocks
p1
1
2
a
b
m1
3
4
c
d
Physic al
ti me
p2
m2
1
5
e
f
p3

Each process Pi maintains a local counter Ci and adjusts this counter according to the following
rules:
 For any two successive events that take place within process Pi, the counter Ci is
incremented by 1.
 Each time a message m is sent by process Pi the message receives a timestamp ts(m) = Ci
 Whenever a message m is received by a process Pj, Pj adjusts its local counter Cj
to
max{Cj, ts(m)}; then executes step 1 before passing m to the application.
EECE 411: Design of Distributed Software Applications
Updating Lamport’s logical timestamps
Physical Time
p 1
0
1
2
8
7
1
p 2
p 3
0
2
0
8
3
3
2
3
6
4
9 10
7
5
4
p 4
n
0
5
6
Clock Value
timestamp
Message
EECE 411: Design of Distributed Software Applications
7
Problem with Lamport logical clocks
Notation: timestamp(a) is the Lamport logical clock associated with event a
By definition if a  b => timestamp(a) < timestamp(b)
(if a happens before b, then Lamport_timestamp(a) < Lamport_timestamp(b))
Q: is the converse true?
That is: if timestamp(a) < timestamp(b) =>
ab
(If Lamport_timestamp(a) < Lamport_timestamp(b), it does NOT
imply that a happens before b
EECE 411: Design of Distributed Software Applications
Example
Physical Time
p 1
0
1
2
8
7
1
p 2
p 3
0
2
0
8
3
3
2
3
6
4
9 10
7
5
4
p 4
n
0
5
Clock Value
timestamp
Message
6
7
logically concurrent
events
Note: Lamport Timestamps: 3 < 7, but event with timestamp 3 is concurrent to event with timestamp 7,
i.e., events are not in ‘happen-before’ relation.
EECE 411: Design of Distributed Software Applications
Causality

Timestamps don’t capture causality


Example: news postings have multiple
independent threads of messages
To model causality – use Lamport’s vector timestamps

Intuition: each item in vector logical clock for one causality
thread.
EECE 411: Design of Distributed Software Applications
Vector Timestamps
(1,0,0) (2,0,0)
p1
a
b
m1
(2,1,0)
(2,2,0)
Physic al
ti me
p2
c
d
m2
(2,2,2)
(0,0,1)
p3
e
f
EECE 411: Design of Distributed Software Applications
Vector clocks

Each process Pi has an array VCi [1..n] of clocks (all initialized at 0)


Pi increments VCi [i]: when an event occurs or when sending


Vector value is the timestamp of the event
When sending



VCi [j] denotes the number of events that process Pi knows have taken
place at process Pj.
Messages sent by VCi include a vector timestamp vt(m).
Result: upon arrival, recipient knows Pi’s timestamp.
When Pj receives a message sent by Pi with vector timestamp ts(m):


for k ≠ j: updates each VCj [k] to max{VCj [k], ts(m)[k]}
for k = j: VCj [k] = VCj [k] + 1
Note: vector timestamps require a static notion of system membership
Question: What does VCi[j] = k mean in terms of messages sent and
received?
EECE 411: Design of Distributed Software Applications
Example: Vector Logical Time
Physical Time
p 1
0,0,0,0
p 2
0,0,0,0
2,0,0,0
4,0,2,2
3,0,2,2
(1,0,0,0)
p 3
0,0,0,0
p 4
0,0,0,0
n,m,p,q
1,0,0,0
(4,0,2,2)
1,2,0,0
1,1,0,0 (2,0,0,0)
(1,2,0,0)
2,0,2,0
2,0,1,0
(2,0,2,2)
2,2,3,0
(2,0,2,0)
2,0,2,1
4,2,4,2
(2,0,2,3)
2,0,2,2
Vector logical clock
(vector timestamp)
4,2,5,3
Message
EECE 411: Design of Distributed Software Applications
2,0,2,3
Comparing vector timestamps




VT1 = VT2, (identical)
iff VT1[i] = VT2[i], for all i = 1, … , n
VT1 ≤ VT2,
iff VT1[i] ≤ VT2[i], for all i = 1, … , n
VT1 < VT2, (happens before relationship)
iff VT1 ≤ VT2 and
 j (1 ≤ j ≤ n) such that VT1[j] < VT2 [j]
VT1 is concurrent with VT2
iff (not VT1 ≤ VT2 AND not VT2 ≤ VT1)
EECE 411: Design of Distributed Software Applications
Quiz like problem
Show:
ab
if and only if
vectorTS(a) < vectorTS(b)
EECE 411: Design of Distributed Software Applications
Message delivery for group communication
ASSUMPTIONS




messages are multicast to named process groups
reliable and fifo channels (from a given source to a given destination)
processes don’t crash (failure and restart not considered)
processes behave as specified e.g., send the same values to all
processes (i.e., we are not considering Byzantine behaviour)
application
process
may specify delivery order to message service
e.g. total order, FIFO order, causal order
(last time total order)
Messaging middleware
OS comms. interface
may reorder delivery to application by buffering messages
assume FIFO from each source (done at lower levels)
EECE 411: Design of Distributed Software Applications
[Last time] Totally Ordered Multicast



Process Pi sends timestamped message msgi to all others. The
message itself is put in a local queue queuei.
Any incoming message at Pk is queued in queuek, according to its
timestamp, and acknowledged to every other process.
Pk passes a message msgi to its application if:


msgi is at the head of queuek
for each process Px, there is a message msgx in queuek with a larger
timestamp.
Note: We are assuming that communication is reliable and FIFO
ordered.
Guarantee: all multicasted messages in the same order at
all destination.
 Nothing is guaranteed about the actual order!
EECE 411: Design of Distributed Software Applications
FIFO multicast

P1
Fifo or sender ordered multicast: Messages are delivered
in the order they were sent (by any single sender)
a
e
P2
P3
P4
EECE 411: Design of Distributed Software Applications
FIFO multicast

P1
Fifo or sender ordered multicast: Messages are delivered
in the order they were sent (by any single sender)
a
e
P2
P3
P4
b
c
d
delivery of c to P1 is delayed until after b is delivered
EECE 411: Design of Distributed Software Applications
Implementing FIFO multicast

Basic reliable multicast algorithm has this property


Without failures all we need is to run it on FIFO channels
(like TCP)
[Later: dealing with node failures]
EECE 411: Design of Distributed Software Applications
Causal multicast


P1
Causal or happens-before ordering
If send(a)  send(b) then deliver(a) occurs before deliver(b) at
common destinations
a
P2
P3
P4
b
EECE 411: Design of Distributed Software Applications
Ordering properties: Causal


P1
Causal or happens-before ordering
If send(a)  send(b) then deliver(a) occurs before deliver(b) at
common destinations
a
P2
P3
P4
b
c
delivery of c to P1 is delayed until after b is delivered
EECE 411: Design of Distributed Software Applications
Ordering properties: Causal


P1
Causal or happens-before ordering
If send(a)  send(b) then deliver(a) occurs before deliver(b) at
common destinations
a
e
P2
P3
P4
b
c
d
e is sent (causally) after b and c
e is sent concurrently with d
EECE 411: Design of Distributed Software Applications
Ordering properties: Causal


P1
Causal or happens-before ordering
If send(a)  send(b) then deliver(a) occurs before deliver(b) at
common destinations
a
e
P2
P3
P4
b
c
d
delivery of c to P1 is delayed until after b is delivered
delivery of e to P3 is delayed until after b&c are delivered
delivery of e and dEECE
to P2
and P3 in any relative order (concurrent)
411: Design of Distributed Software Applications
Causally ordered multicast
VC0=(2,2,0)
VC1=(1,2,0)
VC1=(1,1,0)
VC2=(1,2,2)
VC2=(1,0,1)
EECE 411: Design of Distributed Software Applications
Implementing causal order


Start with a FIFO multicast
We can strengthen this into a causal multicast by
adding vector time


No additional messages needed!
Advantage: FIFO and causal multicast are asynchronous:

Sender doesn’t get blocked and can deliver a copy to itself
without “stopping” to learn a safe delivery order
EECE 411: Design of Distributed Software Applications
So far …

Physical clocks

Two applications



‘Logical clocks’


Provide at-most-once semantics
Global Positioning Systems
Where only ordering of events matters
Other coordination primitives


Mutual exclusion
Leader election
EECE 411: Design of Distributed Software Applications
Mutual exclusion algorithms
Problem: A number of processes in a distributed system want
exclusive access to some resource.
Basic solutions:

Via a centralized server.

Completely decentralized

Completely distributed, with no roles imposed.

Completely distributed along a (logical) ring.
Additional objective: Fairness
EECE 411: Design of Distributed Software Applications
Mutual Exclusion: A Centralized Algorithm
a)
b)
c)
Process 1 asks the coordinator for permission to enter a critical region.
Permission is granted
Process 2 then asks permission to enter the same critical region. The
coordinator does not reply.
When process 1 exits the critical region, it tells the coordinator, when then
replies to 2
EECE 411: Design of Distributed Software Applications
Decentralized Mutual Exclusion
Principle: Assume the resource is replicated n times, with each replica having
its own coordinator

Access requires a majority vote from m > n/2 coordinators.

A coordinator always responds immediately to a request.
Assumption: When a coordinator crashes, it will recover quickly, but will have
forgotten about permissions it had granted.
Correctness: probabilistic!
Issue: How robust is this system?
EECE 411: Design of Distributed Software Applications
Decentralized Mutual Exclusion (cont)
Principle: Assume every resource is replicated n times, with each
replica having its own coordinator


Access requires a majority vote from m > n/2 coordinators.
A coordinator always responds immediately to a request.
Issue: How robust is this system?
 p the probability that a coordinator resets (crashes and
recovers) in an interval Δt

p = Δt /T, where T is the an average peer lifetime
Quiz—like question: what’s the probability to violate
mutual exclusion?
EECE 411: Design of Distributed Software Applications
Decentralized Mutual Exclusion (cont)
Principle: Assume every resource is replicated n times, with each
replica having its own coordinator


Access requires a majority vote from m > n/2 coordinators.
A coordinator always responds immediately to a request.
Issue: How robust is this system?
 p the probability that a coordinator resets (crashes and
recovers) in an interval Δt



p = Δt /T, where T is the an average peer lifetime
The probability that k out m coordinators reset during Δt
P[k]=C(k,m)pk(1-p)m-k:
Violation when at least 2m-n coordinators reset
EECE 411: Design of Distributed Software Applications
EECE 411: Design of Distributed Software Applications
Mutual Exclusion: A Distributed Algorithm
(Ricart & Agrawala)
Idea: Similar to Lamport ordered group communication
except that acknowledgments aren’t sent.
Instead, replies (i.e. grants) are sent only when:



The receiving process has no interest in the shared resource; or
The receiving process is waiting for the resource, but has lower
priority (known through comparison of timestamps).
In all other cases, reply is deferred

(results in some more local administration)
EECE 411: Design of Distributed Software Applications
Mutual Exclusion: A Distributed Algorithm (II)
a)
b)
c)
Two processes (0 and 2) want to enter the same critical region at the same moment.
Process 0 has the lowest timestamp, so it wins.
When process 0 is done, it sends an OK also, so 2 can now enter the critical region.
Question: Is a fully distributed solution, i.e. one without a coordinator,
always more robust than any centralized coordinated solution?
EECE 411: Design of Distributed Software Applications
Mutual Exclusion: A Token Ring Algorithm
Principle: Organize processes in a logical ring, and let a
token be passed between them. The one that holds
the token is allowed to enter the critical region (if it
wants to)
EECE 411: Design of Distributed Software Applications
Logistics

Project




P01 deadline tomorrow.
Project/Non-blocking IO lecture: Thursday
P02 deadline on Wednesday November 15th.
Next quiz

Tuesday 16th.
EECE 411: Design of Distributed Software Applications
So far …

Physical clocks

Two applications



‘Logical clocks’

Where only ordering of events matters



Provide at-most-once semantics
Global Positioning Systems
Lamport clocks
Vector clocks
Other coordination primitives


Mutual exclusion
Leader election: How do I choose a coordinator?
EECE 411: Design of Distributed Software Applications
[Last time] Mutual exclusion algorithms
Problem: A number of processes in a distributed system want
exclusive access to some resource.
Basic solutions:

Via a centralized server.

Completely decentralized (voting based)

Completely distributed, with no roles imposed.

Completely distributed along a (logical) ring.
Additional objectives: Fairness; no starvation
EECE 411: Design of Distributed Software Applications
Mutual Exclusion: Algorithm Comparison
Algorithm
Messages
per
entry/exit
Delay before
entry
(in message
times)
Centralized
Decentralized
Problems
Coordinator
crash
Starvation,
low efficiency
Distributed
Crash of any
process
Token ring
Lost token,
process crash
EECE 411: Design of Distributed Software Applications
Mutual Exclusion: Algorithm Comparison
Algorithm
Centralized
Decentralized
Distributed
Token ring
Messages
per
entry/exit
Delay before
entry
(in message
times)
3
Problems
Coordinator
crash
3mk
(k=number
of attempts)
Starvation,
low efficiency
2(n-1)
Crash of any
process
1..∞
Lost token,
process crash
EECE 411: Design of Distributed Software Applications
Mutual Exclusion: Algorithm Comparison
Messages
per
entry/exit
Delay before
entry
(in message
times)
3
2
3mk
(k=number
of attempts)
Coordinator
crash
2m
Starvation,
low efficiency
Distributed
2(n-1)
2(n-1)
Crash of any
process
Token ring
1..∞
0 to n-1
Lost token,
process crash
Algorithm
Centralized
Decentralized
EECE 411: Design of Distributed Software Applications
Problems
So far …

Physical clocks

Two applications



‘Logical clocks’


Provide at-most-once semantics
Global Positioning Systems
Where only ordering of events matters
Other coordination primitives


Mutual exclusion
Leader election: How do I choose a coordinator?
EECE 411: Design of Distributed Software Applications
Leader election algorithms
Context: An algorithm requires that some process acts as a
coordinator.
Question: how to select this special process dynamically.
Note: In many systems the coordinator is chosen by hand
(e.g. file servers). This leads to centralized solutions
single point of failure.
EECE 411: Design of Distributed Software Applications
Leader election algorithms
Context: Each process has an associated priority (weight).
The process with the highest priority needs to be elected
as the coordinator.
Issue: How do we find the ‘heaviest’ process?
Two important assumptions:
 Processes are uniquely identifiable
 All processes know the identity of all participating
processes
Traditional algorithm examples
 The bully algorithm
 Ring based algorithm
EECE 411: Design of Distributed Software Applications
Election by Bullying



Any process can just start an election by sending an election message
to all other (heavier) processes
If a process Pheavy receives an election message from a lighter process
Plight, it sends a take-over message to Plight. Plight is out of the race.
If a process doesn’t get a take-over message back, it wins, and sends
a victory message to all other processes.
EECE 411: Design of Distributed Software Applications
The Bully Algorithm



Process 4 detects 7 has failed and holds an election
Process 5 and 6 respond, telling 4 to stop
Now 5 and 6 each hold an election (also send
message to 7 as they have not detected 7 failure)
EECE 411: Design of Distributed Software Applications
The Bully Algorithm (2)
d)
e)
Process 6 tells 5 to stop
Process 6 wins and announces itself everyone
EECE 411: Design of Distributed Software Applications
Election in a Ring
Principle: Organize processes into a (logical) ring. Process
with the highest priority should be elected as coordinator.
 Any process can start an election by sending an election
message to its successor. If a successor is down, the
message is passed on to the next successor.
 If a message is passed on, the sender adds itself to the list.
 The initiator sends a coordinator message around the ring
containing a list of all living processes. The one with the
highest priority is elected as coordinator.
EECE 411: Design of Distributed Software Applications
The Ring Algorithm
•
•
Question: What happens if two processes initiate an election at the
same time? Does it matter?
Question: What happens if a process crashes during the election?
EECE 411: Design of Distributed Software Applications
Summary so far …
A distributed system is:
 a collection of independent computers that appears to its
users as a single coherent system
Components need to:
 Communicate



Point to point: sockets, RPC/RMI
Point to multipoint: multicast, epidemic
Cooperate

Naming to enable some resource sharing



Naming systems for flat (unstructured) namespaces: consistent
hashing, DHTs
Naming systems for structured namespaces: EECE456 for DNS
Synchronization: physical clocks, logical clocks, mutual
exclusion, leader election
EECE 411: Design of Distributed Software Applications
Download