DISTRIBUTED SYSTEMS
Principles and Paradigms
Second Edition
ANDREW S. TANENBAUM
MAARTEN VAN STEEN
Chapter 7
Consistency And Replication
(CONSISTENCY PROTOCOLS)
1
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
CONSISTENCY PROTOCOLS
•
•
•
•
Continuous Consistency
- Bounding Numerical Deviation
- Bounding Staleness Deviations
- Bounding Ordering Deviations
Sequential Consistency
- Primary-based Protocol
- Remote-write Protocol
- Local-write Protocol
- Replicated-write Protocol
- Active Replication
- Majority Voting (Quorum-Based)
Cache-Coherence Protocols
Implementing Client-Centric Consistency
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
2
consistency protocol
•
A consistency protocol describes an implementation of a
specific consistency model.
3
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Continuous Consistency
Bounding Numerical Deviation:
•
A solution for keeping the numerical deviation within bounds
•
We concentrate on writes to a single data item x
•
Each write W(x) has an associated weight that represents the
numerical value by which x is updated, denoted as weight
(W)
•
For simplicity, we assume that weight (W) > 0.
•
Each write W is initially submitted to one out of the N
available replica servers, in which case that server becomes
the write's origin, denoted as origin (W)
•
each server Si will keep track of a log Li of writes that it has
performed on its own local copy of x.
4
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Continuous consistency:
Numerical deviation
•
•
•
•
•
•
Consider a data item x and let weight(W) denote the
numerical change in its value after a write operation W.
Assume that weight(W) > 0.
W is initially forwarded to one of the N replicas, denoted as
origin(W).
TW[i,j] are the writes executed by server Si that originated
from Sj
TW[i,j] = ∑{weight(W) | origin(W) = Sj & W Є log(Si )}
The goal is for any time t, to let the current value Vi at server
Si deviate within bounds from the actual value v(t) of x.
This actual value is completely determined by all submitted
writes
5
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Continuous consistency:
Numerical deviation
Actual value v(t) of x:
N
v(t) = v(0) + ∑ TW[k,k]
k=1
value vi of x at replica i
N
v(i) = v(0) + ∑ TW[i,k]
k=1
6
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Continuous consistency:
Numerical deviation
•
•
•
for every server Si, we associate an upperbound δi such that
we need to enforce:
v(t) – vi ≤ δi for every server Si
when a server Si propagates a write originating from Sj to Sk
the latter will be able to learn about the value TW [i,j] at the
time the write was sent
Sk maintains a view TWk [i,j] of what it believes Si will have as
value for TW[i,j].
7
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Continuous consistency:
Numerical deviation
Solution
•
Sĸ sends operations from its log to Si when it sees that
TWĸ[i,k] is getting too far from TW[k,k], in particular, when
TW[k,k] - TWĸ[i,k] > δi / (N -1)
8
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Continuous consistency:
Numerical deviation
•
•
•
when server Sk notices that Si has not been keeping in the
right pace with the updates that have been submitted to Sk it
forwards writes from its log to Si .
This forwarding effectively advances the view TWĸ[i,k] that Sk
has of TW[i,k], making the deviation TW[i,k]- TWĸ[i,k] smaller.
In particular, Sk advances its view on TW[i,k]when an
application submits a new write that would increase
TW[k,k] - TWĸ[i,k] beyond δi / (N -1).
Advancement always ensures that v(t) – vi ≤ δi
9
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Bounding Staleness Deviations
•
•
•
•
There are many ways to keep the staleness of replicas within
specified bounds
One approach is to timestamp each submitted write by its
origin server
let server Sk keep a real-time vector clock RVCk where:
RVCk[i] = T(i) means that Sk has seen all writes that have
been submitted to Si up to time T(i)
T(i) denotes the time local to Si
10
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Bounding Staleness Deviations
•
•
•
•
If the clocks between the replica servers are loosely
synchronized:
Whenever server Sk notes that T(k) - RVCk[i] is about to
exceed a specified limit
it simply starts pulling in writes that originated from Si with a
timestamp later than RVCk[i]·
a replica server is responsible for keeping its copy of x up to
date regarding writes that have been issued elsewhere
11
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Numerical VS Staleness Bounds
•
Numerical bounds follows a push approach, by letting an
origin server keep replicas up to date by forwarding writes
•
Staleness Bounds follows a Pull approach, Replica servers
pull in updates from origin servers
12
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Bounding Ordering Deviations
•
•
•
•
•
ordering deviations in continuous consistency are caused by
the fact that a replica server tentatively applies updates that
have been submitted to it.
each server will have a local queue of tentative writes for
which the actual order in which they are to be applied to the
local copy of x still needs to be determined
The ordering deviation is bounded by specifying the maximal
length of the queue of tentative writes
enforce a globally consistent ordering of tentative writes
Using primary-based or quorum-based protocols
13
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Primary-Based Protocols
•
•
•
•
each data item x in the data store has an associated primary,
which is responsible for coordinating write operations on x
provide a straightforward implementation of sequential
consistency
all processes see all write operations in the same order, no
matter which backup server they use to perform read
operations
A distinction can be made as to whether:
- the primary is fixed at a remote server
- write operations can be carried out locally after moving the
primary to the process where the write operation is initiated
14
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Remote- Write Protocols
•
•
•
•
•
all write operations need to be forwarded to a fixed single
server
Read operations can be carried out locally
Also called (primary-backup protocol), Fig. 7-20
Disadvantage: it may take a relatively long time before the
process that initiated the update is allowed to continue, an
update is implemented as a blocking operation
Alternative: nonblocking approach
15
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Remote-Write Protocols
Figure 7-20. The principle of a primary-backup protocol.16
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Remote- Write Protocols
nonblocking approach:
•
As soon as the primary has updated its local copy of x, it
returns an acknowledgment. After that, it tells the backup
servers to perform the update as well
•
Advantage: write operations may speed up considerably
•
Disadvantage: fault tolerance, updates may not be backed up
by other servers
17
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Local- Write Protocols
•
•
•
when a process wants to update data item x, it locates the
primary copy of x, and subsequently moves it to its own
location (Fig. 7-21)
Advantage (in nonblocking protocol only):
multiple, successive write operations can be carried out
locally, while reading processes can still access their local
copy
updates are propagated to the replicas after the primary has
finished with locally performing the updates
18
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Local-Write Protocols
Figure 7-21. Primary-backup protocol in which the primary
19
migrates to the process wanting to perform an update.
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Local- Write Protocols
Example primary-backup protocol with local writes
•
Mobile computing in disconnected mode (ship all relevant
files to user before disconnecting, and update later on)
•
While being disconnected, all update operations are carried
out locally. while other processes can still perform read
operations (but no updates).
•
Later, when connecting again, updates are propagated from
the primary to the backups, bringing the data store in a
consistent state again
20
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Local- Write Protocols
Example primary-backup protocol with local writes
distributed file systems that require a high degree of fault
tolerance
•
a fixed central server through which all write operations take
place
•
The server temporarily allows one of the replicas to perform a
series of local updates to speed up performance
•
When the replica server is done, the updates are propagated
to the central server, from where they are then distributed to
the other replica servers
21
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Replicated-Write Protocols
•
•
write operations can be carried out at multiple replicas
instead of only one
A distinction can be made between:
- active replication, in which an operation is forwarded to all
replicas
- majority voting (Quorum-Based Protocols)
22
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Active Replication
•
•
•
•
•
each replica has an associated process that carries out
update operations
operations need to be carried out in the same order
everywhere.
totally-ordered multicast mechanism. Such a multicast can be
implemented using Lamport's logical clocks,
Disadvantage: this implementation of multicasting does not
scale well in large distributed systems.
total ordering can be achieved using a central coordinator,
also called a sequencer.
23
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Active Replication
•
•
first forward each operation to the sequencer, which
assigns it a unique sequence number and subsequently
forwards the operation to all replicas
Operations are carried out in the order of their sequence
number.
24
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Quorum-Based Protocols
•
•
A different approach to supporting replicated writes is to use
voting
clients to request and acquire the permission of multiple
servers before either reading or writing a replicated data item
25
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Quorum-Based Protocols
how the algorithm works?
•
a file is replicated on N servers
•
to update a file, a client must first contact at least half the
servers plus one (a majority) and get them to agree to do the
update.
•
Once they have agreed, the file is changed and a new
version number is associated with the new file.
26
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Quorum-Based Protocols
how the algorithm works?
•
To read a replicated file, a client must also contact at least
half the servers plus one and ask them to send the version
numbers associated with the file.
•
If all the version numbers are the same, this must be the most
recent version
27
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Quorum-Based Protocols
•
•
•
to read a file of which N replicas exist, a client needs to
assemble a read quorum, an arbitrary collection of any NR
servers, or more
to modify a file, a write quorum of at least Nw servers is
required. The values of NR and Nw are subject to the following
two constraints:
NR +NW > N and NW > N / 2
28
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Quorum-Based Protocols
Figure 7-22. Three examples of the voting algorithm. (a) A
correct choice of read and write set. (b) A choice that
may lead to write-write conflicts. (c) A correct choice,
known as ROWA (read one, write all).
29
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Cache-Coherence Protocols
cache-coherence protocols ensures that a cache is consistent
with the server-initiated replicas
•
coherence detection strategy (when)
•
coherence enforcement strategy (How)
30
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Cache-Coherence Protocols
•
•
•
•
•
Caches are controlled by clients instead of servers
much research in the design and implementation of caches
caching solutions differ in their coherence detection strategy,
when inconsistencies are detected.
Dynamic solutions: inconsistencies are detected at runtime.
For example, a check is made with the server to see whether
the cached data have been modified since they were cached
In the case of distributed databases, dynamic detectionbased protocols can be further classified by considering
exactly when during a transaction the detection is done
31
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Implementing Client-Centric
Consistency
A Naive Implementation:
•
each write operation W is assigned a globally unique
identifier. Such an identifier is assigned by the server to which
the write had been submitted (Origin of W).
•
for each client, a track of two sets of writes:
•
The read set for a client consists of the writes relevant for the
read operations performed by a client
•
The write set consists of the identifiers of the writes
performed by the client.
32
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Implementing Client-Centric
Consistency
Monotonic-read consistency:
•
When a client performs a read operation at a server, that
server is handed the client's read set to check whether all the
identified writes have taken place locally
•
(The size of the set may introduce a performance problem)
•
If not, it contacts the other servers to ensure that it is brought
up to date before carrying out the read operation
•
Alternatively, the read operation is forwarded to a server
where the write operations have already taken place
•
After the read operation is performed, the write operations
that have taken place at the selected server and which are
relevant for the read operation are added to the client‘s read
set
33
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Implementing Client-Centric
Consistency
Monotonic-write consistency:
•
When a client initiates a new write operation at a server, the
server is handed over the client's write set
•
It then ensures that the identified write operations are
performed first and in the correct order
•
After performing the new operation, that operation's write
identifier is added to the write set
34
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Implementing Client-Centric
Consistency
read-your-writes consistency :
•
requires that the server where the read operation is
performed has seen all the write operations in the client's
write set
•
The writes can be fetched from other servers before the read
operation is performed
35
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Implementing Client-Centric
Consistency
writes-follow-reads consistency:
•
implemented by first bringing the selected server up to date
with the write operations in the client's read set
•
Then, later adding the identifier of the write operation to the
write set, along with the identifiers in the read set
36
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Improving Efficiency
•
read set and write set associated with each client can
become very large
Solution:
•
a client's read and write operations are grouped into sessions
•
A session is typically associated with an application: it is
opened when the application starts and is closed when it exits
•
Whenever a client closes a session, the sets are cleared
37
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5
Summary
•
•
•
•
Consistency protocols describe specific implementations of
consistency models
a distinction can be made between primary-based protocols
and replicated-write protocols
In primary-based protocols, all update operations are
forwarded to a primary copy that subsequently ensures the
update is properly ordered and forwarded
In replicated-write protocols, an update is forwarded to
several replicas at the same time
38
Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5