DISTRIBUTED SYSTEMS Principles and Paradigms Second Edition ANDREW S. TANENBAUM MAARTEN VAN STEEN Chapter 7 Consistency And Replication (CONSISTENCY PROTOCOLS) 1 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 CONSISTENCY PROTOCOLS • • • • Continuous Consistency - Bounding Numerical Deviation - Bounding Staleness Deviations - Bounding Ordering Deviations Sequential Consistency - Primary-based Protocol - Remote-write Protocol - Local-write Protocol - Replicated-write Protocol - Active Replication - Majority Voting (Quorum-Based) Cache-Coherence Protocols Implementing Client-Centric Consistency Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 2 consistency protocol • A consistency protocol describes an implementation of a specific consistency model. 3 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Continuous Consistency Bounding Numerical Deviation: • A solution for keeping the numerical deviation within bounds • We concentrate on writes to a single data item x • Each write W(x) has an associated weight that represents the numerical value by which x is updated, denoted as weight (W) • For simplicity, we assume that weight (W) > 0. • Each write W is initially submitted to one out of the N available replica servers, in which case that server becomes the write's origin, denoted as origin (W) • each server Si will keep track of a log Li of writes that it has performed on its own local copy of x. 4 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Continuous consistency: Numerical deviation • • • • • • Consider a data item x and let weight(W) denote the numerical change in its value after a write operation W. Assume that weight(W) > 0. W is initially forwarded to one of the N replicas, denoted as origin(W). TW[i,j] are the writes executed by server Si that originated from Sj TW[i,j] = ∑{weight(W) | origin(W) = Sj & W Є log(Si )} The goal is for any time t, to let the current value Vi at server Si deviate within bounds from the actual value v(t) of x. This actual value is completely determined by all submitted writes 5 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Continuous consistency: Numerical deviation Actual value v(t) of x: N v(t) = v(0) + ∑ TW[k,k] k=1 value vi of x at replica i N v(i) = v(0) + ∑ TW[i,k] k=1 6 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Continuous consistency: Numerical deviation • • • for every server Si, we associate an upperbound δi such that we need to enforce: v(t) – vi ≤ δi for every server Si when a server Si propagates a write originating from Sj to Sk the latter will be able to learn about the value TW [i,j] at the time the write was sent Sk maintains a view TWk [i,j] of what it believes Si will have as value for TW[i,j]. 7 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Continuous consistency: Numerical deviation Solution • Sĸ sends operations from its log to Si when it sees that TWĸ[i,k] is getting too far from TW[k,k], in particular, when TW[k,k] - TWĸ[i,k] > δi / (N -1) 8 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Continuous consistency: Numerical deviation • • • when server Sk notices that Si has not been keeping in the right pace with the updates that have been submitted to Sk it forwards writes from its log to Si . This forwarding effectively advances the view TWĸ[i,k] that Sk has of TW[i,k], making the deviation TW[i,k]- TWĸ[i,k] smaller. In particular, Sk advances its view on TW[i,k]when an application submits a new write that would increase TW[k,k] - TWĸ[i,k] beyond δi / (N -1). Advancement always ensures that v(t) – vi ≤ δi 9 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Bounding Staleness Deviations • • • • There are many ways to keep the staleness of replicas within specified bounds One approach is to timestamp each submitted write by its origin server let server Sk keep a real-time vector clock RVCk where: RVCk[i] = T(i) means that Sk has seen all writes that have been submitted to Si up to time T(i) T(i) denotes the time local to Si 10 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Bounding Staleness Deviations • • • • If the clocks between the replica servers are loosely synchronized: Whenever server Sk notes that T(k) - RVCk[i] is about to exceed a specified limit it simply starts pulling in writes that originated from Si with a timestamp later than RVCk[i]· a replica server is responsible for keeping its copy of x up to date regarding writes that have been issued elsewhere 11 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Numerical VS Staleness Bounds • Numerical bounds follows a push approach, by letting an origin server keep replicas up to date by forwarding writes • Staleness Bounds follows a Pull approach, Replica servers pull in updates from origin servers 12 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Bounding Ordering Deviations • • • • • ordering deviations in continuous consistency are caused by the fact that a replica server tentatively applies updates that have been submitted to it. each server will have a local queue of tentative writes for which the actual order in which they are to be applied to the local copy of x still needs to be determined The ordering deviation is bounded by specifying the maximal length of the queue of tentative writes enforce a globally consistent ordering of tentative writes Using primary-based or quorum-based protocols 13 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Primary-Based Protocols • • • • each data item x in the data store has an associated primary, which is responsible for coordinating write operations on x provide a straightforward implementation of sequential consistency all processes see all write operations in the same order, no matter which backup server they use to perform read operations A distinction can be made as to whether: - the primary is fixed at a remote server - write operations can be carried out locally after moving the primary to the process where the write operation is initiated 14 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Remote- Write Protocols • • • • • all write operations need to be forwarded to a fixed single server Read operations can be carried out locally Also called (primary-backup protocol), Fig. 7-20 Disadvantage: it may take a relatively long time before the process that initiated the update is allowed to continue, an update is implemented as a blocking operation Alternative: nonblocking approach 15 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Remote-Write Protocols Figure 7-20. The principle of a primary-backup protocol.16 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Remote- Write Protocols nonblocking approach: • As soon as the primary has updated its local copy of x, it returns an acknowledgment. After that, it tells the backup servers to perform the update as well • Advantage: write operations may speed up considerably • Disadvantage: fault tolerance, updates may not be backed up by other servers 17 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Local- Write Protocols • • • when a process wants to update data item x, it locates the primary copy of x, and subsequently moves it to its own location (Fig. 7-21) Advantage (in nonblocking protocol only): multiple, successive write operations can be carried out locally, while reading processes can still access their local copy updates are propagated to the replicas after the primary has finished with locally performing the updates 18 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Local-Write Protocols Figure 7-21. Primary-backup protocol in which the primary 19 migrates to the process wanting to perform an update. Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Local- Write Protocols Example primary-backup protocol with local writes • Mobile computing in disconnected mode (ship all relevant files to user before disconnecting, and update later on) • While being disconnected, all update operations are carried out locally. while other processes can still perform read operations (but no updates). • Later, when connecting again, updates are propagated from the primary to the backups, bringing the data store in a consistent state again 20 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Local- Write Protocols Example primary-backup protocol with local writes distributed file systems that require a high degree of fault tolerance • a fixed central server through which all write operations take place • The server temporarily allows one of the replicas to perform a series of local updates to speed up performance • When the replica server is done, the updates are propagated to the central server, from where they are then distributed to the other replica servers 21 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Replicated-Write Protocols • • write operations can be carried out at multiple replicas instead of only one A distinction can be made between: - active replication, in which an operation is forwarded to all replicas - majority voting (Quorum-Based Protocols) 22 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Active Replication • • • • • each replica has an associated process that carries out update operations operations need to be carried out in the same order everywhere. totally-ordered multicast mechanism. Such a multicast can be implemented using Lamport's logical clocks, Disadvantage: this implementation of multicasting does not scale well in large distributed systems. total ordering can be achieved using a central coordinator, also called a sequencer. 23 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Active Replication • • first forward each operation to the sequencer, which assigns it a unique sequence number and subsequently forwards the operation to all replicas Operations are carried out in the order of their sequence number. 24 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Quorum-Based Protocols • • A different approach to supporting replicated writes is to use voting clients to request and acquire the permission of multiple servers before either reading or writing a replicated data item 25 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Quorum-Based Protocols how the algorithm works? • a file is replicated on N servers • to update a file, a client must first contact at least half the servers plus one (a majority) and get them to agree to do the update. • Once they have agreed, the file is changed and a new version number is associated with the new file. 26 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Quorum-Based Protocols how the algorithm works? • To read a replicated file, a client must also contact at least half the servers plus one and ask them to send the version numbers associated with the file. • If all the version numbers are the same, this must be the most recent version 27 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Quorum-Based Protocols • • • to read a file of which N replicas exist, a client needs to assemble a read quorum, an arbitrary collection of any NR servers, or more to modify a file, a write quorum of at least Nw servers is required. The values of NR and Nw are subject to the following two constraints: NR +NW > N and NW > N / 2 28 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Quorum-Based Protocols Figure 7-22. Three examples of the voting algorithm. (a) A correct choice of read and write set. (b) A choice that may lead to write-write conflicts. (c) A correct choice, known as ROWA (read one, write all). 29 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Cache-Coherence Protocols cache-coherence protocols ensures that a cache is consistent with the server-initiated replicas • coherence detection strategy (when) • coherence enforcement strategy (How) 30 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Cache-Coherence Protocols • • • • • Caches are controlled by clients instead of servers much research in the design and implementation of caches caching solutions differ in their coherence detection strategy, when inconsistencies are detected. Dynamic solutions: inconsistencies are detected at runtime. For example, a check is made with the server to see whether the cached data have been modified since they were cached In the case of distributed databases, dynamic detectionbased protocols can be further classified by considering exactly when during a transaction the detection is done 31 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Implementing Client-Centric Consistency A Naive Implementation: • each write operation W is assigned a globally unique identifier. Such an identifier is assigned by the server to which the write had been submitted (Origin of W). • for each client, a track of two sets of writes: • The read set for a client consists of the writes relevant for the read operations performed by a client • The write set consists of the identifiers of the writes performed by the client. 32 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Implementing Client-Centric Consistency Monotonic-read consistency: • When a client performs a read operation at a server, that server is handed the client's read set to check whether all the identified writes have taken place locally • (The size of the set may introduce a performance problem) • If not, it contacts the other servers to ensure that it is brought up to date before carrying out the read operation • Alternatively, the read operation is forwarded to a server where the write operations have already taken place • After the read operation is performed, the write operations that have taken place at the selected server and which are relevant for the read operation are added to the client‘s read set 33 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Implementing Client-Centric Consistency Monotonic-write consistency: • When a client initiates a new write operation at a server, the server is handed over the client's write set • It then ensures that the identified write operations are performed first and in the correct order • After performing the new operation, that operation's write identifier is added to the write set 34 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Implementing Client-Centric Consistency read-your-writes consistency : • requires that the server where the read operation is performed has seen all the write operations in the client's write set • The writes can be fetched from other servers before the read operation is performed 35 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Implementing Client-Centric Consistency writes-follow-reads consistency: • implemented by first bringing the selected server up to date with the write operations in the client's read set • Then, later adding the identifier of the write operation to the write set, along with the identifiers in the read set 36 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Improving Efficiency • read set and write set associated with each client can become very large Solution: • a client's read and write operations are grouped into sessions • A session is typically associated with an application: it is opened when the application starts and is closed when it exits • Whenever a client closes a session, the sets are cleared 37 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5 Summary • • • • Consistency protocols describe specific implementations of consistency models a distinction can be made between primary-based protocols and replicated-write protocols In primary-based protocols, all update operations are forwarded to a primary copy that subsequently ensures the update is properly ordered and forwarded In replicated-write protocols, an update is forwarded to several replicas at the same time 38 Tanenbaum & Van Steen, Distributed Systems: Principles and Paradigms, 2e, (c) 2007 Prentice-Hall, Inc. All rights reserved. 0-13-239227-5