CSS490 Replication & Fault Tolerance
Textbook Ch9 (p440 – 484)
Instructor: Munehiro Fukuda
These slides were compiled from the course textbook and the reference books.
Winter, 2004
CSS490 Fault Tolerance
1
File Replication
Concepts
Difference between replication and caching
A replica is associated with a server, whereas a cache with client.
A replicate focuses on availability, while a cache on locality
A replicate is more persistent than a cache is
A cache is contingent upon a replica
Advantages
Increased availability/reliability
Performance enhancement (response time and network traffic)
Scalability and autonomous operation
Requirements
Naming: no need to be aware of multiple replicas.
Consistency: data consistency among replicated files.
Replication control: explicit v.s. implicit/lazy replication
ACID: Atomicity, Consistency, Isolation, and Durability
Winter, 2004
CSS490 Fault Tolerance
2
File Replication
Basic Architectural Model
1.
Client
Replica
Manger
Front
End
Replica
Manger
Client
Front
End
Ex: DNS
Winter, 2004
2.
3.
Replica
Manger
Web server
4.
5.
Request: send a client request to a
server.
Coordination: deliver the request
to each replica manger in some
order.
Execution: process a client
request but not permanently
commit it.
Agreement: agree if the execution
will be committed
Response: respond to the front
end
CSS490 Fault Tolerance
3
Group Communication
Replica
Manger
Client
Replica
Manger
Replica
Manger
group
Winter, 2004
Replica
Manger
Group membership service
Create and destroy a group.
Add or withdraw a replica manager
to/from a group.
Detect a failure.
Notify members of group
membership changes.
Provide clients with a group
address.
Message delivery
Absolute ordering
Consistent ordering
CSS490 Fault Tolerance
4
Absolute Ordering
Linearizability
Ti < Tj
Ti
mi
Tj
mi
mj
mj
Winter, 2004
Rule:
Mi must be delivered before mj if Ti < Tj
Implementation:
A clock synchronized among machines
A sliding time window used to commit
message delivery whose timestamp is in
this window.
Example:
Distributed simulation
Drawback
Too strict constraint
No absolute synchronized clock
No guarantee to catch all tardy messages
CSS490 Fault Tolerance
5
Consistent (Total) Ordering
Sequential Consistency
Ti < Tj
Ti
Tj
mj
mj
mi
mi
Rule:
Messages received in the same order
(regardless of their timestamp).
Implementation:
A message sent to a sequencer,
assigned a sequence number, and
finally multicast to receivers
A message retrieved in incremental
order at a receiver
Example:
Drawback:
Winter, 2004
Replicated database update
A centralized algorithm
CSS490 Fault Tolerance
6
Two-Phase Commit Protocol
Coordinator
Worker 1
Worker 2
INIT
INIT
INIT
Commit
Vote-request
WAIT
Vote-abort Vote-commit
Global-abortGlobal-commit
ABORT
COMMIT
Vote-request
Vote-commit
Vote-request
Vote-abort READY
Global-abort
Ack
ABORT
Another possible cases:
The coordinator didn’t receive all vote-commits.
A worker didn’t receive a vote-request.
A worker didn’t receive a global-commit.
Winter, 2004
Vote-request
Vote-commit
Vote-request
Vote-abort READY
Global-commit
Ack
COMMIT
Global-abort
Ack
ABORT
Global-commit
Ack
COMMIT
→ Time out and send a global-abort.
→ All workers eventually receive a global-abort.
→ Time out and check the other work’s status.
CSS490 Fault Tolerance
7
Multi-copy Update Problem
Read-only replication
Primary backup replication
Allow the replication of only immutable files.
Designate one copy as the primary copy and all the
others as secondary copies.
Active backup replication
Access any or all of replicas
Read-any-write-all protocol
Available-copies protocol
Quorum-based consensus
Winter, 2004
CSS490 Fault Tolerance
8
Primary-Copy Replication
1.
2.
Client
Front
End
Primary
Replica
Manger
Backup 4.
Replica
Manger
Client
Front
End
3.
Replica
Manger
5.
Backup
Winter, 2004
Request: The front end sends a
request to the primary replica.
Coordination:. The primary takes
the request atomically.
Execution: The primary executes
and stores the results.
Agreement: The primary sends the
updates to all the backups and
receives an ask from them.
Response: reply to the front end.
Advantage: an easy implementation,
linearizable, coping with n-1 crashes.
Disadvantage: large overhead
especially if the failing primary must
be replaced with a backup.
CSS490 Fault Tolerance
9
Active Replication
1.
2.
Client
Replica
Manger
Front
End
Replica
Manger
Client
Front
End
3.
4.
5.
Replica
Manger
Winter, 2004
Request: The front end multicasts
to all replicas.
Coordination:. All replica take the
request in the sequential order.
Execution: Every replica executes
the request.
Agreement: No agreement needed.
Response: Each replies to the front.
Advantage: achieve sequential
consistency, cope with (n/2 – 1)
byzantine failures
Disadvantage: no more linearizable
CSS490 Fault Tolerance
10
Read-Any-Write-All Protocol
Read from any one of them
Client
Client
Replica
Manger
Front
End
Write to all of them
Front
End
Winter, 2004
Replica
Manger
Replica
Manger
Read
Lock any one of replicas for a
read
Write
Lock all of replicas for a write
Sequential consistency
Intolerable for even 1 failing
replica upon a write.
CSS490 Fault Tolerance
11
Available-Copies Protocol
Read from any one of them
Client
Replica
Manger
Front
End
Write to all available replicats
X
Replica
Manger
Client
Front
End
Replica
Manger
Winter, 2004
Read
Lock any one of replicas for a
read
Write
Lock all available replicas for
a write
Recovering replica
Bring itself up to date by
coping from other servers
before accepting any user
request.
Better availability
Cannot cope with network
partition. (Inconsistency in two
sub-divided network groups)
CSS490 Fault Tolerance
12
Quorum-Based Protocols
#replicas in read quorum + #replicas in write quorum > n
Read quorum
Client
Client
Front
End
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Front
End
Write quorum
Read-any-write-all: r = 1, w = n
Winter, 2004
Read
Retrieve the read quorum
Select the one with the latest
version.
Perform a read on it
Write
Retrieve the write quorum.
Find the latest version and
increment it.
Perform a write on the entire
write quorum.
If a sufficient number of replicas
from read/write quorum, the
operation must be aborted.
CSS490 Fault Tolerance
13
ISIS System
Process group: see page 4 of this ppt file
Group view
p1 Joins the group
p2
p3
p4
multicast
multicast
rejoins
crashed
multicast
Partially multicast messages
must be discarded
Multicast to
available processes
Reliable multicast
Causal multicast: see pages 5 & 6 of MPI ppt file
Atomic broadcast: see page 7 of this ppt file
Winter, 2004
CSS490 Fault Tolerance
14
Gossip Architecture
RMk
Gossip
RMj
(Tj)
RMi
(Ti)
Query, Tf
Value, Ti
If (Tf < Ti)
FE
return value
(Tf)
else {
waits for RMi to be updated Query Value
or
Client
query RMj/RMk}
Winter, 2004
If (Tj > Tk)
update RMk
else
discard the gossip message
Update, Tf
Update id
If (Tf > Tj)
update RMj
FE
else {
update Client
Update
or
ignore and update RMj}
Client
CSS490 Fault Tolerance
15
Bayou System
Committed
Primary
RM
Sent first
Tentative
C0 C1 C2
RM
Sent later
FE
Tn
T3
T1
Perform a dependency check
T0
Client
Client
Secretary and other employees:
book 3pm
Winter, 2004
Check conflicts
Check priority
Merge Procedure
Client
Tn Tn+1
To make a tentative update
committed:
FE
FE
FE
T0 T1 T2 T3
CN
Cancel tentative updates
Change tentative updates
Client
Executive: book 3pm
CSS490 Fault Tolerance
16
Coda File System
1.
Normal case:
•
Read-any, write-all protocol
•
Whenever a client writes back its file, it increments the file version at each server.
2.
Network disconnection:
•
A client writes back its file to only available servers.
•
Version conflicts are detected and resolved automatically when network is reconnected
Client disconnection:
•
A client caches as many files as possible (in hoard walking).
•
A client works in local if disconnected (in emulation mode).
•
A client writes back updated files to servers (in reintegration mode).
3.
W W
Version[2,2,3]
Version[2,2,2]
Version[1,1,1]
Server 3
Winter, 2004
W
Version[3,3,2]
Version[2,2,2]
Version[1,1,1]
Server 2
CSS490 Fault Tolerance
Version[3,3,2]
Version[2,2,2]
Version[1,1,1]
Server 1
emulation
hoard
reintegration
17
Paper Review by Students
ISIS System
Gossip Architecture
Bayou System
Coda
Winter, 2004
CSS490 Fault Tolerance
18