CSS434: Parallel & Distributed Computing

advertisement
CSS490 Replication & Fault Tolerance
Textbook Ch9 (p440 – 484)
Instructor: Munehiro Fukuda
These slides were compiled from the course textbook and the reference books.
Winter, 2004
CSS490 Fault Tolerance
1
File Replication
Concepts



Difference between replication and caching
 A replica is associated with a server, whereas a cache with client.
 A replicate focuses on availability, while a cache on locality
 A replicate is more persistent than a cache is
 A cache is contingent upon a replica
Advantages
 Increased availability/reliability
 Performance enhancement (response time and network traffic)
 Scalability and autonomous operation
Requirements
 Naming: no need to be aware of multiple replicas.
 Consistency: data consistency among replicated files.
 Replication control: explicit v.s. implicit/lazy replication
 ACID: Atomicity, Consistency, Isolation, and Durability
Winter, 2004
CSS490 Fault Tolerance
2
File Replication
Basic Architectural Model
1.
Client
Replica
Manger
Front
End
Replica
Manger
Client
Front
End
Ex: DNS
Winter, 2004
2.
3.
Replica
Manger
Web server
4.
5.
Request: send a client request to a
server.
Coordination: deliver the request
to each replica manger in some
order.
Execution: process a client
request but not permanently
commit it.
Agreement: agree if the execution
will be committed
Response: respond to the front
end
CSS490 Fault Tolerance
3
Group Communication

Replica
Manger
Client
Replica
Manger
Replica
Manger
group
Winter, 2004

Replica
Manger
Group membership service
 Create and destroy a group.
 Add or withdraw a replica manager
to/from a group.
 Detect a failure.
 Notify members of group
membership changes.
 Provide clients with a group
address.
Message delivery
 Absolute ordering
 Consistent ordering
CSS490 Fault Tolerance
4
Absolute Ordering
Linearizability

Ti < Tj

Ti
mi
Tj
mi

mj
mj
Winter, 2004

Rule:
 Mi must be delivered before mj if Ti < Tj
Implementation:
 A clock synchronized among machines
 A sliding time window used to commit
message delivery whose timestamp is in
this window.
Example:
 Distributed simulation
Drawback
 Too strict constraint
 No absolute synchronized clock
 No guarantee to catch all tardy messages
CSS490 Fault Tolerance
5
Consistent (Total) Ordering
Sequential Consistency

Ti < Tj
Ti

Tj
mj
mj
mi
mi

Rule:
 Messages received in the same order
(regardless of their timestamp).
Implementation:
 A message sent to a sequencer,
assigned a sequence number, and
finally multicast to receivers
 A message retrieved in incremental
order at a receiver
Example:


Drawback:

Winter, 2004
Replicated database update
A centralized algorithm
CSS490 Fault Tolerance
6
Two-Phase Commit Protocol
Coordinator
Worker 1
Worker 2
INIT
INIT
INIT
Commit
Vote-request
WAIT
Vote-abort Vote-commit
Global-abortGlobal-commit
ABORT
COMMIT
Vote-request
Vote-commit
Vote-request
Vote-abort READY
Global-abort
Ack
ABORT
Another possible cases:
The coordinator didn’t receive all vote-commits.
A worker didn’t receive a vote-request.
A worker didn’t receive a global-commit.
Winter, 2004
Vote-request
Vote-commit
Vote-request
Vote-abort READY
Global-commit
Ack
COMMIT
Global-abort
Ack
ABORT
Global-commit
Ack
COMMIT
→ Time out and send a global-abort.
→ All workers eventually receive a global-abort.
→ Time out and check the other work’s status.
CSS490 Fault Tolerance
7
Multi-copy Update Problem

Read-only replication


Primary backup replication


Allow the replication of only immutable files.
Designate one copy as the primary copy and all the
others as secondary copies.
Active backup replication

Access any or all of replicas



Read-any-write-all protocol
Available-copies protocol
Quorum-based consensus
Winter, 2004
CSS490 Fault Tolerance
8
Primary-Copy Replication
1.
2.
Client
Front
End
Primary
Replica
Manger
Backup 4.
Replica
Manger
Client
Front
End
3.
Replica
Manger
5.

Backup

Winter, 2004
Request: The front end sends a
request to the primary replica.
Coordination:. The primary takes
the request atomically.
Execution: The primary executes
and stores the results.
Agreement: The primary sends the
updates to all the backups and
receives an ask from them.
Response: reply to the front end.
Advantage: an easy implementation,
linearizable, coping with n-1 crashes.
Disadvantage: large overhead
especially if the failing primary must
be replaced with a backup.
CSS490 Fault Tolerance
9
Active Replication
1.
2.
Client
Replica
Manger
Front
End
Replica
Manger
Client
Front
End
3.
4.
5.
Replica
Manger


Winter, 2004
Request: The front end multicasts
to all replicas.
Coordination:. All replica take the
request in the sequential order.
Execution: Every replica executes
the request.
Agreement: No agreement needed.
Response: Each replies to the front.
Advantage: achieve sequential
consistency, cope with (n/2 – 1)
byzantine failures
Disadvantage: no more linearizable
CSS490 Fault Tolerance
10
Read-Any-Write-All Protocol
Read from any one of them
Client
Client
Replica
Manger
Front
End
Write to all of them
Front
End
Winter, 2004


Replica
Manger

Replica
Manger

Read
 Lock any one of replicas for a
read
Write
 Lock all of replicas for a write
Sequential consistency
Intolerable for even 1 failing
replica upon a write.
CSS490 Fault Tolerance
11
Available-Copies Protocol

Read from any one of them
Client

Replica
Manger
Front
End
Write to all available replicats

X
Replica
Manger
Client
Front
End
Replica
Manger


Winter, 2004
Read
 Lock any one of replicas for a
read
Write
 Lock all available replicas for
a write
Recovering replica
 Bring itself up to date by
coping from other servers
before accepting any user
request.
Better availability
Cannot cope with network
partition. (Inconsistency in two
sub-divided network groups)
CSS490 Fault Tolerance
12
Quorum-Based Protocols
#replicas in read quorum + #replicas in write quorum > n

Read quorum
Client
Client
Front
End
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger
Replica
Manger

Front
End
Write quorum
Read-any-write-all: r = 1, w = n
Winter, 2004

Read
 Retrieve the read quorum
 Select the one with the latest
version.
 Perform a read on it
Write
 Retrieve the write quorum.
 Find the latest version and
increment it.
 Perform a write on the entire
write quorum.
If a sufficient number of replicas
from read/write quorum, the
operation must be aborted.
CSS490 Fault Tolerance
13
ISIS System


Process group: see page 4 of this ppt file
Group view
p1 Joins the group
p2
p3
p4

multicast
multicast
rejoins
crashed
multicast
Partially multicast messages
must be discarded
Multicast to
available processes
Reliable multicast

Causal multicast: see pages 5 & 6 of MPI ppt file

Atomic broadcast: see page 7 of this ppt file
Winter, 2004
CSS490 Fault Tolerance
14
Gossip Architecture
RMk
Gossip
RMj
(Tj)
RMi
(Ti)
Query, Tf
Value, Ti
If (Tf < Ti)
FE
return value
(Tf)
else {
waits for RMi to be updated Query Value
or
Client
query RMj/RMk}
Winter, 2004
If (Tj > Tk)
update RMk
else
discard the gossip message
Update, Tf
Update id
If (Tf > Tj)
update RMj
FE
else {
update Client
Update
or
ignore and update RMj}
Client
CSS490 Fault Tolerance
15
Bayou System
Committed
Primary
RM
Sent first
Tentative
C0 C1 C2
RM

Sent later
FE


Tn
T3
T1
Perform a dependency check


T0
Client
Client
Secretary and other employees:
book 3pm
Winter, 2004
Check conflicts
Check priority
Merge Procedure


Client
Tn Tn+1
To make a tentative update
committed:
FE
FE
FE
T0 T1 T2 T3
CN
Cancel tentative updates
Change tentative updates
Client
Executive: book 3pm
CSS490 Fault Tolerance
16
Coda File System
1.
Normal case:
•
Read-any, write-all protocol
•
Whenever a client writes back its file, it increments the file version at each server.
2.
Network disconnection:
•
A client writes back its file to only available servers.
•
Version conflicts are detected and resolved automatically when network is reconnected
Client disconnection:
•
A client caches as many files as possible (in hoard walking).
•
A client works in local if disconnected (in emulation mode).
•
A client writes back updated files to servers (in reintegration mode).
3.
W W
Version[2,2,3]
Version[2,2,2]
Version[1,1,1]
Server 3
Winter, 2004
W
Version[3,3,2]
Version[2,2,2]
Version[1,1,1]
Server 2
CSS490 Fault Tolerance
Version[3,3,2]
Version[2,2,2]
Version[1,1,1]
Server 1
emulation
hoard
reintegration
17
Paper Review by Students




ISIS System
Gossip Architecture
Bayou System
Coda
Winter, 2004
CSS490 Fault Tolerance
18
Download