Distributed Deadlocks and Transaction Recovery.

advertisement
Distributed Deadlocks and
Transaction Recovery.
Anil Bheemidi.
Deadlocks in distributed systems






A deadlock is a state in which each member of a group of
transactions is waiting for some other member to release a lock
The result is that both processes hang. Deadlocks occur most
commonly in multitasking and client/server (interactive)
environments.
Communication deadlock and Resource deadlock.
Same conditions for distributed systems as centralized
Harder to detect, avoid, prevent
Strategies
– ignore
– detect
– prevent
– avoid
A typical distributed transaction.
Deadlocks in distributed systems are similar to deadlocks in centralized systems.
In centralized systems, we have one operating system that can oversee resource
allocation and know whether deadlocks are (or will be) present. With distributed
processes and resources it becomes harder to detect, avoid, and prevent
deadlocks.
Several strategies can be used to handle deadlocks:
ignore: we can ignore the problem. This is one of the most popular solutions.
detect: we can allow deadlocks to occur, then detect that we have a deadlock in
the system, and then deal with the deadlock
prevent: we can place constraints on resource allocation to make deadlocks
impossible
avoid: we can choose resource allocation carefully and make deadlocks
impossible.
Deadlock avoidance is never used (either in distributed or centralized systems).
The problem with deadlock avoidance is that the algorithm will need to know
resource usage requirements in advance so as to schedule them properly.
Deadlock detection



Preventing or avoiding deadlocks can be difficult.
Detecting them is easier.
When deadlock is detected
- kill off one or more processes
-> annoyed users
-if system is based on atomic transactions, abort one or more transactions.
-> transactions have been designed to withstand being aborted.
->system restored to state before transaction began.
->transaction can start a second time.
->resource allocation in system may be different so the transaction may
succeed.
‘Edge-chasing’ to detect deadlock

●
●
‘Probe’ message is generated
- sent to processes holding the needed resources.
- message contains ‘3’ numbers
->process that just blocked, process that sent the message & process to
whom it is sent.
When message arrives, recipient checks to see if it is waiting for any processes
- if so update message.
if a message goes all the way around and comes back to the original sender, a
cycle exists.
☻ and this gives rise to a ‘deadlock’.
Distributed deadlock prevention


●
●
●
Design systems so that deadlocks are structurally impossible.
Various techniques exist
- allow processes to hold one resource at a time.
-require all processes to request all resources initially and release them all
when asking for a new one.
with global time and atomic transactions : two other techniques.
-assigning each transaction a global timestamp when it starts.
when one process is about to block waiting for a resource used by another.
-check to see which has a larger ‘time stamp’ (which is ‘older’).
allow processes to wait only if the waiting process has a higher (younger)
timestamp than the process waiting for.
Transaction Recovery.

Why we need Recovery

What is required for Recovery
Why transaction recovery

Transaction recovery is required to ensure failure atomicity and
durability in the presence of failure
- transaction abort
-> update-in-place & deferred update
- system crash
-> logging & shadow versions
When things go wrong…
A transaction processing service is ‘fault-tolerant’ if it fails temporarily but can
recover without loss of data.

There are many variants.

We are concerned with fault-tolerance in a simple distributed transaction
processing service.

Atomicity means all or nothing; a requirement that
-the effects of all committed transactions are reflected in the data items and
-none of the effects of incomplete or aborted transactions are reflected in the
data items.
● Durability and Failure atomicity.
-Durability requires that data items are saved in permanent storage and
therefore when a two-phase protocol commits, the changes to the data item
are permanently stored.
-Failure atomicity requires that the effects of a transaction are atomic even when
a client or server fails.

Recovery Manager
●
Data items are held in volatile memory (database memory) until committed
when they are transferred to permanent storage and recorded in a recovery file.
● Recovery therefore consists of restoring the server with the most recent
recovery file.
● Recovery is performed by the ‘Recovery Manager’ which
-saves the data of committed transactions in permanent storage (recovery file).
-restores the server’s data items after a crash.
-reorganize the recovery file to improve recovery efficiency.
The Intentions list:
-> A server must keep track of the data accessed by client’s transactions.
-> Transactions have transaction identifiers.
-> Each transaction has an intentions list( a list of all the names and values
of all the data items altered by that transaction.
Intentions list



When a transaction commits the server uses the transactions list to identify the
data items that it affected.
The committed version of each data item is replaced with the tentative version
made by that transaction and the value is written to the server’s recovery file.
When a transaction aborts, the server uses the intentions list to delete all the
tentative versions of data items made by that transaction.
Entries In the Recovery list:
● Data item
- a value of a data item
● Transaction status
- Transaction identifier, Transaction status ( prepared,
committed, aborted )
Intentions list
- transaction identifier and a sequence of intentions, each of
which consists of the identifier of a data item and the position in
the recovery file of the value of the data item.
RECOVERY OF DATA ITEMS:
- A server is restarted after failure.
- Default values are restored and then RM takes over
- RM must restore the correct values of the data items so that they contain the

correct values of the committed transactions performed in the correct order
- The effects of incomplete or aborted transactions must not be included
RECOVERY OF DATA ITEMS
- The recovery file is read backwards, restoring committed data items as
necessary
- This is made more effective by the use of the intentions lists
- Check pointing prevents the necessity to read the whole recovery file on
failure
Recovery schemes:
● logging
● shadow versions
Logging





This is one of the uses of the recovery file (in this instance it is sometimes
called the log)
It is a log of all the values of data items, transaction status entries and
intentions lists of transactions processed by the server
Normally the recovery manager is called whenever a transaction prepares to
commit, commits or aborts a transaction
When a server is prepared to commit a transaction, the RM appends to the
recovery file all the data items in its intentions list, followed by the current
transaction status (prepared) and its intentions list
When the transaction is committed or aborted, the RM appends the
corresponding status of the transaction to the recovery file
Log for banking service

Recovery process
– when a server is restarted, setsdata to their default values and
lets
recovery manager proceed recovery
– recover manager reads recovery file “backward”
-> if transaction is committed, restore committed value
-> if not, record status of transaction as aborted
check pointing



process of reclaiming space in recovery file
-> to reduce space required for logging
-> to improve recovery process
record in recovery file can be discarded except
-> current committed value of data items
-> transaction status entries and intention lists of transactions that have
not yet been fully resolved
checkpoint
-> mark in recovery file where checkpoint process has begun
Shadow versions
Shadow versions






The previous recovery technique stored all the recovery data in
the log file that is read in reverse during the recovery process.
The shadow versions technique reduces the detail stored in the
recovery file and uses a map to access data items held in a
version store.
When a transaction prepares to commit, changed data items are
appended to the version store.
The versions written are shadows of previous committed
versions.
When a transaction commits a new map is made. The switch
between the two maps must be done atomically.
To recover the recovery manager reads the map file to access
the data items.
Download