Distributed Deadlock Detection

advertisement
Distributed Deadlock Detection
Gilbert K. Cheung
December 2004
Abstract – Without distributed shared memory, distributed systems are prone to deadlocks. Deadlock is a
result of some uncontrolled sequence of release and request of resources among processes in a distributed
system. This survey paper presents some system models and deadlock handling techniques to deal with the
problem. Selected algorithms are also presented to see how distributed deadlocks can be detected.
Index Terms – Distributed system, deadlock detection, wait-for-graph
I. INTRODUCTION
When computers start to work together, interesting problems arise. As there is no shared memory, deadlock is
one of those problems. A primary motivation for using distributed systems is the possibility of resource sharing
[5]. A process makes requests or release control of a resource in an unknown order in execution as a priori. In
terms of locking mechanism, a process holds a lock of a resource when the process is controlling the resource.
As a process can request and release resources in any order not prior known, deadlocks may occur in the system.
A deadlock occurs when a set of processes are waiting for the lock of some resources held by other processes in
the same set. In this paper, we are going to present several algorithms that handle deadlocks in a distributed
system.
A deadlock can be resolved by aborting one or more processes in the deadlocked-set and restart that process such
that its previous state is resumed [2]. A process is aborted when all of the resources it is holding is released, and
withdraw all the resource requests it has made.
The rest of the paper is organized as follows. The system model for the deadlock detection in distributed system
will be discussed in Section II. Section III contains the handling methods of deadlocks in a distributed system.
We will show some selected algorithms for deadlock detection in Section IV. Discussion of deadlock detection is
presented in Section V. Finally, Section VI concludes the paper.
II. SYSTEM MODELS
A distributed system is a network that consists of a set of sites inter-connected by communication links. Within a
site, there are processes and resources. Resources are handled by a resource controller. In the system, multiple
processes and resources can coexist in the same site. When they are not in the same site, they communicate with
each other by messages. We assume that the message communication links are reliable and follows first-in-firstout ordering rule. The reliability and ordering of the communication can be easily implemented by using
combinations of retransmissions, acknowledgements and sequence numbers. There is no guaranteed maximum
time limit for message deliveries but process Pa can assert that the message sent by Pa to Pb will be delivered to
Pb eventually. We further assume that processes aborts, failures and malicious behaviors are non-existing.
At any given time, a process can be in a state either idle or executing. A process is idle when it is waiting for
some resource or message that is occupied or not sent by other process. A process is executing when it is not idle.
1
Resource and Deadlock model
Two deadlock models are interested in researches: Resource model and communication model [3]. We will
discuss both models.
Resource Model
In resource model, processes make requests to acquire some resources, such as data objects in distributed
database. Processes can simultaneously wait for several resources and cannot proceed until it acquired all the
locks of those resources. A set of processes is resource-deadlocked if all processes in the set are waiting for some
other processes in the set to release the lock of some resources [3]. The system model in [1] divided the
communication message into 2 categories: computation message and control message. A computation message is
any message that is sent because of the execution of an application process. It can either be REQUEST, REPLY,
or CANCEL message. REQUEST message is sent to a resource controller when a process wants to access to that
resource. REPLY message is sent to a requesting process when the resource controller has determined that the
resource is ready to be access by that process. A CANCEL message is sent to a requested resource controller
when the process decides to cancel the request made to access that resource earlier. A resource requested by a
process is said to be in the dependent set of the process. A control message is any message that is sent because of
the execution of the deadlock detection algorithm.
We will discuss about the algorithms about deadlock detection in the following the resource model presented in
[2]:
1) Resources are reusable
2) Resources are not duplicated
3) No two process can access a resource at the same time
A process Pj is said to be dependent on another process Pk if there exists a sequence of processes P j, Pi(1), Pi(2),
Pi(3),…, Pk, where each process in the sequence is idle and each process (except the first) in the sequence holds a
resource for which the previous process in the sequence is waiting. Pj is said to be locally dependent on Pk if all
the processes in the sequence is located in the same site [3].
Communication Model
In communication model, the resources the processes are acquiring are messages. Blocked processes are those
that are waiting for several arrivals of messages at the same time. A process can be unblocked when a subset of
those messages it is waiting is arrived. A nonempty set of processes is communication-deadlocked if all
processes in the set are permanently idle. A process is permanently idle if it never receives a message form any
process in its dependent set [3]. However, permanent idle cannot be detected by timeout since we cannot say that
a process A and B are deadlock on each other if B eventually send a message to A as the time B sends a message
is unknown. In [3], Chandy, Misra and Haas defined a set of process S is deadlocked if:
1) All processes in S are idle.
2) The dependent set of every process in S is a subset of S.
3) There are no messages in transit between processes S.
As all processes are idle and dependent on some processes in the same set, there can be no message transits
within the set. Without message transits and all processes are dependent on some processes in the same set, all
processes are idle.
AND, OR, and P-out-of-Q Models
A process changes its state from executing to idle when it waits for a reply for a sent REQUEST to a resource
controller or waits for some messages. Two resource request models are presented to describe the behavior:
AND model and OR model, in the work by Chandy, Misra, and Haas [3]. In AND model, an idle process
changes its state to executing when all requests to its dependent set are replied. In OR model, an idle process
2
changes its state to executing when any request to its dependent set is replied. In particular, resource deadlocked
processes are modeled by AND and communication deadlocks are modeled by OR. A P-out-of-Q model is a
generalization of these two models. A process can change its state from idle to executing when it has received p
REPLYs out of q REQUESTs sent, where p < q. In other words, AND model is generalized to P-out-of-Q
model by setting p equals q, and OR model is generalized by setting p to 1.
Graph-Representation of Deadlocks
With the dynamic state of deadlocks during runtime, an effective data structure to represent a deadlock in a
system becomes more important. The situation of deadlocks can be represented and visualized by Wait-ForGraphs (WFGs) [5]. A WFG is a directed graph (n, l) consists of n nodes and l edges. An edge is presented as
below [6]:
x
P1
P2
where P1 is blocked and waiting for a resource x held by P 2. The edge represents a wait-for relationship. In the
resource model, this edge does not mean that P 1 is communicating with P2 directly. P1 is communicating with
resource control of x which determines the resource locks to processes. A deadlock occurs if and only if there is
a cycle exists in the WFG of the system [5]. In the following example, P1, P2 and P3 are deadlocked as P1  P2
 P3  P1.
P1
P2
P3
III. DEADLOCK HANDLING
There are 3 different approaches [5] to handle the possibility of deadlocks in a distributed system: deadlock
prevention, deadlock avoidance, and deadlock detection.
Deadlock prevention
Deadlock can be prevented when all processes begin their execution after they have acquired all the resources
that they need. Processes can assert that when they begin their execution, all resources are already controlled by
them exclusively. This approach has a number of drawbacks [5]. First, it kills the concurrency, and thus the
primary motivation of distributed systems [5]. For example, if there is only one resource in the system and all
process will eventually need it, using the deadlock prevention approach, there can only be one process executing
at any given time. Second, it is a potential deadlock source during the phase of resource acquiring. For instance,
when a process P1 and P2 make requests to acquire resources R1 and R2 at the same time, the resource controller
of R1 may establish the lock to P1 while resource controller of R2 establish the lock to P2. P1 and P2 are
deadlocked in this phase as they are waiting for resource controller of R 2 and R1 to reply their requests
respectively.
Deadlock Avoidance
Deadlock avoidance is an approach to deadlocks handling that a resource is granted to a process if the resulting
global system state is safe for the lock to be granted [5]. Problems in this approach is that 1) every resources and
processes have to maintain a big table of global state such that they can decide if a lock should be granted or not
when it is requested. 2) There can be at most one process or resource controller in the system to decide if the
global state is safe or not at any given time. If there is more than one process or resource controller is making the
decision, they may give a false conclusion of the global state where they all found the global state as safe but the
3
net global state may not be safe. In [7], Wojcik, B. E. and Wojcik, Z. M. pointed out that it is impractical in
needing some knowledge a priori that is not available in a distributed organization and resulting in very high
contentions for communication channels.
Deadlock Detection
In deadlock detection, there is no control of how and when the processes should acquire locks to resources. The
probe or query computation is a deadlock detection sequence of messages, separated from the underlying
computation. The detection algorithm thus can be run concurrently with the computation. Any circular waits
(hence cycles in WFG of the system) is only the necessary condition for deadlock in the communication model
[7]. The algorithm then resolves the deadlock detection by using its deadlock resolution algorithm.
Types of Deadlock Detections
There are three types of deadlock detections:
1. Centralized deadlock detection
2. Distributed deadlock detection
3. Hierarchical deadlock detection
Centralized Deadlock Detection
In this type of detection, a designate site, called the control site, has the responsibility of constructing the global
WFG of the system [5]. Cycles are searched by this site and resolved by the control site. It is conceptually simple
to implement such a system. Since the control site has the full picture of the system, optimal decisions can be
made. However, this approach suffers from drawbacks that centralized deadlock control site is a single point of
failure. As the system consists of a larger number of sites and processes, the centralized control site has to serve
a larger number of processes, message traffics in that site will be increased and the computational load of the
cycle search algorithm will be increased. These affect the performance of the system as well as the stability of it.
Distributed Deadlock Detection
In distributed deadlock detection, processes are responsible to detect the deadlock by themselves [5]. They
utilize control messages between the processes to detect deadlocks. This type of detection enjoys the
concurrency of the algorithm as well as the tolerance to process failures. However, this type of detection also
suffers from a number of drawbacks. First, as the messages between processes are asynchronous and the system
is dynamic, a distributed algorithm solving the problem is hard to implement and design correctly. Second, this
type of algorithms is not as efficient as the centralized type because no processes can have the full picture of the
WFG. Third, all processes need to run the deadlock detection algorithm continuously and concurrently with the
underlying computation. While it is possible that the algorithm would use a small portion of the computational
resource, it is a performance leak.
Hierarchical Deadlock Detection
In hierarchical deadlock detection, sites are arranged into clusters hierarchically [5], where sites detect deadlocks
that involve only its descendant sites. Hierarchical algorithms tend to get the best out of the two types of
deadlock detection algorithms presented above. There is no single point of failure and sites are not going to be
overloaded with the deadlock detection algorithm when it is unnecessary for deadlock detection. This kind of
algorithm make uses of the access patterns of the system in order to design the hierarchy of the clusters so that
deadlocks are as localized in a cluster as possible. This is one of the biggest challenges in implementing such
kind of detection algorithm.
In the following section, we are going to discuss some algorithms in distributed deadlock detection.
4
IV. DISTRIBUTED DEADLOCK DETECTION ALGORITHMS
Obermarck’s Algorithm
Obermarck’s algorithm in deadlock detection is a path-pushing algorithm [2], [4]. Path-pushing algorithms are
those that information of the global WFG is distributed in the form of paths. It was developed for the distributed
database system R* of the IBM Corporation. In a distributed database system, the computation is done in a set of
participating sites. A transaction is an abstraction for the application processing performed to take the database
from one consistent state to another consistent state in a way that it can be viewed as atomic [4]. Transactions
involve agents in different sites. Therefore, given a site, there can be multiple agents in the site working on
different or the same transaction. When the first transaction is waiting for the second transaction and the second
transaction is waiting for the first transaction at the same time, the system is said to be deadlocked. In [4], the
author used the term Transaction Wait-For-Graph (TWFG) to stress the difference between transaction model
and communication model. However, the mechanism of the graph is roughly the same as WFG.
Each transaction is represented by a globally unique identifier. Since each site consists of a number of agents that
belongs to different transactions, the global TWFG is split into smaller parts when viewed by each site. In a
given site S, the algorithm uses a non-existing virtual agent EXTERNAL (“EX”) to denote external sites that are
not S. Only one T1  T2 edge is created in the TWFG no matter how many times transaction T1 is waiting for
transaction T2. The algorithm at each site builds and analyzes a directed TWFG, where the nodes represent the
transaction agents and the edges denote the wait-for relationships. When an agent is dependent (waiting for)
some agents in external site, it is denotes by an edge to “EX”. Similarly, when an agent is determined that an
agent in external sites is waiting for itself, it is represented by an edge from “EX” to that agent. For example, in a
TWFG “EX”  T1  T2  “EX” denotes a potential global deadlock that involves some external sites, where
there is an agent in external sites is waiting for T 1, and T2 is waiting for some agent in external sites.
The deadlock detectors are roughly synchronized when they exchange control messages. The control messages
are in the form of strings. Strings can be thought as partial TWFG that a site S1 sends to site S2 when some
agents in S1 are waiting for some agents in S2. For the previous example of TWFG, the site will send the string
“EX”, TRANS1, TRANS2 to the site where T 2 is waiting for.
Elementary cycles are cycles that do not involve EXTERNAL. These cycles are detected by each site in step 5 of
the algorithm. When a cycle is detected, the site will choose a victim in the cycle such that it will be removed.
Algorithm
For each site S,
1) Build a wait-for-graph using the transaction-to-transaction wait-for relationships.
2) Receive any strings of nodes transmitted from other sites and add them into the wait-for-graph.
a. For each transaction identified in the string, create a node of the TWFG if none exists at this
site.
b. For each transaction identified in the string, starting with the first, create an edge to the node
representing the next transaction in the string.
3) Create wait-for edges from EXTERNAL to each node representing a transaction’s agent that is
expected to receive on a communication link.
4) Create wait-for edges to EXTERNAL from each node representing a transaction’s agent that is
expected to send on a communication link.
5) Analyze the resulting graph, listing all elementary cycles.
6) Select a victim to break each cycle that does not contain the node External. As each victim is chosen for
a given cycle, remove all cycles that include the victim.
a. Site must remember the transaction identifier of the victim such that it can discard strings
received involves the victim.
b. If the victim transaction has an agent at this site, then the fact that the transaction was chosen
as a victim must be transmitted to each site known to contain an agent of the victim transaction.
Otherwise, the site has to transmit the fact to each site that sends a string containing the
victim’s identifier to S.
5
7) Examine each remaining cycle that contains the node External. If the transaction identifier of the node
External is waiting for is greater than the node that waits for External, then
a. Transform the cycle into a string, which starts with “EX” and terminates with a node identifier
that identify the node waiting for External in the site.
b. Send the string to each site which the terminating node in the string is waiting for.
T1
T5
T5
T1
S1: EX  T1  T5  EX
S2: EX  T1  T5  EX
False Deadlocks
False deadlocks are deadlocks that are detected but do not really exist in the system. The algorithm assumed that
the state of the system is static and strings are propagated to other sites to determine a deadlock. However, it is
obvious that the state of a distributed system can be rather dynamic. A local TWFG may not be a real picture of
the global state. As shown in the above figure, suppose a deadlock detection algorithm determined that T1  T5
exists in site S1. At the time the detection algorithm is querying the T 2, T3 and T4, T5 can release its locks and T1
waits is finished. However, the detection algorithm will not know about that and sends the string to other site S 2.
When S2, with TWFG T1  T5, received such a string, it will determine that a deadlock is existing T 1  T5 
T1, while it is not necessarily true.
Based on the fact that real deadlocks will persist until broken, when a deadlock cycle is detected, validation of
the transaction wait-for relationships in the cycle can be done. If the relationship returns to the starting node after
following the wait-for edges in the cycle, a deadlock can be concluded.
Because of step 6 in the algorithm, the transactions are totally ordered. This reduces the number of messages
transfers and decreases the deadlock detection overhead [2]. This also ensure that there will be exactly one
transaction in each cycle detects the deadlock.
Chandy-Misra Algorithm
In Chandy-Misra algorithm for deadlock detection, two algorithms are given. First algorithm is for AND models,
and second algorithm is for OR models.
1) Algorithm for AND model (Resource Model)
This algorithm is an example of edge-chasing algorithm [2]. For an edge-chasing algorithm, the
existence of a cycle in a WFG can be detected by propagating special control messages called probes
along the edges of the wait-for-graph [8]. Probes are concerned with the deadlock detection and are
distinct from other computation messages. When the initiator init receives a probe that is originated
from itself, it can be determined that there is a cycle in the graph and thus, a deadlock exists.
A probe is a triple [3] in the form of (i, j, k) denoting that Pi is the initiator of the probe, and Pj is the
sender of the probe and Pk is the receiver of it where Pj and Pk are not in the same site. Pj sends the
probe to Pk when the following conditions exist:
a. Pj is idle,
b. Pj is waiting for Pk,
c. Pj has determined that Pi is dependent on Pj.
6
Pk can either accept or discard the probe. P k accepts the probe if and only if:
a. Pk is also idle and is waiting for some other processes,
b. Pk did not know that Pi is depending on it,
c. Pk now knows that Pi is depending on it,
otherwise, Pk discards the probe.
Coloring the edges
To prove the correctness of the algorithm, coloring of edges is used [8].
 gray if Pi has sent a request to Pj that Pj has not yet received;
 black if Pj has received a request from Pi but has not yet sent a grant message to P i;
 white if Pj has sent a grant message to Pi but Pi has not yet received it.
Gray and black edges are called dark edges. When a dark cycle exists, we can see that the cycle will
persist.
Algorithm
Each resource controller maintains an array of dependent k for each process P k, where dependentk(i) is
true only if Pk’s site knows that Pi is dependent on it (i.e., Pi  Pk).
Default: dependentk(i) = false for all k and i.
Initiation of probe by idle process P i:
if Pi is locally dependent on itself
then
declare that Pi is deadlocked
else
for all Pa, Pb such that
a. Pi is locally dependent on Pa, and
b. Pa is waiting for Pb, and
c. Pa, Pb are on different sites.
send probe (i, a, b)
On receiving probe (i, j, k) for Pk:
if
a. Pk is idle, and
b. dependentk(i) = false, and
c. Pk has not replied to all requests of Pj,
then
dependentk(i) = true
if k == i
then
declare that Pi is deadlocked
else
for all Pa, Pb such that
a. Pk is locally dependent on Pa, and
b. Pa is waiting for Pb, and
c. Pa, Pb are on different sites.
send probe (i, a, b)
When a process Pk in the site becomes executing:
set dependentk(i) = false for all i
7
2) Algorithm for OR model (Communication Model)
This algorithm is designed for the communication model, where the process can change from idle to
executing when it receives some number of replies for the requests sent. This algorithm is an example
of diffusion computation based algorithm [8]. The base idea of a diffusion computation based algorithm
is that it is activated by a process that suspects a deadlock. Query and reply messages are used to detect
the activity of the processes in the dependent set. Query messages are in the form of (i, m, j, k) which
denotes that it is the mth query sent by initiator process Pi, and the query message is sent by process P j
to process Pk. A reply message is also in the form of (i, m, j, k). When the initiator Pi initiates the
deadlock detection, it sends query message to all processes in its dependent set. When an active process
received a query message, it discards it. When an idle process P k receives a query message (i, m, j, k),
and if the message is received for the first time, called engaging query, then Pk forwards the query
message to its dependent set. These properties follow [3]:
1.
2.
If the query message received by Pk is not an engaging query, and Pk is still in idle, it
immediately replies the message to Pj. If process Pi is deadlocked when it initiates its mth
query, then it will receive reply (i, m, j, i) corresponding to every query (i, m, i, j) it sent.
If the initiator process Pi received reply (i, m, j, i) corresponding to every query (i, m, i, j)
that it sent, then it can declare itself as deadlocked.
In the following algorithm, each process has to maintain these four arrays:
a. latest(i)
 the largest sequence number m in any query (i, m, j, k) sent or received by P k.
Initially, latest(i) = 0 for every i.
b. engager(i)
 i ≠ k, the identity of the process which casued latest(i) to be set to its current
value m by sending Pk the message query (i, m, j, k).
c. num(i)
 total number of messages of the form query (i, m, k, j) sent by P k, minus total
number of messages of the form reply (i, m, j, k) received by Pk, where m =
latest(i). num(i) = 0 means that Pk has received replies to all queries that P k sent.
d. wait(i)
 a boolean variable that is true if and only if P k has been idle continuously since
latest(i) was last updated. Initially, wait(i) = false for all i.
Algorithm
Initiation of query by an idle process P i:
latest(i) = latest(i) + 1
wait(i) = true
send query (i, latest(i), i, j) to all processes Pj in Pi’s dependent set
num(i) = the number of processes in Pi’s dependent set
For all processes Pk:
if executing, wait(i) = false for all i
discard all query messages
On receiving a query message (i, m, j, k) sent from Pj to Pk:
if m > latest(i)
then
latest(i) = m
engager(i) = j
wait(i) = true
for all proceses Pr in Pk’s dependent set
send query(i, m, k, r)
num(i) = number of processes in Pk’s dependent set
else if (wait(i) == true) and (m == latest(i))
8
then send reply (i, m, k, j) to Pj
On receiving a reply message (i, m, r, k) for Pk:
if (m == latest(i)) and (wait(i) == true)
num(i) = num(i) – 1
if num(i) == 0
then
if i == k
then
declare Pk is deadlocked
else
send reply (i, m, k, j) to Pj where j = engager(i)
To illustrate the two cases where the replies are sent, following example is constructed.
5
1
3
4
6
7
2
8
Example
In the example, WFG is given as above. At node 3 and node 6, replies are sent by to 2 when they
receive a query message that they have received before. It indicates that there is a cycle in the WFG and
node 3 and node 6 are participants of the cycles. Therefore, a reply is sent when a query is received with
wait(i) == true and latest(i) == m. At node 2, when the number of queries sent equal to the number of
the replies received, it sends back a reply to its engager. In this example, it is shown that node 2 sends a
reply to node 1 when it receives 2 replies from node 3 and node 6, indicating that there are cycles in its
dependent set and thus deadlock is unpreventable in this situation. Therefore, when node 1 receives a
reply from node 2, it will declare it is deadlocked.
Kshemkalyani-Singhal Algorithm
Kshemkalyani-Singhal algorithm [1] is an example of global state detection based algorithm [2]. The works in
this area are largely based on results by Chandy and Lamport [8]. The key to global state is that we are going to
find a consistent global state without freezing of the underlying computation. Consistent global state is a state
that is possible to exist in the system. Chandy and Lamport show how to obtain a consistent global state of the
computation by propagating markers along the links of the system. A maker separates the messages in the links
into those to be included in the snapshot (i.e. channel state or process state) from those not to be recorded in the
snapshot. It acts as delimiters for the messages in the channels so that the channel state recorded by the process
at the receiving end of the channel satisfies the condition that if a message m is sent by i to j, if sending m is not
in the consistent state, receive event of m in j should not in the consistent state and m should not in the channel
state [9]. After finding such a consistent global state, the algorithm
The algorithm consists of a single phase. The single phase comprises a diffusion of messages outward from the
initiator process along the edges of the WFG (outward sweep), and then the echoing of diffusion messages
inward to the initiator process (inward sweep). During the outward sweep of messages, the system records a
9
snapshot which each process on receiving the message records its local state. During the inward sweep of
messages, the system reduces the snapshot in a way that it can simulate the possible future of the system in terms
of unblocking events [1].
System Model
In this algorithm, a distributed system consists of nodes. P-out-of-Q model is used so that a node in a WFG is
reduced if P out of the Q requests on which it is blocked can be granted during the reduction process. Therefore,
at the end of the process, those nodes that are not reduced are deadlocked. In this system, every two nodes are
connected logically and there is no shared memory in the system so that every communication between the nodes
is done by sending and receiving messages. A node i has the following local variables to record its state:
ti:
t_blocki:
ini:
outi:
pi:
wi:
waiti:
integer, current local time
integer, the time at which node I was last blocked
set of integer, set of nodes whose requests are outstanding at node i
set of integer, set of nodes for which node I is waiting since t_block i
integer, the number of replies required for unblocking
real, weight for termination of deadlock detection algorithm detection
boolean, records the current status [2]
Send REQUEST
- Executed by process i when it blocks for a p i-out-of-qi requests. Pi and qi depend on the application and
pi <qi.
set Pi by the application
for each node j of qi nodes on which i are depending on, do
outi = outi  {j}
send REQUEST(i) to j
t_blocki=ti
Receive REQUEST
- Execute by process i when it receives a REQUEST made by process k
ini = ini {k}
Send REPLY
- Execute by process i when it replies to a request by process k
ini = ini – {k}
Receive REPLY(j)
- Execute by process i when it receives a reply from process j to its corresponding request. A reply to an
outdated request can be identified by the timestamp of the outdated request on the reply and is discarded
outi = outi – {j}
pi = pi -1
if pi == 0
then
t_blocki = 0
for every j in outi, send CANCEL(i) to j
outi = 0
Receive CANCEL(k)
- Executed by process i when it receives a CANCEL message from process k
ini = ini – {k}
When a process i is blocked, it initiates the deadlock detection algorithm with initiator = i. When there is more
than one processes are blocked, it is possible to have multiple instances of the algorithm are running
concurrently. Each invocation of the deadlock detection algorithm is treated independently and is identified by
10
the initiator’s process ID and the timestamp. Therefore, it is necessary for every node to maintain a snapshot
initiated by other nodes. They need to maintain the latest snapshot only because if there is another call for the
snapshot initiate from the same process i, that means that between the time process i initiated the first snapshot
and the time process i initiated the second snapshot, process i is not idle and thus not deadlocked. Therefore,
previous snapshots can be discarded.
Basic Idea of the Algorithm
We will describe the algorithm here. The detection algorithm makes use of three types of control messages:
FLOOD, ECHO and SHORT. FLOOD messages are sent by the initiator to the processes in its dependent set in
order to do the outward sweep of messages. Every process on receiving the first FLOOD message records its
local state. If the process receiving the FLOOD message is idle, it will forward the FLOOD message to every
process in its dependent set. If the process receiving the FLOOD message is not idle, it will initiate the reduction
process of the algorithm by sending ECHO message back to the sender of the FLOOD message. On receiving an
ECHO message, process i determines itself if it is deadlocked. If not, it will send ECHO message along the way
it received the FLOOD message. These are done concurrently and there is no exact synchronization marker
between the snapshot process and the reduction process in the algorithm.
Reduction process of the algorithm works in the way that a node representing a process is reduced if it receives
pi-out-of-qi ECHOs where qi is the number of FLOOD messages sent. It simulates the fact that the process will
be unblocked as pi-out-of-qi processes in its dependent set sent / forwarded the ECHOs message to it, meaning
that they are not blocked. Therefore, any node that is not reduced suffers from deadlock and those that are
reduced are not deadlocked. The initiator declares itself as deadlocked if it is not reduced at the time of the
termination of the algorithm.
WFG reduction can begin at a non-leaf node before recording of the WFG is completed. This happens when
ECHOs arrive to process i before process i sends all the FLOOD messages to processes in its dependent set.
Therefore, the two activities of the algorithm are done concurrently in a single phase where serialization is not
needed.
Termination Detection
To determine whether it is deadlocked or not, the initiator process requires a detection scheme for the
termination of the deadlock detection algorithm. It is done by using weights and SHORT messages. For the
initiator process, the weight is 1 when the algorithm is initiated. Weight is a real number from 0 to 1 inclusively.
Whenever the process is sending FLOOD message along the paths to the processes of its dependent set, it gives a
portion of the weight it has to those processes. In particular, if |DS(i)| denotes the number of processes in the
dependent set DS(i) of process i, when process i sends a FLOOD message to those processes is DS(i), it attaches
weight 1/|DS(i)| to every FLOOD messages. When the initiator receives returned weights equal to the weights it
has been given out, it can determine that the algorithm has been terminated.
On receiving subsequent FLOOD messages to process i, process i does not forward the FLOOD message to
processes in its DS(i). Rather, it returns the weight attached in the subsequent FLOOD message back to the
initiator using SHORT message. When a FLOOD message arrived at leaf node of the WFG, which represents a
process i that is not idle (i.e. |DS(i)| = 0), the weight is returned by attaching it in the ECHO messages. When a
non-leaf node receives returned weight through the ECHO message, and if it determines itself as unblocked by
that ECHO message, it return the received weights by distributing it among the ECHOs that are sent by that node
along the incoming edges in its WFG snapshot. When a non-leaf node receives a returned weight through the
ECHO message and it does not unblock itself, it returns the weight back to the initiator through SHORT message.
By this, the sum of weights in the system is always equal to 1.
Algorithm
Variables
The variables used in the algorithms for process i:
init: the initiator process identifier
11
As noted before, since it is possible to have multiple instances of the algorithm running concurrently, and the
process needs to maintain the snapshot of the latest received initiation of the algorithm from each process, each
node maintains LS where
LS: array [1…n] of records where n is the number of processes in the system.
A record in LS contains fields:
LS[init].out: set of integers, nodes on which process i is waiting in the snapshot
LS[init].in: set of integers, nodes on which waiting for process i
LS[init].t: integer, the time when init initiated snapshot, = 0 initially
LS[init].s: boolean, local blocked state as seen by the snapshot, = false initially
LS[init].p: integer, value of pi as seen in the snapshot
Messages:
FLOOD(a, b, t, w): a is the sender of the message, b is the initiator process, t is the time the algorithm is initiated,
and w is the weight attached in this message
ECHO(a, b, t, w): a is the sender of the message, b is the initiator process, t is the time the algorithm is initiated,
and w is the weight attached in this message
SHORT(b, t, w): b is the initiator process, t is the time the algorithm is initiated, and w is the weight attached
in this message
Procedures executed by process i
To initiate a snapshot:
init = i
wi = 0
LS[init].t = ti
LS[init].out = out(i)
LS[init].s = true
LS[init].in = 0
LS[init].p = pi
send FLOOD(i, i, ti, 1/|out(i)|) to each process j in out(i)
On Receiving a FLOOD message (j, init, t_init, w) from j:
if (LS[init].t < t_init) and (j is in in(i))
then
LS[init].t = t_init
LS[init].out = out(i)
LS[init].s = waiti
LS[init].in = {j}
if waiti == true
then
LS[init].p = pi
send FLOOD(i, init, t_init, w/|out(i)|) to each process k in out(i)
else
LS[init].p = 0
send ECHO(i, init, t_init, w) to j
LS[init].in = LS[init].in – {j}
else
if (LS[init].t < t_init) and (j is not in in(i))
then
send ECHO(i, init, t_init, w) to j
else
12
if (LS[init].t == t_init) and (j is not in in(i))
then
send ECHO(i, init, t_init, w) to j
else
if (LS[init].t == t_init) and (j is in in(i))
then
if LS[init].s == false
then
send ECHO(i, init, t_init, w) to j
else
LS[init].in = LS[init].in  {j}
send SHORT(init, t_init, w) to init
else
if LS[init].t > t_init
then
discard the FLOOD message
On Receiving ECHO message (j, init, t_init, w) from j:
if LS[init].t > t_init
then
discard the ECHO message
else
if LS[init].t < t_init
then
discard the ECHO message
else
if LS[init].t == t_init
then
LS[init].out = LS[init].out – {j}
if LS[init].s == false
then
send SHORT(init, t_init, w) to init
else
LS[init].p = LS[init].p – 1
if LS[init].p == 0
then
LS[init].s = false
if init == i
then
declare process i is not deadlocked
exit
send ECHO(i, init, t_init, w/|LS[init].in|) to all process k in LS[init].in
else
send SHORT(init, t_init, w) to init
On Receiving SHORT message (init, t_init, w)
if t_init < t_blocki
then
discard the SHORT message
if t_init > t_blocki
then
13
discard the SHORT message
if t_init == t_blocki
then
if LS[init].s == false
then
discard the SHORT message
else
wi = wi + w
if wi == 1
then
declare that Pi is deadlocked
abort
V. DISCUSSION
In distributed system that with no global shared memory and communication are all done by sending and
receiving messages, it is a nontrivial task to design a correct algorithm to solve the problem of deadlocks.
Algorithms are prone to errors because of the possibility of conclusion of an inconsistent state graph of the
system through outdated messages [5].
Proof of Correctness
The large number of errors in published algorithms addressing the problem of distributed deadlock detection
shows the need for proof of correctness of the algorithms. There are two important conditions for correctness [5]:
1.
2.
The algorithm must detect a deadlock if and only if that exists in the system.
No false deadlocks.
Deadlock Resolution
A deadlock is resolved by aborting at least one process involved in the deadlock cycle [2]. The resolution faces
two problems [2]:
1.
2.
The process that detected a deadlock does not know all processes that are deadlocked.
Concurrent deadlock resolutions can lead to unexpected results. Processes in a cycle in WFG initiate the
deadlock detection and resolutions, and may lead to multiple processes reduction, which is inefficiently and
thus undesirable.
There are algorithms that use priority of processes to deal with the second problem shown above. Only the
process with the highest priority in the cycle detects deadlocks and only the process with the lowest priority if
being reduced. This does not imply any performance gain in terms of the choice of the process being reduced,
but it solves the second problem.
A process to be aborted will do the following:
1. The process must be aborted; all the resources held by it must be released. The state of the resources
released must be restored to some reusable state for other processes. The resource controller must ensure
that these released resources are going to be granted to some processes in the deadlocked cycle.
2. All deadlock information about the process reduced must be cleaned so that the processes will not assume
its existence in the wait-for-graphs.
14
False Deadlocks
Deadlock is a problem that cannot be easily detected and resolved correctly [2]. There are many published
algorithms that claims to detect deadlocks but are vulnerable to false deadlocks. False deadlocks are deadlocks
that detected by the algorithm but do not exist in the system. Deadlock persists until it is being broken. We can
view that as a static problem. However, when we try to reduce the problem by deadlock resolution, the problem
becomes dynamic. Suppose two deadlock detection algorithms are running concurrently initiated by different
process in the system. If two detection algorithms detected 2 cycles which shares some edges and both
algorithms try to resolve the cycles by break some edges in the cycles detected, false deadlock is likely to be
detected when the first algorithm resolved the cycle, and the second algorithm still thinks there is a cycle in the
system.
Performance
It is suggested that the performance analysis of the algorithms published are not receiving enough attention from
the authors [2]. Algorithm performances that are measured in terms of number of messages exchanged are not
reflecting the real performance. Singhal and Shivaratri [2] suggested that performance of algorithms should be
measured in deadlock persistence time. It is the length of time of a particular deadlock persists in the system,
given that a deadlock detection and resolution algorithm is running. There is a tradeoff between message traffic
and deadlock persistence time.
Other than the communication overheads and deadlock persistence time, measures such as space requirements in
each process and the time complexity for the algorithm to run in each process should also be considered. There
are many other factors that may affect the performance of the deadlock detection algorithm such as the requestrelease pattern of the system, the arrangement of the processes and so on. How these factors interact with the
performance of the algorithm remains an open research topic.
VI. CONCLUSION
Distributed deadlock detection is an interesting problem in distributed systems. It involves both static and
dynamic properties and a generalized proof of correctness is yet to be found. In this paper, we have discussed
about ways to handle deadlocks in a distributed system. These are deadlock prevention, deadlock avoidance and
deadlock detection. While it is possible to do prevention and avoidance for deadlocks, the required overhead is
too large in a way that it is impractical to implement [2]. Deadlock detection is a popular research topic in this
area. In deadlock detections, it can be done via centralized control site, distributed processes, or special arranged
hierarchy of clusters.
The algorithms we have discussed solve the problem of deadlock in two phases: 1) it constructs the WFG of the
system, and 2), it searches for the cycles. Due to the lack of globally shared memory, the design of the
algorithms is difficult because sites may report the existence of a global cycle after seeing segments of the cycle
at different instants, even though all the segments never existed simultaneously.
We have discussed four types of distributed deadlock detection algorithms, path-pushing, edge-chasing,
diffusion, and global state algorithms. In path-pushing algorithms, wait-for paths are sent between deadlock
detectors to construct the WFG. In edge-chasing algorithms, probes are used to be sent along the edges of the
WFG. If a probe is received by its initiator, a cycle is detected. In diffusion algorithm, QUERYs and REPLYs
are sent along the paths of the WFG. If an initiator receives all REPLYs for the QUERYs sent, it can declare
itself as deadlocked. In global state detection algorithm, processes record their local snapshot when they received
the QUERY message from the initiator. The global snapshot is reduced when processes that are not idle reply
ECHOs. The initiator distributes a weight to its dependent set. When the weights it distributed are all returned, it
15
can determine that the algorithm is terminated. At that point of time, if it is not yet being reduced, it declares
itself as deadlocked.
While these algorithms present some of the most innovative ways to deal with the problem, it is by no means the
only types of the algorithms studied. Though there is a large number of algorithms are published for the deadlock
detection, there are areas that these papers are not yet addressed. Future research should focus on the
performance and correctness of the proofs of the algorithms.
16
REFERENCES
1.
2.
3.
4.
5.
6.
7.
8.
9.
Kshemkalyani, A. D., and Singhal, M., “Efficient Detection and Resolution of Generalized Distributed
Deadlocks,” IEEE Trans. on Software Engineering, January 1994.
Singhal, M. and Shivaratri, N. G., “Distributed Deadlock Detection,” in Advanced Concepts in
Operating Systems, McGraw-Hill, Inc., 1994, pp. 151-177.
Chandy, K. M., Misra, J., and Haas, L. M., “Distributed Deadlock Detection,” ACM Trans. on
Computer Systems, May 1983.
Obermarck, R., “Distributed Deadlock Detection Algorithm,” ACM Trans. on Database Systems, June
1982.
Singhal, M., “Deadlock Detection in Distributed Systems,” IEEE Computer, November 1989.
Choudhary, A. N., Kohler, W. H., and Stankovic, J. A., “A Modified Priority Based Probe Algorithm
for Distributed Deadlock Detection and Resolution,” IEEE Trans. on Software Engineering, January
1989.
Wojcik, B. E., and Wojcik, Z. M., “Sufficient Condition for a Communication Deadlock and
Distributed Deadlock Detection,” IEEE Trans. on Software Engineering, December 1989.
Knapp, E., “Deadlock Detection in Distributed Databases,” ACM Computing Surveys, December 1987.
Kshemkalyani, A. D., Raynal, M., Singhal, M., “An Introduction to Snapshot Algorithms in Distributed
Computing,” Distributed System Engineering, 1995.
17
Download