Deadlock Detection in Distributed Systems 1 Deadlock Model for Distributed Systems Systems have only reusable resources. Processes are allowed only exclusive access to resources. There is only one copy of each resource. 2 Simplified Deadlock model Since we have only reusable resources in this model and it is a single unit resource model, a cycle is the necessary and sufficient (iff) condition for deadlock. Wait-For-Graph (WFG) Graph edges are requests for resources and not data or control flow Any WFG reduction will have the same result as any other. A knot in the resource allocation graph indicates deadlock. Knot in a graph is a nonempty set of nodes such that for every node in the graph, all nodes in the graph and only nodes in the graph are reachable from the node There are no terminating sink nodes in the graph or sub-graph reachable from the given node 3 Deadlock Handling in Distributed Systems. More complicated in distributed systems. Lack of accurate knowledge of the current state of the system by any one site. Intersite communication involves unpredictable delays. 4 Deadlock Prevention. Process acquires all needed resources simultaneously before execution. Preemption of process that holds the needed resource. Decreases system concurrency. Deadlock in resource acquiring phase. - Problem can be solved by forcing processes to acquire needed resources one by one. This will farther reduce efficiency and concurrency. 5 Deadlock avoidance. Resources are granted to processes in various sites if the resulting global system state is safe. This is not practical in distributed systems. It is impractical for every site to maintain information on the global state of the system due to extensive need for communication between sites and the delays and inaccuracies involved in such extensive communication. The larger the system, the higher the computational intensity of computing global state of the system. 6 Deadlock detection. Looking for the cycle which is the necessary and sufficient condition for deadlock in the distributed model. After detecting the cycle, it must be broken to terminate deadlock. This is the most common and practical way of dealing with the problem of deadlock in distributed systems. 7 Control Organizations for Distributed Deadlock Detection. Centralized control. A designated site known as the control site is responsible to construct and maintain the global WFG and search for cycles. Single point of failure. High message traffic to and from control site. Message traffic is independent of the rate of deadlock formation! All sites request and release resources including local resources, by sending request resource and release resource messages to the control site. Control site updates its WFG after receiving messages from other sites and checks for deadlock. 8 The Ho-Ramamoorthy Algorithm. The two-phase algorithm. Every site maintains a status table containing the status of all the processes initiated at that site. resources locked. resources being waited for. Periodically, a designated site requests the status table from all sites, constructs a WFG from the information received and searches it for cycles. If a cycle is detected, the designated site again requests status tables from all the sites and again constructs a WFG using ONLY those transactions which are common to both reports to see if the same cycle is detected again. If it is, the control site will declare the system to be deadlocked. By getting two reports, the designated site reduces the probability of getting an inconsistent state of the system and reporting false deadlock. 9 The one-phase algorithm. Each site maintains two status tables. A resource status table. A process status table. Periodically, a designated site requests both the tables from every site, constructs a WFG using only those transactions for which the entry in the resource table matches the corresponding entry in the process table, and searches the WFG for cycles. No false deadlocks detected. Comparison of the two-phase and one-phase algorithm. One-phase is faster. Requires fewer messages. 10 Distributed control. All sites collectively cooperate to detect a cycle in the state graph that is likely to be distributed over several sites of the system. Deadlock detection is initiated when ever a process is forced to wait. 11 Chandy-Misra-Haas’s algorithm, an edge-chasing algorithm. Uses a special message known as the probe (i,j,k). (i,j,k) denotes that the deadlock detection is initiated by the process Pi and it is being sent by the home site of process Pj to the home site of process Pk. A probe message travels along the edges of the global TWF graph. A deadlock is detected if a probe message returns to its initiating process. A boolean array, dependenti, for each process Pi , is maintained. If Pi knows that Pj is dependent on it, dependenti (j) is set to true. Otherwise it is false. 12 Chandy-Misra-Haas’s algorithm, a diffusion computation based algorithm. A process determines if it is deadlocked by initiating a diffusion computation. Two types of messages used for this are the query(i,j,k) and the reply(I,j,k) messages. A blocked process initiates deadlock detection by sending query messages to all the processes from whom it is waiting to receive a message (dependent set). Active processes will discard query and reply messages. Upon receiving the first query message initiated by Pi, the blocked process Pk propagates the query to all the processes in its dependent set and sets a local variable numk(i) to the number of query messages sent. If the query message is not the first one received by Pk initiated by Pi, it replies to it if it has been continuously been blocked since the first query. Otherwise, it discards the query. The process Pk will finally send a reply message to PI when it has received a reply for every query message that it sent. For every reply, Pk will decrement numk(i) and when numk(i)=0, it send s a reply to Pi. 13 An initiator detects a deadlock when it receives reply messages to all the query messages it had sent out. 14 Hierarchical control. Sites are arranged in hierarchical fashion and a site is responsible for detecting deadlocks involving only its children sites. The Menasce-Muntz algorithm 15 The Ho-Ramamoorthy algorithm. Sites are grouped in disjointed clusters. Periodically, a site is chosen as a central site, which dynamically chooses a control site for each cluster. A control site collects status tables from all the sites in its cluster and applies the one-phase deadlock detection algorithm to detect all deadlocks involving only transactions within the cluster. The central site uses the information from the control sites to detect any deadlocks between the clusters. 16 Performance Number of messages exchanged may not be the true indicator of communication overhead. Deadlock persistence time vs. message traffic. Storage overhead to store deadlock information. Processing overhead to search for cycles and resolve deadlocks. False deadlocks. 17 Deadlock resolution. A process that detects a deadlock does not know all the processes involved in the deadlock. Two or more processes may detect the same deadlock. Can result in unnecessary abortion of processes. Solution: assign unique priorities to processes. 18