Mutual Exclusion in Distributed Systems 1 Mutual for Distributed Systems Lack of shared memory (shared variables). Must rely on message passing to assure mutual exclusion. Mutual exclusion algorithms for distributed systems: Non-token based. Token based. 2 Measuring performance. Number of messages necessary per CS (Critical Section) invocation. Synchronization Delay: the time required after a site leaves the CS and before the next site enters the CS. Response time: the time interval a request waits for its CS execution to be over after its request messages have been sent out. Throughput: CS execution rate. Low and high system load performance rating. Best and worst case performance. 3 Solving the distributed mutual exclusion problem. The control site algorithm. Single point of failure. Uneven work load. High synchronization delay. Low system throughput. High and uneven network traffic. 4 Non-Token-Based Algorithms. The concept of information structure forms the basis for unifying different non-token-based mutual exclusion algorithms. This information structure defines the data structure needed at a site to record the status of other sites. The information kept in the data structure is used by a site in making decisions when invoking mutual exclusion. The information structure at site Si consists of the following three sets: - Request set Ri - Inform set Ii - Status set Sti 5 A site must obtain permission from all the sites in its request set before entering CS Every site must inform all the sites in its inform set when waiting to enter the CS or exiting the CS of its status change The status set contains the ids of sites for which Si maintains status information If Si Ij Sj Sti A site maintains a variable CSSTAT containing the site’s knowledge of the status of the CS and a queue containing REQUEST messages in the order of their timestamps for which no GRANT message has been sent 6 Correctness condition: To guarantee mutual exclusion, the information structure of sites in the generalized algorithm must satisfy the following conditions: If i: 1 i N :: Si Ii the following conditions are necessary and sufficient to guarantee mutual exclusion i: 1 i N :: Ii Ri ij: 1 i,j N :: (Ii I j )(Si Rj Sj Ri) (for every two sites, either they request permission from each oher or they request permission from a common site) 7 8 Lamport’s algorithm. Uses Lamport’s clocks. Every site SI keeps a request-queue containing mutual exclusion requests ordered by their timestamps. Every site has a request set. i: 1 i N :: RI = {S1, S2, …, SN}. Number of messages per CS invocation are 3(N-1). Synchronization delay is T. 9 Requesting the CS by site Si: Si Sends REQUEST(tsI, i) message to all sites in Ri and places the request on the request_queuei. Sj receives REQUEST(tsI, i), message and sends timestamped REPLY message back to Si, and places the request on the request_queuej. Entering the CS by Si: [L1] It has received a message with larger timestamp than (tsi, i) from all other sites (assuring that every one has received and replied to its request) and [L2] it’s request is at the top of the request_queuei. Releasing the CS by Si: Removes its request from the top of its request queue and sends a timestamped RELEASE message to all the sites in its request set. Sj receives RELEASE message and removes the request from the request queue. 10 The proof of the theorem that Lamport’s algorithm achieves mutual exclusion can be done by contradiction. Performance: Requires 3(N-1) messages, where N is the number of sites in the request set. Has a synchronization delay of T, where T is the average message delay. 11 Ricart-Agrawala Algorithm. An improvement on Lamport’s algorithm which requires few number of messages. Requesting the CS by Si: Sends timetamped REQUEST messages to all sites in its request set. Receiving site Sj sends a REPLY message to Si if it is neither requesting nor executing the CS or if it is requesting and has a higher timestamp than Si. Executing the CS by Si: Enters the CS after it has received REPLY messages from all the sites in its request set. Releasing the CS by Si: Sends REPLY messages to all the deferred requests. 12 The proof of the theorem that Ricart-Agrawala algorithm achieves mutual exclusion can also be done by contradiction. Performance: Requires 2(N-1) messages, where N is the number of sites in the request set. Has a synchronization delay of T, where T is the average message delay. 13 Maekawa’s (square root) Algorithm. Radically different approach to distributed mutual exclusion. A site does not request permission from every other site, only a subset of the sites. A site can have only one outstanding REPLY message at any time and therefore it grants permission to an incoming request if it has not granted permission to some other site. The construction of request sets: - i j : i != j, 1 I, j N :: RI Rj - i : 1 i N :: Si RI - i : 1 i N :: |Ri | = K - Any site Sj is contained in K number of RI ‘s. |Ri | = N. 14 Requesting the CS by Si: Sends REQUEST(i) messages to all sites in its request set Ri. Receiving site Sj sends a REPLY(j) message to Si if it hasn’t sent a REPLY message to a site from the time it received the last RELEASE message. Otherwise, it queues up the REQUEST for later consideration. Executing the CS by Si: Enters the CS after it has received REPLY messages from all the sites in its request set. Releasing the CS by Si: Sends RELEASE(i) message to all the sites in Ri. When a site receives a RELEASE(i) message, it sends a REPLY message to the next site waiting in the queue. 15 The proof of the theorem that Maekawa’s algorithm achieves mutual exclusion is again by contradiction. Performance: Requires 3N messages per CS execution, where N is the number of sites in the request set. Has a synchronization delay of 2T, where T is the average message delay. Prone to dead-lock. Dead-lock can be expected because a site can be exclusively locked by other sites and requests are not prioritized by their timestamps. 16 Token-Based Algorithms. Unlike non-token-based algorithms, token-based algorithms are free from starvation and deadlock. Suzuki-Kasami algorithm. CS is entered by the site having the token which is passed around the sites. A site void of the token, attempting to enter the CS broadcasts a REQUEST message for the token to all the other sites. Outdated REQUEST(i,n) messages are distinguished from the current ones by the array RNI[1…N] where RNI[ j ] is the largest sequence number received so far in a REQUEST message from Sj. When site Si receives a REQUEST(j,n) message, it sets RNi[ j ] := max(RNI[ j ], n). The token consists of a queue of requesting sites, Q, and an array of integers LN[1..N], where LN[j] is the sequence number of the request that site Sj executed most recently. LN[I] := RNi[ i ] by SI to indicate that its request has been executed. 17 Requesting the CS by Si when it does not have the token: Increment RNi[i] and sends REQUEST(i,sn) messages to all sites. sn = RNi[i]. Receiving site Sj sets RNj[i] = max(RNi[i], sn) and if it has the token and does not need it, sends it to Si if RNj[i] = LN[I] + 1. Executing the CS by Si: When it possesses the token. Releasing the CS by Si: Sets LN[I] = RNi[i] For every Sj whose ID is not in the token queue, it appends its ID to the token queue if RNi[j] = LN[j] + 1 If token queue is non empty after the above update, it deletes the top site ID from the queue and sends the token to the site indicated by the ID. 18 Performance: Very simple yet very efficient. Requires ONLY 0 or N messages per CS execution, where N is the number of sites. Has a synchronization delay of 0 or T, where T is the average message delay. 19 Raymond’s tree-based algorithm. Sites are logically arranged as a directed tree. Edges of the tree are assigned directions toward the site (root of the tree) that has the token. Every site has a local variable holder that points to an immediate neighbor node on a directed path to the root node. Every site keeps a FIFO queue, called request_q which stores the requests of those neighboring sites that have sent a request to this site, but have not yet been sent the token. 20 Requesting the CS: When a site wants to enter CS, it sends a REQUEST message to the node along the directed path to the root, provided it does not hold the token and its request_q is empty. It then adds its request to its request_q. When a site on the path receives this message, it places the REQUEST in its request_q and sends a REQUEST message along the directed path to the root provided it has not sent out a REQUEST message on its outgoing edge for a previously received REQUEST on its request_q. When the root site receives a REQUEST message, it sends the token to the site from which it received the REQUEST message and sets its holder variable to point at that site. When a site receives the token, it deletes the top entry from its request_q, sends the token to the site indicated in this entry, and sets its holder variable to point at that site. If the request_q is nonempty at this point, then the site sends a REQUEST message to the site which is pointed at by the holder variable. 21 Executing the critical section: A site enters the CS when it receives the token and its own entry is at the top of its request_q. In this case, the site deletes the top entry from its request_q and enters the CS. Releasing the critical section: If its request_q is nonempty, then it deletes the top entry from its request_q, sends the token to that site, and sets its holder variable to point at that site. If the request_q is nonempty at this point, then the site sends a REQUEST message to the site which is pointed at by the holder variable. 22 Performance: Average message complexity is O(log N) because average distance between any two nodes in a tree with N nodes is O(log N). Synchronization delay is (T log N)/2 because the average distance between two sites to successively execute the CS is (log N)/2. Greedy strategy. 23