hw2sol

advertisement
CS 6378, Spring 2005
Homework 2
(Due on March 24, 2005)
1. Raymond’s tree-based algorithm for distributed mutual exclusion does not try to order critical
section requests based on time. So, the algorithm may be unfair. A counter-argument to this
statement of unfairness is that the hierarchical structure of the tree built by the algorithm takes
care of request ordering in an implicit way. Explain whether Raymond’s distributed mutual
exclusion algorithm is fair.
Note: if you say the algorithm is fair, you should explain the concept that ensures ALL the
requests will be fairly treated. If you say the algorithm is unfair, you should give an example
where a request or a set of requests will be treated unfairly. Raymond’s algorithm is given at the
end of the question paper.
Answer:
It is possible that Raymond’s tree treats some requests in an unfair manner. This happens when
the tree is unbalanced. E.g., assume that the tree has 10 levels on the left-hand side and only 1
level on the right hand side. Request from a site on the right hand side can reach the token holder
earlier than a request made by a site (say at the 10th level) on the left hand side, even though the
requests might have been made at the same time.
2. The initialization of SV (status vector) in Singhal’s algorithm is done in a “staircase pattern”:
number of Rs in each site decrease from left to right (with sites arranged in decreasing order).
This initialization pattern ensures that a token request will always be received by a group of nodes
among which token resides.
Will this staircase pattern be preserved during the numerous token exchanges? Why?
N
N
N
N
N
N
N
N
R
N
R
N
N
N
R
N
H
N
N
R
N
N
N
R
R
3
R
2
1
R
R
R
R
n
N
R
R
R
R
Sn
S3
S2
Status Vector SV in each site Si
N
S1
Answer:
When there is no “left over work”, i.e. no pending CS requests, the system will satisfy the
staircase pattern. Here, number of Rs in each site decrease from left to right even though the order
of sites will be interchanged. E.g., let us assume the order of requests to be: S3, S5, and lastly S2.
The updated staircase pattern is shown below: 4Rs in S4, 3Rs in S1, 2Rs in S3, 1 R in S5, and H
in S2.
N
R
N
N
N
N
N
N
R
N
R
N
N
R
N
H
N
N
N
R
N
R
R
4
R
N
3
R
2
1
R
R
R
R
n
5
N
R
R
N
N
Sn
S5
S4
S1 S3
Status Vector SV in each site Si
N
S2
3. We saw that some deadlock detection algorithms may detect false deadlocks, i.e., pronounce a
deadlock where none actually exists. Do you think diffusion based algorithm deadlock will detect
false deadlocks? Why?
Answer:
False deadlock occurs when the algorithm says there is a deadlock and no deadlock exists. This
occurs when there is a time interval between decision making and response given, e.g., HoRamamurthy approach.
In diffusion-based approach, decision making is done in a distributed manner. For instance, a
process decides to give a Reply if (i) it is not an engaging query and (ii) if it has collected the
number of replies equal to the number of processes in its dependent set. There is no time lag
between the decision of forwarding reply and the receipt of other replies/the “not engaging”
query. This decision making process of forwarding reply messages is done in a distributed
manner until it comes to the initiator. Hence, the algorithm does not detect false deadlocks.
4. Non-token based mutual exclusion algorithms can potentially run into a problem when one of
the participating sites crash (or the communication link to a site is cut). Will using a timeout (for
the expected message) be sufficient? Are there other aspects that might need attention in a nontoken based algorithm if such a fault occurs? If so, what are those aspects? (For simplicity,
consider Lamport or Ricart-Agrawala algorithm. Maekawa’s may be more complex).
Also, how do you think a new site (or a recovering site) can join non-token based mutual
exclusion algorithms and get their clocks synchronized?
Answer:
One main goal of mutual exclusion is to ensure only 1 active process in CS.
Timeout is starting point definitely. But it may not be sufficient. For example, let us say that S1
does not receive an expected message from S2, i.e., S1’s timer for S2 runs out. S1 can guess that
S2 is down or its communication link has snapped. However, if only S1 has lost contact with S2
and other sites can still communicate to S2 due to some reasons, then there is a possibility of
multiple sites entering CS.
Hence, the next important step for S1 is to make sure that other sites “agree” or “get to know”
about S2.
(Note that the question does not ask for solutions since the solutions may be complex. For
instance, communication failure can result in network partition, i.e., forming 2 different groups of
sites).
Regarding introduction of a new site: one possibility is to have a “control” message to help site
join the group. The control message will update the time stamp of the new site. Other possibility
is let the new site just send its CS request (one disadvantage is that this request might have the
lowest time stamp and hence get the highest priority. But it might be ok since the new site is
making its first CS request). When it gets response messages back, the new site’s clock will be
updated accordingly.
5. One way of optimizing non-token based algorithms is to reduce the number of messages
required per critical section invocation. For instance, Ricart-Agrawala algorithm reduces message
exchanges to 2*(N-1) and Maekawa tries to reduce N itself (by considering a subset of N as the
request set). How do you think token-based algorithms try to optimize?
Answer:
Token-based algorithms try to optimize the way the token holder is located.
Improving on the broadcast oriented approach used by Suzuki-Kasami algorithm, Singhal’s
heuristic algorithm tries to classify sites into possible token holders and not token holders.
Thereby reducing the number of token request messages communicated. Raymond’s algorithm
improves on it further by designating the token holder to be the root of a tree.
6. We know that the 3-phase commit protocol (for fault tolerance) is not resilient to multiple site
failures. The reason is that all the sites may not reach the same conclusion (i.e., the same final
state) in the presence of multiple failures. However, not all cases of multiple failures lead to
inconsistent conclusions. E.g., failure of more than 1 cohort may not lead to inconsistent
conclusions.
Explain with reason whether the following scenario is a case of multiple failure that can lead to
inconsistent conclusions: The coordinator fails in the state P1 after sending Prepare message to all
cohorts. One of the cohort fails in the state Wi (just before receiving the Prepare message from
the coordinator).
Answer:
It does lead to inconsistent conclusions. Recovering cohort may end up in abort and recovering
coordinator may end up in commit.
7. Consider the following wait-for-graph spread over 3 sites following OR request model. Let P1
be initiator of the diffusion computation algorithm for detecting deadlocks. Assuming that the
communication delays among the sites vary widely, what are the possible combinations of reply
messages received by P1?
Site 2
P2
P1
P3
Site 1
P9
P8
P4
P7
P6
P5
Site 3
Answer:
1. Reply from P2
2. Replies from P4, P8
3. Reply from P8
4. Reply from P4
Download