Reliability Evaluation of Distributed Systems and the File Allocation

advertisement
Reliability Evaluation Techniques
 Reliability refers to the degree of tolerance against errors and
component failures in a system.
 A reliable system prevents loss of information even in the event of
component failures.
 The multiplicity of storage devices and processors in a distributed
system allows the maintenance of multiple copies of critical
information within the system and the execution of important
computations redundantly to protect them against catastrophic
failures.
E.g. if one of the processors fails, the computation can be successfully completed
at the other processor, or if one of the storage devices fails, the information can
still be retrieved from the other storage device.
 Distributed computing systems have a potential for higher reliability
since a few parts of the system can be down without interrupting the
jobs of the users who are using the other parts of the system. This is in
contrast to the centralized system, where a failure of the central
processor results in the entire system shutting down.
E.g. if a workstation of a distributed computing system that is based on the
workstation-server model fails, only the user of that workstation is affected.
 The advantage of higher reliability is an important reason for use of
distributed computing systems for mission-critical applications whose
failure may be disastrous.
Ways to represent reliability
There are two ways to represent reliability:
1. Network point of view
2. Application point of view
Network Point of View
 Given the link and node reliability for a specific network topology,
determine the probability of the network becoming partitioned.
There are two popular parameters used to evaluate network reliability:
1. Source-terminal reliability (STR)
This is the probability that a path exists between a given source
node, s, and a given destination node, d.
2. Computer Network Reliability (CNR)
This is the probability that a path exists between any source
node and any destination node (i.e. the probability that any
node in the network can communicate with any other node in
the network).
Application Point of View
Given the link and processing node reliabilities, and the distribution of
programs as well as data files, determine the probability of successfully
completing a specific application.
A popular parameter used to evaluate reliability from the application
execution standpoint is:
Distributed Program Reliability (DPR)
This is the probability that a program can successfully execute in a
distributed computing system (i.e. the program can always access the
data files it needs to successfully execute).
Computing the STR
Consider the network topology given below:
2
X1
1
X
2
X2
0.85
0.8
X
2
0.8 X3
0.9
X4
3
0.9
X5
4
X
2
Assumptions:
X
1. Each link is full-duplex
2 when working.
2. Each link can is either in a working state or in a failed state.
3. The reliability of every possible link is known.
4. The nodes are perfectly reliable (this assumption is for
mathematical convenience only. The failure of a node can be
easily modeled by assuming the failure of all links that connect
to the node that has failed).
Step 1:
Determine all possible paths from source node 1 to destination node 3.
The possible paths (in increasing order of cardinality) are: X1 X4, X2 X5,
X1 X3 X5, and X2 X3 X4. These are depicted below in order:
X1 X4
X2 X5
X1 X3 X5
X2 X3 X4
Step 2:
Reliability Expression Generation by making the terms disjoint:
Term 1:
X1 X4
P1 P4
Term 2:
XI X4 (these are both not present in the second path
X2 X5)
Z(X1 X4) = X1’ + X1 X4’
X2 X5 (X1’ + X1 X4’)
P2 P5 (Q1 + P1 Q4)
Term 3:
X4 + X2
Z(X4 + X2) = X4’ X2’
X1 X3 X5 X4’ X2’
P1 P3 P5 Q4 Q2
Term 4:
X1 + X5 + X1 X5
X1 (1 + X5) + X5
Z(X1 + X5) = X1’ X5’
X2 X3 X4 X1’ X5’
P2 P3 P4 Q1 Q5
X1 + X5
Hence,
STR = P1 P4 + P2 P5 (Q1 + P1 Q4) + P1 P3 P5 Q4 Q2 + P2 P3 P4 Q1 Q5
STR = 0.85 * 0.8 + 0.9 * 0.9 (0.15 + 0.85 * 0.2) + 0.85 * 0.8 * 0.9 * 0.2 *
0.1 + 0.9 * 0.8 * 0.8 * 0.15 * 0.1 = 0.96008
The Zeta Operator
X1, X2, X3 are logical variables.
Z(X1) = X1’
Z(X1 + X2) = Z(X1) Z(X2) = X1’ X2’
Z(X1 X2) = Z(X1) + X1 Z(X2) = X1’ + X1 X2’
Z(X1 + X2 + X3) = Z(X1) Z(X2) Z(X3) = X1’ X2’ X3’
Z(X1 X2 X3) = Z(X1) + X1 Z(X2) + X1 X2 Z(X3)
= X1’ + X1 X2’ + X1 X2 X3’
Computing the CNR
Consider the network topology given below:
2
X1
1
X
2
X2
0.85
0.8
X
2
0.8 X3
0.9
X4
3
0.9
X5
4
X
2
X
Step 1:
2
Determine all minimum spanning trees (MST) for the topology given.
A MST is defined as the tree that connects every node in the network with no
cycle in them. For e.g. X1 X3 X4 is a MST but X1 X2 X3 X5 is not.
The MSTs (in increasing order of cardinality) are:
MST1 = X1 X3 X4
MST2 = X1 X3 X5
MST3 = X2 X3 X4
MST4 = X2 X3 X5
MST5 = X1 X2 X5
MST6 = X1 X2 X4
MST7 = X2 X4 X5
MST8 = X1 X4 X5
Step 2:
Make the MSTs disjoint.
Term1
Pr[MST1] = P1 P3 P4
Term2
X4
Z(X4) = X4’
X1 X3 X5 X4’
P1 P3 P5 Q4
Pr[MST2] = P1 P3 P5 Q4
Term3
X1 + X1 X5
X1 (1 + X5)
X1
Z(X1) = X1’
X2 X3 X4 X1’
P2 P3 P4 Q1
Pr[MST3] = P2 P3 P4 Q1
Term4
X1 X4 + X1 + X4
X1(X4 + 1) + X4
X1 + X4
Z(X1 + X4) = X1’ X4’
X2 X3 X5 X1’ X4’
P2 P3 P5 Q1 Q4
Pr[MST4] = P2 P3 P5 Q1 Q4
Term5
X3 X4 + X3 + X3 X4 + X3
X3 (X4 + 1 + X4 + 1)
X3
Z(X3) = X3’
X1 X2 X5 X3’
P1 P2 P5 Q3
Pr[MST5] = P1 P2 P5 Q3
Term6
X3 + X3 X5 + X3 + X3 X5 + X5
X3 (1 + X5 + 1 + X5) + X5
X3 + X5
Z(X3 + X5)
X3’ X5’
X1 X2 X4 X3’ X5’
P1 P2 P4 Q3 Q5
Pr[MST6] = P1 P2 P4 Q3 Q5
Term7
X1 X3 + X1 X3 + X3 + X3 + X1 + X1
X1 (X3 + X3 + 1 + 1) + X3 + X3
X1 + X3
Z(X1 + X3)
X1’ X3’
X1 X4 X5 X1’ X3’
P2 P4 P5 Q1 Q3
Pr[MST7] = P2 P4 P5 Q1 Q3
Term8
X3 + X3 + X2 X3 + X2 X3 + X2 + X2 + X2
X3 (1 + 1 + X2 + X2) + X2 + X2 + X2
X3 + X2
Z(X3 + X2)
X3’ X2’
X1 X4 X5 X3’ X2’
P1 P4 P5 Q2 Q3
Pr[MST8] = P1 P4 P5 Q2 Q3
Hence,
8
CNR =  Pr [MSTj]
j=1
Computing the DPR
Consider the topology of the DCS. The distribution of files is shown in curly
braces. Program P is executed on node 1 and the files required for its
execution are f1, f2, f3. The link reliabilities are as shown.
{f1, f2}
X2
2
X1
{f1, f5}
4
0.85
0.9
X7
0.95
0.8
0.8
1
X
X3
2
{f2, f5}
X4
X6
0.9
0.9
3
{f3, f4}
0.9
X5
6
{f1, f4}
X8
5
{f3, f6}
Step 1:
Determine all minimum file spanning trees (MFST) for the topology
given. The MFSTs (in increasing order of cardinality) are:
MFST1 = X1 X3
MFST2 = X1 X4
MFST3 = X3 X4
MFST4 = X3 X5 X6
MFST5 = X1 X2 X6
MFST6 = X3 X5 X8 X7
MFST7 = X1 X2 X7 X8
Term 1
Pr[MFST1] = P1 P3
Term 2
Z(X3)
X3’
Pr[MFST2] = P1 P4 Q3
Term 3
X1 + X1
X1
Z(X1)
X1’
Pr[MFST3] = P3 P4 Q1
Term 4
X1 + X1 X4 + X4
X1 (1 + X4) + X4
Z(X1 + X4)
X1’ X4’
Pr[MFST4] = P3 P5 P6 Q1 Q4
X1 + X4
Term 5
X3 + X4 + X3 X4 + X3 X5
X3(1 + X4 + X5) + X4
Z(X3 + X4)
X3’ X4’
Pr[MFST5] = P1 P2 P6 Q3 Q4
Term 6
X1 + X1 X4 + X4 + X6 + X1 X2 X6
X1 (1 + X4 + X2 X6) + X4 + X6
X1 + X4 + X6
Z(X1 + X4 + X6)
X1’ X4’ X6’
Pr[MFST6] = P3 P5 P7 P8 Q1 Q4 Q6
Term 7
X3 + X4 + X3 X5 X6 + X6 + X3 X5
X3 (1 + X5 X6 + X5) + X4 + X6
X3 + X4 + X6
Z(X3 + X4 + X6)
X3’ X4’ X6’
Pr[MFST7] = P1 P2 P7 P8 Q3 Q4 Q6
Hence,
7
DPR =  Pr [MFSTj]
j=1
X3 + X4
Applicability of CNR to network design
One of the considerations in the design of computer networks is the
reliability of between any pair of nodes and the maximum permissible cost.
Given the location of various nodes (routers) of the network, the maximum
permissible cost of the installing the links, and the possible position of the
links, an algorithm by K. K. Aggarwal is used to determine an optimal
network topology, which maximizes the CNR.
Network Design Algorithm is as follow:
Steps:
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
Find all the MSTs.
Generate a cost matrix, Sc, with its rows corresponding to each
spanning tree. For each row, if branch i exists in the MST then
initialize it with its cost, else zero.
Generate a reliability matrix, Sr, with its rows corresponding to
each spanning tree. For each row if branch i exists in the MST
then initialize it with its reliability, else one.
Determine the cost of each MST from matrix Sc and generate a
new matrix A.
Determine the reliability of each MST from matrix Sr and
generate a new matrix X.
Determine the ratio of MST reliability and MST cost and
generate a new matrix D, i.e., D = X / A.
Choose the MST that has the highest ratio and satisfies the cost.
Remove the links that are in the MST of step 7.
For each remaining branch compute the increment in the
reliability if that branch is added. Find the ratio of incremental
reliability, R, with the cost of the link
(i.e. D = R / cost of link).
Choose the one with the highest ratio (i.e. highest value of D)
and satisfy the remaining cost.
Remove this link from consideration and repeat steps 9 and 10
until all links are exhausted or the cost is exceeded.
Example of the network design problem
Link cost
Link reliability
a
2
0.9
b
5
0.7
c
3
0.8
d
6
0.6
e
4
0.9
f
3
0.8
g
4
0.7
h
3
0.8
Maximum permissible cost = 21 units
The nodes to be connected are as follows. The algorithm will determine
which links are to be maintained.
d
2
4
a
g
1
c
6
f
Xb
2
h
3
5
e
Step 1: All the possible MSTs are abdeh, bcdeh, acdeh, bdefh, adefh,
abefh, bdegh, adegh, abdgh, abdeg, abegh, bcefg, bcdfh, cdfha, acefh, bcegh,
bcdeg, bcdgh, acegh, acdeg, acdgh, bdefg, adefg, abefg, abdfg, bcefg, abcfg,
acefg, acdfg.
Steps 2 and 4
a
b
c
d
e
f
g
h
2
0
2
0
2
2
2
0
2
2
2
2
0
0
2
2
0
Sc = 0
0
2
2
2
0
2
2
2
0
0
2
2
5
5
0
5
0
5
5
5
0
5
5
5
5
5
0
0
5
5
5
0
0
0
5
0
5
5
5
5
0
0
0
3
3
0
0
0
0
0
0
0
0
0
3
3
3
3
3
3
3
3
3
3
0
0
0
0
3
3
3
3
6
6
6
6
6
6
0
6
6
6
6
0
0
6
6
0
0
6
6
0
6
6
6
6
0
6
0
6
0
6
4
4
4
4
4
0
4
4
4
0
4
4
4
0
0
4
4
4
0
4
4
0
4
4
4
0
4
0
4
0
0
0
0
3
3
3
3
0
0
0
0
0
3
3
3
3
0
0
0
0
0
0
3
3
3
3
3
3
3
3
0
0
0
0
0
0
0
4
4
4
4
4
0
0
0
0
4
4
4
4
4
4
4
4
4
4
4
4
4
4
3
3
3
3
3
3
3
3
3
3
0
3
3
3
3
3
3
0
3
3
0
3
0
0
0
0
0
0
0
0
A=
20
21
18
21
18
19
17
22
19
20
21
18
18
20
17
15
19
22
21
16
19
18
22
19
18
20
19
21
16
18
Steps 3 and 5
a
0.9
1
0.9
1
0.9
0.9
0.9
1
0.9
0.9
0.9
0.9
1
1
0.9
0.9
1
Sr = 1
1
0.9
0.9
0.9
1
0.9
0.9
0.9
1
1
0.9
0.9
b
c
d
e
f
g
h
0.7
0.7
1
0.7
1
0.9
0.7
0.7
1
0.7
0.7
0.7
0.7
0.7
0.7
1
0.7
0.7
0.7
1
1
1
0.7
1
0.7
0.7
0.7
0.7
1
1
1
0.8
0.8
1
1
1
1
1
1
1
1
1
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
1
1
1
1
0.8
0.8
0.8
0.8
0.6
0.6
0.6
0.6
0.6
0.6
1
0.6
0.6
0.6
0.6
1
1
0.6
0.6
1
1
0.6
0.6
1
0.6
0.6
0.6
0.6
1
0.6
1
0.6
1
0.6
0.9
0.9
0.9
0.9
1
0.9
0.9
0.9
0.9
1
0.9
0.9
0.9
1
1
0.9
0.9
0.9
1
0.9
0.9
1
0.9
0.9
0.9
1
0.9
1
0.9
1
1
1
1
0.8
0.8
0.8
0.8
1
1
1
1
1
0.8
0.8
0.8
0.8
1
1
1
1
1
1
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
1
1
1
1
1
1
1
0.7
0.7
0.7
0.7
0.7
1
1
1
1
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.7
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
0.8
1
0.8
0.8
0.8
0.8
0.8
0.8
1
0.8
0.8
1
0.8
1
1
1
1
1
1
1
1
X=
0.27216
0.24192
0.31104
0.24192
0.31104
0.24192
0.36288
0.21168
0.27216
0.21168
0.23814
0.31752
0.32256
0.21506
0.27648
0.41472
0.28224
0.21168
0.18816
0.36288
0.27216
0.24192
0.21168
0.27216
0.31752
0.21168
0.28224
0.18816
0.36288
0.24192
Step 6
D = X/A =
0.01361
0.01152
0.01728
0.01152
0.01728
0.01273
0.02134
0.00962
0.01432
0.01058
0.01134
0.01764
0.01792
0.01075
0.01626
0.02765
0.01485
0.00962
0.00896
0.02268
0.01432
0.01344
0.00962
0.01432
0.01764
0.01058
0.01485
0.00896
0.02268
0.01344
max
Step 7
Max[D] = 0.02765 for i = 16
A[16] = 15 (cost)
Thus branches acefh are connected as shown.
Figure 1
2
4
a
1
c
6
f
X
2
h
3
5
e
Step 8
The branches to be considered (in increasing order of cost) are g, b, d.
We have a cost of 21-15 = 6 units to work with.
Step 9
Insert the branch g in the topology shown in figure 1. The augmented
network will have spanning trees acefh, acefg, acehg.
2
4
a
g
1
c
6
f
X
2
h
3
5
e
The overall reliability of this network,
R = Pa Pc Pe Pf Ph + Pa Pc Pe Pf Pg Qh + Pa Pc Pe Ph Pg Qf
R = 0.55987
Therefore, R(1) = 0.55987 – 0.41472 = 0.14515
C(1) = 4 (cost increase, since cost of link g is 4)
D(1) = R(1) / C(1) = 0.03628
Next, insert branch b in Figure 1. The augmented network is shown
below. It has the spanning trees acefh, abefh, bcefh.
2
4
a
1
c
6
f
Xb
2
h
3
5
e
The overall reliability,
R = Pa Pc Pe Pf Ph + Pa Pb Pe Pf Ph Qc + Pb Pc Pe Pf Ph Qa
R = 0.51555
Therefore,
R(2) = 0.51555 – 0.41472 = 0.10483
C(2) = 5 (cost increase, since cost of link b is 5)
D(2) = R(2) / C(2) = 0.02096
Next, insert branch d in Figure 1. The augmented network is shown
below. It has the spanning trees acefh, adefh, acdfh, acdeh.
d
2
4
a
1
c
6
f
X
2
h
3
5
e
The overall reliability,
R = Pa Pc Pe Pf Ph + Pa Pd Pe Pf Ph Qc + Pa Pc Pd Pf Ph Qe
+ Pa Pc Pd Pe Ph Qf
R = 0.56678
Therefore,
R(3) = 0.56678 – 0.41472 = 0.15206
C(3) = 6 (cost increase, since cost of link d is 6)
D(3) = R(3) / C(3) = 0.02534
Since D(1) is the greatest we add the branch g permanently to
Figure 1. The total cost used up is now 15 + 4 = 19. We are left with
resources of 21-19 = 2 to work with. This is insufficient to add either
links b or d. Hence we stop. If we had the resources we would go back
to step 8. The final topology is:
2
4
a
g
1
c
6
f
X
2
h
3
5
e
Applicability of DPR to the file allocation problem
The reliability of executing various applications in a DCS depends on the
topology of the DCS, the program locations, and the allocation of files on
the DCS. If the topology is fixed, then the overall reliability of the DCS
depends mainly on the file allocation on the DCS.
The file allocation problem is formulated in terms of cost factors and
constraints, the objective being to allocate files such that the cost factor(s)
are minimized/maximized and all the constraints are satisfied. The DPR is
used to allocate files on a DCS such that the DPR is maximized.
The problem is formulated as:
Given:
DCS topology, link reliabilities, program location, files needed for
program execution.
Objective function:
Maximize DPR
Constraints:
NN

xij = NFi ; for each file Fi ; 1 <= i <= FN
j=1
where,
NN = total number of nodes on DCS
FN = total number of files on the DCS
NFi = total number of copies of file Fi allowed on the DCS
= 1; if file Fi is allocated to node Nj
xij =
= 0; otherwise
If there are n processing nodes and m data files in the DCS, then the total
number of possible assignments is nm . Thus the optimal allocation of files
on the processing nodes is a problem of exponential complexity. Hence,
various heuristic algorithms such as the Genetic Algorithm are used to solve
this problem.
Download