doc - Louisiana Tech University

advertisement
UML based OSCAR Cluster Availability Modeling
Hertong Song
Computer Science
College of Engineering & Science
Louisiana Tech University
Ruston, LA 71270, USA
Email: hso001@latech.edu
Abstract:
Computer system’s availability is crucial as computers are getting more and more into aspects of our daily
life. It is necessary to evaluate a system’s availability at its design stage. The most analytic availability
modeling techniques are Fault Trees, Markov Chains, Stochastic Petri Net and various methods. On the
other hand, system designers may not be intimate with these techniques; in stead, they normally use UML
to describe systems behavior and communicate with each other. This paper proposes a modeling
framework of using UML to model system’s availability and helping system designers to evaluate the
systems availability at ease. The paper will then present a way to model OSCAR cluster system’s
availability using UML notation, the mapping from UML to Continuous Time Markov Chain and finally
acquire the result from the calculation of the underlying CTMC mode .
Keywords: UML, High Availability, OSCAR, Continuous Time Markov Chain
1.
Introduction
Computer systems are becoming more and more associated with our society and life. They exist in almost
every corner of our daily life, such as telecommunications, banking system, automobiles, spacecraft and
even microwave. A computer clustering system is a high performance computing system (HPC), it can also
improve system’s availability such that when one component is down, while the other components can take
over the workload, and keep the system continuous functioning. Meanwhile System availability plays an
important role in computer systems. Unexpected failures of systems may result the society into chaos, such
as power down, financial lose, even human life. There is no doubt that it is extremely important to evaluate
the system’s availability benchmark at the initial design stage.
The analytical formalism of evaluating system’s availability is using statistical methods like Reliability
diagram, fault tree, Markov Chain, Stochastic Petri Net and so on. On the other hand, UML is becoming a
de facto standard modeling language to depict the system’s behaviors and aspects. Most industrial software
architects, product managers and system designers are using UML to specify the system and they may not
be familiar to the statistical methodologies used for evaluate the system’s availability. Some researchers
have been using UML to model the reliability aspects on specific software [1]. In our previous study [5],
only two-tier system has been addressed. We have enhanced our modeling techniques to model multi-tier
system, and the system’s components may have the same functionality but different failure rate.
In this paper, we propose a mechanism of using UML to model a multi-tier cluster computing system. Then
the UML model will be mapped to the Continuous Time Markov Chain model. Finally, the result of the
system’s availability is calculated based on the uniformization algorithm [9].
The framework of the UML based system availability modeling is illustrated in figure 1. Step (1) UML
notations articulate a system design and describe component-wise reliability information. Step (2) Obtain
an equivalent XMI representation from the UML model. Step (3) Maps the model to the continuous time
Markov chain model (CTMC). And Step (4) calculated the system’s availability based on the CTMC
model.
UML Rep resentation
XMI Representation
Availability Model
Results
Figure 1 Framework of the UML modeling tool
The remainder of this paper is structured as follows: section 2 gives a description of OSCAR cluster system
architecture, section 3 shows how to use UML to model system’s reliability and section 4 describes the
mapping to continuous time Markov chain (CTMC). Finally section 6 addresses the calculation of CTMC.
2.
System Architecture
OSCAR (Open Source Cluster Application Resources) [2] is a software package that is popular for building
High Performance Computing (HPC) systems. It consists of a head node, referred as the server; and a set of
working nodes, herein referred as the clients. The server takes the job requests and dispatches the jobs to
the clients, which actually handle the requests. The server communicates with its clients via a local network
connection.
From a high availability point of view, the current OSCAR suffers several single point of failure. For
example, if the server is down, then the whole system is down. For this reason, the High Availability
OSCAR (HA-OSCAR) is being developed in Louisiana Tech University [3]. The HA-OSCAR has two
servers, two switches and double connections. Figure 2 shows the system architecture of HA-OSCAR.
Server 1
switch 1
node 1
node 2
Server 2
switch 2
node 3
node 4
Figure 2 the architecture of OSCAR cluster system
3.
The UML Model of the Cluster System
Initially, UML distributed diagram used for modeling the availability of cluster systems [5]. At current
stage, we assume (1) each component in the system works independently, and that a failure of one
component will not cause the failure of another component; (2) only the hardware failure for each node is
considered. (3) Once the system is down, there will be no further failures happen. (4) The failure for each
component is exponentially distributed. Figure 3 shows the distributed diagram represents a cluster system.
Figure 3 the distributed diagram of a cluster system
Currently, Gentleware’s Poseidon [10] for UML community edition’s tool is used to create the UML
model. By specifying the availability information for each node as UML tagged values, we will be able to
create the underlying Markov chain model for the cluster system. Then the availability of the system can be
calculated based on the Markov chain model. Table 1 lists the UML tags names and their representations.
UML Tags
Name
failure rate
repair rate
Min
Multiplicity
Represents
Name of the component. Used for grouping
The failure rate of the component
The repair rate of the component
The minimum components required to keep the system
functioning
The number of duplicates of the component
Table 1. UML Tags used for Availability Modeling
Considering a cluster system with two identical servers, two identical switches and two identical clients.
The system is functioning if at least one of the servers, switches and clients are functioning. To model this
system, we use the name server, switch and node as their names; specify failure rate and repair rate, the
minimum requirement of each component to keep the system functioning, and finally the number of
duplicates for each components. Figure 4 shows the using UML tags for this example.
name
server 1
name
switch 1
name
node 1
failure rate
0.001
failure rate
0.002
failure rate
0.001
repair rate
0.05
repair rate
0.01
repair rate
0.02
min
1
min
1
min
1
multiplicity
2
multiplicity
2
multiplicity
2
figure 4 an example of using UML tags.
4.
Construction of Markov Chain
Based on the giving UML notation, we can partition components into distinguished groups. Each tuple
represents a group. The number of a tuple represents the number of components in the corresponding
group. For example, the notation n1 , n 2 , , n s  means there are s groups; there are n1 identical
components in group 1, n 2 identical components in group 2 and so on.
The resulting Markov chain will be in a tree-like structure. The top level (root) of the tree is the initial state
of the Markov chain, denoted by n1 , n 2 , , n s  . At each level below the root, there are s possibilities
(braches), for each components in each group may fail. At the second level, for example, the states will be
n1  1, n2 , , n s , n1 , n2  1, , n s  , n1 , n2 , , n s  1 . The tree will propagate with s children until it
reaches a leave, which is the minimum number of components required for a certain group to keep the
system functioning. A leave will be in a state like n1  d 1 , n 2  d 2 , , k i , , n s  d s  ; where k i denotes
the minimum number of components required for group i to keep the system functioning, d i represents the
number of components are down in group i and ni  d i  k i . Each branch in the tree has a failure rate with
ni  d i  i
and a repair rate  i .
Once the tree is constructed, each duplicated state will be removed by linking its parent state to the same
state in the left part of the tree, with the same failure rate of the removed state.
The states in the tree will be marked as a single integer based on breath-first method. Namely, the root will
be state 0; at the second level, from left to right, the state will be 1, 2, …, s ; then the third level and so on.
The initial probability vector of the Markov chain will be 1,0 , ,0  .
Figure 5 shows the resulting Markov chain from the previous example.
2,2,2
21
2 2
1
1,2,2
1
2 2
1
0,2,2
1
1
0,1,2
0,1,1
1
1,1,1
2
1,2,1
1
23  3
2
1
1
21
3
2
1,0,2
23
3
2,1,2
23
1,1,2
2
2
2
1,0,1
0,2,1
2 2
2
1,1,1
1
1,1,2
3
2
2,2,1
23
2
2,0,2
3
1,2,0
3
2,1,0
21
1
1,2,1
2
3
2,1,1
21
1
1,1,1
2 2
2
2
2,0,1
3
2,2,0
3
3
2,1,0
3
3
1,1,0
Figure 5 a Markov chain example
Same level components with different failure or repair rate.
One way is to use another tuple in the Markov states. For example if the there are s groups denoted by
n1 , n2 , , ns  . If the group 2 is composed with 2 different failure rates, we can use n1 , n2 , n3 , , n s 1  .
Where n 2 and n3 denote the two subgroups of the group 2. The system is functioning if k 1  k 2  k . In
the UML representation, we still use the same name or level to denote the two groups. The diagram below
shows the Markov chain such that the two servers in the above example have two different failure rates.
The repair rates and duplicated states are eliminated from the diagram for simplicity.
1,1,2,2
1
0,1,2,2
0,0,2,2
1,0,2,2
0,1,1,2
0,0,1,2
2 2
1
0,1,2,1
0,1,0,2
0,0,1,1
0,1,1,1
1,1,1,2
1,0,1,2
0,0,2,1
0,1,0,1
23
1,0,2,1
0,1,2,0
0,1,1,0
1,0,0,2
1,0,0,1
1,1,2,1
1,1,0,2
1,0,1,1
1,0,2,0
1,1,1,1
1,1,0,1
1,1,2,0
1,1,1,0
1,0,1,0
Figure 6 a Markov chain model with two different servers
Figure 6 shows the Markov chain model from the previous example but two servers are different.
5.
Calculation of Markov Chain
Markov chain model is popular for system availability modeling, and it has been well developed [7]. The
system’s transient availability of the corresponding continuous time Markov chain model is given by the
Kolmogorov forward differential equation:
d t 
  t Q
dt
where  t  is the state probability vector at time t , and Q is the generator matrix.
If we let Ω denotes the state space of the system described by a CTMC, Ω 0 the set of the state space of
operational system states and Ω f the set of system down states, then the system’s availability at time t is
given by:
At     i t 
iΩ0
Ω 0 is the set of all leaves represented in the tree as given in figure 5 and figure 6, and Ω 0  Ω  Ω f .
For the previous example shown in figure 5, the initial probability vector is  0   1,0 ,0 , ,0 , and the
generator matrix is:
21
2 2
 21   2   3 

1
 1  1  2 2  23  0


2
0

Q
3
0

0
1

0
2


0
3

2 3
0
0
0
1 2 2

0 
23 








The general solution of the above equation is  t    0 e Qt . However, the general solution of Markov
models is nearly impossible except for those in certain forms [7]. For this reason, we adopt a numerical
method referred as uniformization [9] to find the solution.
From the uniformization method, the solution of the Kolmogorov differential equation can be written in the
form as:

 t    0   e qt
k 0
qt k Q* k
k!
Q

where Q *    I  and q  maxi qii . The infinite series of the above equation can be truncated by
q

choosing a truncated error  . Then the solution will be close to:
 t    0   e  qt
r
k l
qt k Q* k
k!
where l and r are the left starting point and right end point of the series. The detailed computation
algorithm is given in the appendix of [9].
6.
Conclusion
UML is becoming a standard to modeling software systems. It has been studied to model system’s
availability aspects recently. We have enhanced our UML modeling availability techniques to model a
multi-tier cluster system, by specifying tagged values for each system component in the UML model. The
UML model is then mapped into the continuous time Markov chain model. The result of the system’s
availability is finally calculated based on the uniformizaiton method. Envision that our framework may
extend the semantics of UML and enable system architects, software designers to use UML to model the
system’s reliability with ease, we will consider to enhance our work to use UML modeling availability of
systems with dependency of the components, dynamic aspect.
References
[1]
Zarras and V. Issarny. UML-Based Modeling of Software Reliability. Proc. of the 1st ICSE Workshop on
Describing Software Architecture with UML, May 2001, Toronto, Canada. pp. 36- 40. 2001.
[2]
Michael J. Brim, Timothy G. Mattson, Stephen L. Scott, “OSCAR: Open Source Cluster Application Resources,”
Ottawa Linux Symposium 2001, Ottawa, Canada, 2001.
[3]
C. Leangsuksun, L. Shen, H. Song, S. L. Scott, and I. Haddad, “The Modeling and Dependability
Analysis of High Availability OSCAR Cluster System,” 17th Annual International Symposium on
High Performance Computing Systems and Applications (HPCS 2003), Sherbrooke, Canada, May
2003.
[4]
C. Leangsuksun, L. Shen, T. Liu, H. Song, S.L. Scott, “Dependability Prediction of High Availability
OSCAR Cluster Server,” The 2003 International Conference on Parallel and Distributed Processing
Techniques and Applications (PDPTS’03), Las Vegas, Nevada, USA, June 23-26, 2003.
[5]
C. Leangsuksun, H. Song, L. Shen, “Reliability Modeling Using UML,” The 2003 International
Conference on (SePDPTS’03), Las Vegas, Nevada, USA, June 23-26, 2003.
[6]
C. Leangsuksun, L. Shen, H. Song, and S. L. Scott, “Availability Prediction and Modeling of High
Availability OSCAR Cluster,” Submitted to ………………
[7]
K. S. Trivedi, Probability and Statistics with Reliability, Queuing, and Computer Science Applications. PrenticeHall. 1982.
[8]
K. S. Trivedi, R. Sahner “Reliability Modeling using SHARPE” IEEE Transactions on Reliability, Vol. R-36,
No.2, June 1987, pp186-193.
[9]
A. Reibman and K.S. Trivedi, “Numerical transient analysis of Markov models,” Comput. and Oper. Res. 15(1),
19-36, (1988).
[10] Gentleware’s Poseidon for UML community edition. http://www.gentleware.com
Download