State Probability of a Series-parallel Repairable System with Two-types of Failure States Gregory Levitin Reliability Department, Planning, Development and Technology Division, Israel Electric Corporation Ltd., P.O. Box 10, Haifa, 31000 Israel Tieling Zhang, Min Xie Department of Industrial and Systems Engineering National University of Singapore, Singapore 117576 ABSTRACT This paper presents a method for the analysis of series-parallel safety-critical system where the system states can be distinguished into failure-safe and failure-dangerous. The method incorporates Markov chain and universal generating function technique. In the model considered, both periodic inspection and repair (perfect and imperfect) of system elements are taken into account. The system state distributions and the overall system safety function are derived based on the developed model. The proposed method is applicable to complex systems for analyzing state distributions and it is also useful in decision-making such as determining the optimal proof-test interval or repair resource allocation. An illustrative example is given. Keywords: Availability, Safety-critical system, Markov model, Universal generating function, Periodic inspection, Failure-safe, Failure-dangerous 1 1. Introduction Safety is of paramount concern for large and complex systems such as nuclear power and chemical processing plants, aircraft navigation control system, power transmission and high speed railway networks, and so on. The complexity of large systems raises many important problems concerning safety such that it may be very difficult or even impossible to ensure that the systems will always behave as expected under all foreseeable conditions. Dangerous faults may be caused by not only random hardware failure but also systematic faults inadvertently designed into the system. Safety analysis or risk assessment for such a system thus becomes a complex problem that involves study of human factors (human error), production process, manufacturing control, on-line measurement or test and repair, diagnosis with periodic inspections and so on. See Dominguez-Garcia et al. (2006), Delon et al. (2005), Cowing et al. (2004), Marseguerra et al. (2004), Burgazzi (2003) for some related discussions on the recent reliability related research for safety-critical systems. The use of safety-critical systems represents taking proactive measures to prevent a process plant from occurrence of dangerous events. For example, emergency shutdown controllers are widely used in chemical processing industry. Their function is to monitor a plant process and to identify if the process is operating within the acceptable limits. If the process moves outside of an acceptable operation range, the controller automatically shuts the process down in a safe manner (Bukowski, 2001). In order to provide proper analysis of safety-critical systems the dangerous and non-dangerous failures should be distinguished, that are corresponding to failure-safe and failure-dangerous states of the system. The international standard IEC 61508 (1998) includes two frameworks: One is risk reduction with Safety-Related System (SRS) and the other is the Overall Safety Life-cycle. Since its publication, it has been widely adopted in various safety related studies and applications (see, e.g., Faller, 2004, Hokstad and Cornliussen, 2004, Zhang et al., 2003, Nunns, 2000, and Knegtering, 1999). A typical architecture of SRS is regarded to consist of components with diagnosis and periodic inspection, where the failures in each component are classified into detectable and undetectable. There are a number of studies on safety-critical systems which correspond to different specific system structures, see, e.g., some recent references such as Kang and Jang (2006), Kim et al. (2005), Weber et al. (2005), Lee et al. (2004), Latif-Shabgahi (2004) and Son and Seong (2003). 2 Periodic inspection is important for safety-critical systems and it has been studied in reliability analysis in general (see, e.g., Cui et al., 2004, Biswas, 2003, Bris et al., 2003, Bukowski, 2001). In various studies of safety-critical system performance, the effects of periodic inspection have been either ignored or modeled by assigning quite longer average repair times for unrecognized degraded states (Zhang et al., 2003 or 2006). In practice, the unrecognized fault can not be repaired until the next periodic inspection (proof-test). In fact, the repair for this kind of faults is carried out at determined time. However, only very few studies have concerned the problem. Bukowski (2001) gives a method of incorporating periodic inspection and repair into Markov model in which both perfect and imperfect inspection and repair can be modeled. However, in Bukowski (2001), the situation that both unrecognized and recognized degraded states may exist simultaneously was not included in the Markov model. As the unrecognized failure can only be found at periodic inspection, the two kinds of faults could exist in some period of time. The purpose of this paper is to present a method for evaluating the probabilities of failure-safe and failure-dangerous states for arbitrary complex series-parallel systems with imperfect diagnostics and imperfect periodic inspections and repairs of elements. Each kind of element failures whatever are of failure-safe or failure-dangerous can be either detected or undetected. The emphasis is on exact state probability or availability of such a system. See Bowles and Dobbins (2004), Chandrasekhar et al. (2004) and Carrasco (2004) for some related study of other systems. The remainder of this paper is composed of Markov model for determining state distribution of a single system element, universal generating function technique for determining state distribution of the entire system and an illustrative example presented. Acronyms & Notations FD failure-dangerous state FS failure-safe state W operational state G set of states of element (system): G = {W, FS, FD} structure function par structure function for elements connected in parallel ser structure function for elements connected in series 3 Sj random discrete state variable of element j sjk k-th realization of Sj: sjk G Fd detected failure Fu undetected failure FDd detected failure-dangerous FDu undetected failure-dangerous FSd detected failure-safe FSu undetected failure-safe pfd probability of failure on demand pfdD probability of failure-dangerous on demand pfdS probability of failure-safe on demand system transition rate matrix 0k zero column vector of size k1 1k unit column vector of size k1 PW(t), PFS(t), PFD(t) probability of subsystem or the entire system is in state W, FS, FD at time t sd, dd, du, su failure rate of FSd, FDd, FDu, FSu sd, dd, du, su repair rate of FSd, FDd, FDu, FSu d fraction of detected failures that are detected correctly TI Proof-test interval Assumptions 1. System is composed of elements and each element can experience two categories of failures: Dangerous and non-dangerous, corresponding respectively to failure-dangerous and failure-safe events. Failure-dangerous and failure-safe events are independent. 2. Both categories of failures can be detected and undetected. 3. Detected and undetected failures constitute independent events. 4. Failure rates for both kinds of failures are constant. 5. The element is in operation state if no failure event (detected or undetected) has occurred. 4 6. The element is in failure-safe state if at least one non-dangerous failure (detected or undetected) has occurred and no dangerous failure has occurred. 7. The element is in failure-dangerous state if at least one dangerous failure (detected or undetected) has occurred. 8. The elements are independent and can undergo periodic inspections at different times. 9. The state of any composition of elements is unambiguously defined by the states of these elements and the nature of elements interaction in the system. 10. The elements’ interaction is represented by series-parallel block diagram. 2. State distribution of single system element According to IEC 61508, the typical system structure is composed of elements to which diagnosis and periodic inspection and repair are applied. Failure-safe or failure-dangerous events can occur independently. The failure category depends on the effects of a fault occurrence. For example, if a failure results in shutdown of a properly operating process, it is of the type of failure-safe (FS). This type of failure is referred in a variety of ways to false trip and false alarm. However, if a safety-critical system fails in an operation which is required to shut down a process, that could cause hazardous results, such as failure of a monitor that is applied to control an important process. This type of failure is generally called failure-dangerous (FD). Both FS and FD events can be detected or undetected. The detected failure can be detected instantly by diagnostic devices. An imperfect diagnosis model presumes that a fraction d of detected failures can be detected instantaneously by diagnostic devices. Whenever the failure of this kind is detected, the on-line repair is initiated. The failures that can not be detected by the diagnostic devices or remain undetected because of the imperfect diagnosis are considered to be undetected failures. These failures can be found only by the proof-test (periodical inspection) just after the end of a proof-test interval. We assume that failure rates of detected failure-safe and failure-dangerous (sd and dd, respectively) as well as undetected failure-safe and failure-dangerous (su and du, respectively) can be calculated or elicited from tests. The state of any single element can be represented as combination of two independent states corresponding to detected and undetected failures. Each of the two failures can be in three different states of no failure (state W), failure of category FS and failure of category 5 FD. According to assumptions 5-7, the state of each element can be determined based on each combination of states of failures using Table 1. Table 1. States of single element. Detected Failure Undetected Failure W W W FSd FS FDd FD FSu FS FS FD FDu FD FD FD The state of each element j can be represented by a discrete random variable Sj that takes values from the set G = {W, FS, FD}. In order to obtain the element state distribution pjW = Pr(Sj = W), pjFS = Pr(Sj = FS) and pjFD = Pr(Sj = FD), one should summarize the probabilities of any combination of states of detected and undetected failures that results in the element states W, FS and FD, respectively. Based on element state transition analysis, one can obtain the Markov state transition diagram presented in Fig. 1. In this diagram, each possible combination of the states of detected and undetected failures (marked inside the cycles) belongs to one of the three sets corresponding to three different states of element defined according to Table 1. Practically, no repair action is applied to the undetected failure until the next proof-test. In general, the periodic inspection and repair take very short time when comparing to the proof test interval TI, and the whole system stops operation (in down state) during the process of periodic inspection and repair. Therefore, it is reasonable to set repair rates for undetected failures du = su = 0 when analyzing the behavior of a safety-critical system within the proof test interval (unlike equivalent repair rates for du and su used in Zhang et al. (2003). 6 W W, 2 su 4 FSd, FSu Fig. 1. su dd Undetected du 3 FSd, W du 7 dd du 5 W, FDu du sd sd FS 1 sd sd dd W, FSu Detected su su W sd dd FSd, FDu FDd, W dd su sd 6 FD 8 su du du dd FDd, FSu 9 FDd, FDu Markov state transition diagram used for calculating state distribution of a single element. According to Fig. 1, the following group of equations describes the element’s behavior: Pj(t) = Pj(t) j (1) Pj(t) = (pj1(t), pj2(t), …, pj9(t)) is the vector of state probabilities, P(t) is derivative of P(t) with respect to t, and j is transition rate matrix, see appendix. According to Table 1, state 1 in the Markov diagram corresponds to state W of the element, states 2 - 4 correspond to state FS of the element and states 5 - 9 correspond to state FD of the element. Having the solution P(t) of Eq. (1) for any element j, one can obtain pjW = pj1, pjFS = pj2 + pj3 + pj4 and pjFD = pj5+ pj6 + pj7 + pj8 + pj9. The solution of Eq. (1) can be expressed as Pj(t) = Pj(0) exp(j t), Pj(t) = Pj(n TI+) exp(j (t n TI)), for t 0; for n TI+ t (n +1) TI+ , (2) n = 0, 1, 2, To consider imperfect inspection and repair, the undetected fault can not be repaired as good as new and some may still exist after inspection and repair. A matrix Mji is used to describe this behavior. Each element of the matrix Mji describes the transition rate of probability from one state to another. Thus, we have Pj(TI+) = Pj(TI) Mj1 = Pj(0) exp(j TI) Mj1 (3) Pj(2TI+) = Pj(2TI) Mj2 = Pj(0) exp(j TI) Mj1 exp(j TI) Mj2 Pj(n TI+) = Pj(n TI) Mjn 7 = Pj((n 1 )TI+) exp(j TI) Mjn = Pj((n 2 )TI+) exp(j TI) Mj(n 1) exp(j TI) Mjn = Pj(0) exp(j TI) Mj1 exp(j TI) Mj2 exp(j TI) Mj(n 1) exp(j TI) Mjn for n = 1, 2, 3, (4) In Eq. (4), n represents the nth proof-test interval and Mji (i = 1, 2, 3, , n) is matrix associated with the ith proof-test. 3. State distribution of the entire series-parallel system In order to obtain the state distribution of the entire system, the procedure used in this paper is based on the universal generating function (u-function) technique. This method was introduced in Ushakov (1987) and has shown to be very effective for the reliability evaluation of different types of multi-state systems, see Levitin et al. (1998) and Lisnianski and Levitin (2003). The comprehensive description of the method and its numerous applications in reliability engineering can be found in (Levitin, 2005). For some recent and related applications, see e.g., Levitin (2004 and 2005), and Korczak et al. (2006). The u-function of a discrete random variable Y is defined as a polynomial K u ( z ) q k z yk , (5) k 1 where the variable Y has K possible values and qk is the probability that Y takes the value of yk. In our case, the polynomial u(z) can define state distributions, i.e. it represents all of the possible mutually exclusive states of the element (or any subsystem) by relating the probabilities of each state to the value that takes the random state variable corresponding to this element (subsystem) in that state. Note that the performance distribution of the basic element j (probability mass function of discrete random variable Sj) can now be represented as u j ( z) where sj1 = FD, sj2 = FS, 3 p jk z s jk , (6) k 1 sj3 = W for any j. To obtain the u-function of a subsystem consisting of two elements, composition operators are introduced. These operators determine the u-function for two elements 8 connected in parallel and in series, respectively, using simple algebraic operations on the individual u-functions of basic elements. All the composition operators take the form 3 u j ( z) ui ( z ) p jk z s jk k 1 3 pih z s jh h1 3 k 1 3 p jk pih z ( s jk , sih ) . (7) h1 The obtained u-function relates the probability of each combination of states of the independent elements (which is equal to the product of the probabilities of these states) to the value that the random state variable of the entire subsystem takes when this combination is realized. The function (.) in composition operators expresses the dependence of the entire subsystem state on the states of both of its elements. The definition of the function (.) strictly depends on the physical nature of the system and on the nature of the interaction of the system elements. The structure functions for pairs of elements connected in parallel and in series should be defined for any specific application based on analysis of system functioning. For example, in the widely applied conservative approach the following assumptions are made. Any subsystem consisting of two parallel elements is in failure-dangerous state if at least one of elements is in failure-dangerous state and is in operational state if at least one of the elements is in operational state. In the rest of cases, the subsystem is in failure-safe state. This can be expressed by the structure function par(.) presented in Table 2. A subsystem consisting of two elements connected in series is in the operational state if both of the elements are in the operational state, whereas it is in failure-dangerous state if at least one of elements is in failure-dangerous state. In the rest of cases, the subsystem is in failure-safe state. This can be expressed by the structure function ser(.) presented in Table 3. Table 2. Structure function for pair of elements connected in parallel. Element 1 Element 2 W W W FS W FD FD FS W FS FD FD FD FD FD Table 3. Structure function for pair of elements connected in series. Element 1 Element 2 W W W FS FS FD FD FS FS FS FD FD FD FD FD 9 In the numerical realization of the composition operator in Eq. (7), we can encode the states W, FS and FD by integer numbers 3, 2 and 1, respectively, as such sjk = k for any j. In our case, k = 1, 2, 3. It can be seen that in this case the defined above functions par(.) and ser(.) take the form: max( s , s ), if min( s jk , sih ) 1 and ser(sjk, sih) = min(sjk, sih). jk ih par(sjk, sih) = 1, if min( s jk , sih ) 1 Note that the nine possible different combinations of element states produce only three possible states of the subsystem. The probabilities of combinations that produce the same subsystem state should be summed in order to obtain this state probability. This can be done by collecting terms with equal exponents in the u-function obtained by Eq. (7). Finally, any subsystem state distribution can be represented by the u-function taking the form of Eq. (6). Any subsystem consisting of two elements can be further treated as a single equivalent element with a performance distribution that is equal to the performance distribution of this subsystem. Consecutively applying the composition operators and replacing pairs of elements by equivalent elements, one can obtain the u-function representing the performance distribution of the entire system. The recursive algorithm The following recursive algorithm obtains the u-function that represents the entire system state distribution: Step 1. Obtain the state probabilities for each element j using the Markov transition diagram method presented in Section 2. Step 2. Define the u-functions uj(z) for each element j using Eq. (6). Step 3. If the system contains a pair elements connected in parallel or in a series, replace this pair with an equivalent element with u-function obtained by operator of Eq. (7) with the structure functions par(.) and ser(.), respectively. Step 4. If the system contains more than one element, return to Step 3. Otherwise, the algorithm stops. 10 The coefficients of the obtained u-function are equal to probabilities of operational, failure-safe and failure-dangerous states of the entire system. With the state probabilities of each element in the form of functions of time, one can use the algorithm presented above to get the probability values corresponding to any given time. Finally, the entire system state probabilities and the overall system safety (defined as the sum of operational probability and failure-safe state probability) as functions of time can be obtained. In the following section, we use an example to illustrate the procedure described here. 4. Illustrative example Consider a combine-cycle power plant with two generating units. Each unit consists of a gas turbine blocks and fuel supply systems. The fuel to each turbine block can be supplied by two parallel systems. The simplified reliability block diagram of the plant is presented in Fig. 2. Each fuel supply system as well as each turbine can experience both safe and dangerous failures (detected and undetected). Fuel supply systems Turbine block 1 5 2 3 6 4 Fig. 2. Reliability block diagram of combine cycle power plant The parameters of fuel supply systems are: sd = 2.5610-5, su= 10-5, dd= 8.910-6, du = 110-6, sd = 0.25; dd = 0.0833, su= du = 0; d = 0.99; TI = 1.5 years. The fuel supply systems are statistically identical, but the inspection times of systems 2 and 4 are 11 shifted 0.5 year earlier relatively to inspection times of systems 1 and 3. The matrix Mji associated with each fuel supply system is M1i (i = 1, 2, 3, 4) as shown in Eq. (A2) in Appendix. The turbine blocks are also statistically identical. The parameters of the turbine blocks are: sd = 2.5610-5, su= 6.54010-6, dd= 7.910-6, du = 7.810-7; sd = 0.25, dd = 0.0625, su= du = 0; d = 0.99; TI = 2 years. The matrix Mji associated with each turbine block is M2i (i = 1, 2, 3) as shown in Eq. (A3) in Appendix. The probabilities pjW(t), pjFS(t) and pjFD(t) for each system element obtained by solving equations (2) and (3) for a period of time, 65000 hours, are presented in Fig. 3 - 5. At the same time, the probabilities PW(t), PFS(t) and PFD(t) for single generating unit and for the entire system (the structure functions are defined in accordance with Tables 2 and 3, respectively), obtained using the algorithm given in Section 3, are also presented in Fig. 3 through 5. These figures show that the variations of these probabilities for single generating unit and the entire system have also the property of periodicity. The system safety S(t)=PW(t)+PFS(t) as the function of time is presented in Fig. 6. 1 PW 0.96 0.92 0.88 0.84 0 10 20 30 40 50 60 t (thousands of hours) elements 1,3 elements 2,4 single unit system elements 5,6 Fig. 3. Probabilities of working states 12 PS 0.12 0.08 0.04 0 0 10 20 30 40 50 60 t (thousands of hours) elements 1,3 elements 2,4 elements 5,6 single unit system Fig. 4. Probabilities of failure-safe states 0.08 PD 0.064 0.048 0.032 0.016 0 0 10 20 30 40 50 60 t (thousands of hours) elements 1,3 elements 2,4 single unit system elements 5,6 Fig. 5. Probabilities of failure-dangerous states 13 1 S 0.98 0.96 0.94 0.92 0.9 0 10 20 30 40 50 60 t (thousands of hours) Fig. 6. Overall system safety 5. Conclusions In this paper a method is proposed for the study of series-parallel systems with imperfect diagnostics and imperfect periodic inspections and repairs of elements. Element failures can be failure-safe and failure-dangerous and can be either detected or undetected. The proposed model incorporates periodic inspection and repair (both perfect and imperfect) of system elements. The Markov model is used for the determination of state distribution of a single system element, while universal generating function technique for state distribution of the entire system. The presented example shows that the procedure can be easily implemented to estimate the state probabilities and the overall safety of a safety-critical system. The method presented in this paper can be applied to different research fields such as power generation units, electronic devices and chips, data storage based on redundant array of inexpensive disks (Katz et al., 1989; Gibson and Patterson, 1993, etc.) and so on. It can be used for evaluating safety of a fault-tolerant single-chip multiple microprocessors architecture (Yao, et al., 2004) which represents a promising solution to partly mitigate the system faults and to increase the system dependability in mission-critical applications. 14 Acknowledgement: This research was carried out while the first author was visiting National University of Singapore supported by the research grant R-266-000-020-112 at National University of Singapore. The authors would like to thank three referees for their constructive comments. References Biswas, A.; Sarkar, J. and Sarkar, S. (2003). Availability of a periodically inspected system, maintained under an imperfect-repair policy. IEEE Transactions on Reliability, 52 (3), 311-318. Bowles, J.B. and Dobbins, J.G. (2004). Approximate reliability and availability models for high availability and fault-tolerant systems with repair. Quality and Reliability Engineering International, 20 (7), 679-697. Bris, R., Chatelet, E. and Yalaoui, F. (2003). New method to minimize the preventive maintenance cost of series-parallel systems. Reliability Engineering & System Safety, 82 (3), 247-255. Bukowski, J.W. (2001). Modeling and analyzing the effects of periodic inspection on the performance of safety-critical systems, IEEE Transactions on Reliability, 50 (2), 321 – 329. Burgazzi, L. (2003). Reliability evaluation of passive systems through functional reliability assessment. Nuclear Technology, 144 (2), 145-151. Carrasco, J.A. (2004). Solving large interval availability models using a model transformation approach. Computers & Operations Research, 31 (6), 807-861. Chandrasekhar, P.; Natarajan, R. and Yadavalli, V.S.S. (2004). A study on a two unit standby system with Erlangian repair time. Asia-Pacific Journal of Operational Research, 21 (3), 271-277 Cowing, M.M.; Pate-Cornell, M.E. and Glynn, P.W. (2004). Dynamic modeling of the tradeoff between productivity and safety in critical engineering systems. Reliability Engineering & System Safety, 86 (3), 269-284. Cui, L.R.; Loh, H.T. and Xie, M. (2004). Sequential inspection strategy for multiple systems under availability requirement. European Journal of Operational Research, 155 (1), 170-177. DeLong, T.A.; Smith, D.T. and Johnson, B.W. (2005). Dependability metrics to assess safety-critical systems. IEEE Transactions on Reliability, 54, 498-505. Dominguez-Garcia, A.D.; Kassakian, J.G. and Schindall, J.E. (2006). Reliability evaluation of the power supply of an electrical power net for safety-relevant applications. Reliability Engineering & System Safety, 91, 505-514. Faller, R. (2004). Project experience with IEC 61508 and its consequences. Safety Science, 42 (5), 405-422. Gibson G. A. and Patterson D.A. (1993). Designing Disk Arrays for High Data Reliability, Journal of Parallel and Distributed Computing, 17, 4 – 27. Goble, W.M. (1998). Control Systems Safety Evaluation and Reliability, 2nd ed: ISA. Hokstad, P. and Corneliussen, J. (2004). Loss of safety assessment and the IEC 61508 standard. Engineering & System Safety, 83 (1), 111-120. Reliability IEC 61508 (1998). Functional safety of electric/electronic/programmable electronic safety-related systems, Parts. 1–7, October 1998–May 2000. Inagaki, T. and Ikebe, Y. (1989). Performance analysis of a safety monitoring system under human-machine interface of safety-presentation type, Microelectronics and Reliability, 29 (2), 1989, 165 – 175. Kang, H.G. and Jang, S.C. (2006). Application of condition-based HRA method for a manual actuation of the safety features in a nuclear power plant. Reliability Engineering & System Safety, 91, 627-633. 15 Katz R.H.; Gibson G.A. and Patterson D. (1989). Disk System Architectures for High Performance Computing, Proceedings of the IEEE, 77, No. 12, pp. 1842 – 1858. Kim, H.; Lee, H. and Lee, K. (2005). The design and analysis of AVTMR (all voting triple modular redundancy) and dual-duplex system. Reliability Engineering & System Safety, 88, 291-300. Korczak, E.; Levitin, G and Ben Haim. H. (2005). Survivability of series-parallel systems with multilevel protection. Reliability Engineering & System Safety, 66, 45-54. Knegtering, B. and Brombacher, A.C. (1999). Application of micro Markov models for quantitative safety assessment to determine safety integrity levels as defined by the IEC 61508 standard for functional safety. Reliability Engineering & System Safety, 66 (2), 171-175. Latif-Shabgahi, G.; Bass, J.M. and Bennett, S. (2004). Taxonomy for software voting algorithms used in safety-critical systems. IEEE Transactions on Reliability, 53 (3), 319-328. Lee, D.Y.; Han, J.B. and Lyou, J. (2004). Reliability analysis of the reactor protection system with fault diagnosis. Key Engineering Materials, 270, 1749-1754. Levitin, G. (2004). A universal generating function approach for the analysis of multi-state systems with dependent elements. Reliability Engineering & System Safety, 66, 285-292. Levitin, G. (2005). Uneven allocation of elements in linear multi-state sliding window system. Eyropean Journal of Operational Research, 163, 418-433. Levitin G.; Lisnianski A.; Beh-Haim H. and Elmakis, D. (1998). Redundancy optimization for series-parallel multi-state systems, IEEE Transactions on Reliability, 47 (2), 165-172. Lisnianski, A. and Levitin, G. (2003). Multi-state System Reliability, World Scientific, Singapore. Levitin, G. (2005). The Universal Generating Function in Reliability Analysis and Optimisation. Springer-Verlag: Berlin, Springer Series in Reliability Engineering. Marseguerra, M.; Zio, E. and Podofillini, L. (2004). A multiobjective genetic algorithm approach to the optimization of the technical specifications of a nuclear safety system. Reliability Engineering & System Safety, 84 (1), 87-99. Nunns, S.R. (2000). Conformity assessment of safety related systems to IEC 61508 - the CASS initiative. Computing & Control Engineering Journal, 11 (1), 33-39. Olbrich, T; Richardson, A.M.D. and Bradley, D.A. (1996). Built-in self-test and diagnostic support for safety critical Microsystems, Microelectronics and Reliability, 36, 1125– 1136. Son, H.S. and Seong, P.H. (2003). Development of a safety critical software requirements verification method with combined CPN and PVS: a nuclear power plant protection system application. Reliability Engineering & System Safety, 80 (1), 19-32. Ushakov I., (1987). Optimal standby problems and a universal generating function, Soviet Journal of Computer System Science, 25, 79-82. Wang, D. and Inagaki, T. (1994).Time-dependent optimality of an alarm subsystem, Microelectronics and Reliability, 34, 1623 – 1633. Weber, W.; Tondok, H. and Bachmayer, M.B. (2005). Enhancing software safety by fault trees: experiences from an application to flight critical software. Reliability Engineering & System Safety, 89, 57-70. Yao, W.B.; Wang D.S. and Zheng W.M. (2004). A Fault-tolerant Single-chip Multiprocessor, ACSAC 2004 Proceedings of Advances in Computer Systems Architecture: 9 th Asia-Pacific Conference, Pen-Cheng Yew and Jingling Xue (eds.), Berlin: Springer, 2004, p. 137-145. Zhang, T.L.; Long, W. and Sato, Y. (2003). Availability of systems with self-diagnostic components—applying Markov model to IEC 61508-6, Reliability Engineering & System Safety, 80, 133 – 141. Zhang, T.L.; Xie, M. and Horigome, M. (2006). Availability and reliability of k-out-of-(M plus N): G warm standby systems. Reliability Engineering & System Safety, 91, 381-387. Zhou, Z. (1987). Analysis of a two unit standby redundant fail-safe system. Microelectronics and Reliability, 27, 469 – 474. 16 Appendix The transition rate matrix for one element is j = c su sd du dd 0 0 0 0 0 (sd +dd) 0 0 0 sd 0 dd 0 sd 0 (su + ddu +sd ) 0 0 su du 0 0 0 0 0 (sd +dd) 0 0 sd 0 dd dd 0 0 0 (su +du+ dd ) 0 0 su du 0 sd 0 0 0 sd 0 0 0 0 0 0 sd 0 0 sd 0 0 0 dd 0 0 0 0 0 dd 0 0 0 0 dd 0 0 0 0 dd (A1) where c = sd + dd + du + su . The matrices M1i (i = 1, 2, 3, 4) for fuel supply system are p1 M11 = M12 = M13 = p2 p3 p4 1 0 0 0 0.90 0.10 0 0 1 0 0 0 0.80 0 0 0.20 15 05 05 05 p1 p2 p3 p4 1 0 0 0 0.88 0.12 0 0 1 0 0 0 0.776 0 0 0.224 15 05 05 05 p1 p2 p3 p4 1 0 0 0 0.85 0.15 0 0 1 0 0 0 0.747 0 0 0.253 15 05 05 05 p5 p6 p7 p8 p9 09 09 09 09 09 , p5 p6 p7 p8 p9 09 09 09 09 09 , p5 p6 p7 p8 p9 09 09 09 09 09 , 17 M14 = p1 p2 p3 p4 1 0 0 0 0.808 0.192 0 0 1 0 0 0 0.711 0 0 0.289 15 05 05 05 p5 p6 p7 p8 p9 09 09 09 09 09 . (A2) The matrices M2i (i = 1, 2, 3) for turbine block are M21 = M22 = M23 = p1 p2 p3 p4 1 0 0 0 0.92 0.08 0 0 1 0 0 0 0.85 0 0 0.15 15 05 05 05 p1 p2 p3 p4 1 0 0 0 0.804 0.096 0 0 1 0 0 0 0.832 0 0 0.168 15 05 05 05 p1 p2 p3 p4 1 0 0 0 0.882 0.118 0 0 1 0 0 0 0.810 0 0 0.190 15 05 05 05 p5 p6 p7 p8 p9 09 09 09 09 09 , p5 p6 p7 p8 p9 09 09 09 09 09 , p5 p6 p7 p8 p9 09 09 09 09 09 . (A3) 18 Gregory Levitin received a PhD degree in Industrial Automation from Moscow Research Institute of Metalworking Machines in 1989. From 1982 to 1990 he worked as software engineer and research associate in the field of industrial automation. From 1991 to 1993 he worked at the Technion (Israel Institute of Technology) as a postdoctoral fellow at the faculty of Industrial Engineering and Management. Dr. Levitin is presently an engineer-expert at the Reliability Department of the Israel Electric Corporation and adjunct senior lecturer at the Technion. His current interests are in operations research and artificial intelligence applications in reliability and power engineering. In this field Dr. Levitin has published over 100 papers and two books. He is senior member of IEEE. He serves in editorial boards of IEEE Transactions on Reliability and Reliability Engineering and System Safety. Tieling Zhang received a Ph.D. in engineering from Tokyo University of Mercantile Marine in 2001. He has six years’ experience of teaching, three years’ working in industry and a few years holding research positions. Currently he is with Hitachi GST, Singapore. He has 30 articles included in peer-review journals and international conference proceedings. He holds a new practical patent of China. His research interests include system reliability, maintainability and safety, system optimization and vibration control. Min Xie received his Ph.D. in Quality Technology from Linkoping University, Sweden, in 1987. Dr Xie has been active in reliability and quality related research since then. He has authored or co-authored over 100 articles in refereed journals and 6 books, including Software Reliability Modelling by World Scientific, Statistical Models and Control Charts for High Quality Processes by Kluwer Academic Publisher, and Weibull Models by John Wiley & Sons. He is a department editor of IIE Transactions, an associate editor of IEEE Trans on Reliability, and on the editorial board of several other journals. He is a fellow of IEEE. 19