Group Replacement Policies for Parallel Systems Whose Components have Phase Distributed Failure Times Elmira Popova 1 Graduate Program in Operations Research and Industrial Engineering The University of Texas at Austin Austin, TX 78712 John G. Wilson Babcock Graduate School of Management Wake Forest University Winston-Salem, NC 27109 Abstract Consider a system of components operating in parallel. Downtime costs are incurred when failed components are not repaired or replaced. There are also xed, unit repair and replacement costs associated with the system. The failure distributions of the components are assumed to be identically distributed random variables. Results on calculating the expected cost and variance per unit time of various group replacement policies will be provided. Consideration of variance is important since, in many cases, practitioners wish not only to achieve small expected cost but to also reduce variability from cycle to cycle. Phase distributions allow for the modeling of a wide range failure time behavior. Closed form results are derived for the three major classes of group replacement policy (m-failure, T -age, and (m; T )) when the underlying distribution is of phase type. Keywords: 1 This Maintenance, Phase Distributions, Production Planning, Cost Variability research has been partially supported by Grant #0003658-472 from State of Texas Advanced Technology Program, National Science Foundation Grant #DMC-8910378, and a Babcock Research Grant 1 Introduction Over the past few decades, the complexity of industrial systems has grown enormously. New industrial paradigms to ensure fast production, delivery, and prot have been introduced. Just-in-time systems are among the most popular. Proper planning of maintenance is important to minimize disruptions to the system. The literature on reliability and maintainability of complex systems has evolved at a relatively slow pace due to the mathematical complexity of such problems and limited computational capabilities in the past. In this paper, a system of stochastically independent and identical components is analyzed. Group replacement policies are investigated in detail. Group replacement policies require the replacement or repair to as good as new of all components whenever any replacement or maintenance is performed. Their great advantages are that they allow for economies of scale and are straightforward to implement. There are three main classes of group replacement policy. A T - age policy (see, e.g., Okumoto and Elsayed [11]) calls for replacement every T units of time. An m-failure policy (see, e.g., Assaf and Shanthikumar [1] and Wilson and Benmerzouga [17]) calls for replacing the system at the time of the mth failure. A policy that combines features of both of the above classes is the (m; T ) policy which calls for replacement at the time of the mth failure or at time T whichever occurs rst (see Ritchken and Wilson [13] and Nakagawa [8]). The work referenced above assumes that the parameters of the underlying failure time distributions are known with certainty. In practice, an engineer's opinion of the failure time distributions will change as data from the actual operation of the system is obtained. There have been a number of recent results on adaptive Bayesian approaches to modeling this situation. For the case of exponential failure times, Wilson and Benmerzouga [18] analyzed a policy class that called for replacement of the system whenever the expected posterior value of the exponential parameter exceeded a certain threshold. A more general form of this policy for the case of three machines operating in parallel was considered in Wilson and Popova [19]. The case of a single machine with a Weibull failure time is considered in Mazzuchi and Soyer [7]. 1 Group replacement policies are popular in large part due to the ease with which they can be implemented in a real production setting. There has been little work in the literature on identifying which class of policies contains the optimal policy for a given system. For a parallel system where the components have exponential i.i.d. failure times, Assaf and Shanthikumar [1] showed that the class of m-failure policies is optimal if one knows the value of the underlying exponential parameter. Wilson and Popova [20] provide optimality results for the adaptive Bayesian case where the parameter is continuously estimated from the failure time data. The case where the system consists of only one component but where the failure distribution is allowed to be any continuous distribution whose parameters are continuously updated is considered in Popova and Wilson [12]. All of the above approaches assume that one is only interested in nding a policy that minimizes the expected cost per unit time. However, many managers and engineers are often just as interested in the variability of cost from cycle to cycle. Indeed, many might prefer a policy with a slightly higher expected cost per unit time if the variability of these costs is small. In any case, knowledge of the variance associated with a given policy provides useful information. Consequently, it is somewhat surprising that most approaches in the literature ignore variance considerations. In this paper both expected cost and variance per unit time are explicitly modeled. Derivations of the quantities needed to compute the variance per unit time are provided in the Appendix. These derivations apply to any continuous failure distribution with nite rst and second moments. (A summary of this material and some examples can be found in Wilson [16].) One diculty with much of the literature on group replacement policies is the restrictive assumptions that must often be made regarding the failure time distribution. A goal of this paper is to analyze the situation where the failure distribution can be drawn from a very wide class. Phase distributions (see Neuts [9]) can be used to approximate most existing continuous distributions (see, e.g., Bobbio and Cumani [2], Johnson [5], Malhotra and Reibman [6]). Consequently an extensive analysis for this case is provided. Some research on the reliability of systems with phase distributed lifetimes has recently appeared. The two unit priority redundant system with phase failure time and underlying 2 repair time of the non-priority unit is analyzed by Gururajan and Bhat [4], where closedform results for the reliability and availability of the system are provided. Chakravarthy [3] considers the system of two machines in series with a buer in between. The machines have exponential failure and repair times and the processing time is phase distributed. An algorithm for obtaining the steady state probabilities and some system performance measures is presented. Consider the class of (m; T ) policies. Assume that the failure distribution is continuous - e.g. Weibull. Suppose the values for T are restricted to the set fT1 ; T2; :::; Tkg. Then a total of mk policies must be considered. Calculating the expected cost per unit time for each of these policies involves many numerical integrations. If one also wishes to compute the appropriate variance associated with each of these policies, then even more integrations are required. However, as will be demonstrated in this paper, no integrations are required if a phase distribution is used. One need only consider operations with matrises. However, if one is analysing a large system of components the dimension of the matrices necessary to compute the expected cost and variance for m and (m; T ) failure policies grows exponentially. Consequently, the size of the problem one can consider depends on the computer power currently available. The class of phase distributions is suciently wide to capture most reasonable failure behavior. For instance, one can nd phase distributions that approximate the Weibull distribution (see, e.g., Johnson [5], Malhotra and Reibman [6]). The applicability of group replacement models is greatly enhanced when the failure distributions are realistic, managerially important quantities such as variances are also calculated and results are relatively easy to obtain numerically. It is demonstrated in this paper that all of these objectives are satised when phase distributions are used to model failure times. Notation and basic assumptions are provided in x2. Since the focus of this paper is on replacement policy issues, most of the phase related derivations are summarized in the Appendix in an eort to reduce the algebraic complexity of the paper. x4 and x5 contain explicit results for the expected costs and variances associated with T -age and m -failure policies, respectively. x6 contains an analysis of the more algebraically complex (m; T )3 policies. 2 Notation and Assumptions Assume that n independent components, or machines with identical failure time distributions are in operation. The system is working if at least one of the components is operating. Each time group maintenance is performed a xed cost of c0 is incurred. The cost of either replacing or repairing a broken component to as good as new is denoted by cr . The cost of either replacing or repairing a functioning component to as good as new is denoted by cs . The quantity cr ? cs is assumed to be positive and can be interpreted as the salvage value for a used but functioning machine. Each failed machine results in a downtime cost of cd per unit time until the machine is repaired or replaced. The times of group replacement/maintenance are renewal points for the system with the renewal cycle being the time between successive group maintenance operations. Let L and C , respectively, denote the random variables for the length of the cycle and the total cost incurred during the cycle. The cost, C , incurred during the repair cycle can be written as C = co + ncs + (cr ? cs ) N + cd D; where N is the total number of components that fail during the cycle and D is the total down time incurred during the cycle. The mean and variance of L will be denoted by and 2, respectively. Let C (t) denote the total cost incurred over all cycles between times 0 and t. The goal of this paper is to develop explicit closed form expressions that do not require integration for the expected cost per unit time lim t!1 t?1 E [C (t)] and the asymptotic variance per unit time limt!1 t?1 V ar[C (t)]. It can be shown from renewal theory that the following relationships hold: lim t?1 E [C (t)] = E [C ] = t!1 lim t?1 V arC (t) = fE [C ]g2 2 ?3 + ?1 V ar [C ] + 2 fE [C ]g2 ?1 t!1 ?2? E [CL] E [C ] 2 (see Ross [14] and Smith [15]). 4 (1) Let F () and f () denote the cumulative distribution and density functions, respectively, for the time to failure of a given machine. For 1 i m, let fi (x) denote the density function of i (the time to the ith failure) and let p(i; x) denote the probability that exactly i out of n machines will have failed x time units into the cycle, i.e. n! fi (x) = (i ? 1)!( f (x) [F (x)]i [1 ? F (x)]n?i n ? i)! and ! p (i; x) = ni [F (x)]i [1 ? F (x)]n?i . (2) (3) Explicit results for the terms in (1) are derived in the Appendix. These results are expressed in terms of F() ; f () ; fi () ; and p (; ) : Note that these results apply to any failure time distribution whose rst and second moments are nite. For the remainder of the paper it will be assumed that the failure time is a phase distributed random variable with representation (; A), i.e. F (x) = 1 ? eAx e, (4) where et = (1; 1; :::; 1) 2 Rr and A is an r r stable matrix with non-negative o diagonal entries, non-positive row sums and negative diagonal entries. The initial probability vector is given by (; m+1 ) with e + m+1 = 1. One interpretation for this distribution is that it represents the time to absorption of a Markov process dened on the states labeled 1; 2; :::; r + 1, where the states 1; :::; r are transient and the state r + 1 is absorbing. (The number r is the dimension of the "distribution representation.) The innitesimal generator # 0 A A for this process can be written as 0 0 , where Ae + A0 = 0 and the initial probability vector of the process is given by (; dr+1) where e + dr+1 = 1 ( dr+1 = 0 in our analysis). The density function is given by f (x) = eAxA0, for x > 0, (5) (see Neuts [9], page 44 for details). Let I denote the r r identity matrix, let Ik denote the rk rk identity matrix and let ek denote the rk column vector that consists entirely of 1's. Let Xi , 1 i n, denote the 5 i.i.d. times to failure of the n components. Then, min (X1; X2) has a phase distribution with representation (2 ; A2) where 2 , A2 A I + I A and denotes the Kronecker product (see Neuts [9] for details). Apply this recursively to see that min (X1; ; Xk ) for k 2 f1; 2; ; ng has a phase distribution with representation (k ; Ak ) where ( for k = 1 k otherwise k?1 ( for k = 1 Ak A Ak?1 I + Ik?1 A otherwise . The function 1 ? [1 ? F (x)]k is the distribution function of min (X1; ; Xk ) which is a phase distribution with representation (k ; Ak ). Consequently, 1 ? [1 ? F (x)]k = 1 ? k eAk x ek , for k 2. (6) Expand F (x)i?1 = [1 ? (1 ? F (x))]i?1 in (2) and (3) with the Binomial theorem and use (4), (5) and (6) to see that fi (x) and p(i; x) can be written as follows: ! i?1 i ? 1 ! X n Ax 0 f (x) = i e A (?1)i?1?k eAn?k?1 xe (7) i p (i; x) = Example i n i ! i X k=0 n?k?1 k k=0 ! i i?k An?k x e . n?k k (?1) n?k e n?k?1 (8) In order to approximate a Weibull distribution with shape parameter equal to c and location parameter equal to b, Malhotra and Reibman [6] suggest solving the following two equations for r and : r?1 = b? c + 1 c (9) r(r + 1)?2 = b2 ? c +c 2 . (If the above produces a nonintegral solution for r then choose the closest integer to the solution.) Then use an Erlang distribution with parameters r and to approximate the given Weibull. Let = (1; 0; 0; 0) and 2 6 ? 0 ? A = 664 0 0 0 0 ? 0 0 ? 6 3 0 0 777 . 5 Then X has an Erlang distribution with parameters r = 4 and and distribution function given by F (x) = 1 ? eAx e. From (9), this distribution can be used to approximate a Weibull distribution with parameters c = 2 and b = 4:51?1. 3 Preliminary results In order to simplify the algebraic exposition, a number of results and denitions that will be needed in the rest of the paper are collected in this section. A number of identities involving F () and f () will be required and are listed below: T Z Z 0 1 0 i h [1 ? F (t)]k dt = k eAk T ? Ik A?k 1 ek , for k = 1; ; n (10) [1 ? F (t)]k dt = ?k A?k 1 ek , for k = 1; ; n (11) x Z 0 Z Z 0 T yf (y)dy = A?1 eAx e ? xeAx e ? A?1 e, for x > 0 x 0 (12) F (t)dt = x ? A?1 eAxe + A?1 e, for x > 0 2 (13) 2 (T ? t)2 f (t)dt = 2T A?1 e + 2 A?1 e ? 2 A?1 eAT e + T 2 , (14) (see Appendix). For i j n, dene S1 (j; T ); S2(j; T ); S3(j; T ) as follows: S1 (j; T ) S2(j; T ) S3(j; T ) T Z 0 T Z 0 Z T 0 x2f (x)[1 ? F (x)]j dx (15) xf (x)[A?1 eAxe][1 ? F (x)]j dx (16) xf (x)[1 ? F (x)]j dx. (17) Then it is shown in the Appendix that the following identities hold: S1 (j; T ) = 1 T 2 j+1 eAj+1 T A?j+1 ? 2Tj+1eAj+1 T A?j+11 ih h i3 1 +2j +1 eAj+1 T ? Ij +1 A?j +1 S2 (j; T ) = n i h ej A0 1 T A?1 j+1 eAj+2 T A?j+2 h ?A?1 j+1 eAj+2 T S3 (j; T ) = ? Ij i +2 h 1 A?j+2 2 2 ej+1 a0 i 1 Tj+1 eAj+1 T A?j+1 ? j+1 eAj+1 T ? Ij+1 A?j+11 7 (18) (19) 2 ej A0 . (20) On letting T go to innity in (18) to (20) and noting that for any substochastic matrix, A, limx!1 eAx = 0, the following can be obtained: 1 S1 (j; 1) = ?2j+1 A?j+1 3 ej A0 1 A?1 j+1 A?j+2 S2 (j; 1) = 1 S3 (j; 1) = j+1 A?j+1 2 2 ej+1 A0 (21) ej A 0 . (22) (23) Now some identities involving F (), f () and fi () are needed. For 1 i n dene U1 (i; T )and U2 (i; T ) by Z U1 (i; T ) = T xfi (x) 0 x Z 0 yf (y)dy fF (x)g?1 dx (24) x2fi (x)dx, (25) and T Z U2 (i; T ) = 0 respectively, and U3 (i; T )and U4 (i; T ) by U3 (i; T ) = Z T 0 xfi (x)Ki(x) Z x 0 yf (y)dy fF (x)g?1 dx (26) and U4 (i; T ) = Z T 0 x2fi (x)Ki(x)dx, (27) respectively, where Ki(x) [1 ? F (x)]?(n?i) nX ?i j =m?i [F (T ) ? F (x)]j [1 ? F (T )]n?i?j . (28) It is shown in the Appendix that U1(i; T ), U2 (i; T ), U3 (i; T ) and U4(i; T ) can be written in terms of S1 (; T ), S2(; T ) and S3 (; T ): U1 (i; T ) = i ni i?2 X !( ! i?1 X k=0 ! i ? 1 (?1)i?k?1 S (n ? k ? 1; T ) 1 k i ? 2 (?1)i?k?2 [S (n ? k ? 2; T ) ? S (n ? k ? 2; T ) ? 1 2 k k=0 io + A?1 e S3 (n ? k ? 2; T ) (29) 8 i?1 i ? 1 X U2 (i; T ) = i ni (?1)i?1?k S1(n ? k ? 1; T ) k k=0 ! ! ! ! j ?i X i?1 nX X i ? 2 n ? i j n (?1)i?2?k U3 (i; T ) = i i k j l k=0 j =m?i l=0 n+l?i?j eAn+l?i?j T en+l?i?j fS2 (i + j ? k ? l ? 2; T ) ! ! ? A? e S (i + j ? k ? l ? 2; T ) ? S (i + j ? k ? l ? 1; T ) ! ! ! ! j i? nX ?i X X n i ? 1 n ? i j U (i; T ) = i i (?1)i?k k j l k j m?i l n l?i?j eAn+l?i?j T en l?i?j S (i + j ? k ? l; T ). 1 3 (30) o 1 (31) 1 4 =0 = =0 + 1 + (32) 4 Expected cost and Variance per unit time for T -age replacement policies A T -age replacement policy calls for replacement every T -units of time. The expected cost per unit time equals T ?1 ( c0 + ncs + (cr ? cs) nF (T ) + ncd Z 0 T F (t)dt ) (see Okumoto and Elsayed [11]). Using (13) in the above expression, the expected cost per unit time associated with a T -age replacement policy can be seen to equal n h io T ?1 c0 + ncr + n (cs ? cr ) eAT e + ncd T + A?1 e ? A?1 eAT e . (33) The asymptotic variance per unit time can be written as T ?1 ( +nc2d n (cr ? cs )2 F (T ) [1 ? F (T )] + 2ncd (cr ? cs)[1 ? F (T )] Z 0 T (T ? t)2 f (t)dt ? nc2d "Z T 0 F (t)dt #29 = ; Z o T F (t)dt (34) (see the Appendix). Use (4), (13) and (14) in the above to obtain a result not involving integration. For T-age policies, the expressions for expected cost and variance per unit time only involve matrices of dimension r. 9 Example continued Suppose three components are operating in parallel. Let the cost parameters c0 , cs , cr and cd equal 70, 10, 50 and 30, respectively. Assume that equal to 1.5. Then the expected cost of a T - policy equals h i e?1:5T ?33:75T 2 + 120T ?1 + 90 ? 20T ?1 + 90, while the variance per unit time equals h e?1:5T 2025T 3 ? 4050T 2 ? 8100T ? 7200 i h i +e?3T ?379:69T 5 + 2025T 3 + 2700T 2 ? 2700T ? 4800T ?1 ? 7200 + 4800T ?1. Figure 1 contains plots of the expected cost and variance per unit time as a function of T . The 2:3-age replacement policy has an expected cost of 80:15, minimizes the expected cost per unit time and has an associated variance of 1367. Because calculation of expected costs and variances is now a computationally easy matter, the decision maker can also consider other approaches. For instance, the decision maker might decide that the 2% increase in expected cost in going from a 2:3-age to 1.8-age replacement policy is worth the 19% decrease in variance. insert Figure 1 about here 5 m-failure policies In this section, expressions will be provided that enable computation of the expected cost per unit time and asymptotic variance associated with any given m-failure policy. 5.1 Expected cost per unit time Use (3), (48), (53) and (11) to see that the expected length of the cycle, , and the expected downtime incurred during the cycle, E [D], can be written as follows: mX ?1 ! n = i=0 i mX ?1 i ni E [D] = i=1 i X (?1)i?k+1 k=0 i X ! k=0 (?1)i?k+1 10 ! i ?1 k n?k An?k en?k ! i A?1 e . k n?k n?k n?k (35) (36) The expected cost of a cycle is given by the following: E [C ] = c0 + mcr + (n ? m)cs + cd E [D]. (37) The mean asymptotic cost E [C ]= is obtained from (35),(36) and (37). 5.2 The asymptotic variance associated with m-failure policies Note that, for m-failure policies, V ar[C ] = c2d E [D2] ? E [D]2 . Thus, from (1), the asymptotic variance per unit time associated with an m-failure policy can be calculated once expressions for 2 , E [CL], E [D2], , E [D] and E [C ] are available. Identities for , E [D] and E [C ] have been provided in (35), (36) and (37). Expressions for 2, E [CL] and E [D2] will now be provided. Use (25) and (49) to obtain the following: h i 2 = E L2 ? 2 = U2(m; 1) ? 2 (38) For m = 1, E [D] = E [D2] = 0 E [CL] = [c0 + mcr + (n ? m) cs] V ar[C ] = 0. (39) In what follows the more dicult case where m 2 is considered. The expression E [CL] can be written as follows: E [CL] = [c0 + mcr + (n ? m) cs] + (m ? 1)cd Z 0 1 Z x 0 F (t)dt [F (x)]?1 xfm (x)dx (see Appendix, equation (52)). Use (2) and (13) to see that the integral on the right hand side can be written as 1 ! x m mn f (x)F (x)m?2 [1 ? F (x)]n?m dx 0 ! Z 1 n mX ?2 m ? 2 ! o n x x ? A?1 eAx e + A?1 e m m f (x) = (?1)m?2?k [1 ? F (x)]n?2?k dx k 0 k=0 Z n x ? A?1 eAx e + A?1 e o 11 Now use (21), (22) and (23) to obtain the result E [CL] = [c0 + mcr + (n ? m!)cs] mX ?2 m ? 2 ! n (?1)m?2?i [S1 (n ? 2 ? i; 1) +cd m(m ? 1) m i i=0 i ?S2(n ? 2 ? i; 1) + A?1e S3(n ? i ? 2; 1) (40) From (67) in the Appendix, the following can be obtained E [D2] = Z 1 0 +2 x2 Z ( 1 0 mX ?1 x ) fi(x) + (m ? 1)2fm (x) dx i=1 ( mX ?1 i=2 ) (i ? 1)fi (x) ? (m ? 1) fm (x) 2 Z x ? 1 yf (y)dy .(41) [F (x)] 0 Recall denitions (24) and (25) and apply (29) and ( 30) in (41) to obtain the result: E [D2] = mX ?1 i=1 U2 (i; 1) + (m ? 1)2 U2(m; 1) + 2 Example continued mX ?1 i=2 U1 (i; 1) ? 2(m ? 1)2U1 (m; 1). (42) Suppose that n = 3 and the failure distribution is phase type with representation = (1; 0; 0; 0) and 3 2 ? 0 0 7 6 A = 664 00 ?0 ? 0 775 . 0 0 0 ? 2 2 Then, , E [D], , E [CL] and E D can be calculated from (35), (36), (38), (40 ) and (42), respectively (for m = 1 apply (39)). For n = 3, c0 = 70, cs = 10, cr = 50 and cd = 30 and = 1:5, Table 1 contains the expected cost and variance per unit time for 1, 2 and 3-failure policies. The 2-failure policy has the smallest expected cost per unit time (81.45) and its variance equals 1368. insert Table 1 about here 6 (m; T )-policies In this section assume that an (m; T )-policy (i.e. replace at the time of the mth failure or time T whichever occurs rst) is being followed. First an expression for the expected cost per unit time will be provided. Then explicit results will be provided for each of the terms in (1) which is the expression for the asymptotic variance per unit time. 12 6.1 Expected cost per unit time From (3), (8) and (54): mX ?1 E [N ] = i ni i=0 +m n X i=m ! i X ! i eAn?k T e n?k n?k k k=0 ! ! i i n X i ? k An?k T e . n?k k n?k e i k=0(?1) (?1)i?k (43) From (8), (10) and (48) the expected cycle length can be written as = mX ?1 ! n i i=0 i X (?1)i?k k=0 ! i A?1 heAn?k T ? I i e , n?k n?k k n?k n?k (44) while the expected downtime is given by mX ?1 E [D] = j nj j =1 ! j X k=0 i h (?1)j ?k n?k A?n?1 k eAn?k T ? In?k en?k . (45) The expected cost for a cycle is given by: E [C ] = c0 + ncs + (cr ? cs) E [N ] + cdE [D], (46) 6.2 The asymptotic variance associated with (m; T ) policies In order to calculate (1), it is necessary to calculate , E [D], E [C ], 2, E [CL] and V ar[C ]. The terms , E [D], E [C ], are provided in (44), (45) and (46), respectively. An expression for 2 = E L2 ? 2 follows from (44) and the identity: h E L 2 i = Z T 0 t2fm (t) dt + T 2 = U2(m; T ) + T 2 mX ?1 i=0 mX ?1 i=0 n i p (i; T ) ! i X k=0 (?1)i?k ! i eAn?k T e , n?k k n?k where the last equality follows from (2), (8) and (25). For m = 1, E [D] = E [D2] = 0, E [CL] = = T Z 0 Z T 0 (cr ? cs ) xf1 (x) dx ! (cr ? cs ) x n1 f (x) [1 ? F (x)]n?1 ! = (cr ? cs ) n1 S3 (n ? 1; T ), 13 n o and V ar[C ] = (cr ? cs )2 E N 2 ? E [N ]2 . In what follows the more dicult case where m 2 is considered. From (52) in the Appendix: Use ( 2), (3) and (13), apply (18), (19) and (20) in (52) to get ! mX ?1 ! m ? 1 S (n ? 1 ? k; T ) A?1 e 3 k k=0 ! ! mX ?2 n m ? 2 m ? 2?k (?1) +cd m(m ? 1) m fS1 (n ? 2 ? k; T ) k k =0 o ?S2 (n ? 2 ? k; T ) + A?1 e S3 (n ? 2 ? k; T ) ! ! i mX ?1 X n i +T (cr ? cs ) i i (?1)i k n?k eAn?k T en?k i=1 k=0 ! ! i mX ?1 X n i i ? k ? 1 (?1) +Tcd i i n?k eAn?k T en?k k i=1 k=0 n o ? 1 AT T ? A e e + A?1 e + (c0 + ncs) . E [CL] = m (cr ? cs ) mn 2 (?1)m?1?k Calculation of V ar[C ] is somewhat more complicated than the other calculations. First note that n h i o V ar[C ] = (cr ? cs)2 E N 2 ? E [N ]2 + 2cd (cr ? cs ) fE [DN ] n h i o ?E [D]E [N ]g + cd E D ? E [D] . 2 2 2 Expressions for E [N ] and E [D] are provided by (43) and (45), respectively. Condition on the number of failures at time T and use (8) to obtain E [N 2] = = mX ?1 i=0 mX ?1 i=0 i2p(i; T ) + m2 i +m2 2 mX ?1 n i ! i X n X i=m p(i; T ) ! i (?1)i?k eAn?k T e n?k n?k k=0 k ! ! i i (?1)i?k eAn?k T e . n X n?k n?k i k=0 k i=0 Use expression (61) from the Appendix to obtain E [DN ] = (Z 0 T F (t) dt ?1 n mX X ) mX ?1 i=0 mj ni + i=m j =1 i 2 ! ! ! i?1 i ? 1 n X i?k?1 [1 ? F (T )]n?k?1 i k=0 k (?1) ! j i?j ! ! X i X j i ? j k +l j k=0 l=0 (?1) k l 14 (Z T [1 ? F (t)]n?k?l 0 ) [1 ? F (T )]l Apply (8), (10) and (13) to get E [DN ] = n o T ? ?1 eAT e + A?1 e . ! ! i?1 mX ?1 X n i ? 1 i ? 1?k 2 (?1) i i n?k?1 eAn?k?1 T en?k?1 k i=0 k=0 ! ! j i?j ! ! ?1 n mX XX X n i j i ? j k +l mj i + j k=0 l=0 (?1) k l i=m j =1 i h n?k?l eAn?k?l T ? In?k?l en?k?l l eAl T el . Thus only the term E D2 remains to be calculated. Note that i h i h E [D2] = E IfmT g D2 + E Ifm>T g D2 . (47) Thus E [D2] can be computed by computing the two terms on the right hand side of (47). Use (13) and (14) in (62) from the Appendix to obtain: h E Ifm >T g D 2 mX ?1 j nj = j =1 i +2T jX ?2 k=0 ! j ?1 X k=0 A?1 e + 2 (?1)j ?2?k j?1 k (?1)j ?1?k A?1 j ?2 k ! 2 h e ? 2 ! ih h n?k?1 eAn?k?1 T en?k?1 T 2 A?1 2 eAT e mX ?1 ! + j (j ? 1) nj j =1 i n?k?2 eAn?k?2 T en?k?2 i2 [T ? A?1 eAT e + A?1 e . From (67) in the Appendix: apply (29), (30), (31) and (32) to get h i E Ifm T g D2 = 2 mX ?1 i=2 U3 (i; T ) ? (m ? 1)2U1(m; T ) + mX ?1 i=1 U4 (i; T ) + (m ? 1)2U2 (m; T ). The above expressions are algebraically complex. However, for a given phase distribution, all reduce to tractable closed form expressions. Example continued For n = 3, = 1:5 and c0 = 70, cs = 10, cr = 50, cd = 30, Figure 2 and 3, respectively, contain the expected costs and variance per unit time for (1; T ), (2; T ) and (3; T ) failure 15 policies as a function of T. The (2; 2:5) policy has the smallest expected cost per unit time (79.97) and its associated variance is 1479. insert Figure 2 about here insert Figure 3 about here 7 Conclusion Group maintenance policies form an important part of the reliability literature. However the analyst has often been restricted to a very narrow (and often inappropriate) range of distributions. Also, given the computational complexity of the problems, sensitivity analyses where failure time and cost parameters can be varied have been problematic. By allowing the analyst to choose an arbitrary phase distribution, the applicability of group maintenance approaches is greatly increased. A contribution of this paper has been to provide explicit closed form results for the major policy classes when the failure time has a phase distribution. These results, which in general appear quite algebraically daunting, are computationally relatively easy for any given problem. This demonstrates once again that, as predicted by Neuts [9], use of phase distributions can be of great practical utility. Sensitivity analyses are now very easy to conduct. Unlike most of the literature, the variability associated with group maintenance policies has been explicitly modeled. (The results provided for calculating the asymptotic variability per unit time apply to general failure time distributions as long as the rst two moments are nite). The closed form results for the asymptotic variance allow the analyst to consider criteria other than simply that of minimizing expected cost per unit time. Indeed, in many applied situations, an analyst might be willing to tolerate an increased expected cost in order to reduce variability. In any case, even if choosing the policy that minimizes expected cost is the analyst's objective, knowledge of the associated variability provides important managerial information. 16 Appendix This Appendix is divided into two parts. In xA expressions for calculating (1) are derived. xB contain the phase results listed in x3. A Calculating Asymptotic Cost and Variance Per Unit Time Explicit expressions for the terms in (1) required to compute the asymptotic expected cost and variance per unit time associated with with (m; T ) policies will now be derived. (Similar expressions for m-policies can be obtained by letting T ! 1 in the appropriate places.) The results of this section are general and apply to any continuous failure distribution with nite rst and second moments. (A summary of these results together with some examples can be found in Wilson [16]). A.1 Calculating ; ; E [CL] and E [C ] for (m; T ) policies 2 Expressions for ; 2; E [CL] and E [C ] are provided by (48), (49), (52), and (55), respectively. Expressions for ; and 2 follow by noting that = = T Z Z P [min(T; m) > t] dt 0 ?1 T mX i?0 0 p(i; t)dt (48) and h i 2 = E fmin (T; tm)g2 ? 2 = T Z 0 t fm (t) dt + T 2 2 mX ?1 i=0 P (i; T ) ? (Z 0 ?1 T mX i=0 )2 p(i; t)dt . (49) In order to calculate E [CL] it is rst necessary to nd expressions for E [C j L = T ] and E [C j L = x], where x < T . Suppose that replacement occurs at x < T; i.e. replacement occurs at the mth failure. The downtime incurred over the cycle is the sum of the downtimes for each of the rst m ? 1 failures. For any y > 0; the expected downtime for an individual machine given that it has failed before time y and is replaced at time y is given by 17 y Z y ? E [X j X < y] = y ? 0 y Z P [X > t j X < y] dt = [F (y)]?1 0 F (t) dt Thus, for x < T E [C j L = x] = c0 + mcr + (n ? m) cs + cd (m ? 1) [F (x)]?1 x Z 0 F (t) dt, (50) where the rst three terms on the right hand side represent the xed and unit costs of replacing the system with a new one, while the last term is the expected downtime cost. Now suppose that the conditioning information is that the cycle ends at time T . Let Nf (t) denote the number of machines that have failed by time T: Then E [C j L = T ] = c0 + ncs + (cr ? cs) E [Nf (T ) j m T ] +cd mX ?1 i=1 P [Nf (T ) = i j m T ] i [F (T )]?1 T Z 0 F (t) dt, (51) where the rst three terms represent the xed and unit costs of replacing the system with a new one, while the last term is the expected downtime cost. Use (50) and (51) and the expression for given by (48) to obtain the following: E [CL] = = = Z xE [C j L = x] dG (x) T Z 0 Z xE [C j L = x] fm (x) dx + TE [C j L = T ] P [m T ] T 0 +T Z x ? 1 F (t) dt xfm (x) dx m (cr ? cs ) + cd (m ? 1) [F (x)] mX ?1 ( i=1 i (cr ? cs ) + icd + (co + ncs ) Z ?1 T mX 0 i=0 0 T [F (x)]?1 (T ) F (t) dt Z 0 p (i; t) dt: ) p (i; T ) (52) Let N and D, respectively, denote the random variables corresponding to the number of failures and the downtime accumulated during a cycle. On noting that E [min (T; j )] = ?1 RT R T mP o P [j > t] dt = o i=0 p (i; t) dt; it can be seen that the expected downtime in a cycle is given by the following: E [D] = mX ?1 j =1 jE [min (T; j+1) ? min (T; j )] 18 ?1 T mX Z = 0 j =1 jp (j; t) dt. (53) Condition on the number of failures at time t to obtain the expected number of failures in a cycle: E [N ] = mX ?1 j =1 ip (i; T ) + m n X j =m p (i; T ). (54) Using (53) and (54), the expected cost incurred during a cycle can be written as follows: E [C ] = co + ncs + (cr ? cs) E [N ] + cd E [D] = co + cs + (cr ? cs ) +cd ?1 T mX Z 0 j =1 mX ?1 j =1 ip (i; T ) + m (cr ? cs) n X j =m p (i; T ) jp (j; t) dt. (55) A.2 Calculating V ar [C ] for (m; T ) policies Now an expression for Var [C ] will be provided. The variance of the cost of one cycle is given by V ar [C ] = V ar [co + ncs + (cr ? cs) N + cdD] n h i = (cr ? cs )2 E N 2 ? E [N ]2 o +2cd (cr ? cs ) fE [DN ] ? E [D] E [N ]g n h i h io n o +c2d E Ifm T g D2 + E Ifm >T gD2 ? c2d E [D]2 . (56) Expressions for E [D] and E [N ] are provided by (53) and (54), respectively. Expressions h i h i for E [N 2], E [DN ], E Ifm T gD2 and E Ifm >T g D2 will now be provided. These together with (53) and (54) can then be inserted into (56) for explicit evaluation of V ar[C ]. Condition on the number of failures at time T to obtain: h i E N2 = mX ?1 i=0 i2p (i; T ) + m2 n X i=m p (i; T ). (57) The random variable Nf (T ) equals the actual number of failures if the system is replaced at time T: If the system is replaced before time T; Nf ; (T ) represents the number of failures 19 that would have occurred up to time T if the system had not been replaced. Condition on the value of this random variable to obtain E [DN ] = mX ?1 i=0 iE [D j Nf (T ) = i] p (i; T ) + From (52), for i < m, E [D j Nf n X i=m mE [D j Nf (T ) = i] p (i; T ) (58) Z T ? 1 F (t) dt. (T ) = i] = i [F (T )] (59) 0 Now consider the case where i m. Conditioned on Nf (T ) = i, where i m; one has a system of i independent machines. The (conditional) distribution function of the failure time of one of these i machines equals [F (T )]?1 F () since the only information about the machine is that it would have failed before time T . So, in order to compute the expected downtime conditioned on Nf (T ) = i; one can act as if the system consists of i (instead of n) machines with i.i.d. failure times with distribution function equal to [F (T!)]?1 F (). Thus, for this system, the probability of j failures by time t is given by i hF (T )?1 F (t)ij h1 ? F (T )?1 F (t)ii?j: j Use this and proceed as in (53) to obtain: E [D j Nf (T ) = i] = = = mX ?1 jE [min (T; j+1) ? min (T; j ) j Nf (T ) = i] j =1 Z T m ?1 X 0 Z 0 j =1 m ?1 T X jP [Nf (t) = j j Nf (T ) = i] dt ! j ji [F (T )]?i [F (t)]j [F (T ) ? F (t)]i?j dt. (60) j =1 Use (59) and (60) in (58) to obtain E [DN ] = [F (T )]?1 n X Z + m ni i=m T 0 F (t) dt !Z T mX ?1 i?0 8 ?1 <m X : j =1 o i ? j [F (T ) ? F (t)] dt . 0 h i2 p (i; T ) ! i n?i j j [1 ? F (t)] [F (t)] (61) i h i It only remains to nd expressions for E Ifm >T gD2 and E Ifm T gD2 . Suppose it is known that exactly j < m machines have failed by time T . Then, conditioned on this 20 o nP information, D has the same distribution as ji=1 (T ? Yi ) , where Y1 ; : : :; Yj are i.i.d. random variables with density equal to [F (T )]?1 f (), the density of the time to failure, Y , of a single machine given that it fails by time T . Use this to obtain h i E Ifm >T g D2 = = mX ?1 h i E D2 j Nf (T ) = j p (j; T ) j =1 mX ?1 E 28 < j 6 X 4 : i=1 j =1 mX ?1 n h + (T ? Yi ) 923 = 7 5 ; p (j; T ) i o jE (T ? Y )2 + j (j ? 1) (E [T ? Y ])2 p (j; T ) ; j =1 h i where the last equality follows since the i.i.d. nature of the Yi implies that E (T ? Yi )2 = h i E (T ? Y )2 and E [(T ? Yi ) (T ? Yj )] = E f[T ? Y ]g2 ; for i6= j: Use the results that h i R R E (T ? Y )2 = 0T (T ? y)2 F (T )?1 f (y) dy and E [T ? Y ] = 0T F (T )?1 F (y) dy and simplify to obtain h E Ifm >T gD 2 i = ( ) Z T mX ?1 ? 1 2 jp (j; T ) F (T ) (T ? y ) f (y ) dy 0 j =1 ( )2 8 Z T ?1 <m X ?1 + F (T ) 0 F (t) dt : j =1 9 = j (j ? 1) p (j; T ); . (62) P Suppose the mth failure occurs before time T , then D = mi=1?1 (m ? i ). Use this to obtain h 2 i E Ifm T g D2 = E 4Ifm T g " ( mX ?1 i=1 ( )2 3 (m ? i ) = E Ifm T g (m ? 1)2 m2 + +2 For 2 i m, h mX ?1 i=2 5 mX ?1 i=1 i2 ? 2 (m ? 1) m (m?1 + : : : + 1) )# i (i?1 + : : : + 1 ) i E IfmT g i2 = = T Z 0 Z T 0 h (63) i x2 E Ifm T g j i = x fi (x) dx x2 P [Nf (T ) m j Nf (x) = i] fi (x) dx: 21 Conditioned on the event fNf (x) = ig ; there are n ? i functioning machines at time x each of whose lifetime distribution function equals [1 ? F (x)]?1 F () : For the event fNf (T ) mg to occur, at least m ? i of these must fail by time T , i.e. nX ?i ! n ? i fP [X T j X > x]gj P [Nf (T ) m j Nf (x) = i] = j j =m?i fP [X > T j X > x]gn?i?j ! nX ?i n ? i ? (n?i) = [1 ? F (x)] j j =m?i [F (T ) ? F (x)]j [1 ? F (T )]n?i?j . (64) Let Ki (x) denote (64), the probability that at least m machines will have failed by time T given that exactly i have failed by time x. Thus, h Z i E IfmT g i = 2 0 T x2Ki (x) fi (x) dx; (65) for 2 i m. Note that conditioned on the events i = x and fm T g, the quantity i?1 + : : : + 1 is the sum of i ? 1 independent failure times each with density function equal to F (x)?1 f (). Consequently, h i E Ifm T g i (i?1 + : : : + 1 ) = = T Z 0 Z 0 T xKi (x) E [i?1 + : : : + 1 j 1 = x] fi (x) dx xKi (x) (i ? 1) Z fi (x) dx. Use (65) and (66) in (63) to obtain h E Ifm T g D 2 i = Z T ( 0 x y [F (x)]?1 f (y) dy . (66) ) mX ?1 fi (x) Ki (x) + (m ? 1) fm (x) dx i=1 ) Z T (m ?1 X 2 (i ? 1) fi (x) Ki (x) ? (m ? 1) fm (x) +2 x 0 i =2 Z x ? 1 yf (y) dy dx. [F (x)] 0 x 2 2 0 (67) B Derivation of expressions given in x3 The following properties of the Kronecker product will be useful in the sequel. Let P , Q, U , V , W , Z be rectangular matrices such that the ordinary matrix product PQU and V WZ 22 are dened, then (PQU ) (V WZ ) = (P V ) (Q W ) (U Z ) . (68) For any square matrices P and Q : e(P IQ )x+(IP Q)x = ePx eQx, (69) where IQ and IP are identity matrices of the same dimension as Q and P , respectively (see Neuts [10], p.373). Derivation of (10)-(14) The function 1 ? [1 ? F (t)]k is the distribution function of min (X1; ; Xk ) which is a phase distribution with representation (k ; Ak ). Consequently, 1 ? [1 ? F (t)]k = 1 ? k eAk t ek , for k 2 (70) and 0T [1 ? F (t)]k dt = 0T k eAk t ek dt, from which (10) and (11) follow. Note that, for x > 0, R R x Z 0 Z 0 3 2 tA?1 eAt edt = x A?1 eAx e ? A?1 eAx ? I e x A?1 eAt edt = A?1 Z 0 x 2 eAx ? I e eAt edt = A?1 eAx ? I e, simplify and use integration by parts to obtain (12), (13) and (14). Derivation of (18)-(20) Use (6) and (5) and apply (68) and (69) to obtain S1(j; T ) = = = T Z 0 Z 0 Z 0 T T x2 j eAj x ej eAx A0 dx x2 j eAj x ej eAx A0 dx x2 (j ) eAj x eAx ej A0 dx, where the second equality follows since the product of scalars is, trivially, a Kronecker product. Now apply (68), (69) and the denition of j +1 and Aj +1 to obtain S1(j; T ) = (Z T 0 x j+1 2 ) eAj+1 x dx 23 ej A 0 , from which (18) follows on integrating by parts 2 times. Use (6), (5) and apply (68) and ( 69) to obtain T Z S2(j; T ) = 0 T Z = 0 h x eAxA0 A?1 eAx e j eAj xej dx x eAxA0 i A?1 j eAj+1 x ej+1 dx where the last equality follows by applying (68) and using the denition of j +1 and Aj +1 . Again apply (68) and the denition of Aj +1 and ej +1 to obtain S2 (j; T ) = = = T Z Z 0 0 Z 0 x A?1 j eAj+1 x ej+1 eAxA0 dx T A?1 j eAj+1 x eAx dx ej+1 A0 Th i A?1 j+1 eAj+2 x dx ej+1 A0 ) , from which (19) follows on applying integration by parts. Again, use (6), (5), (68) and (69) to obtain S3 (j; T ) = = Z T 0 (Z 0 x j eAj x ej eAx A0 dx T ) xj+1 eAj+1 x dx ej A0 . Integration by parts of the above expression yields (20). Derivation of (29)-(32) Use the Binomial theorem to expand [F (x)]i?1 = f1 ? [1 ? F (x)]gi?1 and [F (T ) ? F (x)]i = f[1 ? F (x)] ? [1 ? F (T )]gi in (2) and (28), respectively. Insert the resulting expressions for fi (x) and Ki(x) into the denition for U3 (i; T ) and U4(i; T ), gather terms and recall the denition of S1(; ), S2 (; ) and S3(; ) to obtain the results given in (31) and (32). Similarly, (29) and (30) follow by inserting fi (x) into ( 24) and (25), expanding [F (x)]i?1 using the Binomial theorem and recalling the denition of S1 (; ), S2(; ) and S3 (; ). 24 References [1] D. Assaf and J.G. Shanthikumar, Optimal group maintenance policies with continuous and periodic inspection, Management Science 33 (1987) 1440-1452. [2] A. Bobbio and A. Cumani, Modeling wear-out by multistate homogeneous Markov models. In: Reliability in Electrical and Electronic Components and Systems , eds. C. Lauger and J. Moltorf, North - Holand, 1982, pp. 101-106. [3] S. Chakravarthy, Analysis of production line systems with two unreliable machines with phase type processing times and a nite storage buer, Communications in Statistics: Stochastic Models 3, (1987) 369-391. [4] M. Gururajan and K. S. Bhat, A complex priority redundant system with phase type distribution, Microelectronics and Reliability. 30 (1990) 3, 453-455. [5] M. A. Johnson, Selecting parameters of phase distributions: combining nonlinear programming, heuristics, and Erlang Distributions, ORSA Journal of Computing 5, 1 (1993) 69 - 80. [6] M. Malhotra and A. Reibman, Selecting and implementing phase approximations for semi-Markov models, Communications in Statistics: Stochastic Models. 9, 4 (1993) 473-506. [7] T. A. Mazzuchi and R. Soyer, A Bayesian perspective on some replacement strategies, Reliability Engineering and System Safety 61 (1996) 295-303. [8] T. Nakagawa, Further results on replacement problem of a parallel system in a random environment, Journal of Applied Probability 16 (1979) 923-926. [9] M. Neuts, Matrix-Geometric Solutions in Stochastic Models: An Algorithmic Approach , Johns Hopkins University Press, Baltimore, 1981. [10] M. Neuts, Algorithmic probability: a collection of problems , Chapman and Hall, 1995. 25 [11] K. Okumoto and E.A. Elsayed, An optimum group maintenance policy, Naval Research Logistics Quarterly 30 (1983) 667-674. [12] E. Popova and J. G. Wilson, Selecting and implementing the best group replacement policy for a non Markovian system, in: Proceedings of the International Conference on Probabilistic Safety Assessment and Management , eds. C. Cacciabue et al., SpringerVerlag, 1996, pp. 58 - 63. [13] P. Ritchken and J. G. Wilson, (m; T ) group maintenance policies, Management Science 36, 5 (1990) 632-639. [14] S. Ross, Stochastic Processes , John Wiley, New York, 1996. [15] W. Smith, Renewal theory and its ramications, Journal of the Royal Statistical Society B 20 (1958) 243-302. [16] J. G. Wilson, A note on variance reducing group maintenance policies, Management Science 42, 3 (1996) 452-460. [17] J. G. Wilson and A. Benmerzouga, Optimal m-failure policies with random repair time, Operations Research Letters 9 (1990) 203-209. [18] J. G. Wilson and A. Benmerzouga, Bayesian group replacement policies, Operations Research 43, 3 (1995) 471-476. [19] J. G. Wilson and E. Popova, Adaptive replacement policies for a system of parallel machines, in: Lifetime Data:Models in Reliability and Survival Analysis , eds. N.P.Jewell et al., Kluwer Academic Publishers, 1996, pp. 371-375. [20] J. G. Wilson and E. Popova, Optimal Bayesian group maintenance policies, working paper, department of Mechanical Engineering, The University of Texas at Austin, Austin, 1997. 26 m Expected cost per unit time Asymptotic variance 1 2 3 85.73 81.45 84.77 2177 1368 1059 Table 1. Expected cost and variance per unit time for 1, 2 and 3 - failure policy for n = 3, c0 = 70, cs = 10, cr = 50, cd = 30 and the failure distribution is phase with representation given in the Example. 27