A REGRESSION APPROACH IN THE RELIABILITY OF INTERMITTENTLY USED MACHINES Fl. Popentiu VLADICESCU City University, Department of Electrical, Electronic & Information Engineering Northampton Square, London EC1V, OHB, UK, E-mail: Fl.Popentiu@city.ac.uk Grigore ALBEANU Bucharest University, Faculty of Mathematics Academiei 14, RO-70109, Bucharest, Romania, E-mail: albeanu@math.math.unibuc.ro Elena UNGUREANU GRIGORE Spiru Haret University, Faculty of Mathematics and Informatics Bucharest, Romania, E-mail: grigore@timken.com Abstract: A methodology based on computational statistic and simulation for analyzing failures of intermittently used machines is presented. An additive non-parametric logistic regression is used to compute the conditional probability of failure. Key words: intermittently used machines, non-parametric logistic model, estimator, algorithm 1. Introduction Many machines are used intermittently. For such a situation a failure can occur either when the machine is on or off. Because the stress received by a system while on is different from the stress received while off, a good approach is to use two models to describe the failures, one for the on-case, when if a failure appears, the precise time of failure is known, and the other for the off-case, when the failure is detected only when turning on the system. This paper continuous the Follmann [1] approach using an an additive non-parametric logistic model in order to described the off period. Only repairable systems are considered. 2. Notations and terminology Let T be the time to failure of a system, which is a random variable whose characteristics may be interpreted as components of a global reliability model. If F is the cumulative distribution function (cdf) of T: F(t) = P(T < t ), 0<t< , (1) the reliability function, called also the survivor function, is R(t) such that R(t) = P(T t), 0<t< . (2) If T is (absolutely) continuous then the probability density function (pdf) is given by f(t)=dF(t)/dt=dR(t)/dt, and the instantaneous rate of failure at T=t conditional upon survival to time t -called the hazard function or simply, the failure rate - is given by: h(t)=f(t)/R(t)= - dLog R(t)/dt. For R(0)=1 (which is a normal assumption), we get t R(t)=exp(- 0 h(u ) du ) and t f(t) = h(t) exp(- 0 h(u ) du ). In the discrete case, when T: x1, x2, … and f(xi) = P(T=xi), i=1, 2, …, the reliability function is R(t) = f ( x j ) , or if j:= f(xj)/R(xj) j| x j t then R(t) = (1 ) and j j| x j t j 1 f(xj) = j (1 i ) i 1 In the framework of this paper, T has both discrete and continuous components. Let 0 = t1 < t2 < … denote the times at which a machine is switched on 1 or off. We assume that only on periods follow when time 0 (t1=0). When the index j is odd, tj denotes the start of an on periods (or the end of an off periods). If hc denotes the hazard function for the on-periods, and j is the conditional probability of failure detected at t2j+1 then the overall reliability function is given by [2]: p E ( y | x) g j ( x j ), E[g j ( x j )] 0 , j 1 for all j, by the backfitting algorithm [4]: t R(t) exp(- h(u)du) (1 j ). 0 1. start with gˆ j ( x) 0 for all x and all j , ̂ y . 2. repeat the steps 2.1 and 2.2 2.1. for j:=1, 2, …, p do a) for i:=1 to n do p ri : yi ˆ gˆ k ( xi ) ; (3) j|t2j 1 t We assume that the continuous hazard hc is undefined during off periods. In the next section we present procedures to estimate the hazard function hc and the characteristics j , j = 1, 2, … . b) for i:=1 to n do gˆ j ( xi j ) S (r | xij ) 2.2. compute 3.1.1. =1,2,…,n. The following modules are necessary to obtain and the functions gj: A smoother (smoothing spline, kernel smoother etc). Let S(y|xi) be the value of the function obtained by smoothing the scatter plot (x, y) at the point xi. In this step, scatterplot smoothing replaces simple least squares regression. 2 k k j 3. On estimating the hazard function The estimation of the j by additive nonparametric logistic regression Let y be a binary variable (y=1 if a failure is present, else 0), and x a vector of p explanatory variables, (which describe the machine history: time since repair, the number of previous failures, the age of the machine, the number of on-off cycles, time since the last maintenance action etc). The model for P(x) = E(y |x) = P(y=1 |x) is of the form: p P( x) logit[P(x)] = log = g j (x j ) , (4) 1 P( x) j 1 E[gj(xj)]=0 for all j, where the gj are functions to be determined. If gj(xj) =jxj the model used in [1] is obtained. However, the functions gj can have different forms: trigonometric functions [3], spline functions etc. Our approach uses the local scoring method, developed by Hastie and Tibshirani in [4], in order to estimate the functions gj. The local scoring algorithm is a generalization of the iteratively (re)weighted least squares method. Let us denote the observations processing by (y1, x1), …, (yn, xn), where yi = 0 or 1 and xi is the vector of p covariates with the components: ( xi1 , xi2 ,..., xi p ), i A backfitting estimator. This module provides estimates of the functions gj in the nonparametric model: n p i 1 k 1 RSS : ( yi ˆ gˆ k ( xik )) 2 until | RSS | 1 The backfitting algorithm replaces weighted multiple linear regression and uses repeatedly the scatterplot smoothers. An LSA module. This component provides estimates of the gj’s in the model (4). The main steps of the local Scoring Algorithm for the Logistic model are [4]: 1. Start with gˆ (j0 ) ( x) 0 for all x and all j, ˆ (0 ) logit( y ), m : 0; 2. Repeat steps 2.1, 2.2 and 2.3 2.1. for i:=1 to n compute: p vˆ ( m ) ( xi ) : ˆ ( m ) gˆ (jm) ( xi j ) ; j 1 pˆ i : logit 1 (vˆ ( m) ( xi )) ; z i : vˆ ( m) ( xi ) ( y i pˆ i ) /[ pˆ i (1 pˆ i )] wi : pˆ i (1 pˆ i ) . ( m1) ( m1) , gˆ j 2.2. Obtain ˆ , j 1, ..., p by applying the backfitting algorithm to zi with covariates x1, x2, … ,xp and observating weights wi to value: min E{w( m) ( x)[v ( m) ( x) ,g y p ( m ) ( x) p ( m) ( x)[1 p ( m) ( x)] g j ( x j )]} 2.3. Compute the deviance: D( y, pˆ ) : 2 [ yi log pˆ i (1 yi ) log(1 pˆ i )] The baseline hazard function can be also a polynomial approximation or a piecewise polynomial approximation. until | D( y, p) | 2 Let H c (t | z ) hc (u | z )du and t1, t2, … as above. n i 1 t 0 In the above, the numbers 1 and 2 are small positive values selected to be used in the convergence test. The convergence of the local scoring algorithm was motivated by Hastre and Tibshirani. Finally, it is easy to obtain the characteristics j, j1 to be used in the computing of the reliability function given by (3). 3.2. Estimating the hazard function hc Follmann, in [1], estimates the hazard function hc using a Weibull regression model. However, other procedures can be used. We selected to implement, in software, the extended hazard regression model [6]. Let h0(.) be the baseline hazard function, = (1, 2, 1, 2) be the vector of unknown parameters, and u1(.), u2(.), v1(.), v2(.) be the known monotone functions equal one when their arguments are zero. The extended hazard regression model is given by: hc (t | z ) u1 ( 1T z )v1 ( 1T z )[u1 ( 1T z )t ]v1 ( 1 z 1) ). T . h0 ([u 2 ( 2T z )t ] v2 ( 2 z ) ) T Assumming u1(.)=u2( ), v1(.) = v2(.) and h0 ( x) 1 (k ) x k 1e x 1 I ( k ; x) with x I (k ; x) Γ 1 (k ) t k 1 e t dt , 0 the above model corresponds to the hazard function of a random variable with a generalized gamma distribution with three parameters, two of them depending on covariates z. For v1(.) = v2(.)=k=1, the exponential model is obtained. When k=1, it is obtained a Weibull model. For h0(x)=1/(1+x), we obtain the log-logistic distribution with two parameters depending on covariates. Each ti has associated a covariate vector zi and an indicator variable defined by i=1 if at time ti, the system is in failure, or i=0 otherwise. The log-likelihood function is: n log L i (log[ hc (t i | z i ) H c (t i | z i )]. i 1 Then the vector of unknown parameters * = (*1, *2, *1, *2), the maximum likelihood estimate of is obtained by solving the system of nonlinear equations Log L/ = 0. Classical Newton and the recent Quasi-Gauss-Newton methods [5] are implemented and can be selected for numerical computing. 4. Model validation and conclusions The hazard function of the time until failure is composed by two parts: h(t) = hc(t) while the machine is on and h(t) = p(t) at the end of an off period, where t is the time index. Model validation is very important in order to use this approach in prediction. For the continuous model, the bootstrap method, a computer intensive one, is used in the estimation process. [6, 7]. Finally the best (using the RSS criterion) vector of parameters is selected and used in (3). The preliminary experiments based on the above approach are encouraged. Further additional work is necessary in the code optimization. Many machines are used intermittently. To model the failure process for such machines is necessary to model differently the on-period and the off-period. The above approach proposes some general models which can be used to compute the survivor function. The specialized software under development implements the proposed approach and will be used in the prediction process for different intermittently used machines. References 1. Follmann, D.A.: Modeling failures of intermittently used machines. In: Applied Statistics, No. 39, 1990, p. 115-123. 2. Kalbfleisch, E.D., Prentice, R.L.: The Statistical Analysis of Failure Time Data. Wiley, New York.., 1980. 3. Albeanu, G.: On the piecewise trigonometric nonliner regression models. In: Stud. Cerc. Mat., No. 50, 1998, p. 1-4. 3 4. Hastie, T., Tibshirani, R.: Non–parametric, logistic and proportional odds regression. In: Applied Statistics, No. 36, 1987, p. 260-276. 5. Albeanu, G.: On the convergence of the Quasi-GaussNewton methods for solving nonlinear systems. In: Intern. J. Computer Math., No. 66, 1998, p. 93-99. 4 6. Albeanu, G.: Resampling Simultaneous Confidence Bands for Nonlinear Explicit Regression Models. In: Stud. Cerc. Mat., No. 50, 1998, p.289-295. 7. Albeanu, G., Popentiu, Fl.: On the nonlinear reliability models analysis. In: Proceedings of Safety and Reliability National Conference KONBIN’99, Sept. 22-25, 1999, Zacopane-Koscielisco, Poland, Vol. 1, p. 7-17.