A Regression Approach in the Reliability of Intermittently Used

advertisement
A REGRESSION APPROACH IN THE RELIABILITY OF
INTERMITTENTLY USED MACHINES
Fl. Popentiu VLADICESCU
City University, Department of Electrical, Electronic & Information Engineering
Northampton Square, London EC1V, OHB, UK, E-mail: Fl.Popentiu@city.ac.uk
Grigore ALBEANU
Bucharest University, Faculty of Mathematics
Academiei 14, RO-70109, Bucharest, Romania, E-mail: albeanu@math.math.unibuc.ro
Elena UNGUREANU GRIGORE
Spiru Haret University, Faculty of Mathematics and Informatics
Bucharest, Romania, E-mail: grigore@timken.com
Abstract: A methodology based on computational statistic
and simulation for analyzing failures of intermittently used
machines is presented. An additive non-parametric logistic
regression is used to compute the conditional probability
of failure.
Key words: intermittently used machines, non-parametric
logistic model, estimator, algorithm
1. Introduction
Many machines are used intermittently. For such a
situation a failure can occur either when the machine
is on or off. Because the stress received by a system
while on is different from the stress received while
off, a good approach is to use two models to describe
the failures, one for the on-case, when if a failure
appears, the precise time of failure is known, and the
other for the off-case, when the failure is detected
only when turning on the system.
This paper continuous the Follmann [1] approach
using an an additive non-parametric logistic model in
order to described the off period. Only repairable
systems are considered.
2. Notations and terminology
Let T be the time to failure of a system, which is a
random variable whose characteristics may be
interpreted as components of a global reliability
model. If F is the cumulative distribution function
(cdf) of T:
F(t) = P(T < t ),
0<t<  ,
(1)
the reliability function, called also the survivor
function, is R(t) such that
R(t) = P(T  t), 0<t<  .
(2)
If T is (absolutely) continuous then the probability
density function (pdf) is given by f(t)=dF(t)/dt=dR(t)/dt, and the instantaneous rate of failure at T=t
conditional upon survival to time t -called the hazard
function or simply, the failure rate - is given by:
h(t)=f(t)/R(t)= - dLog R(t)/dt. For R(0)=1 (which is
a normal assumption), we get
t
R(t)=exp(- 0 h(u ) du )
and
t
f(t) = h(t) exp(- 0 h(u ) du ).
In the discrete case, when
T: x1, x2, …
and
f(xi) = P(T=xi), i=1, 2, …,
the reliability function is
R(t) =  f ( x j ) , or if j:= f(xj)/R(xj)
j| x j t
then R(t) =
 (1   ) and
j
j| x j  t
j 1
f(xj) =  j  (1  i )
i 1
In the framework of this paper, T has both discrete
and continuous components. Let 0 = t1 < t2 < …
denote the times at which a machine is switched on
1
or off. We assume that only on periods follow when
time 0 (t1=0). When the index j is odd, tj denotes the
start of an on periods (or the end of an off periods).
If hc denotes the hazard function for the on-periods,
and j is the conditional probability of failure
detected at t2j+1 then the overall reliability function is
given by [2]:

p
E ( y | x)     g j ( x j ), E[g j ( x j )]  0 ,
j 1
for all j, by the backfitting algorithm [4]:
t
R(t)  exp(- h(u)du)  (1   j ).
0
1. start with gˆ j ( x)  0 for all x and all j , ̂  y .
2. repeat the steps 2.1 and 2.2
2.1. for j:=1, 2, …, p do
a) for i:=1 to n do
p
ri : yi  ˆ   gˆ k ( xi ) ;
(3)
j|t2j 1 t
We assume that the continuous hazard hc is
undefined during off periods. In the next section we
present procedures to estimate the hazard function hc
and the characteristics j , j = 1, 2, … .
b) for i:=1 to n do
gˆ j ( xi j )  S (r | xij )
2.2. compute
3.1.1.
=1,2,…,n. The following modules are necessary to
obtain  and the functions gj:
 A smoother (smoothing spline, kernel smoother
etc). Let S(y|xi) be the value of the function
obtained by smoothing the scatter plot (x, y) at
the point xi. In this step, scatterplot smoothing
replaces simple least squares regression.
2
k
k j
3. On estimating the hazard function
The estimation of the j by additive nonparametric logistic regression
Let y be a binary variable (y=1 if a failure is present,
else 0), and x a vector of p explanatory variables,
(which describe the machine history: time since
repair, the number of previous failures, the age of the
machine, the number of on-off cycles, time since the
last maintenance action etc).
The model for P(x) = E(y |x) = P(y=1 |x) is of the
form:
p
P( x)
logit[P(x)] = log
=   g j (x j ) ,
(4)
1  P( x)
j 1
E[gj(xj)]=0 for all j,
where the gj are functions to be determined.
If gj(xj) =jxj the model used in [1] is obtained.
However, the functions gj can have different forms:
trigonometric functions [3], spline functions etc.
Our approach uses the local scoring method,
developed by Hastie and Tibshirani in [4], in order to
estimate the functions gj. The local scoring algorithm
is a generalization of the iteratively (re)weighted
least squares method.
Let us denote the observations processing by (y1, x1),
…, (yn, xn), where yi = 0 or 1 and xi is the vector of p
covariates with the components: ( xi1 , xi2 ,..., xi p ), i
A backfitting estimator. This module provides
estimates of the functions gj in the nonparametric model:
n
p
i 1
k 1
RSS :  ( yi  ˆ   gˆ k ( xik )) 2
until | RSS |  1
The backfitting algorithm replaces weighted multiple
linear regression and uses repeatedly the scatterplot
smoothers.
 An LSA module. This component provides
estimates of the gj’s in the model (4). The main
steps of the local Scoring Algorithm for the
Logistic model are [4]:
1. Start with gˆ (j0 ) ( x)  0 for all x and all j,
ˆ (0 )  logit( y ), m : 0;
2. Repeat steps 2.1, 2.2 and 2.3
2.1. for i:=1 to n compute:
p
vˆ ( m ) ( xi ) : ˆ ( m )   gˆ (jm) ( xi j ) ;
j 1
pˆ i : logit
1
(vˆ ( m) ( xi )) ;
z i : vˆ ( m) ( xi )  ( y i  pˆ i ) /[ pˆ i (1  pˆ i )]
wi : pˆ i (1  pˆ i ) .
( m1)
( m1)
, gˆ j
2.2. Obtain ˆ
, j  1, ..., p
by applying the backfitting algorithm to zi
with covariates x1, x2, … ,xp and
observating weights wi to value:
min E{w( m) ( x)[v ( m) ( x) 
,g
y  p ( m ) ( x)
p
( m)
( x)[1  p ( m) ( x)]
    g j ( x j )]}
2.3. Compute the deviance:
D( y, pˆ ) : 2 [ yi log pˆ i  (1  yi ) log(1  pˆ i )]
The baseline hazard function can be also a
polynomial approximation or a piecewise polynomial
approximation.

until | D( y, p) |  2
Let H c (t | z )   hc (u | z )du and t1, t2, … as above.
n
i 1
t
0
In the above, the numbers 1 and  2 are small
positive values selected to be used in the
convergence test. The convergence of the local
scoring algorithm was motivated by Hastre and
Tibshirani.
Finally, it is easy to obtain the characteristics j, j1
to be used in the computing of the reliability function
given by (3).
3.2. Estimating the hazard function hc
Follmann, in [1], estimates the hazard function hc
using a Weibull regression model. However, other
procedures can be used. We selected to implement, in
software, the extended hazard regression model [6].
Let h0(.) be the baseline hazard function,  = (1, 2,
1, 2) be the vector of unknown parameters, and
u1(.), u2(.), v1(.), v2(.) be the known monotone
functions equal one when their arguments are zero.
The extended hazard regression model is given by:
hc (t | z )  u1 ( 1T z )v1 (  1T z )[u1 ( 1T z )t ]v1 ( 1 z 1) ).
T
. h0 ([u 2 ( 2T z )t ] v2 (  2 z ) )
T
Assumming
u1(.)=u2( ), v1(.) = v2(.)
and
h0 ( x) 
 1 (k ) x k 1e  x
1  I ( k ; x)
with
x
I (k ; x)  Γ 1 (k )  t k 1 e t dt ,
0
the above model corresponds to the hazard function
of a random variable with a generalized gamma
distribution with three parameters, two of them
depending on covariates z. For v1(.) = v2(.)=k=1, the
exponential model is obtained. When k=1, it is
obtained a Weibull model. For h0(x)=1/(1+x), we
obtain the log-logistic distribution with two
parameters depending on covariates.
Each ti has associated a covariate vector zi and an
indicator variable defined by i=1 if at time ti, the
system is in failure, or i=0 otherwise.
The log-likelihood function is:
n
log L    i (log[ hc (t i | z i )  H c (t i | z i )].
i 1
Then the vector of unknown parameters * = (*1,
*2, *1, *2), the maximum likelihood estimate of 
is obtained by solving the system of nonlinear
equations Log L/ = 0. Classical Newton and the
recent Quasi-Gauss-Newton methods [5] are implemented and can be selected for numerical computing.
4. Model validation and conclusions
The hazard function of the time until failure is
composed by two parts: h(t) = hc(t) while the
machine is on and h(t) = p(t) at the end of an off
period, where t is the time index.
Model validation is very important in order to use
this approach in prediction. For the continuous
model, the bootstrap method, a computer intensive
one, is used in the estimation process. [6, 7]. Finally
the best (using the RSS criterion) vector of
parameters is selected and used in (3).
The preliminary experiments based on the above
approach are encouraged. Further additional work is
necessary in the code optimization.
Many machines are used intermittently. To model the
failure process for such machines is necessary to
model differently the on-period and the off-period.
The above approach proposes some general models
which can be used to compute the survivor function.
The specialized software under development
implements the proposed approach and will be used
in the prediction process for different intermittently
used machines.
References
1. Follmann, D.A.: Modeling failures of intermittently
used machines. In: Applied Statistics, No. 39, 1990, p.
115-123.
2. Kalbfleisch, E.D., Prentice, R.L.: The Statistical
Analysis of Failure Time Data. Wiley, New York.., 1980.
3. Albeanu, G.: On the piecewise trigonometric nonliner
regression models. In: Stud. Cerc. Mat., No. 50, 1998,
p. 1-4.
3
4.
Hastie, T., Tibshirani, R.: Non–parametric, logistic
and proportional odds regression. In: Applied
Statistics, No. 36, 1987, p. 260-276.
5. Albeanu, G.: On the convergence of the Quasi-GaussNewton methods for solving nonlinear systems. In:
Intern. J. Computer Math., No. 66, 1998, p. 93-99.
4
6.
Albeanu, G.: Resampling Simultaneous Confidence
Bands for Nonlinear Explicit Regression Models. In:
Stud. Cerc. Mat., No. 50, 1998, p.289-295.
7. Albeanu, G., Popentiu, Fl.: On the nonlinear
reliability models analysis. In: Proceedings of Safety
and Reliability National Conference KONBIN’99,
Sept. 22-25, 1999, Zacopane-Koscielisco, Poland,
Vol. 1, p. 7-17.
Download