58 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 24, NO. 1, FEBRUARY 2009 Using State Diagrams for Modeling Maintenance of Deteriorating Systems Thomas M. Welte Abstract—This paper discusses the use of state diagrams in maintenance modeling. These diagrams frequently illustrate deterioration, inspections and maintenance. Mathematically, the state diagram can be represented by a Markov process. The paper discusses the properties of such a Markov process. They are compared with the maintenance situation in the real world. It is shown that some properties make the model inconsistent with reality especially in cases where a maintenance policy with nonperiodic inspections is modeled. A numerical example is provided that shows that these model properties result in modeling errors. The presented results make it clear that the common practice of using Markov processes based on state diagrams must be judged critically when they are used for modeling certain maintenance strategies. Index Terms—Deterioration, inspection, maintenance, Markov processes, Monte Carlo methods, state diagrams. I. INTRODUCTION M OST technical systems are subject to deterioration as a result of usage and age. Thus, most major technical systems such as power plants, production systems, civil infrastructure, ships or planes are maintained according to a preventive maintenance policy to avoid failure. These policies usually include corrective maintenance after a system failure, regular inspections to reveal the system condition and preventive maintenance to improve the system condition if the system has serious signs of deterioration. Scheduling and optimization of the maintenance require mathematical models to quantify the impact of maintenance on the lifetime and reliability of technical systems. Some authors have proposed that deterioration, inspections and maintenance can be illustrated by a state diagram [1], [2]. Mathematically, the state diagram can be represented by a Markov process, which may be solved by standard Markov methods [1]–[4]. This paper critically discusses this practice when it is applied to the modeling of maintenance of deteriorating systems. The advantage of using state diagrams is that they provide a simple, graphical illustration of the maintenance strategy. Furthermore, they can directly be used as basis for the mathematical model, i.e., the Markov process. Thus, the Manuscript received April 15, 2008; revised July 17, 2008. First published December 09, 2008; current version published January 21, 2009. This work was supported in part by the Norwegian Electricity Industry Association (EBL) and in part by General Electric Energy, Norway. Paper no. TPWRS-00280-2008. The author is with SINTEF Energy Research, Department of Energy Systems, Trondheim, Norway, and also with the Department of Production and Quality Engineering, Norwegian University of Science and Technology, Trondheim, Norway (e-mail: thomas.welte@sintef.no). Digital Object Identifier 10.1109/TPWRS.2008.2005711 diagrams provide an easy and straightforward tool that can be used to build the mathematical model. The focus in this paper is on maintenance policies with statedependent inspection frequencies (also called nonperiodic inspections). The models were originally introduced for analyzing maintenance strategies with periodic inspections [5]–[7]. In this case, they are useful because valid results can be obtained. Later on, however, they have been generalized to nonperiodic inspections [1], [3], [4], [8] without analyzing and reflecting the consequence of this generalization. This paper fills this gap and it is therefore a contribution towards a better understanding of the models, their properties and their influence on the modeling results. This paper discusses properties of Markov processes, which are based on state diagrams where inspections and maintenance are directly incorporated into the diagrams. It is analyzed whether these properties are realistic or not. A numerical example investigates to which extent the model properties influences numerical results, for example, when the models are used for calculating reliability measures, such as failure rates, state durations, mean time between failures or mean time to first failure. The remainder of this paper is organized as follows: Section II provides a short overview of the use of state diagrams in modeling the maintenance of deteriorating systems. The maintenance situation in the real world is described in Section III. Model properties and different concepts to realize a maintenance model mathematically are discussed in Section IV. A numerical example is presented in Section V. Finally, the paper is summarized and conclusions are drawn in Section VI. II. STATE DIAGRAMS IN MAINTENANCE MODELING In many applications, failures can be divided into two categories: Random failures and those arising as a consequence of deterioration (ageing) [2]. In the latter case, the deterioration process can be represented by a sequence of stages of increasing wear, finally leading to equipment failure. A state diagram representing a simple failure-repair process for this case, is shown in Fig. 1(a). If no maintenance is carried out, a new system will , and will sooner run through all stages of deterioration, or later reach the fault state, denoted F. In a simple case, the system will be replaced or repaired to a state of “as good as new,” that is, the system is restored to state after failure. In practice, many technical systems are maintained regularly to avoid failures and to intervene if the technical condition becomes critical. Therefore, maintenance actions that improve the system condition are either carried out according to a predefined schedule, or the system is inspected regularly to decide if and what kind of 0885-8950/$25.00 © 2008 IEEE Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. WELTE: USING STATE DIAGRAMS FOR MODELING MAINTENANCE OF DETERIORATING SYSTEMS 59 Fig. 2. Basic principle for many maintenance models. 0 Fig. 1. State diagrams for deteriorating systems (adapted from [2]). S S : stages of deterioration, F: fault state, M M : maintenance states. (a) Simple failure-repair process. (b) Deteriorating system including maintenance. 0 maintenance is done. According to [2], maintenance can easily ; see the exbe added by introducing additional states, ample in Fig. 1(b). The figure shows one example where it is assumed that maintenance will bring on average an improvement of the system condition to the previous stage of deterioration. It is a matter of common knowledge that the state diagram turns into a Markov process if the state transitions occur with a constant rate and if the future development of the system is only dependent on the current state. This means that the time of transition to a following state is modeled by an exponential distribution and the future process is independent of anything that happened in the past. If the transition times are modeled by a general probability distribution, the resulting model is called a semi-Markov model [9]. The former is relatively easy to solve and there are standard methods that can be used to calculate performance measures such as state probabilities, visit frequencies, mean durations and mean time between failures [5], [9]–[11]. The solution of the latter requires more sophisticated mathematical techniques (see for example [11]). Monte Carlo simulations or numerical methods are sometimes applied for computing reliability measures when the analytical solution is hard to derive. This paper discusses models that have the following basic principle (see Fig. 2): A sequence of deterioration states is followed by the fault state and from one or several deterioration states and/or from the fault state there are transitions to additional states representing inspections, maintenance, decisions, waiting periods etc. From there, the system returns to one of the deterioration states or to the fault state (e.g., if the maintenance action could result in a system failure). In the following, this way of formulating the maintenance model is denoted a “classical” state diagram. Maintenance models that have this classical structure are presented in [1], [3], [4], [6]–[8], and [12]–[15]. Most of the examples found in the literature analyze a strategy where inspections and maintenance are performed with a constant rate, which is independent from the condition of in Fig. 2 are equal the system. This means that the rates and have the same value [6], [7], [12], [14], [15]. In some other applications [1], [3], [4], [8], however, state diagrams are used to model maintenance policies with nonperiodic inspections where the inspection frequency is increased with increasing Fig. 3. Example of a classical state diagram. deterioration. The inspection frequency usually depends on the stage of deterioration of the system. This is a reasonable assumption because it is often common practice to inspect technical systems more frequently if it is known that the system has deteriorated. The objective of this strategy is increasing the probability of detecting a critical situation at the end of the life of the system, and replacing or repairing the system before it fails. The time intervals between inspections and maintenance are often modeled by an exponential distribution. As already discussed in [1] and [16], this assumption is not always realistic and there are techniques to consider non-exponential distributions; see, e.g., [6] and [16]. Nevertheless, exponentially distributed inspection intervals are used in this paper, as it is frequently practised in the literature [1], [3], [4], [6]–[8], [12]–[15]. A. State Diagram Used as Maintenance Model—An Example A typical example of a classical state diagram is shown in Fig. 3. The maintenance strategy visualized by this state diagram is used as a basis for discussions throughout this paper. The deterioration process is represented by three discrete stages, . If no maintenance is carried out, the last deterioration stage is followed by the fault state F. It is assumed that after failure, the system is replaced or repaired to the state . It is well-known that this assumption can easily be relaxed [1]. In order to extend the equipment lifetime it is obvious that maintenance is carried out according to a predefined strategy. Inspec) are performed, which retions (represented by the states sult in the decision to • do nothing, if the system is still in state ; , if the system • carry out a maintenance action, denoted is in state . This will improve the system condition by one stage; , if the system • carry out a maintenance action, denoted is in state . This will also improve the system condition by one stage. Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. 60 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 24, NO. 1, FEBRUARY 2009 III. MAINTENANCE SITUATION IN THE REAL WORLD A maintenance model that mathematically describes the policy in Section II-A can be classified into the category of “inspection models.” Valdez-Flores and Feldman [17] define an inspection model as follows: “The state of the system is completely unknown unless inspection is performed. In the absence of repair or replacement actions, the system evolves as a non-decreasing stochastic process. In general, at every decision epoch there are two decisions that have to be made. One decision is to determine what maintenance action to take, whether the system should be replaced or repaired to a certain state or whether the system should be left as it is. The other decision is to determine when the next inspection should be performed.” According to their definition, decisions about inspection frequencies and maintenance actions are based on the knowledge about the deterioration stage of the equipment. This is in agreement with our logical understanding of the situation in the real world. There often exist predefined rules that recommend a special inspection frequency that is dependent on the stage of deterioration. Such a recommendation could be, for example, that if the system is in state , inspections should be carried out every second year. If the system is in stages or , inspections should be performed yearly. The described strategy implies that the currently used inspection frequency is adjusted as soon as new knowledge about the system is available. New knowledge about the system is normally provided by an inspection, a maintenance action or a failure, that is, after an event that provides new information about the system condition. In the classical state diagrams as presented in the literature, there is usually a direct connection between the deterioration states and the maintenance and inspection states. This connecto (confer tion is illustrated by the arrows pointing from Fig. 3). This connection is a poor model property and results in errors, as shown later in this paper. Instead there should be a clear separation of the deterioration process and the maintenance strategy (inspection and maintenance “process”) in a model, because these two “processes” are independent of each other, apart from when inspections and maintenance are carried out or when failures occur. Furthermore, it is only at these points in time that one can gather information about the system condition. The discussed properties of the real maintenance situation may be illustrated as in Fig. 4, where system deterioration, and the maintenance and inspection strategy are illustrated as two parallel “processes”; symbolized by the solid arrows. They are only connected to each other at the points in time when inspections (I) are carried out or failures (F) occur. Then, decisions (D) about maintenance actions (M) and the length of the next are made. inspection interval A. Alternative Illustration of the Maintenance Strategy As an alternative to the classical state diagram, deterioration, inspections and maintenance may be illustrated as in Fig. 5. The graph in Fig. 5 incorporates the considerations discussed in the to the fault state previous section. Deterioration from state Fig. 4. System deterioration and maintenance/inspection strategy illustrated as two parallel “processes.” D: decision, F: failure, I: inspection, M: maintenance, : inspection interval. Fig. 5. Alternative illustration of deterioration, inspections, and maintenance. D: decision, F: fault state, I: inspection, S: stages of deterioration, eoi: end of inspection, eom: end of maintenance, : inspection frequency, 1= : mean inspection duration, : deterioration rate, 1=: mean maintenance duration. F is represented by a chain of states. This is similar to the classical state diagrams. System deterioration can be modeled, for example, by a stochastic process such as the Markov process. In contrast to a classical state diagram, other ways of graphical representations are used to add inspections and maintenance. There is only one inspection state (I) in the graph. The represents that the dash-dotted rectangle around states system is inspected without knowledge about the current dete, depends rioration state. The actual inspection frequency, on a decision (D) that either has been made at the end of the last inspection (eoi: end of inspection) or at the end of the last maintenance action (eom: end of maintenance). The inspection or . It is not until the induration is represented by , spection time is elapsed that the inspection result is available. Now, a decision can be made whether to perform maintenance , where represents or not. This is illustrated by the nodes the detected system state. The inspection result and the decision depends on the system state . This dependency is illustrated by , which is similar to the the dotted arrows between and illustration in Fig. 4 where the dotted arrows connect the deterioration process with the maintenance and inspection “process.” Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. WELTE: USING STATE DIAGRAMS FOR MODELING MAINTENANCE OF DETERIORATING SYSTEMS At the end of the maintenance action, or if no maintenance is carried out directly at the end of the inspection, a second decision about the next inspection time is made. These decisions are also based on the knowledge about the system condition. Note that the maintenance action may have changed the system condition and the next inspection is scheduled dependent on the system state after maintenance. After the inspection and the maintenance action is finished, the system is immediately put into operation. This is illustrated by the dashed arrows to . Failures are assumed to be self-announcing from and a corrective maintenance action will bring the system back to state . IV. DISCUSSION This section discusses the realism of state diagrams in maintenance modeling. It is shown that in principle there are two concepts to realize the model mathematically: the redrawing concept (RD concept) and the no-redrawing concept (NRD concept). It is shown that the concepts lead to different results. It is argued that the NRD concept is in perfect agreement with our logical understanding of reality, whereas the RD concept has some discrepancies with the maintenance situation in the real world. A Markov process that is built on classical state diagrams is a smart mathematical solution for the RD concept. It follows from this that this kind of model is a poor representation of reality. The meaning of the last inspection interval and the computation of visit frequencies and state durations is also discussed in this section. A. Mathematical or Numerical Model Realization The state diagrams in Figs. 3 and 5 are only a graphical representation of deterioration, inspections and maintenance, but not an executable model. It is not until a mathematical or numerical model is built on the diagrams that an executable model is obtained. As mentioned before, the classical state diagram directly represents a Markov process if all transitions are exponentially distributed. The graph in Fig. 5 cannot directly be used as basis for a mathematical model. Nevertheless, it may help to illustrate the real situation and it makes clear how we must think when an inspection model is realized mathematically. At this point, we want to assume that Monte Carlo simulations are used to realize the model. There are in principle two different concepts to carry out these Monte Carlo simulations. • Redrawing (RD) Both the next time of system deterioration and the next inspection time are redrawn each time when there is a transition from to and from to , respectively. • No-redrawing (NRD) The next time of system deterioration is only drawn when there is a (physical) state change due to deterioration or maintenance. Inspection times are only drawn when there are decisions after an inspection or after a maintenance action. The two ways of carrying out Monte Carlo simulations can be considered as fundamentally different concepts to realize the mathematical model and to compute results. The RD concept is illustrated by a classical state diagram as shown in Fig. 3, 61 whereas the graph in Fig. 5 is an attempt to illustrate the NRD concept. In order to clarify the difference between these two concepts, a deteriorating system is considered that is maintained according to the policy described in Section II-A. Assume that at time . If the system is “as good as new” in state Monte Carlo methods are used to compute a realization of the model, one would start the simulation by computing an exponentially distributed random number with rate . This number is a realization of the sojourn time in state , denoted . In addition, the next inspection interval, , could be computed as an exponentially distributed random number with rate . to ) is at Thus, the time of deterioration (transition from , and the next inspection (transition time to ) is at time . from Assume that there is an inspection before the system dete. This inspection would reriorates to , that is, veal that the real system is in deterioration stage . According to the predefined strategy, no maintenance action is performed and the system is returned into operation. To simplify matters, the inspection duration is assumed to be short compared to the operating times. Thus, the inspection periods can be neglected approximately at time . and the system returns to If the RD concept is applied, the restart of system operation means that the system once again reaches state . Then, one would again draw a random sojourn time, denoted , and a random inspection interval, denoted . This means that the new transition time from to is given by and the . next inspection time is at If the NRD concept is applied, the restart of system operation means that the next inspection interval, , is computed as an exponentially distributed random number, whereas the sojourn is not redrawn. In this case, the new transition time time in to is still the previous transition time, that is, from . The next inspection time is . As it can easily be seen, the difference in the concepts is given by the different handling of the next transition times. However, only one concept is correct for a given real world system. In Section IV-B, the concepts are compared with the maintenance situation in the real world. It is shown that the RD concept violates our logical understanding of the reality, whereas the NRD concept realizes a situation that is in good agreement with our understanding of reality. In the considerations above, only the case for was described. A similar situation occurs for the case where , that is, for the case when a system transition to the next deterioration state occurs before the next inspection is performed. In this case, the next inspection times are handled differently. If the concept of redrawing is used a new inspection is drawn. Thus, the next inspection interval, , with rate , whereas for the NRD concept the next is at inspection time remains on the previous inspection time, i.e., . B. Comparison With the Real Maintenance Situation Let us analyze the maintenance situation in the real world by and . again considering the two cases In the former case, the time of system deterioration does not Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. 62 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 24, NO. 1, FEBRUARY 2009 change (no redrawing) in the real world when inspections are carried out, because the system does not know that it was inspected. This means that an inspection, followed by the decision to do nothing, has no influence on the system state. Solely a maintenance action would influence/change the system state and hence the next deterioration time. In the latter case, there is no changing (redrawing) of the next inspection time in the real to . The inspection world if the system deteriorates from time is a decision of the operator and he does not know when the state transition (deterioration) occurs. Thus, the concept of redrawing provides an unrealistic solution, because the inspections have an influence on the physical deterioration process and system deterioration influences the next time of inspection. In other words, if the RD concept is used, the scheduling of the inspections is triggered by a state transition, whereas in a real world situation, the next inspection time is a decision, which is normally triggered by information gathered through inspections or during maintenance. Thus, if the state diagram is solved with techniques applying the RD concept, this is an obvious violation of our logical understanding of the situation in the real world. The use of classical state diagrams can entice the analyst to use the RD concept. The graphical representation in the diagrams suggests that the system jumps from state to state and that the next event is given by the possible transitions from the actual state to the connected states, whichever transition happens first. The example in Section V shows that the use of the RD concept can result in modeling errors. The NRD concept, however, exactly reflects the real maintenance situation. Thus, an inspection model should be solved by mathematical methods and numerical procedures that realize the NRD concept. C. Markov Processes Based on Classical State Diagrams A Markov process that is built on the classical state diagram is a smart mathematical solution of the RD concept. This can easily be proven when the Markov model is “solved” by Monte Carlo simulation based on the RD concept instead of standard Markov methods. In the previous section, it has been shown that the RD concept violates our logical understanding of the reality. The logical consequence is that a Markov process based on a classical state diagram is a maintenance model with the same unrealistic properties as the RD concept. In the case of nonperiodic inspections, the inspection frequencies ( in Fig. 3) increase with the deterioration state. “Standard Markov methods” have been proposed and used for inspection models with a nonperiodic inspection strategy [1], to [3], [4], [8]. As soon as there is a state transition from in the model, there will also be a changing of the inspec. A consequence of this is that the tion frequency from to residual time to the next inspection is no longer redrawn as exponential distribution with the same rate as the previously drawn inspection time, but with an increased rate. This leads to erroneous results as shown in Section V. The Markov model provides a good, simple and correct solution, if all transitions are exponentially distributed and if all inspection rates are equal (nonperiodic inspections). In this case, the time to next inspection and the time of system deterioration can be redrawn each time when there is an arrival in . The redrawn times can be interpreted as the residual time to system deterioration and inspection, respectively. If all transitions have constant rates, we can utilize the memoryless property of the exponential distribution. The (redrawn) residual time to the next state transition is exponentially distributed with the same transition rate as the previously drawn transition time. This means that it does not matter how long the system stayed in one state and how often we redraw transition times. From this follows that the RD concept and the NRD concept are equivalent. A numerical example for this case is given in Section V-B. Note that the RD concept cannot be applied, if the transitions have a general (non-exponential) distribution because the memoryless property is not valid anymore. This understanding is important when the model is realized by Monte Carlo simulation. Since the graph in Fig. 5 and the NRD concept cannot be mathematically realized by standard Markov methods, it may be realized by Monte Carlo methods as described above, or by other numerical procedures, for example, as suggested in [18] and [19]. For some special maintenance strategies, e.g., periodic inspections, the model may be solved analytically by using renewal theory [11], [20], [21]. When deterioration can be measured by a continuous quantity, stochastic processes such as the gamma process or the Brownian motion may alternatively be applied in maintenance modeling, see, e.g., [22]–[25]. Regardless of the applied concept, when Monte Carlo simulation is used to solve a model, an assumption has to be made according to the proper probability distributions. In general, more flexible distributions (e.g., two-parametric distributions such as the Weibull distribution or the gamma distribution) usually perform better than the exponential distribution, which has limitations due to the constant transition rate. The best way to find a proper probability distribution is the collection of data and to fit a probability distribution to the data set. However, collection of data is not practical in many cases. This may require a long time or the system under consideration is unique which means that no useful data set can be obtained. Furthermore, the exact sojourn times are difficult to observe or it would require continuous monitoring to get good observations of the sojourn times. Some of these topics are discussed in more detail in [26] and [27]. Knowledge about the deterioration process may help to define a suitable probability distribution for the sojourn times. For example, if deterioration is caused by a series of randomly occurring shocks, the gamma distribution may be a good model. If deterioration is caused by several “competing” deterioration mechanisms, the Weibull distribution might be a good choice (weakest link theory). For the repair time and the inspection intervals it is usually easier to collect data than for the sojourn times. The lognormal distribution is considered to be a suitable distribution for repair times [9]. The inspection intervals are rather deterministic than exponentially distributed. If the inspection intervals are not modeled as deterministic numbers, the Weibull or gamma distribution with comparably low variance might be a proper choice. D. Meaning of the Last Inspection Rate Consider a situation where it is known that the system is in deterioration stage . Thus the operator will schedule the next inspection approximately after a time period . We assume Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. WELTE: USING STATE DIAGRAMS FOR MODELING MAINTENANCE OF DETERIORATING SYSTEMS now, that the system will deteriorate one stage to and that the next planned inspection will reveal the deterioration state. According to the predefined strategy, the maintenance action will be carried out and the system condition will be improved to . The next inspection will be scheduled after approximately time units. The state diagram in Fig. 3 shows an inspection rate denoted , which is situated between and . In the real maintenance situation, does not exist. However, we need this rate to build and a our state diagram with an inspection that reveals state following maintenance action that improves the system by one stage. This means that there is again a mismatch between the classical state diagram and the maintenance situation in the real world. The proposed alternative in Fig. 5 gives a better representation of the real situation because it does not require . It is certain that there are situations where an inspection reand afterwards the decision veals that the system is in state is taken to do nothing more than increasing the inspection frehas a practical meaning. However, the quency to . Then, considered maintenance strategy does not allow for this case. Even though the opportunity could be included in the model, this is obviously not a standard case because there is a high risk to get equipment failure in the next time period. Thus, this would only be done in rare situations where the equipment is indispensable for a short time period. Another case where maintenance can is when the maintenance action return the system back to fails to perform its intended purpose. In this case, one does usually not know that the maintenance action had failed. One would continue to operate the system under the assumption that maintenance was successful until a future inspection revealed that the maintenance was not successful or until failure occurred. E. Visit Frequencies Reliability measures that are commonly calculated with Markov processes are the visit frequency and the mean duration of states. In a Markov process, the visit frequency is the frequency of arrivals in state and departures from state , respectively. If we consider, for example, visits in the first deterioration state , then, each transition from to and each to is counted as one departure from in transition from the Markov process. In reality, however, the system does not when there is a transition to . According really leave state to the predefined maintenance policy no maintenance action is carried out and the system remains in . This means that calculated with standard Markov the visit frequency in state including methods is the frequency of all departures from the imaginary departures from to . This frequency has no practical meaning and only exists in the Markov process model. It is not the frequency of visits we are really interested in when we calculate reliability measures. The same argumentation applies for the mean duration of the deterioration state calculated with standard Markov methods. This duration is also an imaginary duration of each imaginary state visit in the model. It would be more interesting to compute the real visit frequency and mean duration in . The real mean duration can be defined as the average duration of the time interval between the commissioning of a new or maintained system and the time when the system finally leaves 63 TABLE I TRANSITION RATES because there is a (physical) deterioration to . Dependent on whether the system deteriorates during inspections or not, the inspection durations have to be included or excluded from this time interval. Referring to the definition of state duration given above, one run through this time interval can be counted as one state visit. It is recommended to use methods that are capable to compute the real visit frequency and state duration. This can be done by means of a model realization that is based on the NRD concept (e.g. Monte Carlo simulations using the NRD concept). In the examples in Section V, numerical values are presented for the different definitions of visit frequency and mean duration in the deterioration states. V. NUMERICAL EXAMPLE The maintenance policy to be analyzed in this example is as presented in Section II-A. The state diagram formulated in the classical way is shown in Fig. 3. The model is a simplification of the models presented in [1], [3], [4], and [8]. The transition rates between the states are constant (see Table I). It is claimed [1], [3], [4] that standard Markov methods can be used to calculate state probabilities, visit frequencies and mean durations, that is, the model can be realized mathematically by a Markov process. It can easily be shown that this Markov process generates identical results as Monte Carlo simulations based on the RD concept. The results for the steady-state solution of the Markov process are presented in Table II where the columns with the results obtained by standard Markov methods are denoted “RD” (since the RD concept is realized). The mean time between failures (MTBF) and the mean time to first failure (MTTFF) has also been computed (Table III). The development of analytical expressions for maintenance models such as the described one is difficult (if not impossible). It is therefore suggest to carry out calculations based on the NRD concept by Monte Carlo simulation. The simulation follow the descriptions in Section IV-A. Random transition times are generated and the next time of system deterioration is only drawn when there is a physical state change due to deterioration or maintenance, and inspection times are only drawn when there are decisions after an inspection or after a maintenance action. In order to illustrate the difference between the two possible definitions of the state duration and the (as discussed in visit frequency in the deterioration states Section IV-E), two solutions are computed (see Table II). The solutions are denoted “NRD-a” and “NRD-b”, where “NRD-a” is counted as one visit means that each departure from to whereas “NRD-b” means that imaginary departures from are not counted as a state visit. Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. 64 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 24, NO. 1, FEBRUARY 2009 TABLE II STATE PROBABILITIES, VISIT FREQUENCIES, AND MEAN DURATIONS Models using the RD concept: Markov process Models using the NRD concept: Monte Carlo simulation using two different ways (a, b) of computing visit frequencies and mean durations TABLE III MTBF AND MTTFF If we compare the results for both models an apparent and surprising finding is that the MTBF and the MTTFF (see Table III) differ considerably. The results for MTBF and MTTFF calculated with the Markov model are much larger than the corresponding results calculated with the Monte Carlo simulation realizing the NRD concept. The reason for this is the direct connection between the deterioration states and the inspection states in the Markov model. This means that the inspection rate in the Markov model increases as soon as there is a state transition to , whereas in the real world the inspection frefrom quency will still remain at the current value until the next inspection reveals the state transition. Thus, we can expect more frequent inspections in the Markov model and there is a better chance to correct the critical situation by maintenance and to extend the lifetime of the equipment. This is confirmed through the results. Inspection states and have higher visit frequencies in the Markov model than the equivalent states in the alternative model. Note that an equivalent state to in the Markov model is an inspection leading to the result that the system is in in Fig. 5. Furthermore, it is state . This is denoted not surprising that the steady state probability is higher for the “good” state and lower in the “bad” states F, and in the Markov model, compared to the probabilities in the alternative model, because there is a higher chance to detect a critical deterioration and to intervene by maintenance due to more frequent inspection in the Markov model. A. Influence of the First Inspection Rate In this section, the dependency of the MTBF on the is analyzed. Fig. 6 shows plots of first inspection rate Fig. 6. MTBF as a function of the length of the first inspection rate , given different values of . RD: Markov process/RD concept; NRD: NRD concept. for both models. The focus is on the model . This means that the first inspection behavior if interval is very long. Thus, if a new system is set into operation, in practice there will be no inspection or maintenance before the system fails. In this situation, the MTBF is the sum of the expected sojourn times of the system in the deterioration states if no maintenance is carried out, that is, . A case without inspections and maintenance could obviously be represented as a simplified state diagram model as shown in Fig. 1(a). This simplified model represents only the situa. However, maintenance models are also used tion when for analyzing the relationship between the inspection rates and MTBF or MTTFF; see, e.g., [8]. The complete (nonsimplified) model is required for such analyzes and the model should conyears when converges to small values. verge to The Monte Carlo simulations realizing the NRD concept, denoted “NRD” in Fig. 6, show this convergence. The Markov process (RD concept) converges as well. The convergence limit, however, is not as expected. It depends on the choice of other model parameters, for example . This dependency is obviously wrong because the second inspection rate will never be applied once the decision is made that no further inspections are carried out. Thus, the Markov model yields incorrect results. Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. WELTE: USING STATE DIAGRAMS FOR MODELING MAINTENANCE OF DETERIORATING SYSTEMS TABLE IV MTBF AND STATE PROBABILITIES FOR THE CASE WITH PERIODIC INSPECTIONS ( = 1=year) RD: Markov process/RD concept, NRD: NRD concept The reason for this is again the direct connection between the deterioration states and the inspection states in the classical state diagram, something that will trigger a changing of the inspecin tion frequency as soon as the system deteriorates to state the model. B. Periodic Inspections Finally, a case with periodic inspections is considered, that is, the inspection rate is in all deterioration states. Both the Markov process (model based on RD concept) and Monte Carlo simulations based on the NRD concept yield the same results if all transitions have an exponential distribution and if all inspection rates are equal (see Table IV). Note that the RD concept and the NRD concept are interchangeable only when the inspection intervals are exponentially distributed. VI. SUMMARY AND CONCLUSIONS This paper has discussed the use of state diagrams in maintenance modeling. The focus has been on inspection models. It has been pointed out that the direct connection between the deterioration states and the maintenance and inspection states is a poor property of classical state diagrams. It has been argued that in reality, system deterioration and the maintenance and inspection strategy are two parallel and separated processes that are only connected to each other when new information about the condition of the system is gained by inspections or following a failure. It has been shown that there are two concepts to realize inspection models mathematically or numerically: the RD concept and the NRD concept. The use of classical state diagrams can entice the analyst to apply mathematical or numerical methods that realize the RD concept; for example, Markov processes that are based on such diagrams. A comparison with the real maintenance situation has shown that this can lead to discrepancies between the maintenance model and the real world. An unrealistic dependency between system deterioration and inspection times may arise. The numerical example presented in this paper has illustrated that this can result in modeling errors when a maintenance strategy with nonperiodic inspections is analyzed. It has been argued that a maintenance model should realize the NRD concept. The proposed alternative graph may help to realize and understand the NRD concept. The resulting inspection model will be in good agreement with the real world situation. The purpose of this paper is not a general critique of state diagrams. The diagrams and the resultant Markov processes can 65 provide useful, simple and correct solutions for different modeling situations. There are many good examples, as well as practical applications, where state diagrams are used as a basis for further modeling steps. When a situation with nonperiodic inspections is analyzed, however, the Markov process is no longer a good representation of reality. Obviously, a model is never a representation of reality but a simplification so that there will always be some discrepancies between the model and the real world. However, if the model no longer represents something that is close to the real situation, the validity of the obtained results may be questioned. ACKNOWLEDGMENT The author would like to thank Prof. J. Vatn from the Department of Production and Quality Engineering, Norwegian University of Science and Technology, for his helpful suggestions and comments. REFERENCES [1] J. Endrenyi, G. Anders, and A. Leite da Silva, “Probabilistic evaluation of the effect of maintenance on reliability—An application,” IEEE Trans. Power Syst., vol. 13, no. 2, pp. 576–582, May 1998. [2] J. Endrenyi, S. Aboresheid, R. Allan, G. Anders, S. Asgarpoor, R. Billinton, N. Chowdhury, E. Dialynas, M. Fipper, R. Fletcher, C. Grigg, J. McCalley, S. Meliopoulos, T. Mielnik, P. Nitu, N. Rau, N. Reppen, A. Salvaderi, A. Schneider, and C. Singh, “The present status of maintenance strategies and the impact of maintenance on reliability,” IEEE Trans. Power Syst., vol. 16, no. 4, pp. 638–646, Nov. 2001. [3] G. J. Anders, J. Endrenyi, and C. Yung, “Risk-based planer for asset management,” IEEE Comput. Appl. Power, vol. 14, no. 4, pp. 20–26, Oct. 2001. [4] G. J. Anders and J. Endrenyi, “Using life curves in the management of equipment maintenance,” in Proc. 2002 Probabilistic Methods Applied to Power Systems (PMAPS) Conf., 2002. [5] J. Endrenyi, Reliability Modeling in Electric Power Systems. Chichester, U.K.: Wiley, 1978. [6] G. J. Anders, J. Endrenyi, G. Ford, and G. Stone, “A probabilistic model for evaluating the remaining life of electrical insulation in rotating machines,” IEEE Trans. Energy Convers., vol. 5, no. 4, pp. 761–767, Dec. 1990. [7] G. Anders, J. Endrenyi, G. Ford, J. Lyles, H. Sedding, J. Maksymiuk, J. Stein, and D. Loberg, “Maintenance planning based on probabilistic modeling of aging in rotating machines,” in Proc. 1992 CIGRE Int. Conf. Large High Voltage Electric Systems. [8] P. Jirutitijaroen and C. Singh, “The effect of transformer maintenance parameters on reliability and cost: A probabilistic model,” Elect. Power Syst. Res., vol. 72, no. 3, pp. 213–224, 2004. [9] M. Rausand and A. Høyland, System Reliability Theory: Models, Statistical Methods, and Applications. Hoboken, NJ: Wiley-Interscience, 2004. [10] G. J. Anders, Probability Concepts in Electric Power Systems. New York: Wiley, 1990. [11] S. M. Ross, Stochastic Processes. New York: Wiley, 1996. [12] S. Amari and L. McLaughlin, “Optimal design of a condition-based maintenance model,” in Proc. 2004 Reliability and Maintainability Symp. (RAMS), pp. 528–533. [13] A. Jayakumar and S. Asgapoor, “Maintenance optimization of equipment by linear programming,” Probab. Eng. Inf. Sci., vol. 20, pp. 183–193, 2006. [14] G. Chan and S. Asgarpoor, “Optimum maintenance policy with Markov processes,” Elect. Power Syst. Res., vol. 76, no. 6–7, pp. 452–456, 2006. [15] G. Theil, “Parameter evaluation for extended Markov models applied to condition- and reliability-centered maintenance planning strategies,” in Proc. 2006 Probabilistic Methods Applied to Power Systems (PMAPS) Conf.. [16] S. Sim and J. Endrenyi, “Optimal preventive maintenance with repair,” IEEE Trans. Reliab., vol. 37, no. 1, pp. 92–96, Apr. 1988. [17] C. Valdez-Flores and R. M. Feldman, “Survey of preventive maintenance models for stochastically deteriorating single-unit systems,” Naval Res. Logist., vol. 36, no. 4, pp. 419–446, 1989. Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply. 66 IEEE TRANSACTIONS ON POWER SYSTEMS, VOL. 24, NO. 1, FEBRUARY 2009 [18] T. M. Welte, J. Vatn, and J. Heggset, “Markov state model for optimization of maintenance and renewal of hydro power components,” in Proc. 2006 Probabilistic Methods Applied to Power Systems (PMAPS) Conf.. [19] T. Welte, “Deterioration and maintenance models for components in hydropower plants,” Ph.D. dissertation, Norwegian Univ. Science Technol., Trondheim, Norway, 2008. [20] M. J. Kallen and J. M. van Noortwijk, “Optimal periodic inspection of a deterioration process with sequential condition states,” Int. J. Pressure Vessels Piping, vol. 83, pp. 249–255, 2006. [21] J. M. Kallen, “Markov processes for maintenance optimization of civil infrastructure in The Netherlands,” Ph.D. dissertation, Delft Univ. Technol., Delft, The Netherlands, 2007. [22] J. M. van Noortwijk, “A survey of the application of gamma processes in maintenance,” Reliab. Eng. Syst. Safety, vol. 94, no. 1, pp. 2–21, Jan. 2009. [23] R. Dagg, “Optimal inspection and maintenance for stochastically deteriorating systems,” Ph.D. dissertation, City Univ. London, London, U.K., 1999. [24] A. Grall, L. Dieulle, C. Berenguer, and M. Roussignol, “Continuoustime predictive-maintenance scheduling for a deteriorating system,” IEEE Trans. Reliab., vol. 51, no. 2, pp. 141–150, Jun. 2002. [25] R. P. Nicola, R. Dekker, and J. M. van Noortwijk, “A comparison of models for measurable deterioration: An application to coatings on steel structures,” Reliab. Eng. Syst. Safety, vol. 92, no. 12, pp. 1635–1650, 2007. [26] M. J. Kallen and J. M. van Noortwijk, “Statistical inference for Markov deterioration models of bridge conditions in the Netherlands,” in Proc. 3rd Int. Conf. Bridge Maintenance, Safety and Management, 2006. [27] T. M. Welte and A. O. Eggen, “Estimation of sojourn time distribution parameters based on expert opinion and condition monitoring data,” in Proc. 2008 Probabilistic Methods Applied to Power Systems (PMAPS) Conf.. Thomas M. Welte was born in Böblingen, Germany, in 1976. He received the Dipl.-Ing. degree in mechanical engineering from the University of Stuttgart, Stuttgart, Germany, in 2003 and the Ph.D. degree in safety, reliability, and maintenance from the Norwegian University of Science and Technology, Trondheim, Norway, in 2008. He is a Research Scientist at SINTEF Energy Research, Department of Energy Systems, Trondheim, Norway. He is working with maintenance and deterioration modeling, maintenance optimization and renewal strategies. Authorized licensed use limited to: IEEE Xplore. Downloaded on February 26, 2009 at 03:50 from IEEE Xplore. Restrictions apply.