Uploaded by hc-wels

Asset Management & Unavailability Data in Power Plants

Page 1 of 37
H.C. Wels
J.L. Brinkman
Let us assume a utility firm that operates a power plant with one or more production units to
produce electricity and that is busy with its core business of production, taking care of
maintenance, tries to prevent failures from happening, tries to make sure when components
are repaired the failure does not occur again, and in general is improving this process; in
short: Asset Management. The power plant may produce more than only electricity such as
water, steam, fly- and bottom ash. The company may have business contracts with other
parties for delivery of goods and services for or from the plant. Such a firm should be
interested in gathering unavailability data from its assets to study the effectiveness of their
Asset Management processes.
If the unavailability of the firm's plant is sketched (see figure 1) as a function of time, the first
part of the well-known bathtub curve with teething problems may have been encountered
and survived already. Ageing is not felt to be present right now. However, a number of
questions may arise with regard to Asset Management related to this plant.
It is reasonable to assume that the forced unavailability of the power plant has been
documented. However, is a forced unavailability of 5-10 % a low value, a reasonable value
given the layout of the plant or such a high value that a project should be started for
betterment of either the technical systems and components or operation and maintenance?
Which forced unavailability is feasible in the long term? How long should the firm measure to
be sure?
Figure 1
Bottom of the bathtub curve
Page 2 of 37
Regrettably, while no ageing nor teething problems seem to be present, an event happens
on the steam turbine causing a 1 month outage. Is this High Impact failure an incident and
has it therefore a low probability or is the frequency of such incidents out of bounds? Which
components are generally related to HILP failures, what are the causal factors what can
management do to influence this?
Due to an increased demand the firm decides to build a new plant. What is the amount of
teething problems that can be expected in the first part of the bathtub curve as per figure 2?
Which components will cause teething problems as a function of technology and possibly
manufacturer? What average value and what range of forced unavailability can one expect
during this period of teething troubles?
Figure 2
Teething problems
An alternative is to extend the life of the old plant. It still has a good availability record despite
its 25 years of operation. Regular investments have been made and the operation and
maintenance departments precisely know how to operate the plant. However, for what
components ageing (increase in failure frequency) and maintainability problems (increase in
repair time) are likely to occur? How much is the uncertainty in the bathtub curve of figure 3?
Figure 3
Life extension of a plant
Page 3 of 37
These Asset Management questions can be answered by analysing failure data of relevant
classes of power plants. In the next paragraphs, a basic qualitative model for unavailability is
sketched. Depending on the questions to answer this model may need further quantification.
It is necessary to take uncertainties into account. The amount of detail should increase
stepwise, starting from a portfolio of plants to the plant itself with its systems and
components. While human factors are not easy to quantify they should be taken into account
at least by recognising their importance.
A simple qualitative model for the forced unavailability of a production plant is shown in the
next figure.
When the new plant is initiated, an investment is made. This investment is important since
the quality of the components as well as the complexity of the plant may result from this. An
investment can also be necessary to modify the plant when system betterments are deemed
necessary, when a layout change or operational change is necessary, spares are to be
bought or pooled, etc. Humans are involved in the design, the choice for manufacturers,
maintenance and operations. All humans make mistakes. Decisions by humans may not
necessarily be optimal in retrospect. As shown in figure 5, management provides the
boundary conditions for investment, operation and maintenance in a feedback loop as a
function of external influences. Therefore these boundary conditions may change in time.
Figure 5 Qualitative model
Page 4 of 37
In most cases, the first period of operation of the plant is in base load with low wear-and-tear
on components. Depending on state-of-the art of the components (their failure behavior may
have improved in time), size of the plant and the anticipated failure behavior, redundancy
may be present within the plant. Decision making that is focused at low installation costs, for
instance with regard to the number and capacity of coal mills, may result in non-optimum
redundancy and large Life Cycle costs which will be hard to resolve once the plant is in
From a cost perspective, any component preferably is 1 * 100 %. However, if this is a critical
component and it fails, the plant is fully out of operation. Redundancy is an option in this
case. From an Asset Management point of view, the outage costs and other consequences
of failure should be weighed in a plant portfolio perspective with the costs of redundancy
installed. If a 3 * 50 % or a 2 * 100 % redundant system is chosen, failure of a component
does not result in production curtailment. Such a system has advantages with regard to
maintenance but disadvantages with regard to costs and complexity. Several configurations
are feasible and simple Reliability Block Diagrams (RBD) provide answers to redundancy
questions using component failure rates, repair times, etc. from databases with information
from power plants derived from practice (Retour de l'experience).
After having solved teething problems in some components and preferably having reported
these as such in databases such that future plant can benefit, a stable period of operation
can be regarded as normal. The unavailability is at the bottom of the bathtub curve. A
Reliability Block Diagram (RBD) with average failure data helps to define a reference value
for the bottom part of the bathtub curve. During this phase of operation especially if the
number of failures is low, the failure – repair process tends to be deterministic in nature:
focus on prediction of the next failure on an engineering basis and, if it has occurred, solve
this failure. Carrying out systematic Root Cause analysis for all important failures and
implementing the lessons learned is making sure that the failure does not repeat itself. This
is a tough process with many human factors involved. By establishing contacts with sister
plants and analysis of those plants that have similar or even identical components, failures
can be prevented. Modifications either due to failures that have occurred or are likely to
occur may be necessary. However, also in this phase of the plant life, surprises can occur.
Surprises are a result of errors made in the construction phase (f.i. a welding error), a design
that behaves different from what is expected (design error or f.i. creep behavior of a new
material). HILP failures generally are cause by combinations of faults (design, operating,
Over the life of a plant, the operating conditions will change since more modern plants will in
time produce at lower costs. Their higher efficiency will lead to lower fuel costs. The plant
that is the less modern is expected to copy with such lower costs. However, a change in
operating conditions such as cycling (starting and stopping) often since it can no longer
compete in base load together with the finite life of f.i. a steam header can result in a
condition change that is faster than expected or even in an unexpected failure. Component
Page 5 of 37
conditions have to be investigated to find out about their damage state (inspections) to
prevent surprises. Inspections, NDE and condition monitoring form a large part of power
plant maintenance and have implications to future maintenance activities as well as the
maintenance regimes of components.
It will be clear that the best quantitative models for unavailability should not only model the
influence of time itself, but the types of model will be a function of (life) time also. To model
the first years of operation of the plant without teething problems, simple Reliability Block
diagrams (RBD) based on generic data are sufficient for decision making and forecasting.
The number of plant specific failures is simply insufficient. After a period of operation, it will
be found that the plant is sufficiently different from other plants to warrant plant specific
failures to be taken into account. Further in the life of the plant quantitative models should be
used that model ageing of the components in the plant.
We consider it sufficiently shown from the above that both technical factors as well as human
factors influence the unavailability of a plant. The human factors can both lower and increase
the unavailability. It is relatively easy to wreck a boiler. It is not so easy to systematically
improve a power plant to become an asset delivering more value to the utility.
Page 6 of 37
Capacity expansion planning
Plant reliability and availability data are inputs to determine the risk of insufficient capacity
while matching supply to demand (power, heat, desalinated water). One should be able to
determine this risk as a function of the plant portfolio for existing and future plants running as
base load, cycling load, hot and cold reserve, etc. The resulting probability on insufficient
supply should be used for medium to long term planning and decisions on scrapping, life
extension or mothballing of plant as well as building new plant. The use of plant specific
reliability and availability data in combination with generic plant data (for those components
that have not failed yet in plants) contributes to a better analysis of the risk of undersupply
with the existing asset portfolio and with candidate new plants.
Figure 6
Step 1 in capacity expansion planning
Step 1 in capacity expansion planning should be an estimate of the demand. Peak demand
can be modelled using regression analysis of historical demand as a function of such
parameters as population, GDP, end-user estimates, etc. EPRI CU-68551 is a recommended
report on how to incorporate uncertainty while producing forecasts that are easy to
understand and are therefore accepted within and outside a utility. Since peak demand is
uncertain it pays to analyse the hourly demand patterns for several years. Analyses of peak
demand and energy consumption are complementary.
EPRI CU-6855, May 1990 "Uncertainty in Forecasting"
Page 7 of 37
Figure 7
Step 2 in capacity expansion planning
Step 2 should be a comparison between supply and demand. The simplest comparison
incorporates a reserve-factor. The reserve-factor should however take into account
maintenance, forced outages as well as grid connections to supply power to areas. The
reserve-factor is a function of the portfolio of plants (large / small / type of plant). In Billington
and Allan2 very basic generation capacity models can be found that are easily programmed
in Excel. Quality control of the Excel model is simple due to the worked out examples in this
handbook. A more elaborate analysis should use analytical models or large simulation
models (f.i. PLEXOS) with “energy not delivered” or ”loss of load expectation" (LOLE), etc. as
input parameters. In any such models the forced unavailability of the portfolio of plants is an
important variable especially when a utility would like to apply a low capacity reserve factor
(say 10 %) which is similar in magnitude to plant forced unavailability (say between 1 % and
10 %).
Roy Billington and Ronald N Allan, "Reliability Evaluaton of Power Systems", 1984, ISBN 0-273-08485-2
Page 8 of 37
Figure 8
Step 3 (and further) in capacity planning
Step 3 in capacity expansion planning should define new plant. The decision should be
robust to scenario's for the future and take into account the production costs (fixed and
variable) of new and existing plants, the dispatch characteristics, forced and planned outage
characteristics of new plant, rehabilitation and longer operation (life extension) of old plant.
Uncertainty in demand and fuel prices is to be taken into account. Long term demand
forecasts (over 30 years) do not need to be very accurate since the time needed to realise
new plants is relatively short compared to such a long period assuming that decision making
is an orderly process using regular demand forecasts. However it is interesting to see that
long term optimum decisions for instance for a situation with a nuclear portfolio, may effect
decision making on short term. An example is choosing open cycle GT's instead of the more
expensive CCGT anticipating that with a future nuclear portfolio such CCGT will not be
operated in base load when the NPPs are in operation.
The plant availability data are to be analysed in relation to actual and expected load on a
regular basis, taking into account the plants daily dispatch and costs structure. For capacity
expansion planning it is recommended to analyse plant unavailability in a systematic way on
a yearly basis at least and preferably gathered the time on a state basis (time stamp and
coding for start, stop for maintenance, start, unplanned outage, etc.). Some systems and
components fail so often that only a few years of information is sufficient to arrive at an
acceptable estimate for reliability parameters. For other systems and components such as
step up transformers one really needs generic data from other plant to arrive at an
acceptable value for forced failure rate, average repair time and unavailability. In order to
enlarge the population and make the estimates less susceptible to statistical outliers, data for
similar plants having the same operational characteristics should be used. These are present
in databases for plant unavailability such as the VGB KISSY database, ORAP and the NERC
Page 9 of 37
databases. The VGB KISSY database has the large advantage that for study purposes, raw
event data including texts can be made available. When operators and maintenance
personnel (and many still do) supply a fair amount of details, these raw data are invaluable.
A plant that does not often run due to its relatively high costs would have a small forced
unavailability on a calendar time basis. However, basically it should function without mishap
when needed for opportunity price windows. How to calculate the unavailability "when
needed" for a cycling or reserve plant using an analytical model is by no means simple.
Monte Carlo simulation is a better tool to allow differentiating between unavailable time when
the plant is not needed (low costs) and when it is needed (high costs and opportunities
missed). Furthermore one should use energy based dependability data3 (instead of time
based data) to differentiate between plant deratings (reduced power) and full outages
especially for large plants. For all purposes, one should analyse unavailability (failure rate,
MTBF, MTTR, etc.) of components from reasonably comparable units as a function of their
age knowing for which plants due investments in maintenance have been made and, to the
extreme, which plants have been subject to minimal maintenance which usually occurs at the
end of commercial life.
A large external grid for assistance in case of outages and for trading purposes will result in
a lower reserve factor with the same LOLE = loss of load expectation. The LOLE rises
sharply with reduced reserve factor4. However, a conservative reserve factor could mean
large unused investments in plants and therefore has to be determined with care.
Figure 9
Loss of load expectation as a function of reserve factor
In an energy based scheme a plant is unavailable with an outage that effects power delivered, in a time based
schema a plant is available when not fully out of operation due to a failure.
In figure 9 LOLE is plotted logarithmically.
Page 10 of 37
Power plant concepts
The choice between concepts for an individual plant is a choice for type of plant,
configuration, fuel type, etc.
By having derived quality availability parameters such as failure rate and repair time for plant
components from databases, optimum decisions on layout can be made with large
consequences for Asset Management during the life of the plant. Detailed concepts as well
as redundancy (for instance 1, 2 or 3 series or parallel components) can be optimised using
Block Diagram type analyses in combination with (probabilistic) cost-benefit analysis.
A typical example is the choice for a flexible plant having 2 smaller but mature gas turbines
delivering steam through heat recovery boilers to a steam turbine or a process requiring
steam such as city heating, cogeneration or desalination versus an alternative such as a
single shaft plant with a larger, more efficient and advanced gasturbine & steam turbine.
Steam to city heating can also be provided by steam from other means (for instance
separate boilers). Such a plant may be cheaper in installation costs but clearly less flexible
when the shaft is rigidly coupled to the steam turbine . The 2 gasturbine plant will be more
expensive with regard to installation costs and maintenance costs, however the probability
that zero electricity or zero steam is produced is substantially smaller compared to the singleshaft plant.
Figure 10
Simple RBD for a plant with 2 gasturbines and 1 steamturbine
Page 11 of 37
The probability that zero electricity or zero steam is generated is easily calculated using
Reliability Block diagrams such as shown above. For a plant with 2 or more products, the so
called cut sets define when and how often product 1, product 2, the combination of products
etc. is not delivered. The inputs are failure rates (expected number of failures per hour) and
repair time for systems and components. Typical similar layout and concept applications are
decisions on auxiliary burners, the choice for a separate GT exhaust for open cycle
gasturbine operation during boiler outages, optimum redundancy with regard to coal milling
equipment, and the classical example of redundant feedwater and condensate pumps.
Even for components and plants that just have come from the design office, one would like to
see the availability characteristics compared to similar components or at least of those
components that operate in similar plants. This is possible to some extent since any new
item will consist of components with known behaviour while for new components engineering
judgement can be applied comparing these new components to existing components.
A Pareto analysis and resultant diagram is a very useful instrument to decide what
components are dominant with respect to unavailability. Only this limited number of dominant
components has to be analysed further, since they already define 50 % - 80 % of the plant
unavailability. For a Pareto type of analysis one needs to pinpoint the system / component
involved, analyse its contribution to forced unavailability and sort components such that the
dominant components become clear.
A data gathering system such as NERC or VGB's KISSY allows analyzing systematically
what components over a certain time window and for a specific class of plants define the
majority of plant unavailability. Having a data gathering system within the utility set up in a
similar way and having the same coding allows plant specific to be easily uploaded to the
generic database as well as easily comparison with the generic database data. Bayesian
statistics are an accepted and systematic way of mixing plant specific and generic data.
When carrying out such a Bayesian analysis one should clearly understand:
The statistical uncertainty in mean value for the generic data (usually small)
The statistical uncertainty in mean value for plant specific data (large to small
depending on the number of failures)
The uncertainty in realisation of unavailability over a period, given a fixed and certain
mean value as a result of the statistical distribution of failure and repair times
The uncertainty due to differences between plant and realisations of failures for the
plants from which the generic data have been derived. Plotting failure data per plant
is recommended to assess such differences.
Page 12 of 37
Analysis of Unplanned Unavailability 2004 - 2008
Causers All Areas
(KKS Function keys F1: A to Z)
Collective: Combined Cycle Units, up to 300 MW
Unavailability incidents
per block and year
unavailability [%]
KKS Functions
not postponable
other (no KKS function key)
HA - Pressure system, feedwater and
steam sections
HF - Bunker, feeder and pulverizing
HH - Main firing system (electricpowered as well)
LA - Feedwater system
LB - Steam system
MA - Steam turbine plant
MB - Gas turbine plant
MK - Generator plant
MM - Compressor plant
XA - Steam turbine plant
2004 - 2008
2004 - 2008
Figure 11
Pareto analysis showing the main contributions for unavailability (energy as
well as number of incidents) derived from VGB's KISSY database
Page 13 of 37
In order to model a power plant concept bottom up from its components, one needs at least
failure data defined at 3 letters KKS code. This is sufficient level of detail to pinpoint systems
and, in some cases, components. An analysis of some raw KISSY flue gas ventilator data is
given in figure 12. From the raw event type data, the failure rate per hr and the average
unavailable (repair) time is calculated. By analysis of partial outages and full outages, one is
able to assess common cause failures. During common cause failures, more than 1
component in a redundant system is out of operation for example due to an operating error, a
maintenance error, effects of external conditions, etc. Please note that the fraction that can
be calculated from the figure (49 / (91+ 49) = 35 %) (from cells C43 and C83) is much higher
than the usual common cause fraction of 10 % found in handbooks. On reason might be that
the plant management want to take both components out of operation to start repair or wants
to investigate whether a problem in component 1 is also present at component 2.
Figure 12
Common cause failure analysis
In this area of failure data application, the ability of the plant to deliver a certain amount of
energy production may be analysed. The availability data can be sourced from
manufacturers, contractors, etc. to assess the new power plant. However, these data should
be judged for credibility as well as for definitions applied, applicability to contracts and
guarantees etc. The data can be used for decision information to choose between contracts.
While this area of application is not fully Asset Management, the data originate from plants
that are being managed and the choices made will directly influence Asset Management
during operation of the new plant.
Page 14 of 37
Figure 13
Data sourced from manufacturers when contracting a new plant
Production in Asset Management is the operation process (monitoring, control, service and
maintenance) of the plant. By the start of operation of a new plant, estimates for the
probability of failures, the average duration of production curtailments, availability, etc.
preferably are known. By setting targets right away or after having dealt with teething
troubles, one is able to measure the performance of the plant against these measures.
Targets should be realistic and deviations from these targets should be analysed taking
uncertainties into account. Such uncertainties will show when analysing otherwise identical
sister plants.
Page 15 of 37
Figure 14
Target for plant X
The data in figure 14 show that the target for unplanned equivalent hours is met after
a period of teething troubles. However, the target for planned hours is not met and
asset managers should investigate this further.
It is obvious that the definitions for planned and unplanned outages are important.
There is no universal agreement on what definitions to take. For example NERC and
VGB definitions differ as shown in the following figure. NERC has originated from time
based definitions, VGB definitions are essentially energy based. A plant is available in
NERC definitions during a power curtailment (derating), whereas in VGB definitions
the plant is unavailable in such system state. Comparisons should therefore be made
using identical definitions are far as possible.
Figure 15
VGB definitions
Page 16 of 37
Figure 16
NERC definitions
Contracts to operate plants or taking over the operation of a plant abroad or in one’s own
country are increasingly becoming usual. These contracts should contain precisely defined
indicators that allow calculating whether or not the contract is fulfilled. Availability indicators
can for example be coupled to a penalty-incentive system.
In figure 17 an example is given of uncertainty within the firm itself in due to several
registration systems, interpretations by the plant, by the dispatch department and by
corporate. Also differences in calculation may be present. In column B – D registration is on
the basis of calendar time. The calculations by Reliability Block diagrams in column E
essentially indicate availability "when needed" under semi-base load conditions. With semibase load it is meant that the time window for need of the plant is large (weekly at least, not
daily) and effects such as a repair time longer than the (daily) need, cycling effects, etc. have
not been modelled in the RBD except being implicitly present due to data from plants that
were cycling.
Page 17 of 37
Figure 17
Different unavailability assessments in 1 company
Weak points in a plants components have a higher than generic average failure rate or
unavailability. Asset Management should better these weak points on a cost-benefit basis.
By comparing for instance superheater failure data between plants, it was found that some
plants had troublesome constructions and/or were subject to minimal maintenance, which
explained the significant difference in failure rate. As is often evident from the data, many
plants have a large fraction of return of the same failure in less than a week. The analysis of
weak points and the return probability of failures makes sure that Asset Management
budgets for betterment are spent at the dominant systems for unavailability and at the
dominant problems.
Page 18 of 37
Figure 18
Weak point: superheater failures for plants no 7 and 19-21
Modification of existing concepts / plants
The basis for modification can be a large number of failures, betterment of availability,
efficiency, inability to acquire spare parts due to obsolescence, etc. The modified plant may
have an increased power compared to the original. A trend in failure rate underpinning the
need for modification is easy to carry out when sufficient plant specific data are gathered.
The controls in the next figure show an increasing average repair time, the flue gas fans
show an increasing number of failures5.
The difference between the 2 series of data are related to separate failure data registration schemes
Page 19 of 37
Figure 19
Ageing of components (controls, flue gas fans)
For other critical components such as generators and step-up transformers where one
cannot tolerate High Impact failures, one should derive bathtub curves from generic data
taking expert judgement on the component into regard. In the next figures, a bathtub curve is
shown for a reasonably sized generator resulting from analysis of a VGB generator
database. Clearly both rotor and stator show generic ageing issues. The rotors in the
database show teething problems.
On the vertical axis in figure 20 is the relative frequency of problems corrected for the
number of units as good as possible, on the horizontal axis the age of the generators. The
database is inputted by experts that are sharing technical information on the problems. How
many generators did have such problems and, equally important, how many generators did
not have problems (in short: what is the exposed population), needed to be estimated
separately from the VGB database.
Page 20 of 37
Figure 20
Ageing of components (generator rotor and stator)
Changing the operation of a plant
Let us assume that, due to a different merit-order because of changing market conditions,
fuel price, efficiency of new plant etc., it becomes necessary to daily start and stop a plant,
otherwise known as cycling. To what extent will this influence the forced unavailability and
what measures can Asset Managers take? The first step in this analysis should be a
qualitative one: to carry out a FMECA using plant specific expert judgement into account
pinpointing the more sensitive components. Generic data are necessary to devote the
attention to the components that are expected to be the most susceptible to cycling. These
Page 21 of 37
components can in principle be derived from an analysis of international databases such as
NERC or VGB by analysing the subset of plants in the database that cycle and preferably
compare with the same subset that has operating in base load also. Given a list of such
components, the databases in combination with expert judgement certainly allow quantifying.
This type of analysis is shown in figure 21. Clearly the failure rate increases with starts per
year. The increase however is not without question marks.
Figure 21
Cycling analysis using NERC data
Figure 22
A RAMP model applicable to a plant cycling
Page 22 of 37
With a simple RAMP model such as that in figure 22, it can be calculated that for abt. 50 % of
the duration of repair times the plant is not needed when cycling daily and being not in
operation during the weekend. Therefore this part of the repair time should not be taken into
account to the same extent as the repair time when the plant is indeed needed. Since the
average repair time and number of failures per operating hour differ per component, a
detailed analysis is not straightforward and needs further modelling. However, given an
estimated increase of the failure rate per component effected by cycling and an estimate of
the number of operator errors (learning effects should be incorporated), one is able to judge
the effects of cycling quantitatively. Of course, such efforts should be accompanied by
assessing the life consumed by cycling, f.i. for HP parts. As shown in figure 23, the
conservative creep and fatigue life calculations that are the design basis should be extended
into a probabilistic failure probability (Structural Reliability). Figure 23 shows for example the
difference in creep life using minimum material characteristics versus average material
Figure 23
Life reduction of a steam header due to cycling
The costs for spare parts should be weighed against the financial benefits. When the wrong
spares have been bought, not only the investment does not pay off but substantial additional
costs for waiting on and implementing a different crash-spare are suffered. Essential
elements in a cost-benefit analysis are therefore the need for spares (applicability and
frequency) and the difference in repair time warranting the investment. More than 1 spare
might be applicable when a large number of plants use the same spare (pooling).
Simple queuing models are able to estimate the probability that another spare is needed as a
function of inventory during the window that the item from the failed component is being
repaired or ordered. Simulation models such as in figure 24 do a better job in calculating that
Page 23 of 37
probability. Such analysis demonstrates the benefits of certain spares used in an overhaul in
combination with application of that spare as a strategic spare (f.i. gasturbine blading, a
spare rotor, etc.). DNV-KEMA uses the RAMP software by Atkins to simulate power plants,
assessment of spares and maintenance optimisation.
Figure 24
Intermediate results of a spares analysis
Page 24 of 37
Figure 24 (ctnd)
Intermediate results of a spares analysis
Also when the number cased where spares are needed is not precisely known, one can
estimate the fraction of failures in which spares might be needed on the basis of the text from
raw failure data as well as from the distribution of repair times. As shown in figure 25, for HP
bypass (KKS-code = MAN), 11 % of the failures has a duration over 24 hrs and thought to
have needed a spare. By counting the number of records in which a spare could have been
applied on the basis of engineering judgement into the problem described, one arrives at 27
% of the failures. This gives an order of magnitude. Due to the enormously large number of
components in a power plant in combination with the fact that the operation and maintenance
crew tries to limit failures as much as possible, one cannot refrain from engineering
judgement. By doing so however, Asset Managers should be able to save several million
Euros on the spares budget.
Figure 25
Spares analysis by comparing repair times
Page 25 of 37
Application of failure data to maintenance is one of the most difficult areas of application.
However, it is rewarding due to the amount of understanding the failures that can be reached
which could be applied to other plants also. The problem in general is that data are gathered
at a system level and only for those events where plant unavailability has occurred.
Maintenance optimisation is however carried out at a component level, mainly by estimating
the ageing part of bathtub curves. Weighing of preventive costs and corrective costs show
the interval for which maintenance on a specific component is optimum. Many maintenance
optimisations assume however that a component is as-good-as-new after maintenance and
this assumption is equal to replacing a component a new component. Trending such as in
figure 19 shows that this is seldom the case: if a component would be as good-as-new, there
would be a straight line without the typical "steps" in cumulative number of failures.
The simplest way to apply failure data to maintenance optimisation is by analysis of the raw
data from a plant unavailability data gathering scheme and to pinpoint those systems and
those components where maintenance obviously is not removing the failures to the
maximum extent possible. Whereas for Asset Management purposes one would like to have
a root-cause registered for every failure, such an effort is out-of-place. Maintenance
optimisation is therefore a kind of art rather then science requiring such information as raw
failure data, bathtub curves (Weibull modelling) and indications for the time window between
initiation of a failure and actual occurrence of a failure (or the time the component is taken
out of operation to prevent an imminent failure).
The basis for a proper maintenance optimisation is a qualitative description of the
maintenance actions that would be effective against a failure mechanism. If, as is the case
with Reliability Centered Maintenance RCM, these are charted by means of a Failure Mode
Effect Analysis such as in figure 26, the basic tool to estimate the contribution of
maintenance to unavailability is already present. It certainly pays to estimate probability of
occurrence by using failure data gathering systems and damage investigations (see figure
27) in addition to the usual expert judgement. The failure rates from failure data, damage
investigations and expert judgement should be comparable.
Page 26 of 37
Figure 26
FMECA correlated with failure analysis
Figure 27
Superheater degradation and failure rate from damage analysis
Page 27 of 37
Figure 28
Markov model
Figure 29
optimum inspection results using a Markov model
Markov modelling such as in figure 28 is about using discrete system states and their
transition rates. Discrete system states described as "new", "moderate", "bad" and "failed"
are easily assessed by maintenance and operation experts. The result of a Markov model
quantification is an optimised maintenance and inspection schedule without the usual fixed
intervals. Instead for such a schedule, optimum moments for inspection and other
maintenance actions are calculated. By application of this type of modelling, Asset
Page 28 of 37
Management is able to save Euros risk based, for not carrying out inspections during the
stages in component life where analysis shows that only with a low probability any
degradation will be found. Inputs into such an analysis are the probability that a component is
in a certain state at the start and the probability that a transition to a degraded state or a
failure state would occur from which the probability on degraded states can easily be
calculated. The transition rates can be derived from expert judgement, damage analysis and
maintenance / failure data. Asset Management also is able to save Euros risk based by
intensifying inspection and other maintenance actions when the probability of degradation
rises while preventing failures.
Reports are defined by their content and form and should be arrived at by systematic way of
working in order to clearly present availability characteristics. Asset Management provides
the data for these reports.
Recording and evaluation must take place in a uniform way taking account of for instance
VGB definitions. In the annex, the VGB input forms are given. Based on the results, a
uniform and regular representation in internal and external reports should be put into
operation. Preferably, evolutionary series are to be chosen with corresponding commentaries
to help interpretation. The type of the representation should be oriented to the interests of the
benefiting parties (technicians, authorities, customers, etc.). Two examples of KISSY VGB
reports are given. In example 1, availability is shown for supercritical units over a range of
time. In example 2, for a specific peer group of plants, the number of units having a certain
availability is given. By plotting one's own plant into this figure, a first step in benchmarking is
Page 29 of 37
Availability numbers
Collective: Supercritical Units
Time range: 2004 - 2008
Time Availabily
Time utilization
Energy availability
Energy utilization
Energy unavailability
unplanned postponable
unplanned not postponable
Nominal Power (gross)
MW 53,232 53,630 51,045 51,423 51,044
Time Availabily
Time utilization
Energy availability
Energy unavailability
Planned energy unavailability
Unplanned energy unavailability
Postponable unplanned energy unavailability
Not postponable unplanned energy unavailability
Energy utilization
© This document is protected by national and international law.
Verberk, Coen
Figure 30
3/9/2009 14:30
standard KISSY availability report
Page 30 of 37
KISSY - Power Plant Information System
Numbers of Power Plant Units over Classes of Energy availability
Collective: Fossil fired Units
Time range: 2008
Number of Units
© This document is protected by national and international law.
Verberk, Coen, 09/04/2009 11:17
Figure 31
VGB PowerTech e.V. • www.vgb.org
Klinkestr. 27-31 • D-45136 Essen • kissy@vgb.org
standard KISSY benchmarking report
Every piece of availability information can be used in the public area. In this utilization, detail
information is not in the foreground. A careful selection of parameters is important in stating
the message. Furthermore one should guard that all used specialist expressions and
language are sufficiently explained in simple language, so that they can be understood by a
non-specialist public. Any values used nevertheless must be build from well defined and
pertinent information based on clear definitions.
Page 31 of 37
Availability data allow optimum planning of the dispatch of present and future power plants
and make an economic optimisation of power plant blocks in combination with purchasing
additional energy fuels for the companies own production installations possible.
An interesting application for Asset Managers to estimate the probability that short term
contracts are not fulfilled is by the use of Markov models such as shown in the next figures.
The resulting reliability and the unavailability as a function of time in figure 33 clearly show
that, given that at time (t) a plant is repaired and in operation, the risk on no delivery at time (t
+ dt) with dt not too large is indeed smaller than the average unavailability that is for example
calculated on a yearly basis. Only for such an average value, the long term steady state of
the operation-and-repair process is valid. Therefore such unavailability is larger than the
short term value.
Figure 32
Markov model for a plant using discrete system states
Page 32 of 37
Figure 33
Failure probability and unavailability as a function of time
Economically optimum dispatch should take into account accurate values for plant failure
rate and unavailability taking the plant size, operational characteristics etc. into account. For
this economic dispatch process, one can either use commercial software such as PLEXOS
or software developed in-house. Such software will result in a optimum spinning reserve as
shown in figure 34.
Figure 34
Failure probability and unavailability as a function of spinning reserve
Page 33 of 37
Benchmarking is ranking one’s own plants and compare the rank to similar plants to judge
the efficiency of the processes (including managerial) within the plant. Benchmarking can be
difficult when the layout of the plant is uncommon or when steam is delivered from a set of
plants that is not comparable to other situations. In such cases we recommend to calculate
the benchmark parameter using Reliability Block diagrams or Fault Tree analysis based on
plant specific as well as generic data. For the plant such as shown in the figure 35 the
probability of having no steam on LP and IP steam headers was calculated. The calculated
value can be regarded as a benchmark value taking the configuration into account.
The results showed that maintenance of the gasturbine and boiler during operation of the
remainder of the plant in theory would result in only a minimal risk increase for the plant
under consideration.
Page 34 of 37
Figure 35
Layout of a plant for steam and power production
Page 35 of 37
The risks for power plant components, described by the likelihood of failure occurrence and
the severity of consequences of damage, can be insured. Also unavailability itself can be
insured. Availability data deliver information whether it is optimal to insure a risk or to carry
the risk by one’s own means. Long-term high quality availability data can be used for the
reduction of the insurance payments within the contract negotiations. Since usually only large
components are assured because of the amount of possible damage incurred, the
elaboration for this application can be limited to the availability analysis of large components.
For the data of one’s own installation, the complete time series since starting the operation is
generally demanded. The insurer may ask for the listing of all events including description
and affiliated investigation reports. As comparison values, statistics from example the
technical and scientific reports of the VGB can be taken for supplementary information.
Questions on the effectiveness of state-of-the-art precautionary measures (especially when
these are not yet present but the insurer is sounding to have these installed) should be
answered taking the present measures and their influences into account using reliability
Availability parameters can be used to set up the budgets for plant maintenance, the amount
of spare parts or modifications and the corresponding budgets. The optimisation problem
involved is weighing direct costs for measures against indirect costs caused by unavailability
events as shown before. Optimisation preferably starts with a Failure Mode Effect &
Criticality Analysis. By detailing and confirming the expert judgement by inspection results
from the past and accurate condition assessments during an overhaul, a maintenance plan
for "normal" maintenance, replacements that are certain given obsolescence etc. and
replacements that are uncertain since condition based maintenance is cost-optimum. The
steps such as carried out in KEMA's Altima = Advance Life Time Management are shown in
the next figure.
Page 36 of 37
Figure 36
ALTIMA = Advanced Life Time Management
In order to judge production assets with regard to concept, construction, operation,
maintenance effectiveness etc., a consequent, systematic and unified data gathering scheme
should feed Asset Managers with reliability and availability indicators for plants as a whole as
well as for components taking into account industry specific definitions and guidelines.
We would like to express our gratitude to VGB for allowing the use of KISSY examples as
well as making KISSY reports incorporating unavailability data available.
VGB PowerTech – „Terms of utility industry“
Part B – „Power and District Heat“
Booklet 3 – „Fundamentals and systematics of availability determination for Thermal
Power Plants“
VGB – Technical Scientific Reports "Thermal Power Plants“
„Analysis of Unavailability of Thermal Power Plants“
VGB TW 103Ae
Page 37 of 37
Annex 1
Input sheet for unavailability analysis of plants
Annex 2
Input sheet for availability of energy (plant as a whole)