Intelligent Alarm Handling Lars Asker, Mats Danielson, Love Ekenberg

advertisement
From: Proceedings of the Twelfth International FLAIRS Conference. Copyright © 1999, AAAI (www.aaai.org). All rights reserved.
Intelligent Alarm Handling
Lars Asker, Mats Danielson, Love Ekenberg
Dept. of Computer and Systems Sciences
Royal Institute of Technology
Electrum 230, SE-164 40, Kista, SWEDEN
{asker,mad,lovek}@dsv.su.se
Abstract
Most of the actions taken within today's power plants are
directed by control systems, which usually are
computerised and located in a central control room within
the power plant. In normal states, the communication
between the control system and the operators is
satisfactory, with few alarms occurring infrequently.
However, when large disturbances occur, the
communication is problematical. Instead of being aided by
the messages, the operators become swamped by the
amount of information, and often have to make more or
less informed guesses of what causes the abnormal
situation. It is therefore of great importance if the control
system can discriminate between normal and abnormal
situations, as well as being less sensitive and giving
priority to alarms that must be sent to the operators. In
order for the system to make such analyses, processes for
diagnosis and decision making regarding the reliability
and importance of the information are needed. This paper
shows how machine learning algorithms can be combined
with decision theory w.r.t. vague and numerically
imprecise background information, by using classifiers.
An ensemble is a classifier created by combining the
predictions of multiple component classifiers. We present
a new method for combining classifiers into an ensemble
based on a simple estimation of each classifier's
competence. The purpose is to develop a filter for
handling complex alarm situations. Decision situations are
evaluated using fast algorithms developed particularly for
solving these kinds of problems. The presented framework
has been developed in co-operation with one of the main
actors in the Swedish power plant industry.
Introduction
Process control in today's power plants is most often
handled by a computer system, which ensures that the
process is kept within given limits and that the objects in
the plant (pumps, engines, valves, etc.) are stable w.r.t.
directions given by operators or the control system. When
deviations from the normal situation occur, the computer
system sends alarm messages to the operators, which
pursue the necessary actions, based on given recommendations or their own experience. When large deviations
occur, the safety routines in the control system stop some
or all of the subsystems in the plant.
Most of the actions taken within the plant are directed
by the control systems, which usually are computerised
and located in a central control room within the power
plant. In normal situations, the level of communication
between the control system and the operators is
satisfactory, with few alarms occurring infrequently.
However, when a large disturbance occurs, the
communication is far from satisfactory. For example,
after a short power failure, the control system may stop
pumps and engines and the position of valves may
change. In this situation, the safety routines normally start
to interfere, and a huge stream of alarm messages will
follow. Instead of being helped by the messages, the
operators become swamped by all information, and they
often have to make more or less informed guesses of what
causes the abnormal situation. It would therefore be of
great help to the operators if the control system could
discriminate between normal and abnormal situations. In
the former case, the sensitivity of the control system
would be as today, resulting in few alarms being sent
every now and then, while in the latter case, the system
would be less sensitive and give priority to some alarms
that are to be sent to the operators. In order for the system
to make such analyses, processes for diagnosis and
decision making regarding the reliability and importance
of the information are needed.
This paper describes a model which have been developed for use in a large Swedish power plant, which, by
using machine learning and reliability analysis, can evaluate the classified alarms according to credibilities that
depend on the current situation. Furthermore, when a
large disturbance has occurred, the system should be able
to sort the alarms in an order that eases the recovery of
the plant.
To efficiently address the problem space indicated
above, techniques from different AI sub-fields were
employed. From the field of machine learning, techniques
for creating models from observations were applied. ML
techniques have been used for finding rules to be used for
classifying a situation as normal or not, as well as for
finding rules that can help the control system to make
priorities among the alarms. The techniques for reliability
analysis will be highly relevant as the control system is
faced with a complex decision problem when
encountering abnormal situations. These models were
used to describe or analyse new observations, e.g.
suggesting how to act in certain situations, or to make
predictions, e.g. suggesting what to expect in certain
situations. Computational methods for decision analysis is
another important subject area. Algorithms for
computational decision analysis offer promising solutions
to the problem of combining different classifiers when the
information at hand is imprecise.
Classifiers
A popular method for creating an accurate classifier from
a set of training data is to train several different classifiers
on the training data, and then to combine the predictions
of these classifiers into a single prediction [Breiman,
1996; Drucker et al., 1994; Wolpert, 1992]. The resulting
classifier is generally referred to as an ensemble because
it is made up of component classifiers. In this paper we
present a novel method for combining the predictions of
several classifiers. For each classifier we derive an
interval prediction and a confidence interval associated
with it.
Figure 1
A number of researchers have demonstrated that
ensembles are generally more accurate than any of their
component classifiers [Breiman, 1996; Clemen, 1989;
Quinlan, 1996; Wolpert, 1992; Zhang et al., 1992]. Using
an ensemble, the class of an example is predicted by first
classifying the example with each of the component
classifiers and then combining the resulting predictions
into a single classification. To create an ensemble, a user
generally must focus on two aspects: (1) which classifiers
to use as components of the ensemble; and (2) how to
combine their individual predictions into one. Hansen and
Salamon [1990] demonstrated that under certain
assumptions, the accuracy of an ensemble increases with
the number of classifiers combined. For each example
where the average error rate is less than 50% for the
distribution of possible classifiers, they show that in the
limit the expected error on that example can be reduced to
zero. Of course, since not all patterns will necessarily
share this characteristic (e.g., outliers may be predicted at
more than 50% error), the error rate over all the patterns
cannot necessarily be reduced to zero. But if we assume a
significant percentage of the patterns are predicted with
less than 50% average error, gains in generalisation will
be achieved. A key assumption of Hansen and Salamon's
analysis is that the classifiers combined should be
independent in their production of errors. Krogh and
Vedelsby [1995] expanded on this notion to show that the
error for an ensemble is related to the generalisation error
of the classifiers plus how much disagreement there is
between the different classifiers. Thus much research on
selecting appropriate classifiers to combine has focused
on selecting classifiers that are accurate in the predictions,
but differ in where they are accurate. Methods for
approaching this problem include using different
classification methods, training on subsets of the data set,
training on different sets of input features, and using
different subsets of the training set for training the
classifiers [Breiman, 1996; Drucker et al., 1994; Hansen
and Salamon, 1990; Hashem et al., 1994; Krogh and
Vedelsby, 1995; Maclin and Shavlik, 1995].
The second aspect of creating an ensemble is the choice
of the function for combining the predictions of the
component classifiers [Kearns and Seung, 1995].
Examples of combination functions include voting
schemes [Hansen and Salamon, 1990], simple averages
[Lincoln and Skrzypek, 1989], weighted average schemes
[Perrone and Cooper, 1994; Rogova, 1994], and schemes
for training combiners [Rost and Sander, 1993; Wolpert,
1992; Zhang et al., 1992]. Clemen [1989] demonstrated
that in the absence of knowledge concerning a specific
problem, almost any reasonable method, including the
simple ones such as voting or using a weighted average,
would result in an effective ensemble.
Our approach can be thought of as being related to the
hierarchical mixture of experts approach [Jacobs et al.,
1991; Jordan and Jacobs, 1994; Nowlan and Hinton,
1990]. In a mixture of experts approach a group of
sub-classifiers is trained so that each sub-classifier will
become an expert on a different portion of the input
space. Our approach differs in that our training
mechanism is simpler (each classifier simply trains on the
entire problem) but as a result our component classifiers
may have significant overlap in their expertise. Instead of
having each classifier pointing out a specific class, or set
of classes, we allow the classifiers to distribute their belief
over classes by allowing interval beliefs and requiring the
distributed beliefs to be normalised (i.e. sum to unity).
A General Method for Combining Classifiers
Consider a scenario where a power plant operator is in
charge of running a highly complex power generating
process. The control system is partly based on a rulebased alarm system. When alarms occur, the operator,
which may be human as well a software agent, needs to
respond, either by ignoring the alarm or by some action
based on the current state of the plant. See the event chain
in Figure 2.
More formally, the operator faces a situation involving
a choice between a finite set of strategies Si (e.g. actions)
having access to a finite set of triggered alarms Ai.
In a situation modelled as above, some alarms may be
more reliable than others. The operator may have access
to assessments expressing the credibility of the different
alarms.
Figure 2
From this information, the operator is determined to
evaluate the strategies given the individual alarms and
their relative credibilities. However, for the operator to
carry out his tasks and to acquire sufficient and reliable
knowledge, it is fundamental that he is able to evaluate
information gathered from different alarms, some
unreliable and some noisy. The dynamic adaptation taking
place over time as the operator interacts with his
environment, and with the other alarms, is affected by the
means available to assess and evaluate imprecise
information. The operator may rank the credibilities of the
different classifiers as well as quantifying them in
imprecise terms. The classifiers have a similar
expressibility regarding their respective reports about the
classes under consideration. We will consider the problem
as a multi-criteria decision problem.
Aggregation of utility functions under a variety of criteria is investigated in the area of Multi Attribute Utility
Theory (MAUT) [Fishburn, 1970; Keeney and Raiffa,
1976; Keeney, 1992]. A number of techniques used in
MAUT have been implemented using computer programs
such as SMART [Edwards, 1977] and EXPERT CHOICE,
the latter which is based on the widely used AHP [Saaty,
1977; Saaty, 1980; Saaty, 1982]. AHP has been criticised
in a variety of respects [Belton and Gear, 1983; Watson
and Freeling, 1982; Watson and Freeling, 1983] and
models using geometric mean value techniques have been
suggested instead [Barzilai et al, 1987; Krovac et al,
1987]. Techniques based on the geometric mean value
have, for instance, been implemented by Lootsma and
Rog in REMBRANDT [Lootsma, 1993].
All these approaches have their advantages, but the
requirement to provide numerically precise information
sometimes seems to be unrealistic in real-life decisions
situations such as alarm classification, and a number of
models with representations allowing imprecise
statements have been suggested. For instance, [Salo and
Hämäläinen, 1984] extends the AHP-method in this
respect and also make use of structural information when
the alternatives are evaluated into overlapping intervals.
The system ARIADNE [Sage and White, 1984] also allows
the decision maker to use imprecise estimates, but does
not discriminate between alternatives when these are
evaluated into overlapping intervals. Fuzzy set theory is a
more widespread approach to relaxing the requirement of
numerically precise data by providing a more realistic
model of the vagueness in subjective estimates of
probabilities, weights, and values [Chen and Hwang,
1992; Lai and Hwang, 1994]. These approaches allow,
among other features, the decision-maker to model and
evaluate a decision situation in vague linguistic terms.
The methods we propose herein originate from earlier
work on handling probabilistic decision problems involving a number of alternatives and consequences when the
background information is vague or numerically
imprecise [Danielson and Ekenberg, 1998; Ekenberg et al,
1996; Malmnäs, 1994]. The aim of this paper is to
generalise the work into the realm of multiple criteria
decision aids, but still conform to classical statistical
theory rather than to fuzzy set theory. By doing so, we try
to avoid problems emanating from difficulties in
providing set membership functions and in defining set
operators having a satisfying intuitive correspondence.
Evaluating Information from different
Classifiers
As was mentioned above, a significant feature of the
framework is that it allows for situations where numerically imprecise or comparative sentences occur. These
sentences are represented in a numerical format and with
respect to this the situations can be evaluated using a
variety of decision rules. The further discriminating
analyses try to show which parts of the given information
are the most critical and must be given extra careful
consideration.
Representation
For instance, the credibility of a classifier is represented
by weight estimates. Such estimates can be represented by
linear constraints and we treat two classes of these:
interval sentences, and comparative sentences (cf.
[Ekenberg and Danielson, 1994]).
Interval sentences are of the form: “The weight of classifier i lies between the numbers ai and bi” and are translated into wi ∈ [ai,bi]. Comparative sentences are of the
form: “The importance of classifier i is greater than the
importance of classifier j”. Such a sentence is translated
into an inequality wi ≥ wj. Each statement is thus represented by one or more constraints. We call the
conjunction of constraints of the types above, together
with the normalisation constraint Σi≤n wi = 1, the
credibility base (S).
The situation base (K) consists of similar translations
of imprecise belief estimates. A situation base with n
classifiers and m classes is expressed in belief variables
{u11,...,u1n,...,um1,...,umn} stating the belief in the classes
of situations according to the different classifiers. The
term uij denotes the belief in situation Si with respect to
classifier cj. The collection of weight and belief
statements constitutes the information frame. It is
assumed that the variables’ respective ranges are real
numbers in the interval [0,1]. Below, we will refer to an
information frame as a structure ⟨S, K ⟩.
Example: A decision-maker gives assessments
concerning the situations for a risk policy of a company.
The objective of the diagnosis is to determine the cause of
a set of alarms. The possible situations1 (classes) are
S1:detector malfunction, S2:fire in main generator, and
S3:blown fuse in electric circuit. Assume that the
diagnosis is supposed to be given with respect to the
classifiers C1 and C2. The beliefs involved could, for
example, be given in numbers. In that case, they are
linearly transformed to real values over the interval [0,1].
For instance, the assessments with respect to classifiers
C1 could be the following:
• The belief in situation S1 is
between 0.20 and 0.50.
• The belief in situation S2 is
between 0.20 and 0.60.
• The belief in situation S3 is
between 0.40 and 0.60.
• The belief in situation S2 is at
least 0.10 stronger than that of
S1.
Similar belief assessments can be asserted with respect to
C2.
Moreover, the weights of C1 and C2 may be estimated
as numbers in the interval [0,1]. The number 0 denotes the
lowest credibility and 1 the highest. Thus, the assessments
about the classifiers could be:
• Classifier C2 is at least as
credible as C1
• The credibility of classifier C1 is
between 0.30 and 0.70
One further reason for allowing interval as well as comparative assessments is that the background information
may have different sources. For instance, intervals may
arise naturally from aggregated quantitative information
whereas qualitative analyses often result in comparisons.
Since the sources may be different, the assessments may
not necessarily be consistent with each other.
1 The possible situations in the system analysed is defined
by logic schemata which define the relationship between
different alarm scenarios and possible causes.
The belief estimates with respect to C1 are translated into
the following expressions.
u11 ∈ [0.20, 0.50] u21 ∈ [0.20, 0.60]
u31 ∈ [0.40, 0.60] u21 ≥ u11 + 0.10
The credibility of C1 and C2 are also represented as
numbers in the interval [0,1], and the translation of the
assessments above results in the following expressions.
w2 ≥ w1 w1 ∈ [0.30, 0.70]
Aggregations
One reasonable candidate for an aggregation principle
could be based on a weighted sum of the beliefs and the
following notation will be used to define this with respect
to an information frame representing n classifiers and m
situations:
Definition: Given an information frame ⟨S, K⟩, the
global belief G(Si) of a situation Si is G(Si) = Σk≤n
wk· uik, where wk and uik are variables in K and
S, respectively.
Definition: Given an information frame ⟨S, K⟩, the
difference in global belief δij between two
situations Si and Sj are δij = G(Si) – G(Sj) = Σk≤n
wk · (uik – ujk), where wk, uik, and ujk are
variables in S and K, respectively.
Definition: Given an information frame ⟨S, K⟩, let
a and b be two vectors of real numbers (a1,...,an)
and (b11,...,bmn) respectively. Then define abG(Si)
= Σk≤n ak · bik, where ak and bik are numbers
substituted for wk and uik in G(Si)). Similarly,
define abdδij = abG(Si) – adG(Sj).
With respect to these definitions, we can, for instance,
express the concept of admissibility in the sense of
[Lehmann, 1959]. The concept of admissibility is
computationally meaningful in our framework as
demonstrated in [Danielson and Ekenberg, 1998].
However, the admissibility often seems to be too weak to
form a decision rule by itself, and in [Danielson and
Ekenberg, 1998; Ekenberg et al, 1997] we introduce a
variety of discriminating principles in the case of
decisions under risk.
Furthermore, in non-trivial decision situations, when an
information frame contains numerically imprecise information, the principles suggested above are sometimes too
weak to yield a conclusive result. A way to refine the
analysis is to investigate how much the different intervals
can be contracted before an expression such as δij > 0
ceases to be consistent. This contraction avoids the
complexity inherent in combinatorial analyses, but it is
still possible to study the stability of a result by gaining a
better understanding of how important the interval
boundary points are. By co-varying the contractions of an
arbitrary set of intervals, it is possible to gain much better
insight into the influence of the structure of the
information frame on the solutions. Contrary to volume
estimates, contractions are not measures of the sizes of the
solution sets but rather of the strength of statements when
the original solution sets are modified in controlled ways.
Both the set of intervals under investigation and the scale
of individual contractions can be controlled.
Consequently, a contraction can be regarded as a focus
parameter that zooms in on central sub-intervals of the
full statement intervals.
Concluding Remarks
We have demonstrated how a set of vague and
numerically imprecise information can be used to
evaluate critical situations with respect to a set of
classifier. The approach considers this problem with
respect to the different classifiers as well as the analysis
of the different situations involved. These aspects are
modelled into information frames consisting of systems of
linear expressions stating inequalities and interval
assessments. The situations may be evaluated relative to a
variety of principles, for example generalisations of the
principle of maximising the expected utility. Contractions
are introduced as an automated sensitivity analysis. This
concept allows us to investigate critical variables and the
stability of the results. It should also be noted that the
model can be generalised to integrate other aspects such
as costs, utilities and probabilities into the framework and,
by this, taking into account a more general picture of
aspects involved in these kinds of decision situations.
References
Barzilai, J., Cook, W., and Golany, B. 1987. “Consistent
Weights for Judgements Matrices of the Relative
Importance for Alternatives,” Operations Research
Letters, vol. 6, pp. 131–134.
Belton, V., and Gear, A. E. 1983. “On a Shortcoming on
Saaty's Method of Analytical Hierarchies,” OMEGA, vol.
11, pp. 227–230.
Breiman, L. 1996. Bagging predictors. Machine Learning,
vol. 24(2) pp. 123–140.
Chen, S-J., and Hwang, C-L. 1992. Fuzzy Multiple
Attribute Decision Making, vol. 375, Springer-Verlag.
Clemen, R. 1989. Combining forecasts: A review and
annotated bibliography. International Journal of
Forecasting, vol. 5 pp. 559–583.
Danielson, M., and Ekenberg, L. 1998. “A Framework for
Analysing Decisions under Risk,” European Journal of
Operational Research, vol. 104/3.
Drucker, H., Cortes, C., Jackel, L., LeCun, Y., and
Vapnik, V. 1994. Boosting and other machine learning
algorithms. In Proceedings of the Eleventh International
Conference on Machine Learning, pages 53–61, New
Brunswick, NJ, July 1994. Morgan Kaufmann.
Edwards, W. 1977. “How to Use Multiattribute Belief
Measurement for Social Decisionmaking,” IEEE
Transactions on Systems, Man, and Cybernetic, SMC-7:5,
pp. 326–340.
Ekenberg, L., and Danielson, M. 1994. “A Support
System for Real-Life Decisions in Numerically Imprecise
Domains,” Proceedings of the International Conference
on Operations Research ’94, pp. 500–505, SpringerVerlag.
Ekenberg, L., Danielson, M., and Boman, M. 1996.
“From Local Assessments to Global Rationality,”
International Journal of Intelligent and Cooperative
Information Systems, vol. 5, nos. 2 & 3, pp. 315–331.
Ekenberg, L., Danielson, M., and Boman, M. 1997.
“Imposing Security Constraints on Agent-Based Decision
Support,” Decision Support Systems International
Journal, vol.20.
Fishburn, P. C. 1970. Belief Theory for Decision Making:
John Wiley and Sons.
Hansen, L., and Salamon, P. 1990, Neural network
ensembles. IEEE Transactions on Pattern Analysis and
Machine Intelligence, 12:993–1001.
Hashem, S., Schmeiser, B., and Yih, Y. 1994. Optimal
linear combinations of neural networks: An overview. In
Proceedings of the 1994 IEEE International Conference
on Neural Networks, Orlando, FL.
Jacobs, R., Jordan, M., Nowlan, S., and Hinton, G. 1991.
Adaptive mixtures of local experts. Neural Computation,
vol 3, pp. 79–87.
Jordan, M., and Jacobs, R. 1995. Hierarchical mixtures of
experts and the EM algorithm. In Proceedings of the
International Conference on Artificial Neural Networks,
London, 1994, vol. 18 pp. 255–276.
Kearns, M., and Seung, H. 1995. Learning from a
population of hypotheses. Machine Learning vol. 18, pp.
255-276.
Keeney, R., and Raiffa, H. 1976. Decisions with Multiple
Objectives: Preferences and Value Trade-offs: John Wiley
and Sons.
Keeney, R. L. 1992. Value-Focused Thinking: A Path to
Creative Decision Making: Harvard University Press.
Krogh, A., and Vedelsby, J. 1995. Neural network
ensembles, cross validation, and active learning. In G.
Tesauro, D. Touretzky, and T. Leen, editors, 1995,
Advances in Neural Information Processing Systems,
vol. 7, Cambridge, MA. MIT Press.
Krovak, J. 1987. “Ranking Alternatives–Comparison of
Different Methods Based on Binary Comparison
Matrices,” European Journal of Operational Research,
vol. 32, pp. 86–95.
Lai, Y-J., and Hwang, C-L. 1994. Fuzzy Multiple
Objective Decision Making, vol. 404, Springer-Verlag.
Lehmann, E. L. 1959. Testing Statistical Hypothesis, John
Wiley and Sons.
Lincoln, W., and Skrzypek, J. 1989. Synergy of clustering
multiple back propagation networks. In D. Touretzky,
editor, Advances in Neural Information Processing
Systems, vol. 2, pp. 650–659, San Mateo, CA, Morgan
Kaufmann.
Lootsma, F. A. 1993. “Scale Sensitivity in the
Multiplicative AHP and SMART,” Journal of MultiClassifiers Decision Analysis, vol. 2, pp. 87–110.
Maclin, R., and Shavlik, J. 1995. Combining the
predictions of multiple classifiers: Using competitive
learning to initialize neural networks. In Proceedings of
the Fourteenth International Joint Conference on Artificial
Intelligence, Montreal, Canada, September.
Malmnäs, P-E. 1994. “Towards a Mechanization of Real
Life Decisions,” in Logic and Philosophy of Science in
Uppsala, Prawitz and Westerståhl, Eds.: Kluwer
Academic Publishers.
Nowlan, S., and Hinton, G. 1990. Evaluation of adaptive
mixtures of competing experts. In R. Lippmann, J.
Moody, and D. Touretzky, editors, Advances in Neural
Information Processing Systems, vol. 3, San Mateo, CA,
Morgan Kaufmann.
Perrone, M., and Cooper, L. 1994. When networks
disagree: Ensemble method for neural networks. In R. J.
Mammone, editor, Artificial Neural Networks for Speech
and Vision. Chapman and Hall, London.
Quinlan, J. R. 1996. Bagging, boosting, and c4.5. In
Proceedings of the Thirteenth National Conference on
Artificial Intelligence.
Rogova, G. 1994. Combining the results of several neural
network classifiers. Neural Networks, vol. 7, pp. 777–781.
Rost, B., and Sander, C. 1993. Prediction of protein
secondary structure at better than 70% accuracy. Journal
of Molecular Biology, 232:584–599.
Saaty, T. L. 1977. “A Scaling Method for Priorities in
Hierarchical Structures,” Journal of Mathematical
Psychology, vol. 15, pp. 234–281.
Saaty, T. L. 1980. The Analytical Hierarchy Process:
McGraw-Hill.
Saaty, T. L. 1982. Decision Makers for Leaders: Van
Nostrand Reinhold.
Sage, A. P., and White, C. C. 1984. “ARIADNE: A
Knowledge-Based Interactive System for Planning and
Decision Support,” IEEE Transactions, SMC-14:1.
Salo, A. A., and Hämäläinen, R. P. 1995. “Preference
Programming through Approximate Ratio Comparisons,”
European Journal of Operational Research, vol. 82, pp.
458–475.
Watson, S. R., and Freeling, A. N. S. 1982. “Assessing
Attribute Weights,” OMEGA, vol. 10, pp. 582–583.
Watson, S. R., and Freeling, A. N. S. 1983. “Comments
on: Assessing Attribute Weights by Ratio,” OMEGA, vol.
11, pp. 13.
Wolpert, D. 1992. Stacked generalisation. Neural
Networks, vol. 5, pp. 241–259.
Zhang, X., Mesirov, J., and Waltz, D. 1992. Hybrid
system for protein secondary structure prediction. Journal
of Molecular Biology, vol. 225, pp. 1049–1063.
Download