Decision-Theoretic Planning for Anticipating and Troubleshooting

From: AAAI Technical Report SS-94-06. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
Decision-Theoretic
Planning for
Anticipating
and Troubleshooting
Faults
Paul
Department
O’Rorke (ororke~ics.uci.edu)
of Information
and Computer
University of California,
Irvine
Irvine,
CA 92717-3425 USA
Introduction
Science
qualitative reasoning methods are used to predict normal behaviors, consequences of actions and faults, and
Problems associated with anticipating and troubleshootto decide trade-offs. The following summaryprovides a
ing faults arise during the design and manufacture of
brief overview of our research, descriptions of our methcomplex devices and systems, in preparing to deploy
ods and results, and pointers to relevant papers that
them, and in operations subsequent to deployment.
contain more information.
Prior to deployment,it is often useful to anticipate faults
In early work [El Fattah and O’Rorke, 1991] we develin the components of a system, to determine their conoped
several architectures integrating model-based disequences, to assess the associated risks, and to propose
agnosis
and explanation-based learning. Wefound that
and decide upon actions that reduce the risks. This colprecompilation or "learning in advance" is often much
lection of problems is called "Failure Modesand Effects
more efficient than the traditional EBLtechnique for
Analysis" (FMEA). Whenmalfunctions occur after de"learning while doing" in the context of model-based diployment, it is important to assess the situation and to
agnosis
of multiple faults. But precompilation is only
recommendactions that will help determine the causes
feasible for relatively small devices, so we developed
and minimize negative impacts. This is called "Failure
methods for trading accuracy for improvements in effiDetection, Isolation, and Recovery" (FDIR).
ciency [El Fattah and O’Rorke, 1992, 1993a]. In one arDecision-theoretic concepts like likelihood, value, and
chitecture, the user prespecifies a desired accuracy rate.
expected utility are frequently useful in anticipating and
The method employed by the system uses (learned) astroubleshooting faults. These concepts break down barsociational knowledge unless the accuracy dips below
riers and unify topics that have been identified as septhe desired threshold; then performance is considered to
arate areas in previous work. For example, FMEAand
be unsatisfactory and the system switches from associFDIR can be seen to have much in common, although
ational to model-based reasoning until accuracy returns
they have been pursued as independent topics in previto the desired range. One can get dramatic improveous research and development. In FMEA,anticipating
ments in efficiency using this approach with a relatively
potential risks involves assessing the likelihoods of faults
small loss in accuracy. This is because a small number
and the costs associated with their effects. Recommend- of rules tend to cover problems that occur frequently.
ing actions to reduce the risks ought to take into account
More recently, we have developed a new architecthe costs of the actions and the likelihood that they will
ture called the Generic Troubleshooting System (GTS)
succeed. In FDIR,prior probabilities of faults are useful
that integrates diagnosis, repair planning, execution, and
in determining the most likely explanations of abnormal
learning. GTSexploits information about the likelibehavior, and the main goal is to generate plans for gathhoods of failures and the costs Of probes and replaceering more information about the fault and for fixing or
ments [El Fattah and O’Rorke, 1993b,c]. Experimental
working around it. The costs of information gathering
results showthat the system -- which uses expected utilprobes and tests ought to be taken into account and
ity and the value of information -- does well as compared
the value of the information they provide ought to be
with systems based on previous information-theoretic
weighedin terms of savings in repair or recovery costs.
and purely probabilistic methods. It is often quicker at
So, even within FDIR, problems such as diagnosis and
producing
less expensive information-gathering and rerepair planning that have been viewed as separable probpair plains. During diagnosis and repair planning, GTS
lems can be seen to have much in common.
learns prediction macros and action rules that form reentrant
decision tree structures. Experimental results
Research Summary
show that the learning techniques incorporated in GTS
In collaboration with students and colleagues, I have deeffectively provide speed-up and take good advantage of
non-uniform distributions of faults.
veloped methods that support decision-making and planning relevant to anticipating and troubleshooting faults.
Wehave also developed methods supporting diagnosThe methods employ decision-theoretic
techniques for
tic decision-making including qualitative and order-ofevaluating situations and actions. Model-based and
magnitude reasoning and EBLmethods. An architecture
289
called AbMaL(short for Abductive Macro Learner)
[O’Rorke, to appear], uses abduction and explanationbased learning to acquire information including expert’s
preferences and assessments of likelihoods. Computational experiments showthat a coarse qualitative reasoning system [O’Rorke et al., 1993b] that can be learned
using EBL[O’Rorke et al., 1993a] covers a disproportionately large number of important decision problems.
Mathematical analysis confirms and explains these experimental results -- a more sophisticated method based
on order-of-magnitude reasoning should make it possible
to improve them [O’Rorke and El Fattah, 1993].
In contrast to FDIR, work on AI methods for partially automating FMEAhas just begun [O’Rorke, 1992;
Wirth and O’Rorke, 1993]. AI methods for predictive
simulation -- from simple forward chaining to modelbased qualitative reasoning -- have been applied with
some success to the problem of determining consequences
of component failures [O’Rorke, 1993]. An important
problem is that the number of possible FMEAscan be
very large for complex systems, especially if multiple
faults are considered. I have developed a method for
FMEAbased on a decision-theoretic
extension of David
Poole’s Probabilistic
Horn Abduction (PHA). The new
method which I call Decision-Theoretic Horn Abduction
(DTHA)[O’Rorke, 1994], has the advantage that it focuses on the most important explanations (e.g., the ones
with the smallest expected utility). This method is relevant to FMEAbecause FMEAis most concerned with
the most likely failures and the most costly consequences.
Future Work
An important FMEAproblem that has not yet been targeted for computerization is the problem of generating
suggested actions aimed at reducing risks. This is a good
target for AI research on decision-theoretic planning.
An important problem associated with FDIR is that
diagnosis has been considered in isolation, and this has
lead to diagnosis methods that do not work well in practice within the larger FDIRpicture. For example, most
AI work on diagnosis presumes that faults must be localized before repair actions can be taken. But it is obviously not worthwhile to determine the exact location of a
fault when it makes no difference with respect to the repair actions available. (Determining exactly which gate
is faulty in an integrated circuit maynot be useful if the
entire chip must be replaced in order to fix the fault.) Superior troubleshooting methods will be developed (and
indeed already are being developed at several sites) using
different forms of decision-theoretic planning.
In the future, more work is also needed that integrates AI approaches to FMEA,model-based diagnosis,
and planning-based approaches to repair. Most work on
FDIR and MBDhas been done independently of FMEA,
but it seemsclear that this approach will lead to wasteful
duplication. An integrated approach will exploit overlap
and eliminate duplication. For example, one model and
one simulation system can be used for both FMEAand
MBD/FDIR.One decision-theoretic
planner can be used
to find actions that can reduce the costs of failures before they occur (in FMEA)and after they have occurred
290
(in MBDand FDIR). Only the knowledge provided
the planner, for example about actions available in these
situations, needs to be different.
Relevant Papers
[El Fattah and O’Rorke, 1991] Y. E1 Fattah and P.
O’Rorke. Learning multiple fault diagnosis. In Proceedings of the Seventh IEEE Conference on Artificial Intelligence Applications, pages 235-239, Los
Alamitos, CA, 1991, IEEE Computer Society Press.
[El Fattah and O’Rorke, 1992] Y. E1 Fattah and P.
O~Rorke. Learning approximate diagnosis. In Proceedings of the Eighth IEEE Conference on Artificial Intelligence Applications, pages 150-156, Los
Alamitos, CA, 1992, IEEE Computer Society Press.
[El Fattah and O’Rorke, 1993a] Y. E1 Fattah and P.
O’Rorke. Explanation-based learning for diagnosis.
Machine Learning, 13(1): 35-70, 1993.
[El Fattah and O’Rorke, 1993b] Y. El Fattah and P.
O’Rorke. GTS: Improved troubleshooting via learning and Utility evaluation. In The Fourth International Workshop on Principles of Diagnosis, pages
187-200, Aberystwyth, Wales, 1993.
[El Fattah and O’Rorke, 1993c] Y. E1 Fattah and P.
O’Rorke. Utility of action in model-based troubleshooting. In Computing in Aerospace 9, pages
406-414, Washington, DC, 1993, AIAA.
[O’Rorke, 1992] P. O’Rorke. Failure modes and effects
analysis: Opportunities for automation. 1992.
[O’Rorke, 1993] P. O’Rorke. The Wales FMEAproject:
Summaryand evaluation. 1993.
[O’Rorke, 1994] P. O’Rorke. Focusing on the most important explanations: Decision-theoretic Horn abduction. 1994.
[O’Rorke, to appear] P. O’Rorke. Abduction and
explanation-based learning: Case studies in diverse
domains. Computational Intelligence, 10(4).
[O’Rorke and El Fattah, 1993] P. O’Rorke and Y.
E1 Fattah. Qualitative and order of magnitude reasoning about diagnostic decisions. In The Fourth
International Workshopon Principles of Diagnosis,
pages 136-148, Aberystwyth, Wales, 1993.
[O’Rorke et al., 1993a] P. O’Rorkeand Y. E1 Fattah and
M. Elliott. Explaining and generalizing diagnostic
decisions. In Machine Learning: Proceedings of the
Tenth International Conference, pages 228-235, San
Mateo, CA, 1993, Morgan Kaufmann.
[O’Rorke et al., 1993] P. O’Rorkeand Y. El Fattah and
M. Elliott. Explanation-based learning for qualitative decision-making. In N. Piera-Carretd and M.
G. Singh, Ed. Qualitative Reasoning and Decision
Technologies. pages 409-418. CIMNE,Barcelona,
1993.
[Wirth and O’Rorke, 1993] R. Wirth and P. O’Rorke.
Representing and reasoning about functions for failure modesand effects analysis. In Working Notes of
the AAAI-93 Workshop on Reasoning about Function, Washington, DC, 1993.