Reasoning with Incomplete Knowledge Using

advertisement
Reasoning with Incomplete Knowledge Using
de Finetti's Fundamental Theorem of Probability:
Background and Computational Issues*
Tracy Myerst Robert M. Freund,land Gordon M. Kaufman§
Sloan WP #3990
November 1997
*This paper is based on Tracy Myers's Ph. D. thesis, Reasoning with Incomplete ProbabilisticKnowledge: The RIP Algorithm
for de Finetti's Fundamental Theorem of Probability, (Department of Brain and Cognitive Sciences, MIT (1995)).
tSchlumberger Oilfield Services, Austin Prodcut Center
tProfessor of Operations Research, Sloan School of Management, Massachusetts Institute of Technology.
§Professor of Operations Research and Management, Sloan School of Management, Massachusetts Institute of Technology.
Professor Kaufman's work was supported by the Air Forces Office of Scientific Research, Contract #AFOSR-F49620-1-0307.
Reasoning with Incomplete Knowledge
Using de Finetti's Fundamental Theorem of Probability:
Background and Computational Issues*
Tracy Myerst Robert M. Freund, and Gordon M. Kaufmantt
November 18, 1997
Draft
1
Introduction
What probabilistic features are shared by a nuclear power plant, the space shuttle, flyby-wire aircraft and DARPA net? Answer: all are very large, logically complex systems that
must operate at exceptionally high levels of reliability, but are subject to failure at uncertain
times and in uncertain ways. A principle goal of modern reliability analysis is to design
methods for appraising the reliability of such systems. Current thinking is that baseline
requirements for such an analysis are a probabilistic model of the stochastic behavior of a
complex system, augmented with numerical appraisal of at least some properties of the joint
*This paper is based on Tracy Myers's Ph. D. thesis, Reasoning with Incomplete ProbabilisticKnowledge: The RIP Algorithm
for de Finetti's Fundamental Theorem of Probability, (Department of Brain and Cognitive Sciences, MIT (1995)).
tSchlumberger Oilfield Services, Austin Product Center
tProfessor of Operations Research, Sloan School of Management, Massachusetts Institute of Technology.
ttProfessor of Operations Research and Management, Sloan School of Management, Massachusetts Institute of Technology.
Professor Kaufman's work was supported by the Air Force Office of Scientific Research, Contract #AFOSR-F49620-1-0307.
1
probability law that the model entails. As White [1] points out:
"The reliability goals [of such systems] are too high to be established by natural
life testing which means the probability of system failure must be computed from
mathematical models that capture the essential elements of fault occurrence and
system fault recovery" 1
The process of stochastic modeling of a complex system often begins with a formal statement of the primitive postulates that govern the behavior of the system.
Postulates or
assumptions may embrace both the logical structure of observable events and a description
of a probability law that identifies the probability of occurrence of all logically possible system
events. If the physics of the system are properly modeled, a well posed mathematical model
emerges that may reasonably be called a " fully specified" model; i.e. the probability of any
observable event is, in principle, computable if both initial conditions (inputs) and all model
parameters are either known with certainty or assigned a priori probability distributions.
When the system is large, construction of a fully specified model may be a formidable task
and, on occasion, an impossible one. Even if a fully specified model can be constructed, it
may be too complex to efficiently compute event probabilities. The tension between realism
and computability never disappears. Global climate change models are good examples. 2
A very different approach to modeling the probabilistic behavior of a complex system is
to model the structure of the system so as to identify all logically obtainable events and to
not model a probability law that governs event occurrence. In place of assigning a specific
probability law to a sample space that contains all obtainable system events, ask system
experts to assign numerical probabilities to occurrences of some events that lie within their
domains of expertise. This recasts the problem and changes the nature and the focus of the
1White (1986) shows that system recovery can be adequately described by its first two moments when faults occur independently at a low constant rate, the system quickly recovers from all faults and fault recoveries are independent semi-Markov
processes.
2See Valverde (1997) Uncertain Inference, Estimation and Decision Making in Integrated Assessments of Global Climate
Change for example.
2
computational task.
For a fully specified model, the task is:
Given initial conditions, a joint probability law for events and either certain or
probabilistic knowledge of model parameters, calculate the probability of occurrence of one or more critical events.
The task for the alternative is:
Given the logical structure of obtainable events and some numerical assessments of
probabilities of these events, compute bounds on the probabilities of one or more
critical events that are not directly assessed. 3
Any method for processing complex system probability assessments should:
* Allow determination of the coherence or incoherence of assessments,
* Enable computation of coherent bounds on probabilities of events not directly assessed,
* Allow for efficient revision of bounds in light of additional information in the form of
expert judgment or in the form of observation of the occurrence of an event or its
complement. It must also allow for efficient revision when additional assessments are
provided.
* Not contradict Bayesian conditionalization, 4
* Be computationally tractable for realistic problems of moderate to large size, and
* Should be based on sound assumptions about qualitative probabilistic structure assigned
to uncertain system events.
Bayesian conditionalization is the linchpin of Bayesian inference. It means that revision
of probability judgments in light of new information should be done in accord with Bayes'
Theorem. Pearl (1988) says three principles characterize Bayesian reasoning:
* Reliance on a complete probabilistic model of observable quantities,
* Willingness to incorporate subjective judgments as an expedient substitute for empirical
data, and
* The use of probabilistic conditionalization [Bayes' Theorem] for updating beliefs in the
light of new observations.
He offers the advice that:
3
0f course, a hybrid of both approaches is possible. We have chosen descriptive extremes to emphasize differences between
reasoning with incomplete knowledge through the lense of de Finetti's Fundamental Theorem of Probability and more traditional
approaches to modeling of stochastic systems.
4
Pearl (1988) defines Bayesian conditionalization as follows: "Probability theory adopts the autoepistemic phrase "...'given
that what I know is C' as a primitive of the language. Syntactically, this is denoted by placing C behind the conditioning bar
in a statement such as P(A I C) = p. This statement combined the notions of knowledge and belief by attributing to A a
degree of belief p, given the knowledge of C. C is also called the context of the belief in A, and the notation P(A I C) is called
Bayes conditionalization. Thomas Bayes (1702-1761) made his main contribution to the science of probability by associating
the English phrase' .. .given that I know C' with the now-famous formula P(A C) = P(A, C)/P(C) [Bayes 1763], which has
become a definition of conditional probabilities. It is by virtue of Bayes conditionalization that probability theory facilitates
nonmontonic reasoning, i.e., reasoning involving retraction of previous conclusions."
3
To fully appreciate the role of the Bayesian approach in artificial intelligence, one
should consider situations in which we do not have complete information to form a
probability model. In such situations the Bayesian strategy encourages the agent
to complete the model by making reasonable assumptions, rather than abandoning
the useful device of conditionalization. 5
Certainly Pearl's advice is sound in principle. If a complete probabilistic model is within
reach, then the advantages are worth the modest effort required. Unfortunately, Pearl's
criteria are not feasible in all circumstances. If available knowledge is not sufficient to make
a complete probabilistic model feasible, de Finetti's Fundamental Theorem of Probability
can be the foundation for an assessment paradigm having the desired properties.
De Finetti's Fundamental Theorem of Probability [FTP] (1937,1949,1974) provides a
framework for computing bounds on the probability of an event in accord with the above
guidelines when this probability cannot be computed directly from assessments and when
an assessor does not have sufficient information to specify a complete probabilistic model of
observables.
The contrast between the perspectives of de Finetti and Pearl on these issues is seen
throughout the field of inductive inference. Bayesian network algorithms provide tractable
mechanisms for inference when knowledge of the probabilistic structure of a system is complete; i.e. the joint probability law governing all observable uncertain quantities is fully specified and a probability law can be assigned to model parameters that are not known with
certainty. Artificial intelligence inference networks and amplitive inference can be viewed as
attempts to develop a rational method of constructing a complete probabilistic model when
the modeler's knowledge of probabilistic structure is incomplete.
De Finetti's paradigm lies at the intersection of cognitive science, artificial intelligence
and operations research. Viewed as a method of inductive inference, the FTP falls within the
5
Pearl (1988) p. 382
4
domains of artificial intelligence and cognitive science. Computational mechanisms needed
to implement the FTP include linear programming, mixed-integer programming and the
representation of Boolean logic by integer programs, so the FTP is part of operations research as well. Lad, Dickey and Rahman (1990) fully leverage the connection between linear
programming and show how to use the FTP to derive bounds on a critical event for which
a probability has not been directly assessed. 6 In addition, they demonstrate that computation of conditional probability bounds can be done by solving a functional linear program
problem.
The relationship of Boole's arithmetic representation of propositional logic and the probability calculus was independently discovered/re-discovered during the 20th century by several
individuals. Components of de Finetti's FTP appear in equivalent forms in the work of Boole
(1847), Good (1950), Hailperin (1965,1976), Nilsson (1986,1993) and Smith (1961) among
others. The work of Nilsson and Quinlan is of particular interest to us as they both focus
on computational aspects of inductive inference in the absence of a complete probabilistic
model. Nilsson (1986) independently developed a probabilistic logic that, in the context
of probabilistic entailment, is equivalent to one version of de Finetti's theorem. Quinlan's
(1983) INFERNO system can be viewed as a local approximation to probabilistic logic and
to inference based on the FTP. 7
The FTP provides some important and unexpected insights. For example, Lad, Dickey
and Rahman use the FTP to develop extensions of classical probabilistic inequalities (Kolmogorov, Bienaym6-Chebychev) and show that these inequalities possess some surprising
features, overlooked in the past. Lad (1996) makes the "bold" statement that:
6
As Lad (1996) points out, Bruno and Gilio (1980) were the first to show that the simplest version of the FTP can be
represented as a linear program.
7
Pearl (1988), Nilsson (1993)
5
"Every inequality of probability theory is a special case of the fundamental theorem
of prevision.8
By recasting the FTP as a linear (or linear fractional) programming problem we benefit
from the extensive field of linear inequality systems and the many types of algorithms used
to efficiently solve such systems. For instance, the simplex algorithm for linear programming (LP) will find the optimal solution or determine that the linear program is infeasible;
however, the computational requirements of the simplex algorithm tends to grow linearly in
the number of decision variables. The work of Lad, Dickey and Rahman is primarily of conceptual value because the FTP prescribes linear programming problems with exponentially
many decision variables, and are intractable by conventional linear programming algorithms
such as the simplex method. If we provide assessments of as few as 100 events, then the
linear program prescribed by the FTP could have up to 2100 decision variables (which is more
than the number of elementary particles in the universe), and so is not directly solvable by
the simplex algorithm or by any other direct method!
In Part I of this paper, we do six things. First, we begin with a discussion of operational
subjectivist terminology to set the stage for the statement of the FTP as presented by
de Finetti and by Lad, Dickey and Rahman. The language that they employ differs in
substantive ways from the language commonly employed by probabilists and statisticians.
Second, we present the FTP along with some simple examples that provide a concrete
illustration of how the FTP interacts with problem logic.
Third, we study more realistic problems and show that even fairly small examples can
quickly overwhelm both conventional linear programming algorithms and an assessor's ability
to manually construct FTP linear programs from system logic. (If an assessor provides N
8Lad (1996) p. 112
6
assessments, the dimension of the linear programming constraint matrix is N x
0 ( 2 N).)
Fourth, we discuss an automated method for mapping logical relations among system
events into algebraic linear inequalities so as to fit them into a linear programming framework.
It is at least as difficult a task to construct FTP linear programs for large, complex problems
as it is to solve them because conventional methods for constructing linear programs of this
type are overwhelmed even more quickly than are the algorithms for solving them.
Fifth, we show how to construct an FTP linear program from a specification of system logic
and probability assessments, and how to solve the linear program using the RIP [Related
Integer Program] algorithm. For problems in which all events are dichotomous, the RIP
algorithm requires computational resources proportional to the number of events for which
assessments have been made ratherthan proportionalto the number of decision variables; that
is, the computational requirements grow as O(N) ratherthan as 0 ( 2 N). This is of fundamental
importance for practical computation.
Sixth, we end Part I with a comparison of the performance of standard linear program
algorithms and the RIP algorithm applied to examples of size up to N = 42.
Lad, Dickey and Rahman declare that the FTP " ... support[s] the process of asserting
bounds on previsions as an operationally meaningful representation of uncertain knowledge" 9 . In addition to possessing knowledge of logical relations among quantities and a
willingness to assess probabilities for uncertain quantities, an assessor may also possess partial knowledge of probabilistic dependence (independence) relations that govern these events
or quantities. The assertion that some set of uncertain quantities are exchangeable is an
example. Exchangeability imposes strong symmetries on the probability law of a set of uncertain quantities that are easily incorporated into the FTP framework and, in addition,
9
Lad, Dickey and Rahman (1990) p. 20
7
drastically reduces dimensionality by introducing equalities among elements of the FTP
linear programming vector of decision variables. 10
More general assertions about qualitative probabilistic structure lead to new problems. In
a subsequent paper we will indeed show that conditional dependence relations can lead to
systems of constraint equations composed of multi-linear forms in elements of the FTP linear
programming decision vector. The standard linear (or linear fractional) programming form
of the FTP no longer fits. Fortunately, it is possible to recast assertions about conditional
dependence relations among uncertain quantities into a form compatible with the algorithm
that we explain here.
2
The Issue of Terminology
De Finetti deliberately adopted a "terminology of prevision" that differs in important
ways from the lingua franca of statisticians and probabilists.
The motivation for novel
terminology stems from a fundamental difference between de Finetti's theory of probability
and that adopted by most probabilists. For example, the Kolmogorov axiomatization of
probability begins with a definition of a sample space and everything else is defined relative
to it. For de Finetti, operational measurement of degree of belief is the principal lever for
defining probabilistic objects. Probability begins with personal judgments about uncertain
quantities. A "sample space" emerges from operational measurement of degrees of beliefs
about uncertain quantities of interest to an agent. According to Lad, Dickey and Rahman
(1990):
"Although the terminology may appear to the mathematically trained reader as
needlessly novel, we submit that it actually provides a simpler and more accurate
1
°See Lad (1996) pp.181-182 and Lad, Dickey and Rahman (1990) p. 23.
8
practical language than that which, admittedly, has become standard in technical
communications in statistical theory. The conceptual differences are deep, and our
language is intended faithfully to portray them. Already, the terminology of prevision has found favor among some theorists, such as Goldstein (1981, 1983), who
recognize the unification it provides for the concepts of probability and expectation
(usually treated as distinct)."
"Quantity, realm, constituents and prevision" are familiar objects with different names.
As a readers' guide, we list definitions of terms needed to understand the FTP as presented
by de Finetti. l l
Quantity X= The numerical outcome of a particular operationally defined measurement.
Realm of a Quantity X = The set R(X) of all numbers that are possible results
of performing an operational measurement.
Size of Realm R(X) = The number of elements in R(X).
Event E = A quantity E whose realm is R(E) = {0, 1}.
Possible Event = An event that is neither impossible nor certain.
There may be logical relations among events. Any method designed to bound the probability of any other event whose probability is not directly computable from assessments must
account for these relations. If there are N events, logical relations among events reduce their
number from 2 N to some smaller number 2 S(N).
Incompatible events = A set of N events is incompatible if their sum cannot exceed 1.
Exhaustive events = A set of N events is exhaustive if their sum cannot be less
than one.
Partition= A set of N events constitutes a partition if they are both exhaustive
and incompatible.
"Op. Cit. (1990) p. 21
9
Constituents = Individual events in a partitionof a set of events.
S(N) = The size of the partition generated by any N events.
Notice that the word event is reserved for a quantity that has range {0, 1).
The concept of prevision is central. It is the subjectivist version of expectation, a label
for the operational measurement of the value of a lottery. The prevision of an event is
the probability for that event. While all events are uncertain quantities, not all uncertain
quantities are events. Consequently, the prevision of an uncertain quantity that is not an
event may not be a probability.
Definition: 12 Let XN = (X 1 , ,XN)t be any vector of quantities with bounded
discrete realm, R(XN). In asserting your state of knowledge about XN, and your
state of utility valuation, your prevision configuration for XN is the vector of numbers P(XN) = (P(X 1 ),... P(XN))t and the number S > 0 which you specify with
the understanding you are thereby asserting your willingness to engage any transaction that would yield you a net gain of amount s [XN - P(XN)], as long as the
vector of scale factors, SN, is a vector for which SN [XN - P(XN)] < S for every
XN e R(XN). Individually, P(XN) is called your prevision for XN and S is called
the scale of your maximum stake.
As Lad (1990) points out, this definition of prevision elides probability and expectation
and, in addition, characterizes prevision as a linear operator on functions of quantities.
A prevision for a vector of quantities is coherent "as long as you do not assert by it your
indifference to some transaction that would surely yield you a loss for every possible outcome
value of the possible quantities in R(XN)." 13
Coherence, a cornerstone of de Finetti's theory of probability, is the tip of an icebergsized difference between de Finetti's approach to probability and a traditional mathematical
theory of probability such as that based on Kolmogorov's well known axiom system.l4
12
13
See Lad (1996) pp. 60-61 for this definition and discussion of it.
Lad, Dickey & Rahman (1990) p. 23
14 In de Finetti's words: "The basic differences are as follows: 1) We REJECT the idea of 'atomic events', and hence the
systematic interpretation of events as sets; we REJECT a ready-made field of events (a Procrustean bed!) which imposes
constraints on us; we REJECT any kind of restriction (such as, for example, that the events one can consider at some given
10
3
The FTP
The FTP provides a mechanism for computation of upper and lower bounds on the
prevision of an uncertain quantity when the prevision of that quantity cannot be directly
calculated from previsions provided by one or more assessors:
"After investigation, we interpret the 'fundamental theorem of probability' to support the process of asserting bounds on previsions as an operationally meaningful
representation of uncertain knowledge." 1 5
The theorem accounts systematically for logical relations among events as well as for
possibly incomplete knowledge of a joint probability law for quantities. The spirit of the
operational subjectivist approach is, indeed, that a subjectivist assessment which constitutes
a complete specification of such a probability law is the exception rather than the rule. On
its face, the FTP may seem no more than a clever re-interpretation of the law of total
probability. Lad (1996) tells us what is fundamental about the FTP:1 6
" ... the fundamental theorem of probability characterizes the coherent implications of every prevision you have assessed for every other vector of quantities that
you have not yet directly assessed. Seen in this way, the fundamental theorem of
prevision is an extension of the famous closure result of deductive logic, described
by Hilbert and Ackermann in their text Mathematical Logic (1950, Section 1.9) as
"the systematic survey of all propositions that can be deduced from any set of axioms." The fundamental theorem of prevision is a truly comprehensive statement
of the coherent logic of uncertain knowledge."
Despite the exhaustive scope of this result, it is not widely known among probabilists and statisticians today. Indeed, de Finetti's (1974a, Chap. 3.10) designation
of his theorem as "the fundamental theorem of probability" has appeared absurd
to many who have bothered to read it without really studying his larger views.
As far as I can tell, no one else ever has designated any theorem of probability
moment, or in some given problem, should form a field); 2) We REJECT the idea of a unique [probability distribution] attached
once and for all to the field of events under consideration...; 3) Our approach deals directly with random quantities and
linear operations on them (events being included as a special case); we thus avoid the complications which arise when one
deals with the less convenient Boolean operations; 4) We REJECT countable additivity (i.e. o-additivity); 5) We REJECT
the transformation of the theorem of compound probabilities into a definition of conditional probability, and we also REJECT
the latter being made conditional on the assumption that [the probability of the conditioning event or quantity is not equal to
0]; by virtue of the exclusions we have made in 4) and 5), the construction of a complete theory of zero probability becomes
possible."
15Lad, Dickey and Rahman (1990) p. 20
16 Lad (1996) p.111-112 This version of the FTP is easily extended to include computation of bounds on conditional previsions and to incorporate numerical assessments of conditional previsions. Rather than presenting a formal statement of these
extensions now, we show how such computations are done in the course of discussion of the RIP algorithm.
11
as "the fundamental theorem." If pressed for a response today, I think that most
professionals would designate as the fundamental theorem of probability either the
law of large numbers or possibly the central limit theorem. There would be a few
other contenders mentioned, and many respondents would deny that there is any
fundamental theorem."
Here is Part I of the FTP:
Let XN+1 denote a vector of any N + 1 quantities that interest you, S(N + 1) be
the size of the realm R(XN+l) and RN+1 be the (N + 1) x S(N + 1) realm matrix
for XN+1. Partition RN+1 into two parts: RN (composed of the first N rows of
RN+1) and rN+1 (composed of the last row of RN+1). Let 0 and 1 be S(N + 1) x 1
column vectors of O's and l's respectively.
FTP I: Suppose your prevision for N uncertain quantities is P(XN) = PN. Let
XN+1 be any further uncertain quantity. Then your prevision for XN+1l is coherent
if and only if your further assertion of P(XN+1) for XN+1 lies within the interval
[1, u] where I and u are defined as solutions to the following linear programming
problems:
I=
minrN+lq and u= maxrN+lq
subject to:
RNq = pN, l t q = 1 and q > 0.
If the feasible set of solutions is empty, then P(XN)
=
PN is incoherent.
While a firm believer in the operational subjectivist approach may object, we shall translate the above statement of the FTP into standard probabilistic terms. This makes it immediately transparent to a reader not familiar with the operational subjectivist terminology at the
cost of failing to emphasize "deep conceptual differences" with standard versions of axiomatic
probability theory, and in particular with standard theory for finite sample spaces. Suppose
that QN is a finite sample space composed. of S(N) atomic events Ek, k = 1,..., S(N) and
that E 1 ,
. .
,ES(N)} is a partition of QN. Define X to be a union of some subset of the set
of atomic events QN. Then X is a generic event or, more simply, an event.
FTP I': Let XN = (X 1 ,,XN)t be an Nxl vector of events, P(XN) = (pi,... pN)t =
PN be a probability assessment for XN and let XN+1 be any other event. Together with logical relations among the N events composing XN and XN+1, the
event vector XN+1 = (X 1, ,XN, XN+l)t generates a sample space QS(N+l1) of size
12
S(N + 1) = 0( 2 N) . Let EN+i = (El, ... ES(N+1)) be a vector of atomic events
composing QS(N+1) · Then there is an (N + 1) x S(N + 1) matrix RN+1 such that
XN+1 = RN+1EN+1
Denote the first N rows of RN+1 by RN and the (N + 1)st row of RN+1 by rN+l.
An extended vector of probability assessments P(XN+1) = (P,PN+1)t for XN+1
is coherent when < PN+1 < u, where I and u are solutions to the linear programs:
Find the minimum I and the maximum u of rN+lq subject to:
RNq = PN, qtl = 1 and q > 0.
The probability vector P(XN) = PN is coherent if and only if the feasible region
for these LP problems is non-empty.
3.1
A Fault Tree Example
A fault tree is a Boolean logic diagram that specifies the failure logic of a system in a
hierarchical fashion. Because of their explicit logical structure and easy interpretation, fault
trees are useful templates for encoding probabilistic information about system failures. Fault
trees specify higher-order failures as logical functions of lower order failure events. Typically,
the root of a fault tree is the failure event of interest. A mechanical system can be viewed
as a collection of components organized into a hierarchy of subsystems. Consider a nuclear
reactor sprinkler system designed to cool the reactor core in the event of an emergency. The
root event of the fault tree for this system is whether or not the system provides sufficient
coolant to the reactor core. The basic failure events, the leaves of the fault tree, are the
states of the individual components such as valves, pumps and pipes. The leaf events are
called basic failure events or elementary events. Intermediate nodes are logical functions
of the leaves of the tree and other intermediate failures. Intermediate nodes correspond to
subsystems of the system and the logical functions that define intermediate nodes mirror the
failure structure of the system.1 7
17
Frankel (1988)
13
In fault tree analysis, probabilities of elementary failures are presumed to be readily
quantifiable using reliability data, expert subjective judgement or a combination of both.18
Working backwards up the tree to the root event, probability assessments for elementary
events are used to compute probability events for non-elementary events. Without further
specification of the structure of a joint probability law for events, a set of probabilities
attached to the leaves of a fault tree is not sufficient to specify completely a probability law
for all possible events. To this end,- fault tree analysis proceeds by making assumptions about
the probabilistic structure of the system in order to force the joint probability distribution for
events to be complete and coherent. Mutual independence of elementary events is the most
commonly employed assumption. 19 While this assumption greatly simplifies computation
of the fault tree's root failure, it has serious negative implications for the accuracy of the
resulting estimate. For many systems, the most likely cause of failure are common-mode
failures. 20 A common-mode failure is a set of elementary failures which are not stochastically
independent and are jointly sufficient to cause system failure. In one safety system for a
nuclear reactor the class of common-mode failures was more than one order of magnitude
more likely than the next most likely class of failures! 21 Identifying common-mode failures
and appraising their likelihood of occurrence is a topic of major interest in reliability analysis.
Figure 3.1 shows the logic diagram of a CSIS safety system in a nuclear reactor. In a
reactor accident the system pumps solution from the storage tank and injects it into the reactor containment vessel. The solution serves to control the temperature of the containment
environment and to absorb radiation. As such it is a critical component in the accident
18
Frankel op. cit., U.S. Nuclear Regulatory Commission (1981)
19See U.S. Nuclear Regulatory Commission (op cit.) and Karimi, Raumussen and Wolf (1980).
20
Henly and Kumamoto (1981) and Pages and Gondran (1986).
21
U.S. Nuclear Regulatory Commission (1976).
14
Figure 3.1: The logic diagram of the CSIS safety system in a nuclear reactor.
ntaoirment
360 Decree Spray
Header BCS-23-1S3
368 Nozzles
Equcy Spaced
360 Decree Spray
Header BC5-22-153
368 Nbzzles
Equoiy Spoced
I-.1
tTL-In
I
o-r l$
15
control procedures.
The system components are pipes, motorized pumps, valves, filters,
spray-nozzles, and associated control equipment. There are thousands of components in the
system. Ignoring passive parts such as pipes which have failure rates several orders of magnitude less than active components such as valves and pumps, there are still over a hundred
components whose failure could contribute to a serious accident. 22 These components are
mutually dependent on each other in many ways. There are logical dependencies in which
the success or failure of a component is a logical relation of some subset of components. In
the CSIS system, the nozzles which spray the solution into the containment fail if any of the
upstream components fail to provide solution in sufficient quantity at sufficient pressure to
the nozzles. There are also stochastic dependencies. Consider the failure of one of the 368
spray nozzles due to clogging. Since the remaining 367 nozzles are subject to the same environmental conditions (e.g., manufacturing process, construction materials, contaminated
solution, maintenance procedures, etc.), the failure of one of the nozzles makes the failure
of the others more likely. Specifying logical dependencies among components in the CSIS
is tedious but straightforward.
Documenting system failure logic is standard practice for
complex systems with high potential for public risk such as nuclear reactors or hazardous
materials processing facilities.
Specifying probabilistic dependencies is a difficult task. In addition to the large number of
assessments required, many of the assessments are hard to make. An expert may be required
to assess events which are unfamiliar or have extremely small probabilities. 2 3 For instance
passive failures in nuclear power plants have been assigned probabilities of order 10 - 9 per
22
United States Nuclear Regulatory Commission (1975)
23von Winterfeldt and Edwards (1986)
16
reactor per year.24 Such assessments are subject to well-known cognitive biases. 25 The event
that a safety system fails is often a disjunction of constituent events and, as Tversky and
Kahneman (1974) show, assessors routinely underestimate the probability of disjunctions
and overestimate the probability of conjunctions.
To illustrate Part I of the FTP we use an example first presented by Dickey in (1990).26
In what follows we shall define x as a generic realization of an uncertain event X and denote
the joint occurrence of Xi = xi, i = 1, 2,..., I by
X 1x 2
... xI. The complement of Xi will be
displayed as Xi and the complement of xi as xi.
Table 3.1 defines the set of events and Figure 3.2 is a display of the probability tree for
this safety system example.
Table 3.1: The safety system example. (Adapted from Dickey, 1991
Event
Description
Relation
Probabilities
-
Sprinkler Head Clogs
[.007, .010 ]
X2
Sprinkler Head Bursts
.005
X3
Insufficient Water Pressure
X 2 implies X3
X4
The System Fails
X 4 = Xl U X 2 U X 3
24
X1
25
26
.015
United States Nuclear Regulatory Commission(1975)
Kahneman, Slovic, and Tversky (1982)
Dickey, Presentation to NBER Seminar on Bayesian Statistics and Econometrics, Cambridge, MA October (1990).
17
Figure 3.2: The probability tree for the safety system example.
v
18
There are 4 generic events, X 1 ,..., X 4 as shown in Table 3.1. If we desire to answer any
probability question about X 1 ,..., X 4, then we would need to make 2 4 = 16 assessments. On
the other hand if we are only interested in the probability that the system will fail, P(X 4 ),
then how many assessments must we make? The system will fail just in case one or more of
X 1 , X 2 or X3 occurs. If we cannot assess P(X 4 ) directly, then we can economically calculate
P(X 4 ) indirectly using three assessments of the form
P(X 4 ) = P(X 1 )P(X
2 IX 1
)P(X 3 J1X2X 1)
to calculate P(X 4 ).
In this example the number of assessments needed to determine an answer to our question
is quite small. This is not true in general, and in a worst case scenario we would be forced
to assess every element in the joint distribution. Notice that it is logically possible either
to assess the joint distribution directly or to assess it indirectly via a set of conditional
probabilities. Direct assessment requires an assessor to assign probabilities to terminal nodes
(leaves) of the probability tree in Figure 3.2. Alternatively conditional probability assessment
corresponds to assigning conditional probabilities to graph edges in Figure 3.2.
Human
experts are usually more comfortable making conditional probability assessments. 2 7
Our principal interest is in the probability P(X 4 ) that this system will fail. Suppose that
we do not assess P(X 4 ) directly, but rely solely on assessments of X 1 , X 2 and X 3 . Absent
logical relations among events, we have 24 = 16 possibilities so that:
27
Heckerman (1990), Pearl (1988), Neopolitan (1990)
19
11111111
0 0000000
11110000
1
1
1
1
0
0
0
0
R4=
(3.1)
1 1 0 0 1 1 0 0
1 1 0 0 1 1 0 0
1 0101010
1 0 1 0
However, in this example, events X 1,X
2
and
X3
1 0 1 0
are logically dependent.
The third
column of Table 3.1 displays logical dependence relations among elements of the vector
X = (X 1, X 2, X 3 , X 4 ) of events in algebraic form, suitable for immediate use in LP format. 28
The relation X 3 < 1 -
X2
says that at most one of X 2 = 1 and
X3
= 1 can obtain, the
relation X4 = 1 - (1 - X 1 )(1 - X 2 )(1 - X 3) says that X 4 = 1 obtains if at least one of
X1 = 1, X 2 = 1 or X3 = 1 occurs. Therefore, the realm matrix becomes:
111
0 0 0
1 0 0 1 0 0
R 4 ,s(16) =
,
(3.2)
010010
111110
so that the vector of possible constituent events is:
E = (E 1 , E 2 , E 3 , E 4 , E5, E 6 )t
-
(X 1 X2
34
Xl13X4 23
XlX2
3X4, XlX2T3X4, X12X3X4, Xl X2 X3 X4)
The columns of R4,s(16) denote logically realizable joint events. Notice that the joint event
x 1 x 2x 3 x 4 does not appear in either E or in R 4,s( 1 6) because it violates the logical relation
prohibiting x 2 and
3
from both occurring. If we define ri to be the ith row of R 4 ,s(16 ), then
the previsions (marginal probabilities)
28
For problems with many events and complex logical relations among them, we need an efficient algorithm for translating
Boolean relations into algebraic relations. We return to this in section 4.
20
P(Xi = xi) _ P(xi) = riq.
Namely, via the law of total probability P(Xi) is a sum of some elements of the vector
q = (ql, q2 ,..., q6 ) t of probabilities of logically possible constituent events. For example,
P(X 1 ) = ql + q2 + q3. Given the assessments shown in the last column of Table 3.1, the FTP
provides bounds on P(X 4 ) by solving the following two LP problems. Let ri be the ith row
of R 4 ,s( 1 6).
Find min and max r 4 q subject to:
.007 < rq < .010,
r 2q = .005,
r 3 q = .015,
q > O;
and with I t = (1, 1, 1, 1, 1),
ltq = 1.
Using a standard LP package, one finds that min r 4 q = .02 and max r 4 q = .03 so that
.00 < P(X 4) < .00.
This simple example illustrates how the FTP works. As a second example, consider the
simple fault tree shown in Figure 3.3. This fault tree represents a system containing two
parallel redundant subsystems, composed of four elementary and two common-mode failure
types. The entire system fails (X 11l) if and only if both parallel subsystems fail (X 9 and
X 10 ). Common-mode Failure 1 (X 5 ) appears in both parallel subsystems and is sufficient to
cause the entire system to fail. Common-mode Failure 2 (X 6 ) also appears in both parallel
subsystems, but is not a sufficient cause for the system to fail. Common-mode Failure 2 will
21
cause the system to fail if and only if both Basic Failure 2-A and Basic Failure 2-B occur as
well.
Now suppose that we have probability assessments for events X 1, ... , X 6 in the fault tree
and wish to compute bounds on probabilities of composite events X 7, ... , X 11. It is not
necessary in this case to make any assumptions regarding probabilistic dependencies among
these eleven events in order to apply the simplest version of the FTP, FTP I. Scenario 1 in
Table 3.2 displays bounds on P(X 7), ... , P(X 1 1) dictated by the unconditional probability
assessments for X 1, ... , X 6 shown in the first column of the table.
Scenarios 2a), b) and c) of the table show bounds that arise when conditional probability
assessments are added to the unconditional assessments shown in the first column of the
table. A numerical assessment of a conditional probability such as P(X 1 X 3 ) = .020 leads
to a linear equality among elements of q and so fits nicely into the framework of FTP I.29.
As the number of events N becomes large, solving FTP linear programs with standard
LP algorithms quickly becomes impossible, because the number of decision variables grows
exponentially in N. A different tactical approach is therefore required. In the next section we
present a column-generation algorithm that enables us to solve such LP problems efficiently
and exactly in reasonable time even when the realm matrix is huge. We couple this algorithm
with a method for automatic translation of logical relations into algebraic relations that are
compatible with an LP representation of the realm matrix.
29
Lad, Rahman and Dickey (1990) show that determination of probability bounds for a conditional target event can be found
by solving a linear fractional program. See section 4 for a discussion of the numerical example shown in Scenario 3 of Table 3.2
22
Figure 3.3: A simple fault tree.
X2
X6
X4
23
Table 3.2 Probability assessments and bounds determined by the FTP for the simple fault tree.
Scenario
1
2(a)
2(b)
2(c)
3
Probability Assessments
Unconditional
Conditional
P(xl) =
P(x 2 ) =
P(x 3 ) =
P(x 4 ) =
P( 5) =
P(x 6 ) =
P(xl)=
P(x 2 ) =
P(x 3 ) =
P(x 4) =
P(x 5 ) =
P(x 6 ) =
P(xl) =
.010
.020
.010
.020
.030
.050
.010
.020
.010
.020
.030
.050
P(x 3 ) =
P(x 4 )=
P(x 5 ) =
P(x 6 ) =
P(xl) =
P(x 2 ) =
P(x 3 ) =
P(x 4 ) =
P(x 5) =
P(x 6 ) =
P(xl) =
P(X 2 ) =
P(x 3 ) =
P(x 4 ) =
P(X 5) =
P(x 6 ) =
.010
.020
.030
.050
.010
.020
.010
.020
.030
.050
.010
.020
.010
.020
.030
.010
P(x 2 ) = .020
Computed Bounds
P(x 7 ) = [0.0, .020]
P(x8) = [0.0, .020]
P(x 9g) = [.030, .060]
P(xlo) = [.030, .060]
P(xll) = [.030,.060]
P(X 2l
6 ) = .300
P(x 4 lx 6 ) = .200
P(xIlx 6 ) = .200
P(x3 lx6 ) = .100
P(x 7 ) =
P(x 8 ) =
P(xg) =
P(xlo) =
P(xll) =
[.015,.015]
[.010, .010]
[.030, .055]
[.030, .050]
[.030, .050]
P(x 7 ) = [0.0, .020]
P(x 8 ) = [0.0, .020]
P(xg) = [.030, .060]
P(xo)= [.030, .060]
P(xll) = [.030, .055]
P(xllx3 ) = P(x3lxI) = .020
P(x 2 lx 4 ) = P(x4lx 2 ) = .100
P(xllx2 ) = P(x 2 lx) = .200
P(x7) = [0.0,.020]
P(x 8 ) = [0.0,.020]
P(x 9 ) = [.030,.056]
P(xlo) = [.030, .056]
P(xll) = [.030, .044]
P(x 2 lx 4 ) = .020
P(x 2 lx 6 ) = .020
P(x 4 1x 6 ) = .020
P(x 7 ) = .001
P(x 8 ) = .001
P(x 9 ) = [.030, .041]
P(xlo)= [.030, .041]
P(xll) = [.030,.041]
.050
P(x 7 lx 6 ) = P(x8slx 6) = .020
P(x 9 lx 6 ) = P(xloIx6 ) = [0.020,.820]
P(xlIlx 6 ) = [0.0, .820]
24
4
The RIP Algorithm
The pumping system fault tree in Figure 5.1 is arguably not much more complicated than
the fault tree for our first simple example. However, computation of bounds for P(X
18)
took
more than 38 hours of CPU Time and 27 megabytes of virtual memory on a Sun 4/379
using the standard LP routine in Mathematica. Using the RIP (Related Integer Program)
algorithm, this example was solved in 5.44 CPU seconds on a Sun Sparcstation 10. This
example is a graphic illustration of the advantage of developing a specialized algorithm for
solving FTP linear programs.
4.1
Probabilistic Logic and Efficient Representation of Realm Matrices
In order to work with large event vectors, we need an efficient way of translating logical
relations among events into a realm matrix. By "efficient" we mean an algorithm whose
running time is O(N) rather than 0 ( 2 N), that can be constructed from the event vector in
time and space proportional to N and that supports operations we need to perform on the
realm matrix. In order to translate logical restrictions on events into a form compatible with
the FTP we need to represent Boolean functions by integer linear inequality constraints. For
example, the logical relations that lead to X 4 = 1- (1 - X 1)(1 - X 2 )(1 - X 3 ) are equivalent
to the pair of inequalities X 4 < X 1 + X 2 + X 3 < 3X 4 over the binary values of Xi, i.e.,
xi
E {0, 1}.
In the course of work on formalisms for representing uncertainty and the problem of
probabilistic inference, Nilsson (1986, 1993) re-discovers the FTP and provides a systematic method for mapping logical relations among events onto linear integer equalities and
inequalities.
25
Logical or deductive inference is sound in the sense that only valid conclusions can be
deduced from a set of premises. Logic is also complete in the sense that all valid conclusions
can be deduced from a set of premises using inference mechanisms. 30 Nilsson undertakes
generalization of modus ponens to probability in a fashion that does not compromise sound
inference and that makes no implicit or unacknowledged probabilistic assumptions. Nilsson's
treatment of modus ponens has become known as the probabilisticentailment problem and his
generalization of logic to encompass probability has become known as probabilistic logic. If
XN = {x1,. . , N} is a set of propositions, then the set of well-formed formulas consisting of
the closure of XN under finite conjunctions and negations is a language L(XN). A knowledgebase in L(XN) is a set K = {(si, Pi), i = 1,..., N} where si E L(XN) and pi is a probability
assessment for the logical statement si . For example, suppose that L(X 2) = {xi, xl -
and K =-{(x,.5),
(xl
-
x 2,.95)}.
Here xl -
x 2}
x2 is the logical conditional "not xl or
x 2." The knowledge base K is incomplete because it does not provide a joint probability
distribution for xl and x 2. Determination of the probability of x 2 from knowledge of K is
an example of Nilsson's probabilistic entailment problem. In this case K implies only that
the probability P(x 2) must lie in the interval (0.45,0.95); i.e. P(x P(x 2 ) < P(xl -
x 2) + P(xl)-
1<
x 2 ).
Among those who have worked on formalisms for representing uncertainty Nilsson is
unique in addressing computational issues. He explores some exact computational methods
for small problems and approximate methods for problems that are too large to solve exactly.
If we desire to transform any arbitrary Boolean function into linear inequality integer
30
As AI researchers are keenly aware, the soundness and completeness of propositional and first order logic is a theoretical tenet
and not an algorithm for efficient inference. Propositional logic is decidable which means that there is a well defined procedure
to determine in finite time whether or not an argument is valid. First order logic is not decidable. Practical deduction systems
compromise completeness for computational efficiency. Expert systems are incomplete by design. They trade computational
efficiency for completeness and are able to draw only a limited subset of the possible conclusions from a given set of premises.
26
constraints then it is sufficient that we be able to represent disjunction and negation (cf
Ebbinghaus, Flum, & Thomas, 1984; Mendelson, 1987). The FTP also requires the normalization constraint. Suppose we have an event vector XN = (X 1 ,... , XN) t such that if xi is
a logical relation on other events in XN then xi is one of the following:
(a) a disjunction of events in XN,
(b) a conjunction of events in XN,
(c) a negation of an event in XN,
(d) a conditional quantity on events in XN,
(e) a linear relation on events in XN, or
(f) a normalization constraint.
Let Dj denote the set of possible values of elements zj in z. For now it is best to think
of Dj
{0, 1}.
We want an algorithm for constructing a system of linear inequalities
Mz < g from the event vector XN with the property that a vector z satisfies Mz < g and
Zj E Dj, j = 1, . . . , N, if and only if z is a column vector of the realm matrix in use. To this
end we need the following:
Proposition: Let RN,S(N) be the realm matrix in use. The pair (M, g) is a matrix
function with domain XN. For a given XN, the vector z = (zl,..., ZN)t is a column
of RN,S(N) if and only if
z E z IMz < g, zj E Dj,j = ,..., N}
(4.1)
for some M and g.
We can construct a system of inequalities Mz < g that satisfies (4.1) by iterating through
the events in XN.
For each xj in XN we add the appropriate inequalities to the system
Mz < g, as follows:
27
Definition: If XN is an event vector, then for every Xi E XN, (M, g) is constructed as
follows:
(a) Conjunction Case: Xi = njn=lxkj if and only if
(i)
(ii)
. Xkl
+
Xk
0
- ni
xk~ +... + kn - xi < n -1
are in (M, g).
(b) Disjunction Case: Xi = Uj=xXkj if and only if
(i)
(ii)
xk +...
+ +Xk,
-
i >
0
.. . + Xk,, - nxi <:0
Xkl -
are in (M, g).
(c) Negation Case: Xi = xj if and only if
j = 1
xi -
is in (M, g).
(d) Conditional Quantity Case: Xi = (xj
(i)
Xi = (1-
(ii)
w < xj
(iii)
W < Xk
(iv)
W < Xj + Xk - 1
Xk)P(X
Xk)
if and only if
I Xk) +
wE [0,1]
are in (M,
g). 31
n
(e) Linear Relation Case: For Xi = xi, xi < E ajxkj if and only if
j=1
n
xi < Z ajxkj
31
See Appendix B for motivation of this construction.
28
is in (M, g).
(f) Normalization Case: Xi is a normalization event if and only if Xi = 1 is in (M, g).
The above definition is the basis for a fast algorithm for representing a realm matrix
RN,S(N) by a system of linear inequalities, (M, g) over domains Dj,j = 1,..., N. For each
relation in XN, at most four inequalities are introduced into (M, g). Thus, given an event
vector of N events the conversion algorithm would output (M, g) with M no larger than
4N x N and g no larger than 4N.
To illustrate how realm matrix generation works, let us return to the fault tree example
presented in Section 3.1. Table 3.1 declares that events X 2 and X 3 must satisfy X 2 + X 3 < 1
and that the target event X 4 = 1 - (1 - X 1)(1 - X 2 )(1 - X 3 ). This last equality is equivalent
to the pair of linear inequalities X 1 + X 2 + X 3 < 3X 4 and X 1 + X 2 + X 3 > X 4 .
If we define
M=
0
1
1
0
-1
-1
-1
1
1
1
1
-3
1
)
g
0
(4.2)
O
Lof
and define D1 = D2 = D3 = D4 = {0, 1}, then
{z I Mz < q, zi = 0 or 1, i = 1,...,4}
(4.3)
characterizes columns of the realm matrix R 4 , 6 shown in (3.2).
4.2
Solving the Master Linear Program
We now reconsider the LP dictated by the FTP from the vantage point of a system
of inequalities Mz < g, with zj E Dj, j = 1, . . . , N that represent a realm matrix RN,S(N) as
linear inequality constraints over the sets Dj, j = 1,..., N. Add the normalization constraint
29
ltq = 1 as a last row to RN,S(N), let c t be the row in RN,S(N) corresponding to the prevision
we want to bound and define
A = [
] and b =
.
(4.4)
Then the FTP says we have the following LP problem to solve; we call it the "Master LP
Problem" (MLP):
Find max or min ctq subject to
Aq =
RN,S(N)
.N
q =b, q>O,
(4.5)
where
RNS(N)
In (4.6), the column vectors, zl,-...
[zl...
ZS(N)
(4.6)
ZS(N), consist of the set of all vectors z that satisfy
Mz < g, Zj E Dj,j = 1,..., N, each of which is a column of RN,S(N); i.e. M and g are
constructed to meet these conditions. Without loss of generality, assume henceforth that
(4.5) is a minimization problem. The dual of the MLP is:
Find
max Atb
subject to
AtA > ct
(4.7)
or equivalently, with It = (1, 1,..., 1), find
max 7rt p + 0
30
(4.8)
subject to
7rtRN,s(N) + 01 > c t .
The dual constraint imposed by the jth column of A is 7rta(j) > cj and the corresponding
reduced cost for a column index a ( j) is pj = cj - 7rta(j) where 7r is the vector of simplex
multipliers.
Given a basic feasible solution to (4.4), the revised simplex method consists of four steps:
(i) Calculate the reduced cost vector p = (P,... PS(N)).
If p < 0 the current basis is optimal.
(ii) Determine the column vector in A to enter the basis.
(iii) Determine the column vector to leave the current basis.
(iv) Update the basis, the feasible solution, and the simplex multipliers 7r.
When the size of A does not permit direct execution of (i) and-(ii) we need an efficient
algorithm for generating columns of A to serve as candidates to enter the basis.
If the
reduced cost pj of column a (j ) of A can be represented as a linear function of elements of
a (j) then steps (i) and (ii) of the revised simplex method can be recast as a mixed-integer
programming problem. Given a basis B, steps (iii) and (iv) operate only on B and are
tractable for any N.
Suppose that the objective function coefficient cj is equal to a linear function ftz of some
column z of RN,S(N).
Then the reduced cost pj = cj-
rtz = (f - 7r)tz. The column of
A with the most negative reduced cost can then be found by solving the following Integer
Programming Sub-Problem (IPS):
32
min(f - 7r)tz - 0
32
The coefficient 0 in the objective function corresponds to the normalization constraint. It is distinguished because it plays
an important role in a subsequent stage of the RIP algorithm.
31
subject to
Mz < g
(4.9)
and for j=1, 2, ... , N,
z e Dj.
or equivalently,
z e{a(),... ,a(S(N))}
The branch-and-bound IP algorithm provides valuable information about the MLP. A
search for the optimal IP solution is initiated by solving a linear relaxation of the RIP:
Linear Relaxation: Find
min(f t z - 7rtz - 0)
(4.10)
subject to
Mz < g.
The solution to (4.10) yields a lower bound on the objective function value for the optimal
integer solution fip (cf. Bradley, Hax, & Magnanti, 1977). Let flp be the value of the objective
function for the optimal solution to (4.10). Then
p<
< ftz
-
rtZ-
0
(4.11)
for all feasible z (cf. Bradley, Hax, & Magnanti, 1977). If the branch-and-bound algorithm
determines that SIp > 0 then our current solution to the master LP problem must be optimal
(i.e., there is no column of A with negative reduced costs). Moreover, the IP algorithm yields
a sequence of feasible integer solutions as it searches for the optimal integer solution. Given
32
a feasible integer solution z' with objective value ftz' - wtzi - 0, we can use flp to decide
whether or not to terminate the search. If
-
(ftz/
7tZ'
(4.12)
1p <
- 0) -
and
(ftz/ -
(4.13)
tZ' - 0) < o0,
then we let z' be the column to enter the basis; otherwise we continue the branch-and-bound
algorithm.
Quite fortunately, the optimal IP solution also provides a lower bound on the master LP
problem.
Theorem: Let (r 't, 0') be the current dual prices for the master LP and let fp be the
lower bound on the corresponding IP problem given in (4.10). Then the optimal objective
value for MLP is bounded below by
[
t
' ]-
(4.14)
lp
Proof. For every feasible [zt, 1]t and ci = ftz as in (4.10),
ftz - Wrtz - 0' > elp,
that is,
Ci > 7rtZ + 0' + e'lp
Thus,
7FI+~;,
and
[7r/t,
< c
01+ip]t is dual-feasible with objective value bt [w't , 0'+-lp]t.
33
[]
Thus we can use the dual solution corresponding to the current primal solution to generate
bounds on the optimal objective value of the master LP problem, terminating when the
bounds are sufficiently tight.
Finally, note that Bertsimas and Tsitsiklis (1997) discuss a probability consistency problem related to but with considerably more structure than our implementation of the FTP.
With the assumed additional probabilistic structure, they provide a polynomial-time algorithm for the solution to a problem that is very much related to the FTP. The method that
drives their proof may lead to an enhancement of our algorithm. attention.
4.3
Conditional Prevision
Expert opinion about an uncertain quantity is often more easily elicited by asking for the
prevision, say, of X conditional on knowledge about another uncertain quantity Y on which
X depends in some fashion. We shall display dependence of X on Y as XlY. If X and Y
are events, we write P(XIY) for the conditional prevision of X given Y and P(XY) for the
prevision of the joint occurrence of X and Y. De Finetti (1974) 33 proves that:
A necessary and sufficient conditionfor coherence in the evaluation of P(X), P(Y)
and P(XIY) is compliance with
P(XY) = P(Y)P(XIY),
(4.15)
in addition to inf (XIY) < P(XIY) < sup (XIY) and 0 < P(Y) < 1. If X is an
event the relation (4.15) is called the theorem of compound probabilities and the
inequality for P(XIY) reduces to 0 < P(XIY) < 1.
A numerical appraisal q of the probability P(XIY) of the event X given Y = y fits into
Part I of The FTP because, for a fixed number P(XIY = y) = q, P(X and Y = y) = qP(Y)
33
de Finetti (1974) Vol I, p 136
34
the probabilities P(X and Y = y) and P(Y) are each sums of probabilities of constituent
events. Consequently, the resulting constraints are linear in elements of q.
If, however, the target is a conditional event XIY, given an uncertain quantity Y, the
probability P(X[Y) is a ratio of sums of elements of q. Lad, Dickey and Rahman (1991) show
that the FTP can be extended to compute bounds on conditional previsions in the following
way: if XN is any vector of N uncertain quantities and XN+2 IXN+1 is the target event whose
prevision we wish to bound, define XN+3 as the conjunction of XN+1 and XN+2, XN+3 =
(XN, XN+1, XN+2, XN+3 )t, RN+3,S(N+3) as the realm matrix for XN+3, RN,S(N) as the realm
matrix for XN and rN+l, rN+2, rN+3 as the final three rows of RN+3,s(N+3) corresponding to
XN+1, XN+2 and XN+3 respectively.
FTP II: Given an assessment of previsions P(XN) = PN for XN, any further
assessment of the conditional prevision P(XN+1IXN+2) coheres with P(XN) if and
only if P(XN+l IXN+2) E [I, u] where I and u are the extrema of the following
linear fractional programming problems:
(a) Find the minimum
and the maximum u of
rN+3q
rN+lq
subject to RN,S(N)q = PN, ltq and q > 0.
(b) When the feasible region is non-empty, finite extreme value solutions exist
if and only if rtN+1q is strictly positive for all vectors q in the feasible region.
(c) The feasible region is non-empty if and only if P(XN) is coherent.
We illustrate this construction on the simple fault tree example of Figure 3.3. Define
numerical assessments for P(Xi), i = 1, 2,..., 6, P(X 2 lX4 ), P(X 2 1X 6) and P(X4lX 6 ) as
P(X2 1X 4 ) = P214, P(X 2 IX6 ) = P216 and P(X 4 1X 6 ) = P416
35
and set
P = (Pl,
Let X 12 = X
11X 6 ,
, P6, P214, P216, P416)t-
a constituent event composed of the intersection of X 11 and X6, and
let Xelm denote the event "Xe given that Xm obtains." In correspondence with the above
definition of p, recognizing that our target event is X1 116 , define
X1116 = (X 1 ,
.
. , X 6 , X 2 14 , X 2
16 ,
X4
16
,X
11 ,
X
12 )
t
(Xt9, X
11
X
12 )t.
Let R(XS) denote the (9 x 64) realm matrix for X 9 , R(X1 1 16) the (11 x 64) realm matrix
for X 11 16, rll and r 12 the last two rows of R(X 11 16 ) respectively and set
A= [R(Xs)
and b = (pt, l)t.
The FTP says that P(X
11 IX 6 )
must lie in the interval [1, u] found by finding
min and max r 2 q
r 6q
subject to
Aq = b and q > O.
This is called a linear fractional programming problem, and is easily converted to and
solved as a standard linear programming problem. Given numerical appraisals of previsions
shown in Scenario 3 of Table 3.2 for elements of Xg, the solution to this problem is computed
to be 0 < P(X 11lX6) < 0.82, a large interval. The marginal probability P(X 11 ) shown in
Scenarios 1, 2a), 2b) and 2c) all lie in intervals of smaller length. Knowledge that X 6 obtains
substantially lengthens the bounding interval. Bounds for intermediate probabilities as well
are shown in Table 3.2.
36
5
Applying the RIP Algorithm
Comparison of computational performance of the RIP algorithm with MATHEMATICA's
off-the-shelf LP routines is something of a straw man. While the MATHEMATICA routine
is robust, it is one of many in a broad mathematical toolkit with significant time and space
resource overhead. A highly optimized dedicated LP package such as CPLEX will certainly
yield better overall computational results. For example, a well behaved 500 variable LP,
even a dense one, is well within the limits of CPLEX on a mid-range RISC workstation. On
the other hand, CPLEX is no match for an LP with more than one billion variables.
Results reported here are based on CPU times reported by the Sun OS time (1) utility
and by the Timing facility in MATHEMATICA (user time only). The RIP times include
both used and system times. All computations were done on SUN SPARC workstations
running Sun Os v4.1.x. Because of licensing restrictions, several different machines were
employed: MATHEMATICA on a Sun 4/370 workstation and CPLEX (the RIP algorithm)
on a SUN SPARCTATION 10. The SPECmark rating of the Sparcstation 10 is roughly three
times that of the 4/370. Computation of solutions to the two fault tree problems discussed
thus far sets the stage for an additional, more ambitious illustration of the performance of
the RIP.
5.1
Further Examples
Figure 5.1 shows a fault tree for a pumping system composed of 12 elementary events and
10 compound events. Bounding intervals were computed using the marginal probabilities
shown in the figure.
As pointed out earlier, this fault tree is arguably not much more
complicated than the simple fault tree shown in Figure 3.3, but the realm matrix is 22 x 4,048
37
and computing bounds on P(Xls) took more than 38.04 hours of CPU time and 27 megabytes
of virtual memory on a Sun 4/370 using the standard LP routine in Mathematica. 3 4
The size of the realm matrix, the total number of pivots required to reach a solution and
a Phase 1 and Phase 2 breakdown for the RIP algorithm are provided in Table 5.1. The
second column in Table 5.2 displays the number of events. Statistics for simple fault tree
Scenarios 1, 2a) and 3 are shown for comparison.
Table 5.1 The results of applying the RIP algorithm to the fault trees.
Target
Events
Decision
Variables Pivots Bounds
Pumping system fault tree
X18
20
4096
Simple fault tree X11
12
Simple fault tree X11
12
Simple fault tree (x111 x6)
19
130
u 45
Scenario 1.
64
113
u 17
Scenario 2a).
64
114
u 17
Scenario 3.
64
122
u 22
0.200
0.520
0.030
0.060
0.030
0.520
0.060
0.820
A fault tree system with 30 basic failure events and 12 non-basic failure events is displayed
in Figure 5.2. The realm matrix contains more than 230 > 1.07 x 109 columns, so a standard
LP algorithm could not even be applied. Results are shown in Table 5.2. Bounds for each of
the non-basic failure events were computed using the sub-tree for each intermediate event.
In each case the terminating conditions are that the IPS solution is non-negative and IPS
relaxation is non-negative. The second column in Table 5.2 displays the number of events in
34
The LP routings in the optimization toolbox for Matlab v3.0 could not solve this problem at all in the available virtual
memory on the same machine (64 megabytes).
38
the problem specification, the third column lists the size of the realm matrix for each nonbasic event, the fourth column shows both the total number of pivots and the breakdown of
Phase I and Phase II and, finally, the fifth column displays lower and upper FTP probability
bounds. The entire case required 270 seconds of CPU time to compute all of the bounds
shown in Table 5.2 including overhead introduced by the UNIX shell script used to run cases
sequentially. It is possible to improve this performance by configuring the RIP algorithm
to use an advanced basis so that it can be jump started directly into Phase II. This was
not done, so the computation times shown here are based on finding the same basis twice once for the lower bound and once for the upper bound. Of the 270 CPU seconds, 214 CPU
seconds were spent solving for the root event X 42 of the fault tree, an event with 30 basic
failure events. The next largest cases are X 40 and X 41 each of which have 15 basic failure
events. These cases required only 10-15 CPU seconds to complete.
39
Figure 5.1: The pumping system fault tree.
[0.2
[0.2]
D6]
06]
[
X5
).03]
[0.05]
X
3
[0.02]
[0.01]
[0.02]
40
[0.01]
Figure 5.2: Top events in the fault tree for the CSIS system.
BrileeC
s
and
Rures
rcn Poe e
Check Vdiveond
MOV
N Assembly
MOV'
X6
[.100)
0
iiX30
Conoanment. hsulf cen. Fkrd
Penerrcolon
>4 Ruptre
XO
[.010)
rom Contemmen
Penetroaton64
XI
[.020]
Fre
Vce,
/heck
Prel
CV\
Berren
and..
nd\
MOV"s
XPksP
Pk
X3
99ed
X7 \/
[.021
wpe
Ir 0 n Pre
VCCh
AsseMJy
MOV
Pe
X
X8
[.02))
Conrasnmenl
Penetrtion 64
PIlged
Penetrcion
3 Ruptur
X3
1.0223]
X2
[.011
41
Osue
,icren?F~Ld
frcm Conne
Penetrarlon 63
VIdve
Beree
and
eck
Pae MO
ed
X9
[.002]
ninmen
Peneroon 63
Plgged /
X4
.035]
1.055)
\
Table 5.2
Target Events
X30
5
x31
5
X32
8
X33
8
X34
12
X35
12
x36
15
X37
15
x38
18
X39
18
X40
22
x41
22
x42
44
5.2
Realm Pivots Bounds
8
17
0.610
u7
0.641
8
17
0.410
u7
0.500
32
113
0.753
u 16
0.753
32
113
0.410
u 18
0.523
128
135
0.610
u 43
0.808
128
128
0.410
u 38
0.639
1024
153
0.610
u 66
0.838
1024
134
0.410
u 51
0.680
4096
1 144
0.610
u 94
0.841
4096
154
0.410
u 72
0.704
32768
1 168
0.610
u 181
0.921
1102
0.410
32768
u 132
0.751
1.07 x10 9
1696
0.020
u 553
0.751
Conclusion
The RIP algorithm is a promising start on expanding the scope of real-world probability
assessment problems that can be treated with de Finetti's Fundamental Theorem of Probability (FTP).
42
References
Bertsimas, D. and Tsitsiklis, J. N. (1997). Introduction to Linear Optimization. Belmont, MA:
Athena Scientific.
Boole, G. (1847) The Mathematical Analysis of Logic. Cambridge, UK: Macmillan.
Bradley, S. P., Hax, A. C. & Magnanti, T. L. (1977). Applied Mathematical Programming.
Reading, MA: Addison-Wesley.
Bruno, G. and Gilio, A. (1980) Applicazione del metodo del simplesso al teorema fondamentale
per le probabilita nella concezione soggettivistica, Statistica, 40(3), 337-344.
Duda, R., Hart, P., & Sutherland, G. L. (1978).
Ebbinghaus, H. D., Flum, J., & Thomas, W. (1984). Mathematical Logic. New York: SpringerVerlag.
de Finetti, B. (1937). "Foresight: Its Logical Laws, Its Subjective Sources," Annals de l'Institut
Henri Poincar, H. Kyburg, Jr. (trans.) in Studies in Subjective Probability. H.E. Kyburg,
Jr. and H.E. Smokler (1964), New York: J. Wiley & Sons.
- - - (1974). Theory of Probability. Vol 1. A. Machi & A. Smith (trans.) New York: Wiley
Interscience.
- - - (1975). Theory of Probability. Vol 2.
Frankel, E. G. (1988). Systems Reliability and Risk Analysis. Boston: Kluwer.
Good, I. J. (1950). Probability and the Weighting of Evidence. London: Griffin.
Hailperin, T. (1965). Best possible inequalities for the probability of a logical function of events.
American Mathematical Monthly, 72, 343-359.
- - - (1986). Boole's Logic and Probability: A critical exposition from the standpoint of con-
43
temporary algebra and probability theory. Amsterdam: North-Holland.
Heckerman, D. (1991). Probabilistic Similarity Networks. Cambridge, MA: MIT Press.
Henley, H.J. & Kumamoto, H. (1981). Reliability Engineering and Risk Assessment. Englewood
Cliffs, NJ: Prentice-Hall.
Kahneman, D., Slovic, P. and Tversky, A. (eds.) (1982) Judgment Under Uncertainty: Heuristics and Biases, Cambridge: Cambridge University Press.
Karimi, R., Rasmussen, N. & Wolf, L. (1980). Qualitative and quantitative reliability analysis
of safety systems. MIT Energy Laboratory Report No. MIT-EL80-015.
Lad, F., Dickey, J. & Rahman, A. (1990). The fundamental theorem of prevision. Statistica,
50(1), 19-39.
- - - (1991).
Numerical application of the fundamental theorem of prevision.
Journal of
Statistical Computatibn and Simlation. 19-38.
Lad, F. (1996). OperationalSubjective Statistical Methods. New York: Wiley Interscience.
Mendelson, E. (1987). Introduction to Mathematical Logic (3rd ed.). Monterey, CA: Wadsworth
& Brooks/Cole.
Myers, T. (1995).
Reasoning With Incomplete Probabilistic Knowledge. Ph.D. thesis, Mas-
sachusetts Institute of Technology.
Nilsson, N. J. (1986). Probabilistic logic. Artificial Intelligence, 28, 71-87.
---
(1993). Probabilistic logic revisited. Artificial intelligence, 59, 39-42.
Neopolitan, R. (1990).
Probabilistic Reasoning in Expert Systems: Theory and Algorithms.
New York: Wiley Interscience.
Pages, A. & Gondran, M. (1986). System Reliability: Evaluation and Predictionin Engineering.
London: North Oxford.
44
Pearl, J. (1988). ProbabilisticReasoning in Intelligent Systems: Networks of Plausible Inference.
Series in Representation and Reasoning. San Mateo: Morgan-Kaufman.
Quinlan, J. R. (1983). Inferno: A cautious approach to uncertain inference.
The Computer
Journal, 26, 255-269.
Shafer, G. (1979). Mathematical theory of evidence. San Mateo, CA: Morgan-Kaufman.
Shortlife, E. H., Buchanan, B. G. (1975). A model of inexact reasoning in medicine. Mathematical biosciences. 23, 351-379.
Smith, C. A. B. (1961). Consistency in statistical inference and decision. Journal of the Royal
Statistical Society, series B. 23, 1-25.
Szolits, P. Pauker, S. G. (1978). Categorical and probabilistic reasoning in medical diagnosis.
Artificial Intelligence, 11, 115-144
Tversky, A. & Kahneman, D. (1974). Judgement under uncertainty: Heuristics and biases.,
Science, 185, 1124-1131.
United States Nuclear Regulatory Commission (1975). "Reactor safety study - An assessment
of accident risks in U. S. commercial nuclear power plants." NUREG-75/014.
United States Nuclear Regulatory Commission (1981). Fault Tree Handbook. NUREG-0492.
Valverde, L. (1997). Uncertain Inference, Estimation and Decision-Making in Integrated Assessments of Global Climate Change. Ph.D. thesis, Massachusetts Institute of Technology,
Management and Policy Program, September 1997.
von Winterfeldt, D. & Edwards, W. (1986). Decision Analysis and Behavioral Research. Cambridge, U.K.: Cambridge University Press.
White, A. L. (1986). "Reliability Estimation for Reconfigurable Systems with Fast Recovery."
Microelectron Reliab., 26, No. 6, pp. 1111-1120.
45
Download