Reasoning with Incomplete Knowledge Using de Finetti's Fundamental Theorem of Probability: Background and Computational Issues* Tracy Myerst Robert M. Freund,land Gordon M. Kaufman§ Sloan WP #3990 November 1997 *This paper is based on Tracy Myers's Ph. D. thesis, Reasoning with Incomplete ProbabilisticKnowledge: The RIP Algorithm for de Finetti's Fundamental Theorem of Probability, (Department of Brain and Cognitive Sciences, MIT (1995)). tSchlumberger Oilfield Services, Austin Prodcut Center tProfessor of Operations Research, Sloan School of Management, Massachusetts Institute of Technology. §Professor of Operations Research and Management, Sloan School of Management, Massachusetts Institute of Technology. Professor Kaufman's work was supported by the Air Forces Office of Scientific Research, Contract #AFOSR-F49620-1-0307. Reasoning with Incomplete Knowledge Using de Finetti's Fundamental Theorem of Probability: Background and Computational Issues* Tracy Myerst Robert M. Freund, and Gordon M. Kaufmantt November 18, 1997 Draft 1 Introduction What probabilistic features are shared by a nuclear power plant, the space shuttle, flyby-wire aircraft and DARPA net? Answer: all are very large, logically complex systems that must operate at exceptionally high levels of reliability, but are subject to failure at uncertain times and in uncertain ways. A principle goal of modern reliability analysis is to design methods for appraising the reliability of such systems. Current thinking is that baseline requirements for such an analysis are a probabilistic model of the stochastic behavior of a complex system, augmented with numerical appraisal of at least some properties of the joint *This paper is based on Tracy Myers's Ph. D. thesis, Reasoning with Incomplete ProbabilisticKnowledge: The RIP Algorithm for de Finetti's Fundamental Theorem of Probability, (Department of Brain and Cognitive Sciences, MIT (1995)). tSchlumberger Oilfield Services, Austin Product Center tProfessor of Operations Research, Sloan School of Management, Massachusetts Institute of Technology. ttProfessor of Operations Research and Management, Sloan School of Management, Massachusetts Institute of Technology. Professor Kaufman's work was supported by the Air Force Office of Scientific Research, Contract #AFOSR-F49620-1-0307. 1 probability law that the model entails. As White [1] points out: "The reliability goals [of such systems] are too high to be established by natural life testing which means the probability of system failure must be computed from mathematical models that capture the essential elements of fault occurrence and system fault recovery" 1 The process of stochastic modeling of a complex system often begins with a formal statement of the primitive postulates that govern the behavior of the system. Postulates or assumptions may embrace both the logical structure of observable events and a description of a probability law that identifies the probability of occurrence of all logically possible system events. If the physics of the system are properly modeled, a well posed mathematical model emerges that may reasonably be called a " fully specified" model; i.e. the probability of any observable event is, in principle, computable if both initial conditions (inputs) and all model parameters are either known with certainty or assigned a priori probability distributions. When the system is large, construction of a fully specified model may be a formidable task and, on occasion, an impossible one. Even if a fully specified model can be constructed, it may be too complex to efficiently compute event probabilities. The tension between realism and computability never disappears. Global climate change models are good examples. 2 A very different approach to modeling the probabilistic behavior of a complex system is to model the structure of the system so as to identify all logically obtainable events and to not model a probability law that governs event occurrence. In place of assigning a specific probability law to a sample space that contains all obtainable system events, ask system experts to assign numerical probabilities to occurrences of some events that lie within their domains of expertise. This recasts the problem and changes the nature and the focus of the 1White (1986) shows that system recovery can be adequately described by its first two moments when faults occur independently at a low constant rate, the system quickly recovers from all faults and fault recoveries are independent semi-Markov processes. 2See Valverde (1997) Uncertain Inference, Estimation and Decision Making in Integrated Assessments of Global Climate Change for example. 2 computational task. For a fully specified model, the task is: Given initial conditions, a joint probability law for events and either certain or probabilistic knowledge of model parameters, calculate the probability of occurrence of one or more critical events. The task for the alternative is: Given the logical structure of obtainable events and some numerical assessments of probabilities of these events, compute bounds on the probabilities of one or more critical events that are not directly assessed. 3 Any method for processing complex system probability assessments should: * Allow determination of the coherence or incoherence of assessments, * Enable computation of coherent bounds on probabilities of events not directly assessed, * Allow for efficient revision of bounds in light of additional information in the form of expert judgment or in the form of observation of the occurrence of an event or its complement. It must also allow for efficient revision when additional assessments are provided. * Not contradict Bayesian conditionalization, 4 * Be computationally tractable for realistic problems of moderate to large size, and * Should be based on sound assumptions about qualitative probabilistic structure assigned to uncertain system events. Bayesian conditionalization is the linchpin of Bayesian inference. It means that revision of probability judgments in light of new information should be done in accord with Bayes' Theorem. Pearl (1988) says three principles characterize Bayesian reasoning: * Reliance on a complete probabilistic model of observable quantities, * Willingness to incorporate subjective judgments as an expedient substitute for empirical data, and * The use of probabilistic conditionalization [Bayes' Theorem] for updating beliefs in the light of new observations. He offers the advice that: 3 0f course, a hybrid of both approaches is possible. We have chosen descriptive extremes to emphasize differences between reasoning with incomplete knowledge through the lense of de Finetti's Fundamental Theorem of Probability and more traditional approaches to modeling of stochastic systems. 4 Pearl (1988) defines Bayesian conditionalization as follows: "Probability theory adopts the autoepistemic phrase "...'given that what I know is C' as a primitive of the language. Syntactically, this is denoted by placing C behind the conditioning bar in a statement such as P(A I C) = p. This statement combined the notions of knowledge and belief by attributing to A a degree of belief p, given the knowledge of C. C is also called the context of the belief in A, and the notation P(A I C) is called Bayes conditionalization. Thomas Bayes (1702-1761) made his main contribution to the science of probability by associating the English phrase' .. .given that I know C' with the now-famous formula P(A C) = P(A, C)/P(C) [Bayes 1763], which has become a definition of conditional probabilities. It is by virtue of Bayes conditionalization that probability theory facilitates nonmontonic reasoning, i.e., reasoning involving retraction of previous conclusions." 3 To fully appreciate the role of the Bayesian approach in artificial intelligence, one should consider situations in which we do not have complete information to form a probability model. In such situations the Bayesian strategy encourages the agent to complete the model by making reasonable assumptions, rather than abandoning the useful device of conditionalization. 5 Certainly Pearl's advice is sound in principle. If a complete probabilistic model is within reach, then the advantages are worth the modest effort required. Unfortunately, Pearl's criteria are not feasible in all circumstances. If available knowledge is not sufficient to make a complete probabilistic model feasible, de Finetti's Fundamental Theorem of Probability can be the foundation for an assessment paradigm having the desired properties. De Finetti's Fundamental Theorem of Probability [FTP] (1937,1949,1974) provides a framework for computing bounds on the probability of an event in accord with the above guidelines when this probability cannot be computed directly from assessments and when an assessor does not have sufficient information to specify a complete probabilistic model of observables. The contrast between the perspectives of de Finetti and Pearl on these issues is seen throughout the field of inductive inference. Bayesian network algorithms provide tractable mechanisms for inference when knowledge of the probabilistic structure of a system is complete; i.e. the joint probability law governing all observable uncertain quantities is fully specified and a probability law can be assigned to model parameters that are not known with certainty. Artificial intelligence inference networks and amplitive inference can be viewed as attempts to develop a rational method of constructing a complete probabilistic model when the modeler's knowledge of probabilistic structure is incomplete. De Finetti's paradigm lies at the intersection of cognitive science, artificial intelligence and operations research. Viewed as a method of inductive inference, the FTP falls within the 5 Pearl (1988) p. 382 4 domains of artificial intelligence and cognitive science. Computational mechanisms needed to implement the FTP include linear programming, mixed-integer programming and the representation of Boolean logic by integer programs, so the FTP is part of operations research as well. Lad, Dickey and Rahman (1990) fully leverage the connection between linear programming and show how to use the FTP to derive bounds on a critical event for which a probability has not been directly assessed. 6 In addition, they demonstrate that computation of conditional probability bounds can be done by solving a functional linear program problem. The relationship of Boole's arithmetic representation of propositional logic and the probability calculus was independently discovered/re-discovered during the 20th century by several individuals. Components of de Finetti's FTP appear in equivalent forms in the work of Boole (1847), Good (1950), Hailperin (1965,1976), Nilsson (1986,1993) and Smith (1961) among others. The work of Nilsson and Quinlan is of particular interest to us as they both focus on computational aspects of inductive inference in the absence of a complete probabilistic model. Nilsson (1986) independently developed a probabilistic logic that, in the context of probabilistic entailment, is equivalent to one version of de Finetti's theorem. Quinlan's (1983) INFERNO system can be viewed as a local approximation to probabilistic logic and to inference based on the FTP. 7 The FTP provides some important and unexpected insights. For example, Lad, Dickey and Rahman use the FTP to develop extensions of classical probabilistic inequalities (Kolmogorov, Bienaym6-Chebychev) and show that these inequalities possess some surprising features, overlooked in the past. Lad (1996) makes the "bold" statement that: 6 As Lad (1996) points out, Bruno and Gilio (1980) were the first to show that the simplest version of the FTP can be represented as a linear program. 7 Pearl (1988), Nilsson (1993) 5 "Every inequality of probability theory is a special case of the fundamental theorem of prevision.8 By recasting the FTP as a linear (or linear fractional) programming problem we benefit from the extensive field of linear inequality systems and the many types of algorithms used to efficiently solve such systems. For instance, the simplex algorithm for linear programming (LP) will find the optimal solution or determine that the linear program is infeasible; however, the computational requirements of the simplex algorithm tends to grow linearly in the number of decision variables. The work of Lad, Dickey and Rahman is primarily of conceptual value because the FTP prescribes linear programming problems with exponentially many decision variables, and are intractable by conventional linear programming algorithms such as the simplex method. If we provide assessments of as few as 100 events, then the linear program prescribed by the FTP could have up to 2100 decision variables (which is more than the number of elementary particles in the universe), and so is not directly solvable by the simplex algorithm or by any other direct method! In Part I of this paper, we do six things. First, we begin with a discussion of operational subjectivist terminology to set the stage for the statement of the FTP as presented by de Finetti and by Lad, Dickey and Rahman. The language that they employ differs in substantive ways from the language commonly employed by probabilists and statisticians. Second, we present the FTP along with some simple examples that provide a concrete illustration of how the FTP interacts with problem logic. Third, we study more realistic problems and show that even fairly small examples can quickly overwhelm both conventional linear programming algorithms and an assessor's ability to manually construct FTP linear programs from system logic. (If an assessor provides N 8Lad (1996) p. 112 6 assessments, the dimension of the linear programming constraint matrix is N x 0 ( 2 N).) Fourth, we discuss an automated method for mapping logical relations among system events into algebraic linear inequalities so as to fit them into a linear programming framework. It is at least as difficult a task to construct FTP linear programs for large, complex problems as it is to solve them because conventional methods for constructing linear programs of this type are overwhelmed even more quickly than are the algorithms for solving them. Fifth, we show how to construct an FTP linear program from a specification of system logic and probability assessments, and how to solve the linear program using the RIP [Related Integer Program] algorithm. For problems in which all events are dichotomous, the RIP algorithm requires computational resources proportional to the number of events for which assessments have been made ratherthan proportionalto the number of decision variables; that is, the computational requirements grow as O(N) ratherthan as 0 ( 2 N). This is of fundamental importance for practical computation. Sixth, we end Part I with a comparison of the performance of standard linear program algorithms and the RIP algorithm applied to examples of size up to N = 42. Lad, Dickey and Rahman declare that the FTP " ... support[s] the process of asserting bounds on previsions as an operationally meaningful representation of uncertain knowledge" 9 . In addition to possessing knowledge of logical relations among quantities and a willingness to assess probabilities for uncertain quantities, an assessor may also possess partial knowledge of probabilistic dependence (independence) relations that govern these events or quantities. The assertion that some set of uncertain quantities are exchangeable is an example. Exchangeability imposes strong symmetries on the probability law of a set of uncertain quantities that are easily incorporated into the FTP framework and, in addition, 9 Lad, Dickey and Rahman (1990) p. 20 7 drastically reduces dimensionality by introducing equalities among elements of the FTP linear programming vector of decision variables. 10 More general assertions about qualitative probabilistic structure lead to new problems. In a subsequent paper we will indeed show that conditional dependence relations can lead to systems of constraint equations composed of multi-linear forms in elements of the FTP linear programming decision vector. The standard linear (or linear fractional) programming form of the FTP no longer fits. Fortunately, it is possible to recast assertions about conditional dependence relations among uncertain quantities into a form compatible with the algorithm that we explain here. 2 The Issue of Terminology De Finetti deliberately adopted a "terminology of prevision" that differs in important ways from the lingua franca of statisticians and probabilists. The motivation for novel terminology stems from a fundamental difference between de Finetti's theory of probability and that adopted by most probabilists. For example, the Kolmogorov axiomatization of probability begins with a definition of a sample space and everything else is defined relative to it. For de Finetti, operational measurement of degree of belief is the principal lever for defining probabilistic objects. Probability begins with personal judgments about uncertain quantities. A "sample space" emerges from operational measurement of degrees of beliefs about uncertain quantities of interest to an agent. According to Lad, Dickey and Rahman (1990): "Although the terminology may appear to the mathematically trained reader as needlessly novel, we submit that it actually provides a simpler and more accurate 1 °See Lad (1996) pp.181-182 and Lad, Dickey and Rahman (1990) p. 23. 8 practical language than that which, admittedly, has become standard in technical communications in statistical theory. The conceptual differences are deep, and our language is intended faithfully to portray them. Already, the terminology of prevision has found favor among some theorists, such as Goldstein (1981, 1983), who recognize the unification it provides for the concepts of probability and expectation (usually treated as distinct)." "Quantity, realm, constituents and prevision" are familiar objects with different names. As a readers' guide, we list definitions of terms needed to understand the FTP as presented by de Finetti. l l Quantity X= The numerical outcome of a particular operationally defined measurement. Realm of a Quantity X = The set R(X) of all numbers that are possible results of performing an operational measurement. Size of Realm R(X) = The number of elements in R(X). Event E = A quantity E whose realm is R(E) = {0, 1}. Possible Event = An event that is neither impossible nor certain. There may be logical relations among events. Any method designed to bound the probability of any other event whose probability is not directly computable from assessments must account for these relations. If there are N events, logical relations among events reduce their number from 2 N to some smaller number 2 S(N). Incompatible events = A set of N events is incompatible if their sum cannot exceed 1. Exhaustive events = A set of N events is exhaustive if their sum cannot be less than one. Partition= A set of N events constitutes a partition if they are both exhaustive and incompatible. "Op. Cit. (1990) p. 21 9 Constituents = Individual events in a partitionof a set of events. S(N) = The size of the partition generated by any N events. Notice that the word event is reserved for a quantity that has range {0, 1). The concept of prevision is central. It is the subjectivist version of expectation, a label for the operational measurement of the value of a lottery. The prevision of an event is the probability for that event. While all events are uncertain quantities, not all uncertain quantities are events. Consequently, the prevision of an uncertain quantity that is not an event may not be a probability. Definition: 12 Let XN = (X 1 , ,XN)t be any vector of quantities with bounded discrete realm, R(XN). In asserting your state of knowledge about XN, and your state of utility valuation, your prevision configuration for XN is the vector of numbers P(XN) = (P(X 1 ),... P(XN))t and the number S > 0 which you specify with the understanding you are thereby asserting your willingness to engage any transaction that would yield you a net gain of amount s [XN - P(XN)], as long as the vector of scale factors, SN, is a vector for which SN [XN - P(XN)] < S for every XN e R(XN). Individually, P(XN) is called your prevision for XN and S is called the scale of your maximum stake. As Lad (1990) points out, this definition of prevision elides probability and expectation and, in addition, characterizes prevision as a linear operator on functions of quantities. A prevision for a vector of quantities is coherent "as long as you do not assert by it your indifference to some transaction that would surely yield you a loss for every possible outcome value of the possible quantities in R(XN)." 13 Coherence, a cornerstone of de Finetti's theory of probability, is the tip of an icebergsized difference between de Finetti's approach to probability and a traditional mathematical theory of probability such as that based on Kolmogorov's well known axiom system.l4 12 13 See Lad (1996) pp. 60-61 for this definition and discussion of it. Lad, Dickey & Rahman (1990) p. 23 14 In de Finetti's words: "The basic differences are as follows: 1) We REJECT the idea of 'atomic events', and hence the systematic interpretation of events as sets; we REJECT a ready-made field of events (a Procrustean bed!) which imposes constraints on us; we REJECT any kind of restriction (such as, for example, that the events one can consider at some given 10 3 The FTP The FTP provides a mechanism for computation of upper and lower bounds on the prevision of an uncertain quantity when the prevision of that quantity cannot be directly calculated from previsions provided by one or more assessors: "After investigation, we interpret the 'fundamental theorem of probability' to support the process of asserting bounds on previsions as an operationally meaningful representation of uncertain knowledge." 1 5 The theorem accounts systematically for logical relations among events as well as for possibly incomplete knowledge of a joint probability law for quantities. The spirit of the operational subjectivist approach is, indeed, that a subjectivist assessment which constitutes a complete specification of such a probability law is the exception rather than the rule. On its face, the FTP may seem no more than a clever re-interpretation of the law of total probability. Lad (1996) tells us what is fundamental about the FTP:1 6 " ... the fundamental theorem of probability characterizes the coherent implications of every prevision you have assessed for every other vector of quantities that you have not yet directly assessed. Seen in this way, the fundamental theorem of prevision is an extension of the famous closure result of deductive logic, described by Hilbert and Ackermann in their text Mathematical Logic (1950, Section 1.9) as "the systematic survey of all propositions that can be deduced from any set of axioms." The fundamental theorem of prevision is a truly comprehensive statement of the coherent logic of uncertain knowledge." Despite the exhaustive scope of this result, it is not widely known among probabilists and statisticians today. Indeed, de Finetti's (1974a, Chap. 3.10) designation of his theorem as "the fundamental theorem of probability" has appeared absurd to many who have bothered to read it without really studying his larger views. As far as I can tell, no one else ever has designated any theorem of probability moment, or in some given problem, should form a field); 2) We REJECT the idea of a unique [probability distribution] attached once and for all to the field of events under consideration...; 3) Our approach deals directly with random quantities and linear operations on them (events being included as a special case); we thus avoid the complications which arise when one deals with the less convenient Boolean operations; 4) We REJECT countable additivity (i.e. o-additivity); 5) We REJECT the transformation of the theorem of compound probabilities into a definition of conditional probability, and we also REJECT the latter being made conditional on the assumption that [the probability of the conditioning event or quantity is not equal to 0]; by virtue of the exclusions we have made in 4) and 5), the construction of a complete theory of zero probability becomes possible." 15Lad, Dickey and Rahman (1990) p. 20 16 Lad (1996) p.111-112 This version of the FTP is easily extended to include computation of bounds on conditional previsions and to incorporate numerical assessments of conditional previsions. Rather than presenting a formal statement of these extensions now, we show how such computations are done in the course of discussion of the RIP algorithm. 11 as "the fundamental theorem." If pressed for a response today, I think that most professionals would designate as the fundamental theorem of probability either the law of large numbers or possibly the central limit theorem. There would be a few other contenders mentioned, and many respondents would deny that there is any fundamental theorem." Here is Part I of the FTP: Let XN+1 denote a vector of any N + 1 quantities that interest you, S(N + 1) be the size of the realm R(XN+l) and RN+1 be the (N + 1) x S(N + 1) realm matrix for XN+1. Partition RN+1 into two parts: RN (composed of the first N rows of RN+1) and rN+1 (composed of the last row of RN+1). Let 0 and 1 be S(N + 1) x 1 column vectors of O's and l's respectively. FTP I: Suppose your prevision for N uncertain quantities is P(XN) = PN. Let XN+1 be any further uncertain quantity. Then your prevision for XN+1l is coherent if and only if your further assertion of P(XN+1) for XN+1 lies within the interval [1, u] where I and u are defined as solutions to the following linear programming problems: I= minrN+lq and u= maxrN+lq subject to: RNq = pN, l t q = 1 and q > 0. If the feasible set of solutions is empty, then P(XN) = PN is incoherent. While a firm believer in the operational subjectivist approach may object, we shall translate the above statement of the FTP into standard probabilistic terms. This makes it immediately transparent to a reader not familiar with the operational subjectivist terminology at the cost of failing to emphasize "deep conceptual differences" with standard versions of axiomatic probability theory, and in particular with standard theory for finite sample spaces. Suppose that QN is a finite sample space composed. of S(N) atomic events Ek, k = 1,..., S(N) and that E 1 , . . ,ES(N)} is a partition of QN. Define X to be a union of some subset of the set of atomic events QN. Then X is a generic event or, more simply, an event. FTP I': Let XN = (X 1 ,,XN)t be an Nxl vector of events, P(XN) = (pi,... pN)t = PN be a probability assessment for XN and let XN+1 be any other event. Together with logical relations among the N events composing XN and XN+1, the event vector XN+1 = (X 1, ,XN, XN+l)t generates a sample space QS(N+l1) of size 12 S(N + 1) = 0( 2 N) . Let EN+i = (El, ... ES(N+1)) be a vector of atomic events composing QS(N+1) · Then there is an (N + 1) x S(N + 1) matrix RN+1 such that XN+1 = RN+1EN+1 Denote the first N rows of RN+1 by RN and the (N + 1)st row of RN+1 by rN+l. An extended vector of probability assessments P(XN+1) = (P,PN+1)t for XN+1 is coherent when < PN+1 < u, where I and u are solutions to the linear programs: Find the minimum I and the maximum u of rN+lq subject to: RNq = PN, qtl = 1 and q > 0. The probability vector P(XN) = PN is coherent if and only if the feasible region for these LP problems is non-empty. 3.1 A Fault Tree Example A fault tree is a Boolean logic diagram that specifies the failure logic of a system in a hierarchical fashion. Because of their explicit logical structure and easy interpretation, fault trees are useful templates for encoding probabilistic information about system failures. Fault trees specify higher-order failures as logical functions of lower order failure events. Typically, the root of a fault tree is the failure event of interest. A mechanical system can be viewed as a collection of components organized into a hierarchy of subsystems. Consider a nuclear reactor sprinkler system designed to cool the reactor core in the event of an emergency. The root event of the fault tree for this system is whether or not the system provides sufficient coolant to the reactor core. The basic failure events, the leaves of the fault tree, are the states of the individual components such as valves, pumps and pipes. The leaf events are called basic failure events or elementary events. Intermediate nodes are logical functions of the leaves of the tree and other intermediate failures. Intermediate nodes correspond to subsystems of the system and the logical functions that define intermediate nodes mirror the failure structure of the system.1 7 17 Frankel (1988) 13 In fault tree analysis, probabilities of elementary failures are presumed to be readily quantifiable using reliability data, expert subjective judgement or a combination of both.18 Working backwards up the tree to the root event, probability assessments for elementary events are used to compute probability events for non-elementary events. Without further specification of the structure of a joint probability law for events, a set of probabilities attached to the leaves of a fault tree is not sufficient to specify completely a probability law for all possible events. To this end,- fault tree analysis proceeds by making assumptions about the probabilistic structure of the system in order to force the joint probability distribution for events to be complete and coherent. Mutual independence of elementary events is the most commonly employed assumption. 19 While this assumption greatly simplifies computation of the fault tree's root failure, it has serious negative implications for the accuracy of the resulting estimate. For many systems, the most likely cause of failure are common-mode failures. 20 A common-mode failure is a set of elementary failures which are not stochastically independent and are jointly sufficient to cause system failure. In one safety system for a nuclear reactor the class of common-mode failures was more than one order of magnitude more likely than the next most likely class of failures! 21 Identifying common-mode failures and appraising their likelihood of occurrence is a topic of major interest in reliability analysis. Figure 3.1 shows the logic diagram of a CSIS safety system in a nuclear reactor. In a reactor accident the system pumps solution from the storage tank and injects it into the reactor containment vessel. The solution serves to control the temperature of the containment environment and to absorb radiation. As such it is a critical component in the accident 18 Frankel op. cit., U.S. Nuclear Regulatory Commission (1981) 19See U.S. Nuclear Regulatory Commission (op cit.) and Karimi, Raumussen and Wolf (1980). 20 Henly and Kumamoto (1981) and Pages and Gondran (1986). 21 U.S. Nuclear Regulatory Commission (1976). 14 Figure 3.1: The logic diagram of the CSIS safety system in a nuclear reactor. ntaoirment 360 Decree Spray Header BCS-23-1S3 368 Nozzles Equcy Spaced 360 Decree Spray Header BC5-22-153 368 Nbzzles Equoiy Spoced I-.1 tTL-In I o-r l$ 15 control procedures. The system components are pipes, motorized pumps, valves, filters, spray-nozzles, and associated control equipment. There are thousands of components in the system. Ignoring passive parts such as pipes which have failure rates several orders of magnitude less than active components such as valves and pumps, there are still over a hundred components whose failure could contribute to a serious accident. 22 These components are mutually dependent on each other in many ways. There are logical dependencies in which the success or failure of a component is a logical relation of some subset of components. In the CSIS system, the nozzles which spray the solution into the containment fail if any of the upstream components fail to provide solution in sufficient quantity at sufficient pressure to the nozzles. There are also stochastic dependencies. Consider the failure of one of the 368 spray nozzles due to clogging. Since the remaining 367 nozzles are subject to the same environmental conditions (e.g., manufacturing process, construction materials, contaminated solution, maintenance procedures, etc.), the failure of one of the nozzles makes the failure of the others more likely. Specifying logical dependencies among components in the CSIS is tedious but straightforward. Documenting system failure logic is standard practice for complex systems with high potential for public risk such as nuclear reactors or hazardous materials processing facilities. Specifying probabilistic dependencies is a difficult task. In addition to the large number of assessments required, many of the assessments are hard to make. An expert may be required to assess events which are unfamiliar or have extremely small probabilities. 2 3 For instance passive failures in nuclear power plants have been assigned probabilities of order 10 - 9 per 22 United States Nuclear Regulatory Commission (1975) 23von Winterfeldt and Edwards (1986) 16 reactor per year.24 Such assessments are subject to well-known cognitive biases. 25 The event that a safety system fails is often a disjunction of constituent events and, as Tversky and Kahneman (1974) show, assessors routinely underestimate the probability of disjunctions and overestimate the probability of conjunctions. To illustrate Part I of the FTP we use an example first presented by Dickey in (1990).26 In what follows we shall define x as a generic realization of an uncertain event X and denote the joint occurrence of Xi = xi, i = 1, 2,..., I by X 1x 2 ... xI. The complement of Xi will be displayed as Xi and the complement of xi as xi. Table 3.1 defines the set of events and Figure 3.2 is a display of the probability tree for this safety system example. Table 3.1: The safety system example. (Adapted from Dickey, 1991 Event Description Relation Probabilities - Sprinkler Head Clogs [.007, .010 ] X2 Sprinkler Head Bursts .005 X3 Insufficient Water Pressure X 2 implies X3 X4 The System Fails X 4 = Xl U X 2 U X 3 24 X1 25 26 .015 United States Nuclear Regulatory Commission(1975) Kahneman, Slovic, and Tversky (1982) Dickey, Presentation to NBER Seminar on Bayesian Statistics and Econometrics, Cambridge, MA October (1990). 17 Figure 3.2: The probability tree for the safety system example. v 18 There are 4 generic events, X 1 ,..., X 4 as shown in Table 3.1. If we desire to answer any probability question about X 1 ,..., X 4, then we would need to make 2 4 = 16 assessments. On the other hand if we are only interested in the probability that the system will fail, P(X 4 ), then how many assessments must we make? The system will fail just in case one or more of X 1 , X 2 or X3 occurs. If we cannot assess P(X 4 ) directly, then we can economically calculate P(X 4 ) indirectly using three assessments of the form P(X 4 ) = P(X 1 )P(X 2 IX 1 )P(X 3 J1X2X 1) to calculate P(X 4 ). In this example the number of assessments needed to determine an answer to our question is quite small. This is not true in general, and in a worst case scenario we would be forced to assess every element in the joint distribution. Notice that it is logically possible either to assess the joint distribution directly or to assess it indirectly via a set of conditional probabilities. Direct assessment requires an assessor to assign probabilities to terminal nodes (leaves) of the probability tree in Figure 3.2. Alternatively conditional probability assessment corresponds to assigning conditional probabilities to graph edges in Figure 3.2. Human experts are usually more comfortable making conditional probability assessments. 2 7 Our principal interest is in the probability P(X 4 ) that this system will fail. Suppose that we do not assess P(X 4 ) directly, but rely solely on assessments of X 1 , X 2 and X 3 . Absent logical relations among events, we have 24 = 16 possibilities so that: 27 Heckerman (1990), Pearl (1988), Neopolitan (1990) 19 11111111 0 0000000 11110000 1 1 1 1 0 0 0 0 R4= (3.1) 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 0101010 1 0 1 0 However, in this example, events X 1,X 2 and X3 1 0 1 0 are logically dependent. The third column of Table 3.1 displays logical dependence relations among elements of the vector X = (X 1, X 2, X 3 , X 4 ) of events in algebraic form, suitable for immediate use in LP format. 28 The relation X 3 < 1 - X2 says that at most one of X 2 = 1 and X3 = 1 can obtain, the relation X4 = 1 - (1 - X 1 )(1 - X 2 )(1 - X 3) says that X 4 = 1 obtains if at least one of X1 = 1, X 2 = 1 or X3 = 1 occurs. Therefore, the realm matrix becomes: 111 0 0 0 1 0 0 1 0 0 R 4 ,s(16) = , (3.2) 010010 111110 so that the vector of possible constituent events is: E = (E 1 , E 2 , E 3 , E 4 , E5, E 6 )t - (X 1 X2 34 Xl13X4 23 XlX2 3X4, XlX2T3X4, X12X3X4, Xl X2 X3 X4) The columns of R4,s(16) denote logically realizable joint events. Notice that the joint event x 1 x 2x 3 x 4 does not appear in either E or in R 4,s( 1 6) because it violates the logical relation prohibiting x 2 and 3 from both occurring. If we define ri to be the ith row of R 4 ,s(16 ), then the previsions (marginal probabilities) 28 For problems with many events and complex logical relations among them, we need an efficient algorithm for translating Boolean relations into algebraic relations. We return to this in section 4. 20 P(Xi = xi) _ P(xi) = riq. Namely, via the law of total probability P(Xi) is a sum of some elements of the vector q = (ql, q2 ,..., q6 ) t of probabilities of logically possible constituent events. For example, P(X 1 ) = ql + q2 + q3. Given the assessments shown in the last column of Table 3.1, the FTP provides bounds on P(X 4 ) by solving the following two LP problems. Let ri be the ith row of R 4 ,s( 1 6). Find min and max r 4 q subject to: .007 < rq < .010, r 2q = .005, r 3 q = .015, q > O; and with I t = (1, 1, 1, 1, 1), ltq = 1. Using a standard LP package, one finds that min r 4 q = .02 and max r 4 q = .03 so that .00 < P(X 4) < .00. This simple example illustrates how the FTP works. As a second example, consider the simple fault tree shown in Figure 3.3. This fault tree represents a system containing two parallel redundant subsystems, composed of four elementary and two common-mode failure types. The entire system fails (X 11l) if and only if both parallel subsystems fail (X 9 and X 10 ). Common-mode Failure 1 (X 5 ) appears in both parallel subsystems and is sufficient to cause the entire system to fail. Common-mode Failure 2 (X 6 ) also appears in both parallel subsystems, but is not a sufficient cause for the system to fail. Common-mode Failure 2 will 21 cause the system to fail if and only if both Basic Failure 2-A and Basic Failure 2-B occur as well. Now suppose that we have probability assessments for events X 1, ... , X 6 in the fault tree and wish to compute bounds on probabilities of composite events X 7, ... , X 11. It is not necessary in this case to make any assumptions regarding probabilistic dependencies among these eleven events in order to apply the simplest version of the FTP, FTP I. Scenario 1 in Table 3.2 displays bounds on P(X 7), ... , P(X 1 1) dictated by the unconditional probability assessments for X 1, ... , X 6 shown in the first column of the table. Scenarios 2a), b) and c) of the table show bounds that arise when conditional probability assessments are added to the unconditional assessments shown in the first column of the table. A numerical assessment of a conditional probability such as P(X 1 X 3 ) = .020 leads to a linear equality among elements of q and so fits nicely into the framework of FTP I.29. As the number of events N becomes large, solving FTP linear programs with standard LP algorithms quickly becomes impossible, because the number of decision variables grows exponentially in N. A different tactical approach is therefore required. In the next section we present a column-generation algorithm that enables us to solve such LP problems efficiently and exactly in reasonable time even when the realm matrix is huge. We couple this algorithm with a method for automatic translation of logical relations into algebraic relations that are compatible with an LP representation of the realm matrix. 29 Lad, Rahman and Dickey (1990) show that determination of probability bounds for a conditional target event can be found by solving a linear fractional program. See section 4 for a discussion of the numerical example shown in Scenario 3 of Table 3.2 22 Figure 3.3: A simple fault tree. X2 X6 X4 23 Table 3.2 Probability assessments and bounds determined by the FTP for the simple fault tree. Scenario 1 2(a) 2(b) 2(c) 3 Probability Assessments Unconditional Conditional P(xl) = P(x 2 ) = P(x 3 ) = P(x 4 ) = P( 5) = P(x 6 ) = P(xl)= P(x 2 ) = P(x 3 ) = P(x 4) = P(x 5 ) = P(x 6 ) = P(xl) = .010 .020 .010 .020 .030 .050 .010 .020 .010 .020 .030 .050 P(x 3 ) = P(x 4 )= P(x 5 ) = P(x 6 ) = P(xl) = P(x 2 ) = P(x 3 ) = P(x 4 ) = P(x 5) = P(x 6 ) = P(xl) = P(X 2 ) = P(x 3 ) = P(x 4 ) = P(X 5) = P(x 6 ) = .010 .020 .030 .050 .010 .020 .010 .020 .030 .050 .010 .020 .010 .020 .030 .010 P(x 2 ) = .020 Computed Bounds P(x 7 ) = [0.0, .020] P(x8) = [0.0, .020] P(x 9g) = [.030, .060] P(xlo) = [.030, .060] P(xll) = [.030,.060] P(X 2l 6 ) = .300 P(x 4 lx 6 ) = .200 P(xIlx 6 ) = .200 P(x3 lx6 ) = .100 P(x 7 ) = P(x 8 ) = P(xg) = P(xlo) = P(xll) = [.015,.015] [.010, .010] [.030, .055] [.030, .050] [.030, .050] P(x 7 ) = [0.0, .020] P(x 8 ) = [0.0, .020] P(xg) = [.030, .060] P(xo)= [.030, .060] P(xll) = [.030, .055] P(xllx3 ) = P(x3lxI) = .020 P(x 2 lx 4 ) = P(x4lx 2 ) = .100 P(xllx2 ) = P(x 2 lx) = .200 P(x7) = [0.0,.020] P(x 8 ) = [0.0,.020] P(x 9 ) = [.030,.056] P(xlo) = [.030, .056] P(xll) = [.030, .044] P(x 2 lx 4 ) = .020 P(x 2 lx 6 ) = .020 P(x 4 1x 6 ) = .020 P(x 7 ) = .001 P(x 8 ) = .001 P(x 9 ) = [.030, .041] P(xlo)= [.030, .041] P(xll) = [.030,.041] .050 P(x 7 lx 6 ) = P(x8slx 6) = .020 P(x 9 lx 6 ) = P(xloIx6 ) = [0.020,.820] P(xlIlx 6 ) = [0.0, .820] 24 4 The RIP Algorithm The pumping system fault tree in Figure 5.1 is arguably not much more complicated than the fault tree for our first simple example. However, computation of bounds for P(X 18) took more than 38 hours of CPU Time and 27 megabytes of virtual memory on a Sun 4/379 using the standard LP routine in Mathematica. Using the RIP (Related Integer Program) algorithm, this example was solved in 5.44 CPU seconds on a Sun Sparcstation 10. This example is a graphic illustration of the advantage of developing a specialized algorithm for solving FTP linear programs. 4.1 Probabilistic Logic and Efficient Representation of Realm Matrices In order to work with large event vectors, we need an efficient way of translating logical relations among events into a realm matrix. By "efficient" we mean an algorithm whose running time is O(N) rather than 0 ( 2 N), that can be constructed from the event vector in time and space proportional to N and that supports operations we need to perform on the realm matrix. In order to translate logical restrictions on events into a form compatible with the FTP we need to represent Boolean functions by integer linear inequality constraints. For example, the logical relations that lead to X 4 = 1- (1 - X 1)(1 - X 2 )(1 - X 3 ) are equivalent to the pair of inequalities X 4 < X 1 + X 2 + X 3 < 3X 4 over the binary values of Xi, i.e., xi E {0, 1}. In the course of work on formalisms for representing uncertainty and the problem of probabilistic inference, Nilsson (1986, 1993) re-discovers the FTP and provides a systematic method for mapping logical relations among events onto linear integer equalities and inequalities. 25 Logical or deductive inference is sound in the sense that only valid conclusions can be deduced from a set of premises. Logic is also complete in the sense that all valid conclusions can be deduced from a set of premises using inference mechanisms. 30 Nilsson undertakes generalization of modus ponens to probability in a fashion that does not compromise sound inference and that makes no implicit or unacknowledged probabilistic assumptions. Nilsson's treatment of modus ponens has become known as the probabilisticentailment problem and his generalization of logic to encompass probability has become known as probabilistic logic. If XN = {x1,. . , N} is a set of propositions, then the set of well-formed formulas consisting of the closure of XN under finite conjunctions and negations is a language L(XN). A knowledgebase in L(XN) is a set K = {(si, Pi), i = 1,..., N} where si E L(XN) and pi is a probability assessment for the logical statement si . For example, suppose that L(X 2) = {xi, xl - and K =-{(x,.5), (xl - x 2,.95)}. Here xl - x 2} x2 is the logical conditional "not xl or x 2." The knowledge base K is incomplete because it does not provide a joint probability distribution for xl and x 2. Determination of the probability of x 2 from knowledge of K is an example of Nilsson's probabilistic entailment problem. In this case K implies only that the probability P(x 2) must lie in the interval (0.45,0.95); i.e. P(x P(x 2 ) < P(xl - x 2) + P(xl)- 1< x 2 ). Among those who have worked on formalisms for representing uncertainty Nilsson is unique in addressing computational issues. He explores some exact computational methods for small problems and approximate methods for problems that are too large to solve exactly. If we desire to transform any arbitrary Boolean function into linear inequality integer 30 As AI researchers are keenly aware, the soundness and completeness of propositional and first order logic is a theoretical tenet and not an algorithm for efficient inference. Propositional logic is decidable which means that there is a well defined procedure to determine in finite time whether or not an argument is valid. First order logic is not decidable. Practical deduction systems compromise completeness for computational efficiency. Expert systems are incomplete by design. They trade computational efficiency for completeness and are able to draw only a limited subset of the possible conclusions from a given set of premises. 26 constraints then it is sufficient that we be able to represent disjunction and negation (cf Ebbinghaus, Flum, & Thomas, 1984; Mendelson, 1987). The FTP also requires the normalization constraint. Suppose we have an event vector XN = (X 1 ,... , XN) t such that if xi is a logical relation on other events in XN then xi is one of the following: (a) a disjunction of events in XN, (b) a conjunction of events in XN, (c) a negation of an event in XN, (d) a conditional quantity on events in XN, (e) a linear relation on events in XN, or (f) a normalization constraint. Let Dj denote the set of possible values of elements zj in z. For now it is best to think of Dj {0, 1}. We want an algorithm for constructing a system of linear inequalities Mz < g from the event vector XN with the property that a vector z satisfies Mz < g and Zj E Dj, j = 1, . . . , N, if and only if z is a column vector of the realm matrix in use. To this end we need the following: Proposition: Let RN,S(N) be the realm matrix in use. The pair (M, g) is a matrix function with domain XN. For a given XN, the vector z = (zl,..., ZN)t is a column of RN,S(N) if and only if z E z IMz < g, zj E Dj,j = ,..., N} (4.1) for some M and g. We can construct a system of inequalities Mz < g that satisfies (4.1) by iterating through the events in XN. For each xj in XN we add the appropriate inequalities to the system Mz < g, as follows: 27 Definition: If XN is an event vector, then for every Xi E XN, (M, g) is constructed as follows: (a) Conjunction Case: Xi = njn=lxkj if and only if (i) (ii) . Xkl + Xk 0 - ni xk~ +... + kn - xi < n -1 are in (M, g). (b) Disjunction Case: Xi = Uj=xXkj if and only if (i) (ii) xk +... + +Xk, - i > 0 .. . + Xk,, - nxi <:0 Xkl - are in (M, g). (c) Negation Case: Xi = xj if and only if j = 1 xi - is in (M, g). (d) Conditional Quantity Case: Xi = (xj (i) Xi = (1- (ii) w < xj (iii) W < Xk (iv) W < Xj + Xk - 1 Xk)P(X Xk) if and only if I Xk) + wE [0,1] are in (M, g). 31 n (e) Linear Relation Case: For Xi = xi, xi < E ajxkj if and only if j=1 n xi < Z ajxkj 31 See Appendix B for motivation of this construction. 28 is in (M, g). (f) Normalization Case: Xi is a normalization event if and only if Xi = 1 is in (M, g). The above definition is the basis for a fast algorithm for representing a realm matrix RN,S(N) by a system of linear inequalities, (M, g) over domains Dj,j = 1,..., N. For each relation in XN, at most four inequalities are introduced into (M, g). Thus, given an event vector of N events the conversion algorithm would output (M, g) with M no larger than 4N x N and g no larger than 4N. To illustrate how realm matrix generation works, let us return to the fault tree example presented in Section 3.1. Table 3.1 declares that events X 2 and X 3 must satisfy X 2 + X 3 < 1 and that the target event X 4 = 1 - (1 - X 1)(1 - X 2 )(1 - X 3 ). This last equality is equivalent to the pair of linear inequalities X 1 + X 2 + X 3 < 3X 4 and X 1 + X 2 + X 3 > X 4 . If we define M= 0 1 1 0 -1 -1 -1 1 1 1 1 -3 1 ) g 0 (4.2) O Lof and define D1 = D2 = D3 = D4 = {0, 1}, then {z I Mz < q, zi = 0 or 1, i = 1,...,4} (4.3) characterizes columns of the realm matrix R 4 , 6 shown in (3.2). 4.2 Solving the Master Linear Program We now reconsider the LP dictated by the FTP from the vantage point of a system of inequalities Mz < g, with zj E Dj, j = 1, . . . , N that represent a realm matrix RN,S(N) as linear inequality constraints over the sets Dj, j = 1,..., N. Add the normalization constraint 29 ltq = 1 as a last row to RN,S(N), let c t be the row in RN,S(N) corresponding to the prevision we want to bound and define A = [ ] and b = . (4.4) Then the FTP says we have the following LP problem to solve; we call it the "Master LP Problem" (MLP): Find max or min ctq subject to Aq = RN,S(N) .N q =b, q>O, (4.5) where RNS(N) In (4.6), the column vectors, zl,-... [zl... ZS(N) (4.6) ZS(N), consist of the set of all vectors z that satisfy Mz < g, Zj E Dj,j = 1,..., N, each of which is a column of RN,S(N); i.e. M and g are constructed to meet these conditions. Without loss of generality, assume henceforth that (4.5) is a minimization problem. The dual of the MLP is: Find max Atb subject to AtA > ct (4.7) or equivalently, with It = (1, 1,..., 1), find max 7rt p + 0 30 (4.8) subject to 7rtRN,s(N) + 01 > c t . The dual constraint imposed by the jth column of A is 7rta(j) > cj and the corresponding reduced cost for a column index a ( j) is pj = cj - 7rta(j) where 7r is the vector of simplex multipliers. Given a basic feasible solution to (4.4), the revised simplex method consists of four steps: (i) Calculate the reduced cost vector p = (P,... PS(N)). If p < 0 the current basis is optimal. (ii) Determine the column vector in A to enter the basis. (iii) Determine the column vector to leave the current basis. (iv) Update the basis, the feasible solution, and the simplex multipliers 7r. When the size of A does not permit direct execution of (i) and-(ii) we need an efficient algorithm for generating columns of A to serve as candidates to enter the basis. If the reduced cost pj of column a (j ) of A can be represented as a linear function of elements of a (j) then steps (i) and (ii) of the revised simplex method can be recast as a mixed-integer programming problem. Given a basis B, steps (iii) and (iv) operate only on B and are tractable for any N. Suppose that the objective function coefficient cj is equal to a linear function ftz of some column z of RN,S(N). Then the reduced cost pj = cj- rtz = (f - 7r)tz. The column of A with the most negative reduced cost can then be found by solving the following Integer Programming Sub-Problem (IPS): 32 min(f - 7r)tz - 0 32 The coefficient 0 in the objective function corresponds to the normalization constraint. It is distinguished because it plays an important role in a subsequent stage of the RIP algorithm. 31 subject to Mz < g (4.9) and for j=1, 2, ... , N, z e Dj. or equivalently, z e{a(),... ,a(S(N))} The branch-and-bound IP algorithm provides valuable information about the MLP. A search for the optimal IP solution is initiated by solving a linear relaxation of the RIP: Linear Relaxation: Find min(f t z - 7rtz - 0) (4.10) subject to Mz < g. The solution to (4.10) yields a lower bound on the objective function value for the optimal integer solution fip (cf. Bradley, Hax, & Magnanti, 1977). Let flp be the value of the objective function for the optimal solution to (4.10). Then p< < ftz - rtZ- 0 (4.11) for all feasible z (cf. Bradley, Hax, & Magnanti, 1977). If the branch-and-bound algorithm determines that SIp > 0 then our current solution to the master LP problem must be optimal (i.e., there is no column of A with negative reduced costs). Moreover, the IP algorithm yields a sequence of feasible integer solutions as it searches for the optimal integer solution. Given 32 a feasible integer solution z' with objective value ftz' - wtzi - 0, we can use flp to decide whether or not to terminate the search. If - (ftz/ 7tZ' (4.12) 1p < - 0) - and (ftz/ - (4.13) tZ' - 0) < o0, then we let z' be the column to enter the basis; otherwise we continue the branch-and-bound algorithm. Quite fortunately, the optimal IP solution also provides a lower bound on the master LP problem. Theorem: Let (r 't, 0') be the current dual prices for the master LP and let fp be the lower bound on the corresponding IP problem given in (4.10). Then the optimal objective value for MLP is bounded below by [ t ' ]- (4.14) lp Proof. For every feasible [zt, 1]t and ci = ftz as in (4.10), ftz - Wrtz - 0' > elp, that is, Ci > 7rtZ + 0' + e'lp Thus, 7FI+~;, and [7r/t, < c 01+ip]t is dual-feasible with objective value bt [w't , 0'+-lp]t. 33 [] Thus we can use the dual solution corresponding to the current primal solution to generate bounds on the optimal objective value of the master LP problem, terminating when the bounds are sufficiently tight. Finally, note that Bertsimas and Tsitsiklis (1997) discuss a probability consistency problem related to but with considerably more structure than our implementation of the FTP. With the assumed additional probabilistic structure, they provide a polynomial-time algorithm for the solution to a problem that is very much related to the FTP. The method that drives their proof may lead to an enhancement of our algorithm. attention. 4.3 Conditional Prevision Expert opinion about an uncertain quantity is often more easily elicited by asking for the prevision, say, of X conditional on knowledge about another uncertain quantity Y on which X depends in some fashion. We shall display dependence of X on Y as XlY. If X and Y are events, we write P(XIY) for the conditional prevision of X given Y and P(XY) for the prevision of the joint occurrence of X and Y. De Finetti (1974) 33 proves that: A necessary and sufficient conditionfor coherence in the evaluation of P(X), P(Y) and P(XIY) is compliance with P(XY) = P(Y)P(XIY), (4.15) in addition to inf (XIY) < P(XIY) < sup (XIY) and 0 < P(Y) < 1. If X is an event the relation (4.15) is called the theorem of compound probabilities and the inequality for P(XIY) reduces to 0 < P(XIY) < 1. A numerical appraisal q of the probability P(XIY) of the event X given Y = y fits into Part I of The FTP because, for a fixed number P(XIY = y) = q, P(X and Y = y) = qP(Y) 33 de Finetti (1974) Vol I, p 136 34 the probabilities P(X and Y = y) and P(Y) are each sums of probabilities of constituent events. Consequently, the resulting constraints are linear in elements of q. If, however, the target is a conditional event XIY, given an uncertain quantity Y, the probability P(X[Y) is a ratio of sums of elements of q. Lad, Dickey and Rahman (1991) show that the FTP can be extended to compute bounds on conditional previsions in the following way: if XN is any vector of N uncertain quantities and XN+2 IXN+1 is the target event whose prevision we wish to bound, define XN+3 as the conjunction of XN+1 and XN+2, XN+3 = (XN, XN+1, XN+2, XN+3 )t, RN+3,S(N+3) as the realm matrix for XN+3, RN,S(N) as the realm matrix for XN and rN+l, rN+2, rN+3 as the final three rows of RN+3,s(N+3) corresponding to XN+1, XN+2 and XN+3 respectively. FTP II: Given an assessment of previsions P(XN) = PN for XN, any further assessment of the conditional prevision P(XN+1IXN+2) coheres with P(XN) if and only if P(XN+l IXN+2) E [I, u] where I and u are the extrema of the following linear fractional programming problems: (a) Find the minimum and the maximum u of rN+3q rN+lq subject to RN,S(N)q = PN, ltq and q > 0. (b) When the feasible region is non-empty, finite extreme value solutions exist if and only if rtN+1q is strictly positive for all vectors q in the feasible region. (c) The feasible region is non-empty if and only if P(XN) is coherent. We illustrate this construction on the simple fault tree example of Figure 3.3. Define numerical assessments for P(Xi), i = 1, 2,..., 6, P(X 2 lX4 ), P(X 2 1X 6) and P(X4lX 6 ) as P(X2 1X 4 ) = P214, P(X 2 IX6 ) = P216 and P(X 4 1X 6 ) = P416 35 and set P = (Pl, Let X 12 = X 11X 6 , , P6, P214, P216, P416)t- a constituent event composed of the intersection of X 11 and X6, and let Xelm denote the event "Xe given that Xm obtains." In correspondence with the above definition of p, recognizing that our target event is X1 116 , define X1116 = (X 1 , . . , X 6 , X 2 14 , X 2 16 , X4 16 ,X 11 , X 12 ) t (Xt9, X 11 X 12 )t. Let R(XS) denote the (9 x 64) realm matrix for X 9 , R(X1 1 16) the (11 x 64) realm matrix for X 11 16, rll and r 12 the last two rows of R(X 11 16 ) respectively and set A= [R(Xs) and b = (pt, l)t. The FTP says that P(X 11 IX 6 ) must lie in the interval [1, u] found by finding min and max r 2 q r 6q subject to Aq = b and q > O. This is called a linear fractional programming problem, and is easily converted to and solved as a standard linear programming problem. Given numerical appraisals of previsions shown in Scenario 3 of Table 3.2 for elements of Xg, the solution to this problem is computed to be 0 < P(X 11lX6) < 0.82, a large interval. The marginal probability P(X 11 ) shown in Scenarios 1, 2a), 2b) and 2c) all lie in intervals of smaller length. Knowledge that X 6 obtains substantially lengthens the bounding interval. Bounds for intermediate probabilities as well are shown in Table 3.2. 36 5 Applying the RIP Algorithm Comparison of computational performance of the RIP algorithm with MATHEMATICA's off-the-shelf LP routines is something of a straw man. While the MATHEMATICA routine is robust, it is one of many in a broad mathematical toolkit with significant time and space resource overhead. A highly optimized dedicated LP package such as CPLEX will certainly yield better overall computational results. For example, a well behaved 500 variable LP, even a dense one, is well within the limits of CPLEX on a mid-range RISC workstation. On the other hand, CPLEX is no match for an LP with more than one billion variables. Results reported here are based on CPU times reported by the Sun OS time (1) utility and by the Timing facility in MATHEMATICA (user time only). The RIP times include both used and system times. All computations were done on SUN SPARC workstations running Sun Os v4.1.x. Because of licensing restrictions, several different machines were employed: MATHEMATICA on a Sun 4/370 workstation and CPLEX (the RIP algorithm) on a SUN SPARCTATION 10. The SPECmark rating of the Sparcstation 10 is roughly three times that of the 4/370. Computation of solutions to the two fault tree problems discussed thus far sets the stage for an additional, more ambitious illustration of the performance of the RIP. 5.1 Further Examples Figure 5.1 shows a fault tree for a pumping system composed of 12 elementary events and 10 compound events. Bounding intervals were computed using the marginal probabilities shown in the figure. As pointed out earlier, this fault tree is arguably not much more complicated than the simple fault tree shown in Figure 3.3, but the realm matrix is 22 x 4,048 37 and computing bounds on P(Xls) took more than 38.04 hours of CPU time and 27 megabytes of virtual memory on a Sun 4/370 using the standard LP routine in Mathematica. 3 4 The size of the realm matrix, the total number of pivots required to reach a solution and a Phase 1 and Phase 2 breakdown for the RIP algorithm are provided in Table 5.1. The second column in Table 5.2 displays the number of events. Statistics for simple fault tree Scenarios 1, 2a) and 3 are shown for comparison. Table 5.1 The results of applying the RIP algorithm to the fault trees. Target Events Decision Variables Pivots Bounds Pumping system fault tree X18 20 4096 Simple fault tree X11 12 Simple fault tree X11 12 Simple fault tree (x111 x6) 19 130 u 45 Scenario 1. 64 113 u 17 Scenario 2a). 64 114 u 17 Scenario 3. 64 122 u 22 0.200 0.520 0.030 0.060 0.030 0.520 0.060 0.820 A fault tree system with 30 basic failure events and 12 non-basic failure events is displayed in Figure 5.2. The realm matrix contains more than 230 > 1.07 x 109 columns, so a standard LP algorithm could not even be applied. Results are shown in Table 5.2. Bounds for each of the non-basic failure events were computed using the sub-tree for each intermediate event. In each case the terminating conditions are that the IPS solution is non-negative and IPS relaxation is non-negative. The second column in Table 5.2 displays the number of events in 34 The LP routings in the optimization toolbox for Matlab v3.0 could not solve this problem at all in the available virtual memory on the same machine (64 megabytes). 38 the problem specification, the third column lists the size of the realm matrix for each nonbasic event, the fourth column shows both the total number of pivots and the breakdown of Phase I and Phase II and, finally, the fifth column displays lower and upper FTP probability bounds. The entire case required 270 seconds of CPU time to compute all of the bounds shown in Table 5.2 including overhead introduced by the UNIX shell script used to run cases sequentially. It is possible to improve this performance by configuring the RIP algorithm to use an advanced basis so that it can be jump started directly into Phase II. This was not done, so the computation times shown here are based on finding the same basis twice once for the lower bound and once for the upper bound. Of the 270 CPU seconds, 214 CPU seconds were spent solving for the root event X 42 of the fault tree, an event with 30 basic failure events. The next largest cases are X 40 and X 41 each of which have 15 basic failure events. These cases required only 10-15 CPU seconds to complete. 39 Figure 5.1: The pumping system fault tree. [0.2 [0.2] D6] 06] [ X5 ).03] [0.05] X 3 [0.02] [0.01] [0.02] 40 [0.01] Figure 5.2: Top events in the fault tree for the CSIS system. BrileeC s and Rures rcn Poe e Check Vdiveond MOV N Assembly MOV' X6 [.100) 0 iiX30 Conoanment. hsulf cen. Fkrd Penerrcolon >4 Ruptre XO [.010) rom Contemmen Penetroaton64 XI [.020] Fre Vce, /heck Prel CV\ Berren and.. nd\ MOV"s XPksP Pk X3 99ed X7 \/ [.021 wpe Ir 0 n Pre VCCh AsseMJy MOV Pe X X8 [.02)) Conrasnmenl Penetrtion 64 PIlged Penetrcion 3 Ruptur X3 1.0223] X2 [.011 41 Osue ,icren?F~Ld frcm Conne Penetrarlon 63 VIdve Beree and eck Pae MO ed X9 [.002] ninmen Peneroon 63 Plgged / X4 .035] 1.055) \ Table 5.2 Target Events X30 5 x31 5 X32 8 X33 8 X34 12 X35 12 x36 15 X37 15 x38 18 X39 18 X40 22 x41 22 x42 44 5.2 Realm Pivots Bounds 8 17 0.610 u7 0.641 8 17 0.410 u7 0.500 32 113 0.753 u 16 0.753 32 113 0.410 u 18 0.523 128 135 0.610 u 43 0.808 128 128 0.410 u 38 0.639 1024 153 0.610 u 66 0.838 1024 134 0.410 u 51 0.680 4096 1 144 0.610 u 94 0.841 4096 154 0.410 u 72 0.704 32768 1 168 0.610 u 181 0.921 1102 0.410 32768 u 132 0.751 1.07 x10 9 1696 0.020 u 553 0.751 Conclusion The RIP algorithm is a promising start on expanding the scope of real-world probability assessment problems that can be treated with de Finetti's Fundamental Theorem of Probability (FTP). 42 References Bertsimas, D. and Tsitsiklis, J. N. (1997). Introduction to Linear Optimization. Belmont, MA: Athena Scientific. Boole, G. (1847) The Mathematical Analysis of Logic. Cambridge, UK: Macmillan. Bradley, S. P., Hax, A. C. & Magnanti, T. L. (1977). Applied Mathematical Programming. Reading, MA: Addison-Wesley. Bruno, G. and Gilio, A. (1980) Applicazione del metodo del simplesso al teorema fondamentale per le probabilita nella concezione soggettivistica, Statistica, 40(3), 337-344. Duda, R., Hart, P., & Sutherland, G. L. (1978). Ebbinghaus, H. D., Flum, J., & Thomas, W. (1984). Mathematical Logic. New York: SpringerVerlag. de Finetti, B. (1937). "Foresight: Its Logical Laws, Its Subjective Sources," Annals de l'Institut Henri Poincar, H. Kyburg, Jr. (trans.) in Studies in Subjective Probability. H.E. Kyburg, Jr. and H.E. Smokler (1964), New York: J. Wiley & Sons. - - - (1974). Theory of Probability. Vol 1. A. Machi & A. Smith (trans.) New York: Wiley Interscience. - - - (1975). Theory of Probability. Vol 2. Frankel, E. G. (1988). Systems Reliability and Risk Analysis. Boston: Kluwer. Good, I. J. (1950). Probability and the Weighting of Evidence. London: Griffin. Hailperin, T. (1965). Best possible inequalities for the probability of a logical function of events. American Mathematical Monthly, 72, 343-359. - - - (1986). Boole's Logic and Probability: A critical exposition from the standpoint of con- 43 temporary algebra and probability theory. Amsterdam: North-Holland. Heckerman, D. (1991). Probabilistic Similarity Networks. Cambridge, MA: MIT Press. Henley, H.J. & Kumamoto, H. (1981). Reliability Engineering and Risk Assessment. Englewood Cliffs, NJ: Prentice-Hall. Kahneman, D., Slovic, P. and Tversky, A. (eds.) (1982) Judgment Under Uncertainty: Heuristics and Biases, Cambridge: Cambridge University Press. Karimi, R., Rasmussen, N. & Wolf, L. (1980). Qualitative and quantitative reliability analysis of safety systems. MIT Energy Laboratory Report No. MIT-EL80-015. Lad, F., Dickey, J. & Rahman, A. (1990). The fundamental theorem of prevision. Statistica, 50(1), 19-39. - - - (1991). Numerical application of the fundamental theorem of prevision. Journal of Statistical Computatibn and Simlation. 19-38. Lad, F. (1996). OperationalSubjective Statistical Methods. New York: Wiley Interscience. Mendelson, E. (1987). Introduction to Mathematical Logic (3rd ed.). Monterey, CA: Wadsworth & Brooks/Cole. Myers, T. (1995). Reasoning With Incomplete Probabilistic Knowledge. Ph.D. thesis, Mas- sachusetts Institute of Technology. Nilsson, N. J. (1986). Probabilistic logic. Artificial Intelligence, 28, 71-87. --- (1993). Probabilistic logic revisited. Artificial intelligence, 59, 39-42. Neopolitan, R. (1990). Probabilistic Reasoning in Expert Systems: Theory and Algorithms. New York: Wiley Interscience. Pages, A. & Gondran, M. (1986). System Reliability: Evaluation and Predictionin Engineering. London: North Oxford. 44 Pearl, J. (1988). ProbabilisticReasoning in Intelligent Systems: Networks of Plausible Inference. Series in Representation and Reasoning. San Mateo: Morgan-Kaufman. Quinlan, J. R. (1983). Inferno: A cautious approach to uncertain inference. The Computer Journal, 26, 255-269. Shafer, G. (1979). Mathematical theory of evidence. San Mateo, CA: Morgan-Kaufman. Shortlife, E. H., Buchanan, B. G. (1975). A model of inexact reasoning in medicine. Mathematical biosciences. 23, 351-379. Smith, C. A. B. (1961). Consistency in statistical inference and decision. Journal of the Royal Statistical Society, series B. 23, 1-25. Szolits, P. Pauker, S. G. (1978). Categorical and probabilistic reasoning in medical diagnosis. Artificial Intelligence, 11, 115-144 Tversky, A. & Kahneman, D. (1974). Judgement under uncertainty: Heuristics and biases., Science, 185, 1124-1131. United States Nuclear Regulatory Commission (1975). "Reactor safety study - An assessment of accident risks in U. S. commercial nuclear power plants." NUREG-75/014. United States Nuclear Regulatory Commission (1981). Fault Tree Handbook. NUREG-0492. Valverde, L. (1997). Uncertain Inference, Estimation and Decision-Making in Integrated Assessments of Global Climate Change. Ph.D. thesis, Massachusetts Institute of Technology, Management and Policy Program, September 1997. von Winterfeldt, D. & Edwards, W. (1986). Decision Analysis and Behavioral Research. Cambridge, U.K.: Cambridge University Press. White, A. L. (1986). "Reliability Estimation for Reconfigurable Systems with Fast Recovery." Microelectron Reliab., 26, No. 6, pp. 1111-1120. 45