From: AAAI Technical Report SS-94-06. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved. CONSTRUCTING BELIEF NETWORKS TO EVALUATE PLANS Christopher Elsaeser AI TechnicalCenter The MITRE Corporation 7525Colshire Drive McLean, VA22102 chris@starbase.mitre.org Paul E. Lehner SystemsEng. Dept. & C3I Center GeorgeMasonUniversity Fairfax, VA22030 & The MITRECorporation plehner@masonl.gmu.edu ABSTRACT This paper examinesthe problemof constructing belief networks to evaluate plans producedby an knowledgebased planner. Techniquesare presented for handling various types of complicating plan features. These include plans with context-dependent consequences, indirect consequences,actions with preconditions that must be true during the execution of an action, contingencies, multiple levels of abstraction, multiple execution agents with partially-ordered and temporally overlappingactions, and plans whichreference specific times andtime durations. Contentareas: planning,probabilistic reasoning 1. INTRODUCTION Uncertaintyis ubiquitous in planning problems.Despite this, few knowledge-basedplanning systems have been developedthat can reasonexplicitly about uncertainty. Instead, mostknowledge-based planningsystemsare based solely on symbolicreasoning (Allen, et. al., 1990). Althoughthese systemsmayemploytechniquesthat adapt a plan to unanticipated events, they cannot generate quantitativeuncertaintyestimateof possiblefuture states. Scott A. Musman AI TechnicalCenter The MITRECorporation 7525Colshire Drive McLean, VA 22102 musman@ starbase.mitre.org Consequently,the best these planners can do is react. Theycannot generate plans that are deducedto be robust againstprobablefutures. Recently a numberof researchers have recognized the importanceof uncertainty in automatedplanningand are developing approaches to address it (e.g., Dean Wellman,1991; Hanks,1990; Kushmerick,et.al, 1993). Common to manyof these approachesis the use of belief networksto represent and reason about uncertainties in plans. To date, however,research in the use of belief networksto reasonabout uncertaintyin planninghas been restricted to limited types of plans. Most,for instance, assume a single execution agent, a single level of abstractionandno contingencies. If belief networks are to provide a foundation for probabilistic planning,then weneedto examinethe extent to whichdifferentplan featurescanbe representedin belief networks.This paper examinesthis issue. In particular weshowhowto developbelief networksthat can handlea variety of plan features. All of the capabilities described beloware being implementedas part of the APplanning system. The AP system is designed for adversarial planning problemswhereeach planning agent mayhave multiple execution agents that execute coordinated activities (Elsaesser andMacMillan, 1991). Iocl !oc2) 0 Si Si+l Si Figure 1. Example of Action and Persistence Models. 148 Si+l 2. BASIC APPROACH The basic idea behind using belief networksfor plan evaluation is to construct a belief network from a knowledgebase of probabilistic action models and probabilistic persistence models. (Wellman,1990). probabilistic action model specifies probability distributions on a set of consequence predicates conditionedonthe state of a set of predecessor predicates. For example,the probability action modeldepicted in Figure1 asserts that the location of the object referenced by obj ( (Loc obj) ) in the situation after the action movingobj from locl to loc2 ( (Moveobj loci 1oc2) is completed (situation Si+l) is a probabilisticfunction the location of obj in the prior situation. Similarly, the probabilistic persistence model for (Loc obj) is probabilistic function of the state of (Locobj) in the previousstate. Considerthe two step plan (MoveA L1 L2) --> (Move L3L1). To build a belief networkto evaluate this plan, one can begin by sequentially pasting onto the belief networkthe probability action modelfor each action (Figure 2a). Whenpasting onto the belief network, conflicting informationalreadyin the networkis replaced. Notethat the networkin Figure 2a is incomplete,since there are nodesin future states whichare not connectedto the current state. To complete this network, it is necessary to work backwardsthrough the network and sequentially pasting into the networkthe necessary persistence models(Figure 2b). Whenpasting into the network,current entries in the networkare not changed, but previously unspecified nodes and probability assessmentsmaybe entered. constructed,existing algorithmscan be applied to the PEnet to calculate the marginalprobability of any nodein the PE-netas a function of informationabout the initial or futurestate. 3. PLAN FEATURES THAT COMPLICATE PE-NET CONSTRUCTION. PE-netconstructionis straight forwardfor simplelinear plans such as the one mentioned above. However,as plans get morecomplex,the processof constructinga PEnet becomescorrespondingly more complex. Belowwe showhowto handle a numberof these complexities. 3.1 PARTIAL MODELS The PE-net approach to plan evaluation assumes a knowledge base of action and persistence models,each of whichis a small, paritially-specified belief network. Giventhe numberof actions and predicates that maybe mentioned in the knowledge base, it is unlikelythat all of the conditional probabilities mentionedin all of these networkswill be specified. It is morelikely that the probabilities for consequencepredicates will only be specifiedfor a subset of the predecessorstates. Wehandle this as follows. Whereverthe action model is under specified, wepaste into the PE-netthe persistencemodels for the consequentpredicates. Whereverthe persistence modelis under specified, wepaste into the PE-net a default persistencemodel.For our applicationsthe default persistencemodelasserts that no changewill take place. Usingthis techniqueall the conditional probabilities in the networkwill be specified. All that remains is to specifythe unconditional probabilitiesfor the initial state. Werefer to a belief network,such as shownin Figure2b, as a plan evaluation network or PE-net. Once SO SO PE.network after action models are pasted on. 2a PE-network after persistence models are pasted in. 2b S2 S1 S2 S1 Figure 2. Constructing a PE-net. 149 S1 SO Figure 3. Problematic PE-net with derived effects. 3.2 DERIVED EFFECTS In developinga PE-net,it is importantto separatecausal effects fromderivedeffects. Causaleffects are links that go froma predicate nodein one situation to a predicate nodein a later situation. Derivedeffects are definedby links betweentwopredicatenodesin the samesituation. To illustrate the kind of problemthat maybe encountered, consider the simple PE-net in Figure 3. This PE-net is for a single Moveaction. It also includes (At L1) nodes whichindicate whatobject is at location L1. Clearly, if (At L1)=Xthen (Loc X)=L1.Consequently,the status (LocX) can sometimesbe derived fromthe status of (AT L1) in the samesituation. It seemsnatural therefore to paste onto the PE-net an arc from (At L1) to (Loc where the conditional probability P((Loc X)=LII(At L1)=X& anything else)=l is specified. This is exampleof a derived effect. Now,assumethat the move action is completelyreliable, all persistence modelsare the no changedefault model, and X is initially at L1. Actions { Derived Nodes { Giventhese assumptions, we would expect (Loc X)=L2 in S 1 with certainty. However,the PE-net in Figure 3 implies (Loc X)=L1in S1 with the certainty! The additionof the derivedeffect unexpectedly resulted in the persistence model for (AT L1) overriding the action model. In general, this type of problemoccurs becausederived effects serve to completean incompletecausal model.In theory, it is possible to do awaywith derived effects altogether. If causality is temporal, then a complete causal modelgoing fromSi to Si+l wouldaccountfor all interactions withina situation. In Figure3, for instance, a completeaction modelwouldhaveboth (LocX) and (At L1) as consequencepredicates. This would removethe need to directly connect (Loc X) and (At L1) in Unfortunately,the knowledge engineeringeffort required to developa completecausal modelis prohibitive, since it wouldrequirethe specificationof conditionalprobabilities for all direct andindirect consequences of an action. Nodes Primitive{ © Figure 4. PE.net with primitive and derived predicates 150 Our approach to derived effects is a compromisebetween complete causal modeling and the liberal use of derived effects. All the PE-nets constructed by our system generates networkswith the structure depicted in Figure 4. Predicates are split into two levels. Primitive predicates do not have interconnections within a situation. It is assumedthat they are only conditioned on the state of the nodes in the previous situation. It is up to the knowledge engineer of the action and persistence models to ensure that the models are causally complete with respect to primitive predicates. Predicates at the derived level can only be conditioned on other nodes in the same situation. The predicates at the derived level changefromsituation to situation. Only relevant derived-level predicates are included. Enforcing this structure removesthe problems with derivedeffects. This approach requires that any predicate mentionedas a consequence in an action model must be a primitive predicate. In Figure 3, therefore, (Loc X) would need be a primitive node, (At L1) a derived node, and the arcs would go from (Loc X) to (At L1). This chnage would repair the problemin Figure 3. 3.4 A plan contains contingent actions when the decision to execute an action (or which action) is contingent on the situation. In a PE-net, contingent actions can be handled by combining actions into a single node, and then conditioning the merged action node on the nodes which determine which action will be executed. Figure 5 depicts a networkwith contingent actions. There two things to note here. First, actions can be made contingent on whether or not previous actions were executed. Consequently,it is straightforward to represent a contingent action sequence (i.e., a contingency plan). Second,there is no requirement that action selection be a deterministic function of the situation. It could be probabilistic, to reflect possible uncertainties about the agents ability to detect the true status of a situation. Alternatively, one could makethe action contingent on a sensor report and makethe sensor report a probabilistic function of the situation. 3.5 3.3 CONTEXT-DEPENDENT EFFECTS Many planners have actions models where the consequencesof an action are functions of the situation in which the action was executed (Wilkins, 1988). In a PEnet this can be handled by invoking these same functions to determine possible node states in situation Si+l as a function of the possible node states in situation Si. Iterating through the states in this waywill enumerateall possible states for each node. PLANNED CONTINGENCIES MULTIPLE LEVELS OF ABSTRACTION. Many planners use operators at varying levels of abstraction. As a result, there maybe plans that are only partially detailed. In order to build PE-netsfor such plans, it is necessary to have probabilistic action models for operators at each level of abstraction. Highlevel actions can be pasted onto the network in exactly the same manneras less abstract actions. A L1 ~ ove A L1 or L~ B L2 L1 L1 (Loc SO A~ (Loc S1 Figure 5. PE-net for plan with contingent actions 151 82 B, C, D ere high level actions. C2 is alternative subplsn, which can be selected instead of C1. C1 e, C1 b, and C1 c are executable. Figure 6. Example hierarchical plan. Whenan abstract operator is expanded,the PE-subnetfor that expansionshould be pasted onto the overall PE-net. Thereare two things to note about the PE-subnet.First, not only shouldit containthe actions that are selected to be part of the plan, but it shouldalso containthe actions that were enumerated,but not selected. To illustrate, consider the plan in Figure 6. B, C and Dare abstract actions, eachcapableof expansion.After expandingC, it turns out that there are two possible approaches to achieving C, namely C1 and C2. C1 is selected for inclusion in the plan and is further expandedto the sequenceof actions Cla, Clb and Clc. To construct the PE-net, a subnet that combines C1 and C2 into a contingentaction nodeis constructedand pasted onto the PE-net. This requires that the conditions be enumerate underwhichthe alternative action will be selected. After this, the subnet for the Cla, Clb, Clc sequence is constructedandpasted onto the PE-net.Thesecondthing to note is that whena PEsubnetfor an expandedsubplan is pasted onto a PEnet, it doesn’t necessarily override everythingin the moreabstract action model.Theremay be consequence predicates of the higher level action model that are not mentioned in the lowerlevel action models. Oneadvantageof using PE-netsto evaluate hierarchical plans is that the PE-netcan be processedto estimateboth the probabilitythat the current plan will succeedand the probability that the current plan will lead to success (i.e., the probability that the plan can be successfully modifiedduring execution). The probability that the currentplan will succeedis the joint probabilitythat the goalconditions(representedas specific states on specified predicates)will be true in the final situation andthat the (mostdetailed) steps in the current plan will be executed, whilethe probabilitythat the plan will lead to successis just the probabilitythat the target conditionswill be true in the final situation. 3.6 OVERLAPPING ACTIONS, DURING CONDITIONS AND EFFECTS. In AP,a planningagent mayplan the coordinatedactivity of multiple executionagents. Althoughthe plan for each executionagent is linear, the overall plan will contain multiplesimultaneousactions with interlockingstart and end situations. To relate the effects of overlapping actions, APaction models use during conditions and duringeffects. A duringcondition is a proposition that mustbe true during executionof an action in order for someeffect to occur. Similarly, someeffects occurduring the executionof an action, rather than in the endsituation of that action. If the probabilistic action andpersistence modelsdo not mention specific times (see below), then PE-net constructionfor plans with overlappingactions proceeds by arbitrarily selecting a linear orderingonthe situations that is consistentwith the interlockconstraints, andthen pasting onto the PE-netanyduringconditionsandeffects of an action for the nodesin the situations betweenthe start and end situation of that action. Theprobability estimates derived from such a PE-net have two useful characteristics. First, they are minimum estimates. This is because the planning agent can choose to further constrain the plan so that the execution agents will execute the actions in a waythat satisfies the linear ordering on the situations. Second, in practical applications the probability estimates of the goal conditions are not likely to changesubstantially if a different linear ordering is selected. This is because nonlinearplanners(such as AP)are specifically designed to impose order constraints whenever the current constraints leave attainment of the goal conditions in doubt. Consequently,while it is certainly possible for a nonlinearplannerto miss an importantorder constraint, a plannerthat does this often is unlikely to transition to practicalapplications. 3.7 REFERENCES TO SPECIFIC TIMES AND DURATIONS. Onecan easily introducetime into situations by addinga predicate for clock time and having action modelsthat assigna probabilitydistributionover the clocktime in the end situation conditionedon the clock time in the start situation. If clock time are included, then the probabilistic persistencemodelscan use time elapsedsince the previoussituation as a conditioningvariable. 152 7a. 7b. "--O S2a So Sl SZb Figure 7. Structure of PE-net for plan that lacks a clear temporal order on situations. This approach workswell for linear plans, wherethe sequenceof situations are necessarilyin temporalorder no matter what the distribution of situation clock times. Unfortunately, this does not alwayshold for plans with overlappingactions. To illustrate the problemmayresult fromoverlappingactions, considerthe plan in Figure7a. In this plan actions A1and A2begin together. A1takes either 2 or 4 minutes to complete, A2 takes 1 or 6 minutes.As a result, the clock time for S1 is either 2 or 4, and for $2 it is either 1 or 6. If the PE-netfor this plan orders the situations S0-->S1-->$2-->$3, then there are possible states for Clock-timein S 1 that comeafter some states for Clock-time in $2. As a result, the persistence models must condition the probability distribution over the other nodesin S 1 as a function of negativeelapsedtimes. Obviouslyintolerable. A solution to this problemis to split situations so that the temporalorderingof the situations is guaranteed.For instance, as shownin Figure7b, $2 can be split into S2a and S2b. A newnode, Relative-end-time is added. The probabilistic action modelfor A2is pasted onto S2a wheneverRelative-end-timeis negative. Otherwiseit is pasted onto S2b. This solution guarantees that the situations are in temporalorder, eventhoughthe clock timesfor the situations mayoverlap. 4. plan. This will occur whenevermultiple node states are generatedfor eachnodestate in a previoussituation. This problem can be mitigated somewhat by defining an "OTHER" node state, whichcombinesinto a single node state a set of nodestates that seemto havelittle relevance to evaluating the plan. In general, if the action and persistence modelsare carefully engineered, then we anticipate that the numberof nodestates will increase linearly with the lengthof a linear plan. Nonlinearplans are moreproblematic.If relative end time nodes are inserted then, as the examplein Section 3.7 indicates, the numberof situations will increaserapidly, whereevery situation will contain mostof tke primitive predicates mentionedin any of the. action models. The rate of increaseis not exponential,but it is substantial. Finally, exact processing of a belief net increases exponentially with the size of the network (Cooper, 1990). This suggeststhat approximate(e.g., montecarlo) algorithmsshouldbe usedto processlarge PE-nets. DISCUSSION Ourworkto date suggests that automatedprocedurescan be developed for constructing PE-nets for plans that contain a variety of complicating features. Belief networks do seem to provide an adequate formal foundation for probabilistic evaluation of plans, and automated constructionof these nets is feasible. References Allen, J, Hendler,J. and Tate, A. (eds.) (1990)Readings in Planning. San Mateo, CA.: MorganKaufmann. Cooper, G.F. (1990) The computational complexity probabilistic inference using Bayesianbelief networks, Artificial Intelligence,42, 393-405. Dean, T. and WeUman, M. (1991) Planning and Control. San Mateo, CA.: MorganKaufmann. Elsaesser, C. and Macmillan,T.R., (1991) Representation and Algorithms for Multiagent Adversarial Planning, Technical Report MTR-91W000207, MITRE Corporation, December1991. Clearly, a great concern is computationalcomplexity. Hanks,S. (1990)Projecting Plans for UncertainWorlds. Ourworkto date suggeststhat for linear plans the number Technical Report 756, Yale University, Dept. of of nodes in a PE-netgrowslinearly with the length of a ComputerScience. plan. However,unless care is taken, the numberof node states will increase exponentiallywith the length of the 153 Kushmerick, N., Hanks, S. and Weld, D. (1993) An Algorithm for Probabilistic Planning, Technical Report 93-06-03, Dept. of ComputerScience and Engineering, Univ. of Washington. Wellman, M. P. (1990) The STRIPS assumption for planning under uncertainty. In Proceedings AAAI-90, Menlo Park, CA.: AAAIPress, 198-203. Wilkins, D. (1988) Practical Planning:Extending the Classical AI Planning Paradigm. San Mateo, CA.: Morgan Kaufmann. 154