From: AAAI Technical Report WS-97-08. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved. Handling Contingency Selection Using Goal Values Martha E. Pollack Department of ComputerScience and Intelligent SystemsProgram University of Pittsburgh Pittsburgh, PA15260 pollack@cs.pitt.edu Nilufer Onder Department of ComputerScience University of Pittsburgh Pittsburgh, PA15260 nilufer@cs.pitt.edu Abstract Akey question in conditional planningis: howmany, and whichof the possible executionfailures should be plannedfor? Onecannot, in general, plan for all the failures that can be anticipated: there axe simply too many.Butneither can oneignore all the possible failures, or onewill fail to produce sufficientlyflexible plans. Wedescribe a planning systemthat attempts to identify the contingenciesthat contribute the most to a plan’s overall value. Plan generation proceeds by extendingthe plan to include actions that will be takenin casethe identifiedcontingencies fail, iterating until either a givenexpectedvaluethresholdis reached or planningtime is exhausted. Introduction Classical AI plan generation systems assumestatic environments and omniscient agents, and thus ignore the possibility that events mayoccur in unexpectedways-that contingencies might arisc during plan execution. A problemwith classical planners is, of course, that things do not always go "according to plan." In contrast, universal planning systems and more recent MDP-basedsystems make no such assumption. They produce "plans" or "policies" that are functions from states to actions. However,the state space can be enormous,makingit difficult or impossible to gener1. ate completepolicies or truly universal plans Conditional planners take the middle road. Theyallow for conditional actions with multiple possible outcomes and for sensing actions that allow agents to determine the current state (Blythe & Veloso 1997; Draper, Hanks, & Weld 1994; Etzioni et al. 1992; Goldman& Boddy 1994; Peot & Smith 1992; Pryor & Collins 1993). A key question in conditional planning is: howmany,and whichof the possible execution failures should be plannedfor? 1ThusDeanet al. (1995)describe an algorithmto construct an initial policyfor a restricted set of states, and incrementallyincreasethe set of states covered. 59 In this paper, we describe Mahinur, a probabilistic partial-order planner that supports conditional planning with contingency selection based on probability of failure and goal values. We present an iterative refinement plemning algorithm that attempts to identify the influence each contingency would have on the outcome of plan execution, and then gives priority to the contingencies whose failure would have the greatest negative impact. The algorithm first finds a skeletal plan to achieve the goals, and then during each iteration selects a contingency whose failure will have a maximal disutility. It then extends the plan to include actions to take in case the selected contingency fails. Iterations proceed until the expected value of the plan exceeds some specified threshold or planning time is exhausted. Planning with Contingency Selection Westart with a basic idea: a’plan is composedof many steps that produce effects to support the goals and subgoals, but not all of the effects contribute equally to the overall success of the plan. Of course, if plan success is binary--plans either succeedor fail--then in somesense all the effects contribute equally, since the failure to achieve any one results in the failure of the whole plan. However, as has been noted in the literature on decision-theoretic planning, plan success is not binary. In this paper, wefocus on the fact that the goals that a plan is intended to achieve maybe decomposableinto subgoals, each of which has some associated value. For example,consider a car in a service station and a plan involving two goals: installing the fallen front reflector and repairing the brakes. Thevalue of achieving the former goal maybe significantly less than the value of achievingthe latter. Consequently,effects that support only the formergoal (e.g., getting the reflector fromthe supplier) contribute less to the overall success of the plan than the effects that support only the latter (e.g., getting the calipers). Effects that support both 1. 2. Skeletal plan.: Construct a skeletal plan. Plan reflnement.: While the plan is below the. given expected value threshold: 2a. Select the contingency with the highest expected disutility. 2b. Extend the plan ¢o include actions that will be taken in case the contingency fails. I-RJ I-FiJ IFIJ PA]B¢i’= (<{-pR},O.95,(PA.-BL)>,<(-PRLO.OS.{|>.<(PR),I,(}>} ~SPEc’r: (<~-BL~.{ }, L{ } J-FL/>,<~LM 1,0.IJ } J-FLt>.<~L~{ },O.gJ}./Ft./> Figure 2: Twoexample .actions. Figure 1: The planning algorithm. goals (e.g., knowingthe phone number of the supplier) will have the greatest importance. Wewill say that an effect e supports a goal g if there is some path through the plan starting at e and ending at g. The high level specification of a planning algorithm based on this idea is shownin Fig. 1. It first builds a skeletal plan which is one in which every subgoal is supported minimally, i.e., by exactly one causal or observation link. In our current implementation of the Mahinur system, ¯the skeletal plan is constructed using Buridan with the restriction that each subgoal is supported by exactly one link. This algorithm depends crucially on the notions of contingencies and their expected values. Wedefine contingencies to be subsets of outcomes of probabilistic actions. For instance, if we repair the brakes, we may or may not succeed in having the brakes functioning properly (perhaps the new calipers are defective). Having the brakes functioning correctly is one contingent outcome of the repair action; not having them is another. A deterministic action is simply one with a single (contingent) outcome. The importance of planning for the failure of any particular contingency can then be seen to be a factor of two things: the probability that the contingency will fail, and the degree to which the failure will effect the overall success of the plan. To compute the former, one needs to know both the probability that the action generating the particular contingency will be executed, and the conditional probability of the contingency given the action’s occurrence. Computing the latter is more difficult. In this paper, however, we make two strong simplifying assumptions. First, we assume that the top-level goals for any planning sproblem can be decomposed into utility-independent subgoals having fixed scalar values. Second, we assume that all failures are equally difficult to recover from at execution time. With these two assumptions, we can compute the 2The utility of one subgoal does not dependon the deo gree to which other goals are satisfied (Haddawy& Hanks 1993). 6O expected disutility of a contingency’s failure: it is the probability of its failure, multiplied by the sum of the values of all the top-level goals it supports. Planning for the failure of a contingency c means constructing a plan that does not depend on the effects produced by c, i.e., a plan that will succeed even if c fails to occur. Action Representation To formalize these notions, we build on the representation for probabilistic actions that was developed for (Kushmerick, Hanks, & Weld 1995, p. 247). Causal actions are those that alter the state of the world. For instance, the PAINTaction depicted both graphically and textually in Fig. 2 has the effects of having the part painted (PA) and blemishes removed CBL)95% the time when the part has not been processed CPR). Definition 1 A causal action N is defined by a set of causal consequences: N: {<tl,Pl,l,el,1 tn,Pn,l,en,1 >,...,<tl,Pl,l,el,/ >,...,<~ tn, >,..., Pn,m,en,m >}. For each i,j: ti is an expression called the consequence’s trigger, and ei,j is a set of literals called the effects. Pi,j denotes the probability that the effects in ei,j will be produced given that the literals in ti hold. The triggers are mutually exclusive and exhaustive. For each effect e of an action, RESULT(e, st) is a new state that has the negated propositions in e deleted from, and the non-negated propositions added to st. Conditional plans also require observation actions so that the executing agent will knowwhich contingencies have occurred, and hence, which actions to perform. Observational consequences are different from causal consequences in that they record ¯ two additional pieces of information: the label (in //) shows which proposition’s value is being reported on, the subject (in \\) shows what the sensor looks at. The subject is needed when an aspect of the world is not directly observable and the sensor instead observes another (correlated) aspect of the world. For instance, a robot INSP~.cTing a part might look at whether it is blemished (BL), report whether it is flawed (/FL/) (Fig. 2). Note ( II~l~¢’l" Figure 3: Multiple outcomes. the label and subject are not arbitrary strings--they are truth-functional propositions. Definition 2 An observation action N is a set of observational consequences: N: {< sbjl,tl,Pl,l,el,i,ll,1 Figure 4: Observation links. >, ¯ ..,< 8bjl,tl,Pl,hel,l,ll,! >,..., < sbjn, tn, Pn,1, en,1, In,1 >, ¯ .., < sbjmtmpn,m, en,m, ln,m >}. For each i,j: sbjl is the subject of the sensor reading, tl is the trigger, ei,j is the set of any effects the action has, and lij is the label that shows the sensor report, pi,j is the probability of obtaining the effects and the sensor report. The subjects and triggers are mutually exclusive and exhaustive. The subject and trigger propositions cannot be identical, although the subject and label propositions can be. This observation action representation is similar to C-Buridan’s (Draper, Hanks, & Weld 1994). However, C-Buridan labels are arbitrary strings that do not relate to propositions, and it does not distinguish the subject of the sensor reading. Contingencies An important issue in the generation of probabilistic plans is the identification of equivalent consequences. Suppose that an action A is inserted to support a proposition p. All of the outcomes of A that produce p as an effect can then be treated as an equivalent contingent outcome, or contingency. The contingencies of a step are relative to the step’s intended use. If the operator in Fig. 3 is used to support p, then {a,b} is its single contingency, but if it is used to support q, then {a} and {b} are alternative contingencies. The Plan Structure Wedefine a planning problem in the usual way, as a set of initial conditions encoded as a start step (START: {< O, pl,el >,...,< O, pn, en >}), a goal (GOAL: {91,...,g,~},1,0 >}), and an action library. Weassume that each top-level goal g~ has value val(gi). plan in our framework has two sets of steps and links (causal and observational), along with binding and ordering constraints. Definition 3 A partially ordered plan is a 6tuple, < To,To, O,Lc,Lo, S >, where Tc and To are causal and observation steps, O is a set of temporal 61 ordering constraints, Lc and Lo are sets of causal and observation links, and S is a set of open subgoals. Causal links record the causal connections among actions, while observational links specify which steps will be taken depending on the report provided by an¯ observation step. It is important to note that both types of links emanate from contingencies rather than individual outcomes. For instance, in Fig. 4, SHIPwill be executed if INSPECTreports fFL/, and REJECTwill be executed otherwise. Definition 4 A causal link is a 5-tuple < Si, e,p, Ck, Sj >. Contingencyc of step Si is the link’s producer, contingency ck of step Sj is the link’s consumer, and p is the supported proposition (Si, Sj TcUTo). Definition 5 An observation link is a 4-tuple < S~, c, l, Sj >. Contingencyc is the collection of consequences of Si E To that report l. An observation link means that Sj E TcOTowill be executed only when the sensor report provided by Si is I. Often the planner needs to reason about all the steps that are subsequent to a particular observation. We therefore define a composite step, which is a portion of the plan graph rooted at an observation action O. Each composite step is factored into branches, one branch for each contingency associated with O. For example, the composite step (O :< {-BL, 1}, {BL,0.1} >< $1,$2 > ,< {BL, 0.9} >< $3 >), indicates that after some observation action O, steps S1 and $2 will be executed subsequent to one observation (which always occurs when BL is false, and occurs 10%of the time when BI. is true), while step $3 will be executed subsequent to the alternative observation (which occurs 90%of the time when BLis true). Definition 6 A composite step includes an observation action and the entire plan graph below it: (0: ..., < {tl,l,Pl,1},..., {tl,l,Pl,l} < >< Sl,1,...,Sl,rn >< >). < Si4,... ,Si,m >, called branchi, includes the steps that will be executed if contingency i of the observation >, step is realized. < {ti,l,pi,1},--., {ti,l,pij} > denote the condition for contingency i. Although composite steps are an important part of our overall framework, for brevity in this paper, we provide only base definitions, not involving composite steps. (See (Onder & Pollack 1996) for the general definitions). ’---;-i, ck ~ .... ~’ Expected ’ ck ~ ..... Value The main idea in computing the expected value of a contingency c is to find out which top-level goal(s) will fail if the step falls to produce c. For example, suppose that c supports top-level goal gl in two ways: along one path with probability 0.7, and along another with probability 0.8. If c fails, the most that will be lost is the support for gi with probability 0.8, and thus the expected value of c is 0.8x val(gl). Therefore, while propagating the values to a contingency, we take the maximumsupport it provides for each t0p-level goal. Definition 7 Expected value of a contingency: Suppose that contingency c has outgoing causal links to consumer contingencies Cl,...,cn and possibly an outgoing observation link to step S. Assumethat ki is the probability that c~ supports goal g, and kn+l is the probability that S supports g. Then, the expected value of c with respect to g is: val(g),if c directly supports g, evg(c, g) = max(k1,, i., kn+l) x val(g), otherwise. For z top-level goals, the expected value of c is z BY(c)= Z evg(c,g,). i=1 Wenext consider the computation of expected disutility of a contingency’s failure. To begin, we need to compute the probability that a given state will occur after the execution of a sequence of actions. Definition 8 The probability of a state after a (possibly null) sequence of steps is executed: P[st’lst , <>] = {1,if st’ = st; O, otherwise.), P[st’lst,<s >]= Z pi,jP[ti[st]P[st’[RESULT(ei,j, <tl ,Pl,j P[st’[st, st)], ,ei ,j > ES < S1,...,Sn >] = Figure 5: Inserting a new observation action. where S~ is a causal step 3. The action sequence is a total ordering consistent with the partially ordered plan. The expected disutility of a contingency c is the product of the probability that the action generating c will be executed, the conditional probability of c given the action’s occurrence, and the value of c. Definition 9 Expected disutility of the failure of a contingency: Let S be a step and c be the ith contingency of S. If the condition for c is < {ti4,Pi4),..., {$i,j,Pi,j) >, then the expected disutility of c is defined as P[S is executed] x (1 - (~k~__l pi,kP[ti,al < 31,. ,Si-1 >]) x EV(c). The Planning Algorithm Wecan now see how the algorithm in Fig. 1 works. After forming a skeletal plan, it selects a contingency c whose failure has maximalexpected disutility. Suppose that c has an outgoing causal link < Si, c,p, ck,Sj > (Fig 5, left). Because the proposition supported c is p, the planner selects an observation action, So, that reports the status of p and inserts it ordered to come after Si. Twocontingencies for So are formed: cl contains all of So’s outcomes that produce the label /p/, and c2 contains all the other outcomes. The causal link < Si, c,p, Ck, Sj > is removed and an observation link < So,cl,/p/,Sj > is inserted to denote that Sj will be executed when the sensor report is /p/. In addition, a causal link, < Si, c, sbj, cl, So > is inserted (Fig. 5, right). This link prevents the insertion of action that would perturb the correlation between the propositions coded in the subject and the label. Conditional planning then proceeds in the CNLP style: the goal is duplicated and labeled so that it cannot receive support from the steps that depend on contingency cl of the observation step. Plan iteration stops when a given expected value threshold is reached or planning time is exhausted. 3The probability of an expression e with respect to a state st is definedas: P[e[st] ---- {1, if e C_st; 0, otherwise.}. ~_,P[ulst, < & >]P[st’lu, < $2,..., S,.,>]. u 62 Definition 10 The expected value of a totally ordered plan is the sum of the product of the probability that each of the z goal propositions will be true after Sn is executed, and the value of the goal. (START -~ , ~:::::-:--::::.’ .................... z ZZP[glu]×P[ul< sl,..,, s, >]×Pal(g,). i----1 u where u ranges over all possible states. Wecurrently use an arbitrary total ordering of the plan for this computation. Example The following example is based on one solved by CBuridan. The goal is to process (PR) and paint (PA) parts. Initially, 30%of the parts are flawed (FL) and blemished (EL), the rest are in good condition. The "correct" way of processing a part is to reject it if it is flawed, and to ship it otherwise. Weassign 100 as the value of processing a part, and 560 as the value of painting. The action library contains the SHIP, REJECT, INSPECTand PAINTactions of Draper et al. (1994). The Mahinur planning system starts by constructing a skeletal plan, and nondeterministically chooses the SHIP action to process the part. Twocontingencies of SHIP are constructed (shown in Fig. 6 in dashed boxes): the first one produces PR, and the second does not. A causal link that emanates from the first contingency is established for PR. The triggers for the first contingency are adopted as subgoals (-PR~’FL), and both are supported by START. For START,two different sets of contingencies are constructed: one with respect to’PR, and one with respect to-FL. Whenthe PAINTaction is inserted to support the PAgoal, and the’PR trigger of the first contingency is supported by START,the skeletal plan is complete. The skeletal plan contains three contingencies whose disutilities of failure must be calculated (the contingency for "PR of STARThas no alternatives). The expected disutilities for the first contingencies of START, PAINT, and SHIPare 0.3 x 100 = 30 (for’FL), 0.05 × 560 ---28 (for PA), and 0.3 × 100 = 30 (for PR), respectively. Mahinur chooses to handle the first contingency with the higher expected disutility, namely the first contingency of START.Because-FL is the proposition supported by that contingency, an action that inspects the value of FL is inserted before SHIP (Fig. 7). The causal link supporting’FL is removed and an observation link from the-FL report is inserted. The INSPECTstep looks at EL to report about FL. Therefore, the first contingency of STARTis linked to INSPECTto ensure that an action that alters the value of BLcannot be inserted between STARTand INSPECT(BL and FL are correlated). 63 Figure 6: The skeletal plan. The new branch is then completed..For the duplicated goal, a new REJECTaction is inserted to support PR, and the existing PAINTstep is used to support PA yielding the plan shownin Fig. 7. If the success threshold has not been met, Mahinur will plan for additional contingencies. Note that existing conditional planners can solve this problem, but the plan generated will be the same regardless of the value assigned to the toplevel goals. Within our framework, the planner is sensitive to the values assigned to each goal and is able to focus on different parts of the plan based on the disutility of contingencies. Mahinur was implemented using CommonLisp on a PC running Linux. Weconducted preliminary experiments on two sets of problems. In the first set, we used synthetic problems which yielded plans containing 1, 2, or 3 possible sources of failure. Table 1 showsthe total run time for solving the problem and amount of time used for disutility calculation. As expected, runtime increases exponentially with the size of the problem4; disutility calculation, however,is insignificant. For the second set of experiments, we implemented the C-Buridan algorithm. C-Buridan constructs branches for alternative contingencies in a somewhat indirect fashion. Whenit discovers that there are two incompatible actions, say, A1 and A2, each of which can achieve some condition C, it introduces an observation action O with two contingent outcomes, and then "splits" the plan into two branches, associating each outcome with one of the actions. We used the same support functions for conditional planning (context propagation, checking context compatibility), the same ranking function and the same base planning sys4The problems marked with a * terminated with a LISP memoryerror. Probability C-Buridan Mahinur 22 27 0.75 0.875 1317 106 0.9375 * 336 0.96875 * 566 ) i-- - i i ............... !:i ....... :i......... Table 2: Numberof nodes visited threshold is increased. ditionai planning than that taken in some earlier systems, notably C-Buridan which does not reason about whether contingent outcomes are worth planning for. The PLINTH system (Goldman & Boddy 1994) and the Weaver system (Blythe & Veloso 1997) skip certain contexts during the construction of probabilistic plans. Our approach differs from theirs in two aspects. First, we are focusing on partial-ordering, where they use a total-order approach. Second, we consider the impact of failure in addition to the probability of failure. Other relevant work includes the recent decisiontheoretic planning literature, e.g.,the PYRRHUS system (Williamson & Hanks 1994) which prunes the search space by using domain-specific heuristic knowledge, and the DRIPS system (Haddawy & Suwandi 1994) which uses an abstraction hierarchy. Kambhampati (1994) describes planning algorithms with multicontributor causal links. Our algorithm maintains multiple contributors at the action outcomelevel in a probabilistic setting. Figure 7: The complete plan. # 0 1 2 3 4 1 poss. fail. Disu. Run 0.01 0.01 0.14 0.01 0.42 0.01 1.67 0.02 9.11 0.03 2 poss. fail. Run Disu 0.03 0.01 0.56 0.01 11.28 0.02 229.22 0.03 3558.87 0.04 Table 1: Planning time and disutiliiy for iterations on three problems. 3 poss. fail. Disu. 0.01 0.08 3.92 0.02 335.21 0.03 4368.43 0.05 Run calculation time tem. To demonstrate the advantage of directly planning for contingencies, we used a domain which contains a single PAINToperator. Painting a part repeatedly increases the probability of success by 0.5 n, where n is the number of repetitions. It is an error to paint an already painted part, thus the plan has to contain an observe action to check whether the part has been painted and repeat the PAINTstep only if it has not been successful previously. Table 2 compares the run times for C-Buridan and Mahinur as the probability threshold is increased. Mahinur expands significantly less number of nodes, because the planning process concentrates on planning for contingencies, as opposed to improving the support for the (sub)goals. Summary and Related as the probability Work Wepresented an approach to decision-theoretic plan refinement that directly reasons about the value of planning for the failure of alternative contingencies. To do this, we adopted a different strategy towards con- 64 Acknowledgments This work has been supported by a scholarship from the Scientific and Technical Research Council of Turkey, by the Air Force Office of Scientific Research (Contract F49620-92-J-0422), by RomeLaboratory and the Defense Advanced Research Projects Agency (Contract F30602-93-C-0038), and by an NSF YoungInvestigator’s Award (IRI-9258392), References Blythe, J., and Veloso, M. 1997. Analogical replay for efficient conditional planning. In Proceedings of the 1.#h National Conference on Artificial Intelligence. Dean, T.; Kaelbling, L. P.; Kirman, J.; and Nicholson, A. 1995. Planning under time constraints in stochastic domains. Artificial Intelligence 76:35-74. Draper, D.; Hanks, S.; and Weld, D. 1994. Probabilistic planning with information gathering and contingent execution. In Proceedings of the 2nd International Conference on Artificial Intelligence Planning Systems, 31-36. Etzioni, O.; Hanks, S.; Weld, D.; Draper, D.; Lesh, N.; and Williamson, M. 1992. An approach to plan- ning with incomplete information. In Proceedings of the Third International Conference on Principles of Knowledge Representation and Reasoning, 115-125. Goldman, R. P., and Boddy, M. S. 1994. Epsilon-safe planning. In Proceedings of the lOth Conference on Uncertainty in Artificial Intelligence, 253-261. Haddawy,P., and Hanks, S. 1993. Utility models for goal-directed decision-theoretic planners. Technical Report 93-06-04, Department of Computer Science and Engineering, University of Washington. Haddawy, P., and Suwandi, M. 1994. Decisiontheoretic refinement planning using inheritance abstraction. In Proceedings of the ~nd International Conference on Artificial Intelligence Planning Systems, 266-271. Kambhampati, S. 1994. Multi-contributor causal structures for planning: a formalization and evaluation. Artificial Intelligence 69:235-278. Kushmerick, N.; Hanks, S.; and Weld, D. S. 1995. An algorithm for probabilistic planning. Artificial Intelligence 76:239-286. Onder, N., and Pollack, M. E. 1996. Contingency selection in plan generation. In 1996 AAAI Fall Symposium on Plan Execution: Problems and Issues, 102108. Peot, M. A., and Smith, D. E. 1992. Conditional nonlinear planning. In Proceedings of the 1st International Conference on Artificial Intelligence Planning Systems, 189-197. Pryor, L., and Collins, G. 1993. Cassandra: Planning for contingencies. Technical Report 41, The Institute for the Learning Sciences, Northwestern University. Williamson, ning with a ings of the Intelligence M., and Hanks, S. 1994. Optimal plangoal-directed utility model. In Proceed2nd International Conference on Artificial Planning Systems, 176-18i. 65