Defeasible Planning John L. Pollock Department of Philosophy University of Arizona Tucson, Arizona 85721 (e-mail: pollock@arizona.edu) From: AAAI Technical Report WS-98-02. Compilation copyright © 1998, AAAI (www.aaai.org). All rights reserved. Abstract Planningtheory has traditionally madethe assumptionthat the plannerbegins with all relevant knowledgefor solving the problem.Planning agents operating in a complexand dynamicenvironmentcannot makethat assumption. They are simultaneouslyplanningagents andepistemicagents, and the pursuit of knowledgeis driven by the planning. In particular,the searchfor a planinitiates the searchfor threats. Becausethe epistemicinvestigationcanbe arbitrarily complex and maybe non-terminating,this has the consequencethat the planner cannot wait for the epistemicinvestigation to terminatebefore continuingthe plan search. This in turn has the consequencethat the planning cannot be done by a traditional algorithmicplanner.It is arguedthat the planning must instead be done defeasibly, makingthe default assumption that there are nothreats andthen modifyingplans as threats are discovered.This paper sketcheshowto build such a planner based uponthe OSCAR defeasible reasoner. Theresulting planner performsessentially the samesearch as UCPOP, but does it by reasoningdefeasibly about plans rather than runningconventionalplan search algorithms. A beneficial side-effect of such defeasible planningis that updatingplans in the face of changesin the agent’s beliefs (often reflecting changesin the world) can be donemore efficiently thanby replanning. 1. Planning with Variable Knowledge Most applications of AI planning theory assumethat the planner comes to the planning problem equipped with the knowledgeneeded to solve the planning problem. In a complex and variable environment, that assumption can fail in three different ways: First, the beliefs of the planningagent will typically be fallible. Theywill be based on the best evidencecurrently available to it, but as more information is acquired, the agent mayhave to changeits mindabout someof its beliefs. Put another way, the agent’s epistemic reasoning will be defeasible. Planning based uponsuch defeasibly held beliefs must also be defeasible. If a plan assumesa belief that is subsequently retracted, then the plan must be retracted as well. Second,as the world changes,the agent’s beliefs about its current situation must be updated to keep track of the world. A plan predicated on the world’s being a certain waymust be retracted if the world changesso as to falsify the assumptionsof the plan. Third, and most important for the present paper, the agent’s knowledgeof the world, even if accurate, will typically be incomplete. An agent cannot be assumed to know everything there is to know about a complex environment. Instead, it must be equipped with cognitive facilities enabling it to search for and acquire further knowledge as it needs it. The search for additional knowledge can take two forms. Sometimes the requisite knowledge can be obtained by reasoning from what the agent already knows. But sometimes reasoning by itself will not provide the needed knowledge. The agent may have to undertake empirical investigations. The simplest empiricalinvestigations will consist of the agent’s perceiving its surroundings. But most empirical investigations require the agent to first take actions and then perceive the results. For example,in order to find out what time it is the agent mayfirst have to go into the next roomand position itself before the clock, and then perceive the display on the clock face. It is useful to follow philosophers in distinguishing between epistemic cognition and practical cognition. Epistemic cognition is cognition about what to believe, and practical cognition is cognition about what to do. In the current context, practical cognitionis just planning. Planning presupposesbeliefs about the agent’s environment,and those beliefs are produced by epistemic cognition. The above observations can be summarizedby saying that (1) epistemic cognition is defeasible, and a planning agent must be prepared to revise its plans as its defeasibly held beliefs change, and (2) in order to solve a planning problem agent mayhave to acquire more information, both through reasoning and through empirical investigation. 2. Algorithmic Goal-Regression Planning The purpose of this paper is to investigate the extent to which existing goal-regression planners can accommodate the observations of section one. I will argue that these observations require important changes to current AI planning technology. Not only must the adoption of plans be defeasible. In a sense to be explained below, the plan search itself must be donedefeasibly. Typical goal-regression planners are SNLP (MacAllester and Rosenblitt 1991), UCPOP (Penberthy Weld 1992, Weld 1994), and PRODIGY(Carbonell, Knoblock,Minton 1991). Goal-regression planning is based upon two kinds of information about the world. First, there is information about the agent’s current situation, typically symbolizedas a set of literals. Second, there is information to the effect that if an action is performed under certain circumstances,it will have a certain result. I will represent this using aplanning-conditional,of the form "(subgoal & action) )~ goal". Goal-regression planning works backwards from goals to subgoals, using planningconditionals. Givenan interest in finding a plan for achieving goal and a planning-conditional "(subgoal & action) goat’, a goal-regression planner tries to find a subplan for achieving subgoal, and given such a subplan it constructs a plan for achieving goal by adding action to the end of the subplan. A crucial complication arises whena goal or subgoal is a conjunction, because the planner will not usually have planning-conditionals whose consequents are the desired conjunctions. Goal-regression planning handles this by planning separately for the individual conjuncts, and then merging the separate subplans into a single plan for the conjunction. Unfortunately, merging the separate subplans into a single plan may create destructive interference, wherein executing the steps of one subplan mayhave results that will prevent another subplan from working. Goalregression planners handle this by checking for destructive interference and attempting to repair it before proposing the mergedplan as a plan for the conjunctive goal. To illustrate, whenUCPOP mergestwo plans, it checks to see whether (1) a step I of o ne plan i s i ntended t o achieve a subgoal g whosesatisfaction is presupposedby a later step s2, (2) there is a step s of the other plan that prescribes an action A for which there is a planningconditional of the form (precondition &A) ¯ -g, and (3) it is consistent with the ordering constraints of the merged plan to execute s between s~ and s 2. s is said to threaten the causal-link between s~ and s2, and UCPOP attempts to resolve threats by either adding ordering-constraints that preclude s’s being executed between s I and s 2 (promotion or demotion) or adding steps to the plan that will make precondition false at the appropriate time (confrontation). Themergedplan is not proposedas a plan for the conjunction until either (1) it is verified that there are no threats, or (2) threats are repaired by promotion, demotion, or confrontation. Such a planner can be proven sound and complete (Penberthy and Weld 1992). Conventionalgoal-regression planners are algorithmic planners, in the sense that in solving a planning problem, such a planner executes an effective computation,i.e., the set of pairs (problem,solution)that characterize the planner is recursively enumerable.This is because each step of the plan-search is dictated by the planningalgorithm, the search for threats is performedby simply searching a precompiled finite list of planning-conditionals,and the repair of threats is done mechanically. 3. Problems for Algorithmic GoalRegression Planning In a complexand changing environment, the beliefs upon which goal-regression planning is based will be held only defeasibly. If a crucial belief is retracted, then a plan based upon it must also be retracted and an attempt made to either repair it or replaceit. I will refer to this as planupdating. Plan-updating need not necessitate that the planningitself be donedifferently than by using conventional planning algorithms. If crucial beliefs change, the algorithm can be run over again with the new beliefs. On the other hand, as we will see below, there maybe a more efficient way to handle plan-updating. Wehave also seen that in a complexenvironment, an agent’s knowledgewill be both incomplete and potentially completable. The sense in which the agent’s knowledgeis potentially completableis not that the agent can ever know it all, but rather whena crucial piece of knowledge is missing it is alwaysin principle possible that the agent can acquire that particular missing piece of knowledgeeither through reasoningfrom extant beliefs or by empirical investigation. The potential completability of the agent’s knowledge creates major difficulties for algorithmic goal-regression planning. The problem arises from the fact that it cannot be assumed that a planning agent operating in a complex environment has exactly the knowledgeit needs to solve a planning problem. Such an agent must build its own knowledgebase. The systemdesigner can get things started by providing backgroundknowledge, but the agent must be provided with cognitive machinery enabling its knowledge base to grow and evolve as it gains experience of its environment,senses its immediatesurroundings, and reasons about the consequences of beliefs it already holds. The more complex the environment, the more the agent will have to be self-sufficient for knowledgeacquisition. I have distinguished between practical cognition and epistemic cognition. The principal function of epistemic cognition in an autonomousagent is to provide the information needed for practical cognition. As such, the course of epistemic cognition is driven by practical interests. Rather than coming to the planning problem equipped with all the knowledge required for its solution, the planningproblemitself directs epistemic cognition, focusing epistemic endeavors on the pursuit of informationthat will be helpful in solving current planning problems. Paramountamongthis information is knowledgeabout what will happenif certain actions are taken under certain circumstances, i.e., planning-conditionals. Sometimesthe agent already knowswhat will happen, but often it has to figure it out. At the very least this will require reasoning from current knowledge.In manycases it will require the empirical acquisition of new knowledge that cannot be obtained just by reasoning from what is already known. To use the example given above, in order to construct a plan the planning agent mayhave to find out what time it is, and it maybe able to do that only by examining the world in some way (e.g., it mayhave to go into the next room and look at the clock). In general, such empirical investigations are carried out by performing actions (not just by reasoning). Figuring out what actions to perform is a matter of engaging in further planning. The agent acquires the epistemic goal of acquiring certain information, and then plans for howto accomplish that. So planning drives epistemic investigation which mayin turn drive further planning. It follows that an essential characteristic of planning agents is that planning and epistemic cognition are interleaved. It is accordinglyimpossibleto require of a planning agent capable of functioning in realistically complex environments that it acquire all the requisite knowledgebefore beginning the plan search. Nowlet us apply this to the questionwhethera planning agent can perform its planning by implementing planning algorithms. That is only possible if destructive interference is computable,whichin turn requires that the consequences of actions be computable. As we have seen, autonomous planning agents cannot rely on precompiled knowledge. They must engage in genuine reasoning about the consequences of actions, and we should not expect that reasoning to be any simpler than general epistemic reasoning. Realistically, epistemic reasoning must be defeasible, which interferences will be only r.e. This means that when the planning algorithm computesplans for the conjuncts of a conjunctive goal and then considers whether they can be merged without destructive interference, the reasoning required to find any particular destructive interference may take indefinitely long, and if there is no destructive interference, there will be no point at which the planner can draw the conclusion that there is none simply on the grounds that none has been found. Thus a planning algorithm that waits for such assurance before going on will bog downat this point and will never be able to produce the mergedplan for the conjunctive goal. If destructive interference is not computable,howcan a planner get away with dividing conjunctive goals into separate conjuncts and planning for each conjunct separately? The key to this problem emerges from considering how human beings solve it. Humansassume defeasibly that the separate plans do not destructively interfere with one another, and so infer defeasibly that the mergedplan is a goodplan for the conjunctive goal. Having made this defeasible inference, humanplanners then look for destructive interference that woulddefeat it, but they do not regard it as essential to establish that there is no destructive interference before they makethe inference. And if, at the time plan execution is to begin, no destructive interference has been discovered, then we humansgo ahead and execute the plan despite the fact that we have not provenconclusively that there is no destructive interference. One may be tempted to suppose that humanbeings are makingan unreasonable leap of faith here, and that a more rational agent wouldpostpone plan execution until it has beenestablished that there is no destructive interference. However,the logic of the epistemic search for destructive interference makes that logically impossible. Given a logically complexknowledgebase, there will not, in general, be a point at which an agent can conclude with certainty that there is no destructive interference within a plan, so an agent that required such certainty wouldbe unable to plan for conjunctive goals. Theupshot of this is that a rational agent operating in a realistically complexenvironment must makedefeasible assumptions in the course of its planning, and then be preparedto changeits planningdecisions later if subsequent epistemic reasoning defeats those defeasible assumptions. In other words, the reasoning involved in planning must be a species of defeasible reasoning. Planning in autonomous agents cannot be done algorithmically. The general way goal-regression planning must work is by using planning-conditionals to reason backwardsfrom goals to subgoals, splitting conjunctive goals into their conjuncts and planning for themseparately, and then merging makes the set of conclusions at best A2.~ But even if we could construct an agent that did only first-order deductive reasoning, the set of conclusions is not effectively computable--it is recursively enumerable. Even for such an unrealistically oversimplified planner, destructive interference will not be computable--theset of destructive Fora discussionof this, see Pollock(1995),chapterthree. 3 the plans for the individual conjuncts into a combinedplan for the conjunctive goal. The planning agent will infer defeasibly that the mergedplan is a solution to the planning problem. A defeater for this defeasible inference consists of discoveringthat the plan contains destructive interference. Whenever a defeasible reasoner makes a defeasible inference, it must adopt interest in finding defeaters, so in this case the agent will adopt interest in finding destructive interference. 2 Finding such interference should lead the agent to try various waysof repairing the plan to eliminate the interference, and then lead to a defeasible inference that the repaired plan is a solution to the planningproblem. 4. Rules for Defeasible Planning The OSCARsystem of defeasible reasoning 3 was constructed with epistemic reasoning in mind, but by providing it with appropriate inference-schemes,it can also implementthe kind of defeasible planning described above. The simplest way to do this is to regard a solution to a planning problem as an epistemic conclusion to the effect that a certain plan will achieve the goal. Defeasibleplanning is then defeasible reasoning in support of this epistemic conclusion. In this section I will give a brief description of howthis reasoning can be performed. The simplest case of means-end reasoning is the degenerate case in whichthe goal to be achieved is already true, and hence nothing needs to be done to achieve it. A null-plan for the goal goal is a plan with no plan-steps. The degenerate case of means-end reasoning can then be regarded as proceeding in accordance with the following inference-scheme: PROPOSE-NULL-PLAN Givenan interest in finding a plan for achievinggoal, if goal is already true, take that as a conclusive (nondefeasible) reason for concludingthat a null-plan will achieve goal The core of goal-regression planning consists of using planning-conditionals to reason backwardsfrom goals to 2 For moreabout the dynamicsof defeasible reasoning,see chapterfour of Pollock[ 1995]. subgoals. This can be formulated as follows: GOAL-REGRESSION Givenan interest in finding a plan for achieving G, adopt interest in finding planning-conditionals (A C) ¯ G having G as their consequent. Given such conditional, adopt an interest in finding a plan for achieving C. If a plan subplan is proposed for achieving C, construct a plan by (1) adding a newstep to the end ofsubplan where the new step prescribes the action A, and (2) ordering the newstep after all steps of subplan. Infer nondefeasiblythat the new plan will achieve G. Given two plans planI and plan2, let planI + plan2 be the plan that results from combiningthe plan-steps and orderingconstraints of each. Wecan plan for conjunctive goals by using the following defeasible inference-scheme: SPLIT-CONJUNCTIVE-GOAL Givenan interest in finding a plan for achievinga conjunctive goal (Gt &G2), adopt interest in finding plansplanI for G~ andplan2 for G2. If such plans are proposed,infer defeasibly that planI + plan2 will achieve (Gl & G2). SPLIT-CONJUNCTIVE-GOAL is a rule of defeasible reasoning. Anysuch inference-scheme must be supplementedwith an account of what the defeaters are for reasoning in accordancewith it. Wecan follow the lead of UCPOP and adopt the following defeater: FIND-THREATS Given an inference in accordance with SPLITCONJUNCTIVE-GOAL to the conclusion that plan& will achieve (G1 &G2), for each plan-step 2 of plan&, ifs 2 assumesthat someearlier step s~ achieves a subgoalg,4 then for each plan-step s of plan&,if it is consistent with the ordering-constraints of plan&that s occur betweens~ and s2, and the action prescribed by s is A, adopt interest in finding a planning-conditional of the form (C & A) ¯ -g. Take the inference to the 3 Thetheory behindthe OSCAR defeasible reasoningis presentedin Pollock[1995].Thedetails of the implementation are described in The OSCAR Manual,whichcan be downloaded fromhttp://www.u.arizona.edu/~pollock/. 4 Suchassumptionsare recordedin the standardway,using causal-links. conclusion thatplan& will achieve (G~ &G2) to be defeated by the discovery of such a planningconditional. The resolution of threats can be handled by a pair of defeasible inference-schemes: ADD-ORDERING-CONSTRAINT Givenan interest in finding a plan for achieving a conjunctive goal (gt &g2), and plans planI for gt and planz for g2, if plan&is a putative plan for (gl &g2) constructed by mergingplans plan1 and plan2 (and possibly other plans), but a plan-step s of plan& threatens a causal-link betweensteps s~ and s2 of plan&, construct a plan plan+ by adding the orderingconstraint that s not occur betweens~ and s2 (if this can be doneconsistently) and infer defeasibly that plan+will achieve (gl &ge). CONFRONTATION Givenan interest in finding a plan for achieving a conjunctive goal (gt &g2), and plans planI for gl and planz for g2, if plan&is a putative plan for (gj &g2) constructed by mergingplans planI and plan2 (and possibly other plans), but a plan-step s of plan& threatens a causal-link betweensteps s~ and s2 of plan& by way of a planning-conditional (C & A) -g, adopt interest in finding a plan for achieving-C (or if C is a conjunction, for each conjunct of C, adopt interest in finding a plan for achievingits negation). If a plan repair-plan is proposedfor achieving -C or the negation of one of its conjuncts, construct a new planplan+ by merging repair-plan with plan&, and ordering the final step of repair-plan before the step that dictates Aand threatens the causal-link. If this ordering can be done consistently, infer defeasibly thatplan+will achieve (gl &g2). The repaired plan produced by ADD-ORDERINGCONSTRAINTor CONFRONTATION resolves just one threat. Other threats mayremain, so these two inferenceschemesmustalso be regardedas defeated by finding threats. This requires us to generalize FIND-THREATS to apply to inferences made in accordance with these inferenceschemes as well as SPLIT-CONJUNCTIVE-GOAL ADDORDERING-CONSTRAINT and CONFRONTATION will automaticallylead to attempts to resolve such additional threats. An experimental planner that implements reasoning of the general sort just described has beenconstructed within the OSCAR defeasible reasoner, and can be downloaded 5from mywebsite. The need for defeasible planning is driven by the fact that a planning agent’s knowledgeof the consequences of actions can be expected to be incomplete but potentially completable. A second source of defeasibility lies in the fact that the beliefs upon which planning is based are themselves held only defeasibly, and if they are retracted the plan must be retracted as well. Plan-updating is the process of repairing or replacing plans in the face of such belief-updating. One way of updating plans is to perform the planning all over again with the newbeliefs. However, that maybe quite inefficient because large parts of the planning process may be unchanged, and it would be desirable to avoid repeating them. This inefficiency is automatically avoided whenplans are constructed as above by reasoning about themdefeasibly. This reasoning is just more epistemic reasoning, and as such is integrated with the samereasoning that leads to the retraction of beliefs. Whenbeliefs used in the planning are retracted (defeated), then the specific parts of the plan-reasoning that depend uponthose beliefs will also be defeated. But other parts of the plan-reasoning will not be defeated, and so the conclusions drawnin that part of the reasoning can be used without further ado in the process of trying to repair the defeated plan. The reasoning need not be done over again. 5. Evaluating a Defeasible Planner An algorithmic planner is evaluated by asking whether it is sound and complete. It is sound if every solution it proposesis a correct solution, and it is completeif it finds a solution wheneverone exists. But howcan we evaluate a defeasible planner? It will inevitably find unsoundsolutions. Hopefully,it will retract themlater. A distinction can be made between the conclusions that a defeasible reasoner is justified in holding, at any given stage of its reasoning, and the warrantedconclusions 5 The URLis http://www.u.arizona.edu/~pollock/.Two technicalreportsdescribingthe plannerin further detail canalso be downloaded. "Reasoningdefeasibly about plans" describes the constructionof the planner,and "Thelogical foundationsof goal-regressionplanning"formulatesa generaltheory of goal-regressionplanning. that it will be justified in holding at the limit, whenall possible relevant reasoning has been performed6 What we want of a defeasible planner is that it will eventually draw warrantedconclusions that constitute solutions to planningproblems. Let us call the plans endorsed by warranted conclusions warranted plans. Wemight require that warranted plans are always solutions, and wheneverthere is a solution there will be a correct warranted solution. However, as a criterion of adequacy for the planning reasoningthis is too strong. The reasoner might be warranted in taking an unsoundplan to be a solution simply because the reasoner is unable to draw the conclusion that some relevant fact about the world is a fact or that somerelevant consequenceof an action is a consequenceof that action. For the same reason it maybe unable to find somecorrect solution. Wecan usefully separate the plan-reasoning from the reasoning aimed at finding factual knowledgeof use in the planning. The reason-schemes used in planning may be beyondreproach, but the reasoner maystill find incorrect plans and fail to find correct ones because its factual reasoning is inadequate. This separation can be achieved by simply giving the reasoner all the factual knowledge (including planning-conditionals)that is relevant to solving the problem, and then asking whether under those circumstancesall its warrantedplans are correct and whether it is alwaysable to find a warrantedsolution whenthere is a solution. Let us understand soundness and completeness for defeasible planners in this way. Givingthe reasoner all the relevant factual knowledge has the effect of turning the defeasible planner into an algorithmic planner. Search for defeaters for any particular inference will terminate after a single step, so for each plan there will be a determinate point at which there is no more relevant reasoning to be done. Wecan take the planner to "return" the plan iff at that point it is justified in concluding that the plan is a solution to the planning problem.Because all the relevant reasoning has been done, the plan will be warranted iff the reasoner is justified in drawing that conclusionat that point. Looking at the OSCAR planner in this way, it is equivalent to UCPOP,and sound and complete for the same class of planning problems, with the exception that it does not handle universally and existentially quantified goals. 6 Thisis mademoreprecise in chapterthree of Pollock[1995]. 6. Conclusions A planning agent operating in a complexenvironment cannot be provided from the outset with exactly the informationit needsto solve, without further reasoning, all the planning problems it mayencounter. It may have to engagein arbitrarily complexepistemic reasoning both about what is true in the start-state and what the consequencesof actions maybe under various circumstances. This has the result that the set of threats is not recursive, and that has the consequence that planning cannot proceed algorithmically. Instead, a planning agent must assume defeasibly that there are no unresolvedthreats until one is found. An implementation of such a defeasible planning agent was described. Acknowledgments This work was supported by NSFgrant no. IRI-9634106. References Carbonell, J. G., Knoblock,C. A., Minton,S. 1991 PRODIGY: An Integrated Architecture for Prodigy. In K. VanLehn(ed.), Architectures for Intelligence, pp. 241-278, LawrenceEribaumAssociates, Hillsdale, N.J. Fikes, R. E., and Nilsson, N. J. McAllester, David, and Rosenblitt, David 1991 "Systematic nonlinear planning", Proceedings of AAAI-91, 634-639. Penberthy, J. Scott, and Weld,Daniel 1992 "UCPOP:a sound, complete, partial order planner for ADL".Proceedings 3rd International Conference on Principles of Knowledge Representation and Reasoning, 103-114. Pollock, John 1995 Cognitive Carpentry, MITPress. Weld, Daniel 1994 "An introduction to least commitmentplanning", AI Magazine, 15 27-62.