A Representation for Efficient Planning in ... Jim Biythe

From: AAAI Technical Report WS-96-07. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved. A Representation for Efficient Planning in DynamicDomains with External Events Jim Biythe School of ComputerScience Carnegie Mellon University Pittsburgh. PA15213 jblythe@cs.cmu.edu Abstract the occurrence of events at specific times and deal with concurrent events. I divide event types into twosets: those underthe control of the planningagent (actions) and those beyondthe agent’s control (external events). This division introduces a notion of controllability into the action languagewhichis crucial for reasoning about the plans of agents. I use a scoped circumscription policy as described in (Miller & Shanahan 1994) to reason about plans in the presence of external events. The planner does not explicitly use circumscription to reasonaboutplans, but it creates plans that are soundin the sense that goal satisfaction can be provenfrom the domain model,initial state and actions using ciro~mscription. In the next section I present a reviewof Miller and Shanahan’s modelfor narratives in the situation calculus. In the following section I describe modifications that modelthe language for external events used by Weaver. The semantics for the events werechosenboth for their representational powerand to allow for efficient planning algorithms, which Is achieved by makingaspects of structure explicit and by allowing convenient reasoning for backwardchaining planners. After this I describe the planner and also sketch proofs I present a languagefor specifying planningproblemsin whichthere are external events, that is, events beyondthe direct control of the plannerwhichmaytake place while a plan is being executed. The languageis based on Miller andShanahan’s representationof narrativesin the situation calculus.I describean implemented plannerandprovethat it produces correctplansfor a subsetof this language.I discuss howsomeaspectsof the languageaffect the efficiencyof the planner. Introduction Animportant direction for AI planning systems is to handle dynamic domsins -- domains where changes take place that are not the direct result of the actions of the agent, and wherethe world describes a state trajectory in the absence of any actions from the planning agent. In manyreal-world planning problems, for example, change to the world takes place due to other agents, or due to forces of nature. The planning agent mayhave limited control over these external events, and limited knowledgeabout both events that have occurred and events that maypossibly occur in the furore. The aimof this paper is twofold: to give an accountof an action modelfor dynamicdomainsthat is used in an implementedplanner, and to showhowissues of representational powerand computational efficiency jointly influence the model. A model for dynamic domains for planners should haveat least the followingthree features. First, the planner should be able to take advantageof events beyondits direct control, incorporating theminto a plan as appropriate. Second, the modelshould allow the planner to reason about a set of different possible futures, each being consistent with the agent’s knowledgeof events. Third, the planner should be able to efficiently determinethe relevant events that mayeffect candidate partial plans, and obtain informationthat may allow a modified plan to be madesafe from these events. In this paper I focus on a modelfor representing external changethat is faithful to the Weaverplanner (Blythe 1994; 1996). I discuss the extent to whichit satisfies the criteria above. The model is based on narratives in the situation calculus as described by Miller and Shanahan (Miller Shanahan1994). This language was chosen because it provides an extension to the situation calculus that can describe thatWeaver is sound withrespect to thismodel. Inthefinalsection I describe techniques for reasoning about plans within the modelthat help satisfy the third criterion of efficiently finding the relevant external events. Thefinal section contains a brief discussion. A reviewof narrativesin the situation calculus Miller and Shanahanprovide sorts for situations, action types and fluents. The term Result(a, s) represents the situation resulting from performingaction a in situation s. The formula Holds(f, s) represents that fluent f holds in situation s. For example, consider a laundry domainwhich contains two action types, Washand Hang-out, and is described by three finents, Clean, Dry and Hanging.Actions of type WashmakeClean true and Dry false, while actions of type Hang-out makeHangingtrue. This is represented by the following formulae: Holds(Clean,Result(Wash,8)) -~Holds(DryJ~esult(Wash,8)) Holds(Hanging,Result(Hang-out,8)) wheres is universally quantified over situations. In this 45 case neither action has any preconditions.Whenthere are actionpreconditionsor context-dependent effects, theywill be represented as implications. FollowingBaker (Baker 1991), Miller andShanahaninclude the frame axiom [Holds(f,Result(a,s) ) +--Holds(f,s)] +-- -,Ab(a,f, s) andfour existenceof situation axioms: Holds(And(g l g2), s)+-~ [Hol ds(g l, s) AHolds(g2, s)] Holds(Not(f), ’9 ~ ~Holds(f, s) Holds(#,Sit(g)) +-- ~Absit(g) Sit(gl) = Sit(g2) gl= g2 Facts aboutresulting states maybe provedusinga circumscription policythat miniraises Absitat a higherpriority than Ab whileallowingthe initial state SOand the Result function to vary. For examplethis can be used to prove Holds(Clean,Result(Hang-out.Result(Wash,SO))) in laundry domain. Miller andShanahan extendthe representationto describe narrativeswhichdeal with the occurrenceof actual events. Theyaddto the language a newsort for times, interpretedby the non-negativereals, a newpredicateHappens(a, r, 7-’), representingthat an actionof typea happensin the timeinterval between7- and7-’, anda newtermState(7-) denoting the situationat time7-. Aneventhistoryis thenrepresented by a conjunctionof Happenspredicates, and two newaxiomslink the State termto the eventhistoryin the intuitive way(see (Miller&Shanahan 1994)for details). Thefirst iomsaysthat the situationat timet is Result(a1, State(t1) if actiona 1 completes at timet 1 andno otheractionhappens betweentl and t: (State(t) = Result(al,State(tl))) +-- [Happens( a l, tO, tl) AtO < tl Atl < t ~3a2,t2, t3[Happens(a2, t2, t3) [al¢ a2Vtl ¢ t2]Atl _< t2At2 < tl]] Thesecondaxiomdescribesthe initial situation: State(t) = So +-- -,3al, tl, t2[Happens(al, tl, t2)Atl < t] Thecircumscription policy is amended to mlniraise Absit at a high priority andthen to minimiseAband Happens in parallel, allowingResult andState to vary. Thisintuitively representsthe assumption that no eventstake place that are not explicitly representedin the domain.If welet T stand for the conjunctionof the domaindefinition axioms(Holds predicates), the frameaxiomsandnecessaryuniqueness-ofnamesaxioms,andlet Nstand for the narrative predicates, thenthis policycanbe stated as CIRC(T A N; Absit;Ab, Happens, Result,Holds,So, State) ACIRC( T AN ; Ab;Happens, Result, Holds,So, State) WhereCIRC(T;rr, 0; ~b) denotes the parallel circumscription of 7r and0 in a theoryT with~ and~b allowedto vary. Miller and Shanahanshowthat the circumscription 46 policy they introduceentails the circumscriptionpolicy of Baker(Baker1991). whichallows useful deductionsto madeabout states of the world. Theproof of this assumes that the narrativedescriptionis a finite conjunction of atomic Happenspredicates. For example,if weadd to the laundry domainthe followingnarrativepredicates: Happens(Wash, 1, 4) Happens(Hang-out, 5, 6) wecan proveHolds’(Clean, State(8)). Overlapping actions Finally, predicatesare addedto specifythe effects of action occurrenceswhichoverlapin time. In mostcases wemight expectthat twoactionoccurrences that overlapwill not alter eachother’s effects, andthat is the default casein Miller and Shanahan’slanguage. Howeverthe predicate Cancels is includedto specifywhenthe effects of actionsthat take place concurrentlywill be altered, andan axiomis added to derive the fluents that holdafter concurrentactions are taken. This language allowsa muchricher set of interactions than is supportedin Weaver.and I describe axiomsbelow that restrict this model.Forthis reasonthe full modelis not presentedhere. Details can be foundin (Miller &Shanahan 1994). A newfunction symbolDurationmapsactions to numbers, denotingthe durationof an actiontype, whichmustbe positive. Twooperators are used to combineaction types. al; c~2 represents the action of al immediatelyfollowed by a2 and a l&a2represents the actions performedconcurrently. Twofunction symbolsrepresent waysto decompose operatorsin time. Head(a,6) represents the first ~ time units of actiontype a andTail(a, 6) representsthe remainder of a. Anysuchaction derivedfroma will be termeda "horizontalcomponent" of a, andthe predicateSlice (a 1, a) is definedto meanthat a I is a horizontalcomponent of a. Thepredicate Part(al, a) meansthat al is derivedfroma by taking somesequenceof horizontal and vertical components. Anequivalenceoperator~ is introducedto represent that twodifferent waysof writingan action are identical. Thesesymbolsare definedby the followingaxioms: Result(a l ; a2,s) =Result(a2, Result(a s) al ~ a2 ~ Result(al, s) = Result(a2, s) a ~ Head(a,d); Tail(a, d) Part(al, a2) ~ [Slice(al, a2)V3a3,a4[Slice(al, a3)Aa2~ a3&a4]] ThepredicateCancels(a2, a 1, f, s ) representsthat action a2 stops a 1 fromhavingits usual effect on fluent f when the twoactions are performed concurrentlyin situation s. Axiomsare included so that in this case any equivalent action to a2 will also cancel al, as will any horizontal component. Theadditionalaxiomfor definingeffects of actions is holds(f, Result(a,s)) +-[a ~ al&a2A Holds(f, Result(al, s))A ~3a3[Part(a3, a2) A Cancels(a3,al, f, s)]] In order to reason about narratives with overlapping events,axiomsare addedthat assert that the occurrence of an actionis equivalentto the occurrence of all its components. horizontal or vertical. Thepredicate ~ is ciro~mscribed at the highest priority, then Absit andthen Cancels,and finally Ab and Happensin parallel. Onemorepredicate. t (c~, rl, r2), is introducedto representthat action Happens c~ takes placein the time periodfromrl to r2 andno other actiontakes placein this timeperiod,definedby the following axiom: t (a, t 1, t2) Happens [Happens(a, t 1, t2)A --,3a1, t2, t 4[Happens( a 1, t3, t4) Atl < t3 At4 < t2 A~Part(al, a)]] Finallythe axi-omrelating the State andResultfunctions is alteredas follows: State(t)=Result(a 1, State(1)) +[Happenst(al,tl, t2) At _>t2A -~qa3,t3, t4[Happens(a3, t3, t4) At2 < t4 At4 < t]] Modifications to Miller and Shanahan’s Model A small numberof additions and restrictions are madeto the languageof narratives describedin (Miller &Shanahan 1994)in order to providea modelfor the Weaverplanner, andtheyare describedhere.First, a predicateis definedthat capturesthe notionof the controlthat the agenthas over its domain.Second,limits are placed on the wayconcurrent eventscan interact to simplifythe planner’stask. Third, a notionof conditionalnarrativesis introduced,to represent conditionalplans in such a waythat Baker’scircumsciptive policystill workscorrectly. External events Myaim is to describe a planner for domainsin whichexternal eventsmaytake placethat are beyondthe control and prior knowledge of the planningagent, althoughI assume the agentknowsthe typesof external eventsthat mayoccur, andtheir consequences on the world.For instance,a robotic agentmayknowthat it shares its domainwith other robotic agents and with humans.Theother agents can movearound. pick things up andleave doorsopen,but the planning agent does not knowexactly whichof the possible events might occur, or when. In order to modeldomainswith external events I make use of an extensionMiller andShanahan introduceto their languagein (Miller &Shanahan1994):the minimisation the Happens predicate in the circumscriptionpolicy is replacedby a Happens* predicate, that only capturesa subset of the events.In this case I aminterestedin whetherevents are underthe controlof the planningagent. I introducea newpredicateAgent(c~)to representthat the action type c~ is underthe control of the planning agent. Whenan action type is under the control of the planning agent, one can reasonablyassumethat the action does not take place withoutthe agent’s knowledge. This is not true 47 for events e for which~ Agent(e)is true. Thepredicate Agent(f) is also usedto denotethat the fluent f is only altered by actions c~ for whichAgent(a). In whatfollows I will refer to eventtypesa for whichAgent(a)is true as action types andthose for whichit is not true as external eventtypes. Specifically.weaddthe sentences Happens(a, t 1, t2) AAgent(a)~ Happens*(a, t 1, t2) (-~Vs[Holds(f , Result(a,s) ) ¢, Holds(f,s)]) --+ -~Agent(f) V Agent(a) and followa circumscriptionpolicy of minimisingAbsit at a high priority andthen minimising Aband Happens* in parallel, allowingHappens, Result, SOandState to vary. For example,supposethe laundry domainis augmented with a neweventtype Wind-blows, with the followingcharacterisation: Holds(Windy,Result(Windblows,s ) Holds(Dry,Result(Wind-blows, s)) +-- Holds(Hanging. andsupposethe followingcontrollability predicates are added: Agent(Wash) Agent(Hang-out) ~Agent(Wind-blows) Agent(Clean) Thengiventhe previousnarrativedescription,it is still possibleto proveHolds(Clean, State(8)) but it is no longer possibleto prove~Holds(Dry,State(8)) since the circumscription policy admitsmodelsin whichone or moreevents of type Wind-blows take place. Concurrent events and Weaver Theproofof Holds(Clean, State(8)) in the previoussection relies on minimisingthe Cancelspredicate so that the effects of Wind-blowsare assumedto be independentfrom the effects of the action types, shouldthey overlap. Interesting external eventswill not alwaysbe independent in their effects fromthe actionsavailableto the agent, andso in general somemechanism is required to makethe effects of these events precise. Weaverdoesnot currently support the explicit specificationof Cancelspredicatesin the domain,instead it requires that no twoeventsthat alter the samefluent can overlap.This is expressedas a limitation on the Happens predicate,rather thanthe Cancelspredicate, becauseit is moreconvenientto reasonabout the times of completionusing Happens.In the laundry example,there are no altered fluents in common betweenthe Wind-blows eventandanyof the actions, so no restrictions needto be added. Themotivationfor allowingmultipleexternaleventsthat maynot interact with eachother is the potential for efficiencyin planningalgorithmsrather than for representational power.In this model,the externaleventscanbe split up into smallerunits, andthe disjointnessrestriction enforcesa degreeof structure. Thefact that differentsets of events maybe combined in different states enablesa more parsimonious representation. The ways in which a planner mayexploit this structure are discussed in the next section. Balanced conditional narratives Whenexternal events are present in the domain,the planner has incomplete knowledgeabout the initial state or there are non-deterministicaction effects, a plan with conditional steps may succeed where any non-branching plan would fail. Such a plan has steps that are only executed when certain facts are true of the world at the time of executing the plan. Since plans in Weaverare translated into narratives in this model,conditional plans are represented by conditional narratives. For example: Happens(Wash, 1,2) Happens(Hang-out, 2, 3) 6-- Holds(Windy, State(2)) Happens(Spin-dry, 2, 3) 6-- ~Holds(Windy, State(2)) Here the domainof section has been extended with the action Spin-dry, underagent control, whichdries the laundry without the wind. Note that there is no modelof the domain and this narrative in whichboth Hang-outand Spin-dry take place. In order to showHolds(Dry,State(3)) the proof that circumscription of narratives entails Baker’s circumscription policy must be extendedto conditional narratives, but this is simple to do. Miller and Shanahan’sproof takes an arbitrary model M of CIRC[TA N; Ab, Happens; Result, SO, State] (1) where T is a domaindescription and N is a narrative description, or a conjunction of atomic statements of the form Happens(a, 7-1, 7-2). They show that Mmust also be modelof Baker’s system, C I RC[T; Ab; Result, SO] (2) The proof works by deriving a contradiction: since Mis a modelof T, if it is not a modelof 2, then there is a model AT/that is strictly preferred to M. From~/a modelthat is strictly preferred to Maccording to 1 can be found, which violates the original assumption. This proof can easily be generalised to allow N to contain a finite numberof conditional occurrencesof the form Happens(s,7-1, 7-2) ~ Holds(f, r0), where 7-0 _< 7-1, by observing that any model Mof (1) is a model of some unconditional narrative formed from N according to the actions which occur in the M’s interpretation of Happens. Nowthe original proof can be re-used to show that Mis also a modelof (2). Minimisationof Happens*with arbitrary conditional narratives, however,can producenon-intuitive results. This is because a minimalmodelcould include extra actions, either external events or those underthe agent’s control, in order to ensure that the preconditions of the conditional occurrences do not hold. Considera subset of the previous narrative: Happens(Wash,1, 2) Happens(Hang-out, 2, 3) 6-- Holds(Windy, State(2)) 48 From this narrative it is possible to prove -~Holds(Windy,State(2))[ This is because in a minimal model for happens*, no Wind-blows events take place, so the precondition will not be satisfied and only the Washaction takes place. Despite its appeal to the paranoid, modelsin whichexternal events dependon the agent’s conditional future plans are undesirable. For this I introducethe concept of balanced conditional narratives, whichhave the property that wheneverthe narrative includes a conditional occurrenceoftheformHappens(A,tl, t2) 6-- P where P is a combinationof statementsof the formHolds(f, State( t 1) using disjunction, conjunctionand negation, then the set of all such statementsin the narrative that refer to State(t 1) must form a partition of the possible situations that hold at that state. In other words, whateverthe combinationof fluents that hold at time t 1, there mustbe exactly one action that will be taken. The narrative above with three occurrence statements is dearly a balancedconditional narrative. For the remainderof this paper, all conditional narratives are assumedto be balanced. The restriction to balanced conditional narratives does not reduce the expressibility of the language if we include the action type Void in the domain, whichis not mentioned in the domaintheory. Thenconditional occurrences of the Void action can be addedto the narrative simply to achieve balance. Planning Having introduced the language used to modelthe class of problems for which the Weaverplanner is designed, I will describe the planning algorithm and sketch a proof that the planner is sound, i.e. that plans producedby the planner in responseto an initial state and goal description will achieve the goal fromthe initial state. Weaveris based on Prodigy4.0, with extensions to create conditional plans. In the following description of Prodigy 4.01 follow (Fink &Veloso1995). Prodigyis given an initial state description SOand a goal G, a first-order sentence in the fluents of the domain.Oncompletion,Prodigy returns a plan, a totally-ordered sequence of actions (ai[l < i < n) such that Holds(G, Result(an,Result(an- 1, . . . Result(a 1, SO) (3) At any point in its execution, the state of Prodigyis described by its current incomplete plan. An incompleteplan has two parts, a head-plan and a tall-plan, depicted in Figure 1. The head-planis a totally ordered sequenceof action types, representing the prefix of Prodigy’scurrent preferred plan. Thetall-plan is a partially orderedset of actions, built by a backward-chainingalgorithm. Prodigy begins execution with both the head-plan and tall-plan empty. At each point in its execution, Prodigy applies an execution simulation algorithm to its head-plan, resulting in a simulated state Current. If Holds(G,Current) is true. Prodigy terminates and returns the head-planas its plan. Otherwise, one of two steps is non-deterministically chosen: (1) pick an action type from the tall-plan that has <- -" head-plan - - [<- gap-~ <- - tail-plan - ~" Figure1: Representationof an incompleteplan. no childrenandmoveit to the endof the head-plan,or (2) adda childless actiontype a to the tail-plan as the child of an existing step. Theaction a mustbe suchthat Agent(a)is true. Thesesteps determineProdigy’ssearch space, along witha specificalgorithmto determine the set of actiontypes considered to addto a tall-plan in step (2). In the absenceof externalevents,the potential completeness of the searchspaceis easy to see: with a simplealgorithmin step (2) that only addsaction types to the empty tall-plan andconsidersall possibleactiontypes,all possible sequencesof actions of increasinglength can be generated in the head-plan.In practice, a back-chaining algorithmwill be usedso that the tall-plan is a goaltree andback-jumping techniqueswill further reducethe search space. For more details, see (Fink&Veloso1995). In orderto provethe soundness of the algorithm°i.e. that whena plan is returnedonecan proveequation3, it is not necessaryto knowanythingabout the back-chainingalgorithmusedto generatethe tall-plan, or the details of the searchstrategy. All that is importantis that the execution simulationapplied to the head-planmakescorrect predictions aboutthe state resulting froman actual executionof the head-plan. Executionsimulation maintainsa modelof the state by applyingthe changesprescribedby the domain modelfor the action types in the head-plan,andno others, to a modelof the initial state. Thereforea lemmaused by Kartha(Kartha1995)to provethat Baker’scircumscription policy andPednault’sADL languageare equivalentfor a class of domainscan be adaptedto showthat the plans producedby Prodigyare correct with respect to the action languagedevelopedhere, for the sameclass of domainsand in the absenceof externalevents. Thusin the absenceof external events, if a narrativeis expressedas Basedon this analysis, Weaverthen re-directs Prodigyto producea newplan, makingtwodifferent kindsof modification. First, newactionsmaybe addedto the tall-plan whose effects are chosento stop external eventsfromhavingeffects that interferewiththe plan(rather thanonlyto achieve the preconditionsof other actions). Second,the plannercan be appliedto twoor morecasesof the executionsimulation, accountingfor different occurrences of externalevents, and combiningto producea conditionalplan. Weaver’soverall algorithmis summarisedin Figure 2. Thetechnical details of choosingactions and managingback-chainingfor conditional plans are not coveredhere: they can be found in (Blythe1994)and(Blythe1996). Pickaninitial state Createa plan with Prodigy Buildanexecution graph Planok? Return plan Modifyplanningstate Figure 2: Summary of Weaver’salgorithm n A Happens(ai,t~, t~) i=1 whereti < t~ _< ti+l for 1 < i < n and wherethe plan (aill < i < n) is returned by Prodigyto achieve goal frominitial state So, thenwecan prove Vt >_t~n [Holds(G,State(Q)] Planning with external events Weavermakesuse of Prodigyto produceplans ignoring external events,then determines externaleventsrelevantto the planandconsidersthe different outcomes that canarise from the plan basedon the possibleoutcomes of external events. 49 In this sectionI sketcha proofthat the plansproduced by Weaver are correct whenthe initial state is certain andthere are no actions with nondeterministiceffects, t In order to showthis it is necessaryto describeWeaver’s plan evaluation modulein moredetail. Weaverevaluates a plan by constructing an execution graph. The execution graph is a directed graph each of whosenodesis labelled with a fluent anda time, suchthat no twonodesshare the samefluent and time. Thearcs are labelled with either the symbolpersistence or the symbol |Weaver is correctwithouttheseassumptions, butthe proofis morecomplicated. effect. Assumlng that the plan is representedby the narrative N : A Happens(ai, ti_ h ti) i=1 whereto < ta < ... < tn-1 < tn, then the possible times in the node labels are the ti. First, the plan consisting of just these actions is simulated by Prodigy. Thenthe graph is constructed as follows: 1. For eachfluent g~ in the goal statement,the node(g~, t,~) is added. 2. At each time t~ wherei > 0, for each fluent f such that there exists a node(f, ti), (a) if the value of f does not change betweenState(h_l) and State(ti), a node (f, ti-1) is added and an arc (labelled persistence) is addedbetween(f, ti-1) and (f, ti). (b) for each predicate the form Holds(f, Result(a~, s) ) 4- intheplanning domalnT, for each fluent pj mentionedin P. a node (pj,ti-1) is addedto the graph, and an arc (labelled effect is addedfrom(pj, t~-l) to (f, tl), regardless of whetherf changesfrom State(ti_ 1) to State(ti Conditionalplans are dealt with by creating an execution graph for each branch of the plan. Weaveruses the execution graph to search for potential flaws in a candidate plan by examluingeach link. ff a link [(f, ti) --+ (f, ti+l)] is labelled persistence then Weaverlooks for external events e such that Holds(~f, Result(e, s)) 6-- is a s tatement in T and P is consistent with the fluent values in the nodes for time period tp given by tp = max{t~lt~ < ti - duration(e)}. In order to showthat the event maycause the plan to fall, Weaverexamines the preconditions P to see if they are actually applicable in the simulated state, or if somechain of events can makethem so. Whileit is myultimate aim to showthat this technique is correct, and that Weaverworks for nondetermlnisticactions as well, in this paper I provea weakerresult: Lemma I Let the domainT, consisting of action types and external events, contain no non-deterministic effects and a completely specified initial state SO. Let the narrative N be such that Weaverdiscovers no persistence-threatening events in the execution graphfor goal G, then G is entailed by the circumscription policy of(l). Proof sketch: Supposethere is a model Mof(l) with N and T defined for this domainand narrative such that the goal statement G is not true in State(t ,~ ). Thenthere mustbe an earliest timet k for whichthe valueof at least onefluent in the nodes(f, t k is different from that in the simulated narrative, t~ > to since the initial state is fixed. Considerthe links to (f, tk) from nodeswith time label tk- 1. If there is no persistence link, then there is an effect link for ak correspondingto an expression Holds(f, Result(ak, s) ) +- P in such that th e 5O preconditions P are satisfied in State(tk_ 1) but the result does not hold in State(tk). This contradicts that Mis a model of T A N. Therefore there must be a persistence link (f, tk-1) ( f , t k ) suchthat .f hasthe samevaluein State( t h_l ) in Mas in a modelin whichno external events occur, but a different value in State(tk). Thus Mmust include some external event occurrenceHappens(e,t’, t) with tk-i < t < tk that changes the value of f. But then Weaverwould have found this event type in its examinationof the executiongraphfor N. ¯ The purpose of the execution graph is to focus Weaver’s attention on the external events that are relevant to the plan. If a modelMdiffers from the modelwith no external events only in events that do not affect any of the persistence links in the executiongraph, then the goal will be satisfied in M. This graphical structure allows someof the independenceof events and of action outcomesthat mayexist in the domain to be exploited for efficient planning. If Weaverdoes discover persistence-threateningevents, it is still possibleto provein someciro_~mstances that the plan satisfies the goal. For each such event, if the preconditions P of the event are consistent with the state whenthe event begins, any extra fluents from P are addedto the execution graph at the time point for this state. All event sequences maybe considered that take place prior to this time stage and maymakethe preconditions of the event be satisfied. If no sequence of events is found that makesat least on persistence-threatening event possible, the plan is correct, by an extension of the previous proof. Such a search may be prohibitively expensivein general, but Weaveris able to use the effects of the eventsto restrict its search at each time point for events that mayalter the state for relevant fluents only. Note that one of the desired features mentionedin the introduction is satisfied: since external eventsare modelled in the sameform as the actions, a planner can reason about their effects and causes in order to makeuse of them to achieve somegoal. Althoughthe planner cannot directly cause an event e to take place, it maybe possible to produce a state s in which e wouldhave a desired effect if it did take place. Sucha plan could not be provedto satisfy the goal in the languageof the previous section, but there may be a non-emptyset of modelsin which the plan satisfies the goal. Waysto makeuse of plans that sometimessatisfy their goals are mentionedin section. Plans using external events can be created in Prodigy by relaxing the constraint that Agent(a)be true for all actions a addedto the tall-plan. I note in passing that the conceptof a goal specification must be refined for planners in domainswith external events. Twopossible refinements for the notion of reaching a goal state are (1) reaching a state s such that the goal condition is true in s and all subsequent states, or (2) reaching state s wherethe goal condition is true, regardless of its value in subsequent states. Weaveruses the second, weaker definition but the argumentsabout efficiency madein this paper hold for either definition. An extended example In order to talk about planningwith external events. I introduce an example planning domain concerned with reading a book in peace in myhouse. This requires being in a different roomfrom mycat, whichdemandsattention. I allow function terms to represent fluents and actions and represent the domainas follows. First the actions Getbook, Go, Close and Catmovesare specified: Happens(Move(Diningroom),1, 2) 6-- Holds(Cat(Ifitehen), Figure 3 showsthe execution graph created by Weaverfor this plan, whichit uses to search for relevant events. Since it finds that events of type Catmovesare relevant, shownin the annotated execution graph in figure 4. lemma1 cannot be used to showthat the plan achieves the goals. The reason the plan does achieve its goals is that wheneverthese events can take place, the conditions under whichthe event changes the value of Cat(Diningroom) are not satisfied. Weaverhas addedarcs labelled stop-event to showwhythese conditions do not hold. Holds( Havebook,Result( Getbook,s Holds(In(z), Result(Go(x), s) ) +-Holds(Cat(x), Result(Catmoves(z), s)) +-Room(x) A Open(x) A In(x) -,Holds(Open(z), Result(Close(x), s)) 6-- Room(z) Efficiency issues for the planner A backward-chainingplanner, whether partial-order or otherwise, must be able to evaluate a candidate plan for some goal efficiently. Whethergoals can be satisfied transiently or mustbe satisfied permanently,the planner needsto effectively computethe effects of a plan across manybranches of a tree of future states. In this section I describe three techniquesthat can improvethe efficiency of this process. Ftrst, given a particular plan the independenceimplicit in the multiple events modelcan be used to ignore those events that cannot affect the outcomeof the plan with respect to goal satisfaction. This approacheffectively collapses manymodels of the plan’s effects by ignoringunimportant differences. Second,a planner mayattach information to outcomesof actions and events specifying whichit believes are morelikely than others. These can be combinedaccording to laws of probability or Dempster-Shaferevidence combination, for example, and can often allow a planner to have a high confidence in plan success while examininga relatively small fraction of the possible futures. Third, efficiency gains can comefrom a careful managementof the time spent searching for candidate plans versus the time spent evaluating them. It can often be the case that a candidate plan can be ruled out before even all the relevant external events have been identified. I describe the trade-offs involved in choosing whereto spend computational effort below. Next the domainobjects: Room(x) ¢:~ [z = Kitchen V z = Diningroom] Third,the initial state: ~Holds( H avebook,So) Holds(Cat(Kitchen), Holds( In (Kitchen), Fourth, information about controllable actions and action durations: Agent( Getbook) A Agent(Go(x)) A Agent(Close(x)) Duration( Getbook ) = 1 A Duration(Close) = Duration( Catmoves) = 1 A Duration(Go) and finally the goal is specified: Goal ¢~ 3sx[noom(x) A Holds( ln(x), s) A~Holds(Cat(x), s) A Holds(Havebook,s)] Exploiting independence in multiple events The number of models for a given candidate plan is exponential in the plan’s length, as the numberof external events that maytake place increases. Whileall the actions chosenby the planner are presumablyrelevant to the plan, manyof the external events that can take place will have no relevance, becausethe fluents they alter do not affect the actions in the plan or the final goal. If these events can be ignored, the size of the tree can be exponentially reduced. The newset of modelsconsidered will not uniquely specify states that arise while the plan is executed,but will capture enoughdetails to ensurethe plan is correct. The execution graph filters out someof the events that cannot affect the plan, and achieves this to a limited degree. In fact Weaverexaminesevents that might threaten a persistence link further by checkingif its preconditionsare completelysatisfied in its starting state, allowingfor other Similarly to the examplefrom the laundry domain, the two-step plan represented by the narrative Happens(Getbook, O, 1) Happens(Move(Diningroom),1, 2) does not achieve the goal in every model of the domain. This is because there exists a model that includes the event occurrence Happens(Catmoves(Dinmgroom), 1, 2). However. the three-step plan represented by Happens(Close(Diningroom),0, 1) Happens( Getbook, 1, 2) Happens(Move(Diningroom),2, 3) does solve the plan. Weaveris able to find this plan, by modifying the plan above, and also a conditional plan that can be proved to satisfy the goal. This conditional plan is Happens(Getbook, 0, 1) 51 1) 0 Close(Diningroom) 2 Getbook Move(Diningroom) 3 ! I I ln(Diningroom) nav ~book persistence I I I I I Hav~book I I I I I iningroom)persistence:Cat(Di],ingroom) persistence=Cat(Di~ingroom) Cat(Dilingroom) persistence~, Cat(D I I I I Figure3: Thefirst stage of creation of the executiongraph,before external events are found. Thegraphcontains no arcs labelledeffect becausethere are noconditionsnecessaryfor the actionsto achievethe effects in the graph. model.Thusnon-deterministicplanners that makeno use of probabilities, suchas Cassandra(Pryor&Collins1993), couldbe extendedto use the sameaction model. eventsthat mighttake place. Provingthat this morestrict filter for eventsis correctrequiresaa extensionto the proof presentedhere. I havealso investigateda meansto filter out irrelevant eventsbasedon a static analysisthat canbe made fromthe domainbefore planningbegins(Blythe 1996). Thepowerof this approachdependson the use of small, relatively independent events, whichmaytake place concurrenfly. This representationchoiceis thereforeimportantto the modelsatisfyingthe third criterion mentioned in the introduction,allowingrelevanteventsto be foundefficiently. Whilethe representationdoesn’tguaranteethis, it doesallowit. Interleaving planning and plan evaluation Using a probabilistic model of the non-deterministic outcomes In modelsof real-worlddomains,someof the possibleoutcomesof actions or events mayoften be considerablymore likely than others, and in events with manypossible outcomes,a small numberof themmayoccur a high proportion of the time. This informationcan be usedby a planner to both evaluate a plan, to withinsomeerror margin,more quickly, andto create plans morequicklyby focussingon the moreimportantscenariosin a candidateplan. Thelikelihoods of individual eventor action outcomescan be combined to providea likelihood that somepropertywill hold after executinga plan, includinggoalsatisfaction. Knowledge about independenceof events and actions can again be used to find moreefficient techniquesthan summing branchprobabilities over all possible branchesof the event tree. Eventhoughan event is relevant to the plan, it maynot be relevant to everystep, andin fact the goal structure of the planleads directly to an independence structure for the events andactions that can be used to computethe probability of goal successin a Bayesiannet (Kuslamerick,Hanks,&Weld1993). Indeed, the execution graph described in section is implementedas a Bayesian net in Weaver.Withinthe modelof events describedhere, the use of probabilitiesis viewedas a techniquefor efficient planningrather than as a fundamental propertyof the action 52 Since computationalworkis required to evaluate the effects of a plan, there is a spectrumof possible planning approachesbasedon the proportionof time spent searching for plans andthe proportionof time spentevaluatingplans. At one end of the spectrum, a planner mightattempt to create a plan by backward chaining on actions andevents, andignoringthe effects of externaleventsuntil a complete planis produced. Atthis point the relevanteventsare determined,usingthe techniquesof section, anda reducedevent tree or Bayesiannet mightbe usedto find the probability that the plansucceeds.(It will probablybe low.)I refer sucha planneras a "greedyplanner". At the other end of the spectrum,a plannermightreason aboutthe external events that mayaffect each action whenit is chosen. Since the earlier actions chosenby a backward chaining plannerappearlater in the plan, these eventstructureswouldnot be identicalto the final eventtree corresponding to the appropriatepoint in the plan since the possible states are unknown.Howeversomeelements of the tree can be determinedaheadof time, and these might allowthe plannerto pruneuselesspartial plansmuchearlier than the greedy-planner would.I refer to this planneras a "greedyevaluator". Atruly efficient plannerfor dynamic domains will operate somewhere in the middleof this spectrum.Theexact point dependson the number andthe impactof the external events in the domain°In a domainwith a large numberof events whichfrequently turn out to be irrelevant, a moregreedy planner will be appropriate. In a domainwhereextexnal events are closely coupledwith actions (for exampletheir preconditions are satisfied bythe effects of the action) and frequentlyinterfere withplans, a moregreedyevaluatorwill be appropriate. 0 , Close(Diningroom) Getbook 2 Move(Diningroom) I n(Diningroom) I 3 ’, I I = ln(Diningroom):l’ persistence Ha~~.book sto~nt ~ Cat(Dlmygroom)persistence =, Ha,)lebook Open(Di~ingroom)_ persistence Open(Di~ingroom)__ Cat(Dir~ngroom) ~ ~r~ , persistence # (" Catmoves(Dinin0room) P~stence ~ Cataaini l"~°°m ’ Calmoves(Diningreom) Cat(Dini~gr°°m)% Calmoves(Diningroom) Figure 4: Theexecutiongraph, after events have beenexamined.Theroundedlines showthe events whosepreconditions wereaddedto the graph.Theselines are not part of the graphitself. Discussion I havedescribed a modelfor actions in dynamicdomains, wherethe effects of eventsbeyondthe agent’sdirect control can be modelledas well as those of its actions. Themodel satisfies someusefulcriteria for the construction of efficient plannersin these domains.Themodelof external eventsallowsthemto be incorporatedin plans by backward chaining planners.It also allowsefficient reasoningaboutthe events relevant to a particular plan by exploiting their independencestructure. Themodelcan be describedby a modified versionof narrativesin the situation calculus,describedin (Miller & Shanahan1994), Althoughit does not makefull use of Miller and Shanahan’sdescriptions of simultaneous eventsandis less expressivethanothersthat addresssimilar issues, for example (Shanahan 1995),(Baral 1995),it is of only a few currently used in an implementedplanner. Themodelof non-deterministicaction used here is similar to that of the Buridanplanner(Kushmerick, Hanks,&Weld 1993). Thereis aneedboth for moreanalysisof modelslike these andof appropriateplanningalgorithmsandtechniques,and for moreempirical studies of such domalnsand planners. Weaveris currently a greedyplannerthat iteratively improvesits plan. Thatis, it constructsa completeplanwithout consideringextemalevents, andthen evaluatesthe plan withrespectto the externalevents.It thenattemptsto repair its plan by either addingsteps before events to makethem inapplicableor addingsteps after events to recoverform their effects. In future workI will use the Weaver architecture as a platformto investigate the tradeoff betweentime spent planningand time spent evaluatingplans in dynamic domains. 53 This short paper has ignoredmanyinteresting aspectsof planningin dyn0mic,non-deterministicdomains.In particular I haveassumedcompleteobservability and I assumed the plannercreates entire plans beforeexecutingthem.The modelof time used here is very simpleandthere has been no accountof temporalgoal satisfaction (whereconstraints are placedon the timeinterval overwhichthe goalis satisfied). However the languageserves to illustrate someissues for planners and for action languageswhichwill remain importantin languagesthat handlethese features. References Baker,A. B. 1991.Nonmonotonic reasoningin the frameworkof the situationcalculus.Artificial Intelligence49:5. Baral, C. 1995. Reasoning about actions: Nondeterministiceffects, constraints, andqualification. In Proc.14thInternationalJoint Conference on Artificial Intelligence, 2017-2024.Montrb,al, Quebec:Morgan Kaufmann. Blythe,J. 1994.Planningwith external events.In de Mantarns, R. L., andPoole, D., eds., Proc.TenthConference on Uncertainty in Artificial Intelligence, 94-101.Seatde, WA:MorganKaufmann. Blythe, J. 1996. Decompositionsof markovchain~ for reasoningabout external changein planners. In Drabble, B., ed., Proc.ThirdInternationalConference on Artificial Intelligence PlanningSystems. Universityof Edinburgh: AAAIPress. Fink, E., and Veloso,M. 1995. Formalizingthe prodigy planningalgorithm~In Ghallab,M., andMilanl,A., eds., NewDirections in AI Planning, 261-272. Assissi, Italy: IOSPress. Kartha, G.N. 1994. Two counterexamples related to baker’s approachto the t~ameproblem.Artificial Intelligence 61:379-391. Kartha, G.N. 1995. A Mathematical Investigation of ReasoningAbout Actions. Ph.D. Dissertation, University of Texasat Austin. Knoblock, C.A. 1993. Generating Abstraction Hierarchies: An Automated Approach to Reducing Search in Planning. Boston: Kluwer. Kushmerick, N.; Hanks, S.; and Weld, D. 1993. An algorithm for probabilisticplanning. Technical Report 9306-03, Department of ComputerScience and Engineering, University of Washington. Miller, R., and Shanahan, M. 1994. Narratives in the situation calculus. Journal of Logic and Computation4(5). Pryor, L., and Collins, G. 1993. Cassandra: Planning for contingencies. Technical Report 41, The Institute for the Learning Sciences. Shanahan,M. 1995. A cirolm.~criptive calculus of events. Artificial Intelligence 75(2). 54

A Representation for Efficient Planning in ... Jim Biythe

Related documents

Products

Support

A Representation for Efficient Planning in ... Jim Biythe

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib