A Representation for Efficient Planning in ... Jim Biythe

From: AAAI Technical Report WS-96-07. Compilation copyright © 1996, AAAI (www.aaai.org). All rights reserved.
A Representation for Efficient
Planning in DynamicDomains with External Events
Jim Biythe
School of ComputerScience
Carnegie Mellon University
Pittsburgh. PA15213
jblythe@cs.cmu.edu
Abstract
the occurrence of events at specific times and deal with
concurrent events.
I divide event types into twosets: those underthe control
of the planningagent (actions) and those beyondthe agent’s
control (external events). This division introduces a notion
of controllability into the action languagewhichis crucial
for reasoning about the plans of agents. I use a scoped
circumscription policy as described in (Miller & Shanahan
1994) to reason about plans in the presence of external
events. The planner does not explicitly use circumscription
to reasonaboutplans, but it creates plans that are soundin the
sense that goal satisfaction can be provenfrom the domain
model,initial state and actions using ciro~mscription.
In the next section I present a reviewof Miller and Shanahan’s modelfor narratives in the situation calculus. In the
following section I describe modifications that modelthe
language for external events used by Weaver. The semantics for the events werechosenboth for their representational
powerand to allow for efficient planning algorithms, which
Is achieved by makingaspects of structure explicit and by
allowing convenient reasoning for backwardchaining planners. After this I describe the planner and also sketch proofs
I present a languagefor specifying planningproblemsin
whichthere are external events, that is, events beyondthe
direct control of the plannerwhichmaytake place while a
plan is being executed. The languageis based on Miller
andShanahan’s
representationof narrativesin the situation
calculus.I describean implemented
plannerandprovethat it
produces
correctplansfor a subsetof this language.I discuss
howsomeaspectsof the languageaffect the efficiencyof the
planner.
Introduction
Animportant direction for AI planning systems is to handle
dynamic domsins -- domains where changes take place
that are not the direct result of the actions of the agent, and
wherethe world describes a state trajectory in the absence
of any actions from the planning agent. In manyreal-world
planning problems, for example, change to the world takes
place due to other agents, or due to forces of nature. The
planning agent mayhave limited control over these external
events, and limited knowledgeabout both events that have
occurred and events that maypossibly occur in the furore.
The aimof this paper is twofold: to give an accountof an
action modelfor dynamicdomainsthat is used in an implementedplanner, and to showhowissues of representational
powerand computational efficiency jointly influence the
model. A model for dynamic domains for planners should
haveat least the followingthree features. First, the planner
should be able to take advantageof events beyondits direct
control, incorporating theminto a plan as appropriate. Second, the modelshould allow the planner to reason about a set
of different possible futures, each being consistent with the
agent’s knowledgeof events. Third, the planner should be
able to efficiently determinethe relevant events that mayeffect candidate partial plans, and obtain informationthat may
allow a modified plan to be madesafe from these events.
In this paper I focus on a modelfor representing external
changethat is faithful to the Weaverplanner (Blythe 1994;
1996). I discuss the extent to whichit satisfies the criteria
above. The model is based on narratives in the situation
calculus as described by Miller and Shanahan (Miller
Shanahan1994). This language was chosen because it provides an extension to the situation calculus that can describe
thatWeaver
is sound
withrespect
to thismodel.
Inthefinalsection I describe techniques for reasoning about plans
within the modelthat help satisfy the third criterion of efficiently finding the relevant external events. Thefinal section
contains a brief discussion.
A reviewof narrativesin the situation calculus
Miller and Shanahanprovide sorts for situations, action
types and fluents. The term Result(a, s) represents the situation resulting from performingaction a in situation s.
The formula Holds(f, s) represents that fluent f holds in
situation s. For example, consider a laundry domainwhich
contains two action types, Washand Hang-out, and is described by three finents, Clean, Dry and Hanging.Actions
of type WashmakeClean true and Dry false, while actions
of type Hang-out makeHangingtrue. This is represented
by the following formulae:
Holds(Clean,Result(Wash,8))
-~Holds(DryJ~esult(Wash,8))
Holds(Hanging,Result(Hang-out,8))
wheres is universally quantified over situations. In this
45
case neither action has any preconditions.Whenthere are
actionpreconditionsor context-dependent
effects, theywill
be represented as implications. FollowingBaker (Baker
1991), Miller andShanahaninclude the frame axiom
[Holds(f,Result(a,s) ) +--Holds(f,s)] +-- -,Ab(a,f, s)
andfour existenceof situation axioms:
Holds(And(g l g2), s)+-~
[Hol
ds(g l, s) AHolds(g2, s)]
Holds(Not(f), ’9 ~ ~Holds(f, s)
Holds(#,Sit(g)) +-- ~Absit(g)
Sit(gl) = Sit(g2) gl= g2
Facts aboutresulting states maybe provedusinga circumscription
policythat miniraises Absitat a higherpriority
than Ab whileallowingthe initial state SOand the Result
function to vary. For examplethis can be used to prove
Holds(Clean,Result(Hang-out.Result(Wash,SO))) in
laundry domain.
Miller andShanahan
extendthe representationto describe
narrativeswhichdeal with the occurrenceof actual events.
Theyaddto the language
a newsort for times, interpretedby
the non-negativereals, a newpredicateHappens(a,
r, 7-’),
representingthat an actionof typea happensin the timeinterval between7- and7-’, anda newtermState(7-) denoting
the situationat time7-. Aneventhistoryis thenrepresented
by a conjunctionof Happenspredicates, and two newaxiomslink the State termto the eventhistoryin the intuitive
way(see (Miller&Shanahan
1994)for details). Thefirst
iomsaysthat the situationat timet is Result(a1, State(t1)
if actiona 1 completes
at timet 1 andno otheractionhappens
betweentl and t:
(State(t) = Result(al,State(tl)))
+-- [Happens(
a l, tO, tl) AtO < tl Atl < t
~3a2,t2, t3[Happens(a2,
t2, t3)
[al¢ a2Vtl ¢ t2]Atl _< t2At2 < tl]]
Thesecondaxiomdescribesthe initial situation:
State(t) = So +-- -,3al, tl, t2[Happens(al,
tl, t2)Atl < t]
Thecircumscription
policy is amended
to mlniraise Absit
at a high priority andthen to minimiseAband Happens
in
parallel, allowingResult andState to vary. Thisintuitively
representsthe assumption
that no eventstake place that are
not explicitly representedin the domain.If welet T stand
for the conjunctionof the domaindefinition axioms(Holds
predicates), the frameaxiomsandnecessaryuniqueness-ofnamesaxioms,andlet Nstand for the narrative predicates,
thenthis policycanbe stated as
CIRC(T
A N; Absit;Ab, Happens,
Result,Holds,So, State)
ACIRC(
T AN ; Ab;Happens,
Result, Holds,So, State)
WhereCIRC(T;rr, 0; ~b) denotes the parallel circumscription of 7r and0 in a theoryT with~ and~b allowedto
vary. Miller and Shanahanshowthat the circumscription
46
policy they introduceentails the circumscriptionpolicy of
Baker(Baker1991). whichallows useful deductionsto
madeabout states of the world. Theproof of this assumes
that the narrativedescriptionis a finite conjunction
of atomic
Happenspredicates. For example,if weadd to the laundry
domainthe followingnarrativepredicates:
Happens(Wash,
1, 4)
Happens(Hang-out,
5, 6)
wecan proveHolds’(Clean,
State(8)).
Overlapping actions
Finally, predicatesare addedto specifythe effects of action
occurrenceswhichoverlapin time. In mostcases wemight
expectthat twoactionoccurrences
that overlapwill not alter
eachother’s effects, andthat is the default casein Miller
and Shanahan’slanguage. Howeverthe predicate Cancels
is includedto specifywhenthe effects of actionsthat take
place concurrentlywill be altered, andan axiomis added
to derive the fluents that holdafter concurrentactions are
taken. This language
allowsa muchricher set of interactions
than is supportedin Weaver.and I describe axiomsbelow
that restrict this model.Forthis reasonthe full modelis not
presentedhere. Details can be foundin (Miller &Shanahan
1994).
A newfunction symbolDurationmapsactions to numbers, denotingthe durationof an actiontype, whichmustbe
positive. Twooperators are used to combineaction types.
al; c~2 represents the action of al immediatelyfollowed
by a2 and a l&a2represents the actions performedconcurrently. Twofunction symbolsrepresent waysto decompose
operatorsin time. Head(a,6) represents the first ~ time
units of actiontype a andTail(a, 6) representsthe remainder of a. Anysuchaction derivedfroma will be termeda
"horizontalcomponent"
of a, andthe predicateSlice (a 1, a)
is definedto meanthat a I is a horizontalcomponent
of a.
Thepredicate Part(al, a) meansthat al is derivedfroma
by taking somesequenceof horizontal and vertical components. Anequivalenceoperator~ is introducedto represent
that twodifferent waysof writingan action are identical.
Thesesymbolsare definedby the followingaxioms:
Result(a l ; a2,s) =Result(a2,
Result(a s)
al ~ a2 ~ Result(al, s) = Result(a2, s)
a ~ Head(a,d); Tail(a, d)
Part(al, a2) ~ [Slice(al, a2)V3a3,a4[Slice(al, a3)Aa2~ a3&a4]]
ThepredicateCancels(a2, a 1, f, s ) representsthat action
a2 stops a 1 fromhavingits usual effect on fluent f when
the twoactions are performed
concurrentlyin situation s.
Axiomsare included so that in this case any equivalent
action to a2 will also cancel al, as will any horizontal
component.
Theadditionalaxiomfor definingeffects of actions is
holds(f, Result(a,s)) +-[a ~ al&a2A Holds(f, Result(al, s))A
~3a3[Part(a3,
a2) A Cancels(a3,al, f, s)]]
In order to reason about narratives with overlapping
events,axiomsare addedthat assert that the occurrence
of an
actionis equivalentto the occurrence
of all its components.
horizontal or vertical. Thepredicate ~ is ciro~mscribed
at the highest priority, then Absit andthen Cancels,and
finally Ab and Happensin parallel. Onemorepredicate.
t (c~, rl, r2), is introducedto representthat action
Happens
c~ takes placein the time periodfromrl to r2 andno other
actiontakes placein this timeperiod,definedby the following axiom:
t (a, t 1, t2)
Happens
[Happens(a,
t 1, t2)A
--,3a1, t2, t 4[Happens(
a 1, t3, t4)
Atl < t3 At4 < t2 A~Part(al, a)]]
Finallythe axi-omrelating the State andResultfunctions
is alteredas follows:
State(t)=Result(a 1, State(1)) +[Happenst(al,tl, t2) At _>t2A
-~qa3,t3, t4[Happens(a3,
t3, t4) At2 < t4 At4 < t]]
Modifications to Miller and Shanahan’s
Model
A small numberof additions and restrictions are madeto
the languageof narratives describedin (Miller &Shanahan
1994)in order to providea modelfor the Weaverplanner,
andtheyare describedhere.First, a predicateis definedthat
capturesthe notionof the controlthat the agenthas over its
domain.Second,limits are placed on the wayconcurrent
eventscan interact to simplifythe planner’stask. Third, a
notionof conditionalnarrativesis introduced,to represent
conditionalplans in such a waythat Baker’scircumsciptive
policystill workscorrectly.
External events
Myaim is to describe a planner for domainsin whichexternal eventsmaytake placethat are beyondthe control and
prior knowledge
of the planningagent, althoughI assume
the agentknowsthe typesof external eventsthat mayoccur,
andtheir consequences
on the world.For instance,a robotic
agentmayknowthat it shares its domainwith other robotic
agents and with humans.Theother agents can movearound.
pick things up andleave doorsopen,but the planning agent
does not knowexactly whichof the possible events might
occur, or when.
In order to modeldomainswith external events I make
use of an extensionMiller andShanahan
introduceto their
languagein (Miller &Shanahan1994):the minimisation
the Happens
predicate in the circumscriptionpolicy is replacedby a Happens*
predicate, that only capturesa subset
of the events.In this case I aminterestedin whetherevents
are underthe controlof the planningagent.
I introducea newpredicateAgent(c~)to representthat the
action type c~ is underthe control of the planning agent.
Whenan action type is under the control of the planning
agent, one can reasonablyassumethat the action does not
take place withoutthe agent’s knowledge.
This is not true
47
for events e for which~ Agent(e)is true. Thepredicate
Agent(f) is also usedto denotethat the fluent f is only
altered by actions c~ for whichAgent(a). In whatfollows
I will refer to eventtypesa for whichAgent(a)is true as
action types andthose for whichit is not true as external
eventtypes.
Specifically.weaddthe sentences
Happens(a,
t 1, t2) AAgent(a)~ Happens*(a,
t 1, t2)
(-~Vs[Holds(f
, Result(a,s) ) ¢, Holds(f,s)]) --+
-~Agent(f) V Agent(a)
and followa circumscriptionpolicy of minimisingAbsit
at a high priority andthen minimising Aband Happens*
in
parallel, allowingHappens,
Result, SOandState to vary.
For example,supposethe laundry domainis augmented
with a neweventtype Wind-blows,
with the followingcharacterisation:
Holds(Windy,Result(Windblows,s )
Holds(Dry,Result(Wind-blows,
s)) +-- Holds(Hanging.
andsupposethe followingcontrollability predicates are
added:
Agent(Wash)
Agent(Hang-out)
~Agent(Wind-blows)
Agent(Clean)
Thengiventhe previousnarrativedescription,it is still
possibleto proveHolds(Clean,
State(8)) but it is no longer
possibleto prove~Holds(Dry,State(8)) since the circumscription policy admitsmodelsin whichone or moreevents
of type Wind-blows
take place.
Concurrent events and Weaver
Theproofof Holds(Clean,
State(8)) in the previoussection
relies on minimisingthe Cancelspredicate so that the effects of Wind-blowsare assumedto be independentfrom
the effects of the action types, shouldthey overlap. Interesting external eventswill not alwaysbe independent
in
their effects fromthe actionsavailableto the agent, andso
in general somemechanism
is required to makethe effects
of these events precise. Weaverdoesnot currently support
the explicit specificationof Cancelspredicatesin the domain,instead it requires that no twoeventsthat alter the
samefluent can overlap.This is expressedas a limitation
on the Happens
predicate,rather thanthe Cancelspredicate,
becauseit is moreconvenientto reasonabout the times of
completionusing Happens.In the laundry example,there
are no altered fluents in common
betweenthe Wind-blows
eventandanyof the actions, so no restrictions needto be
added.
Themotivationfor allowingmultipleexternaleventsthat
maynot interact with eachother is the potential for efficiencyin planningalgorithmsrather than for representational power.In this model,the externaleventscanbe split
up into smallerunits, andthe disjointnessrestriction enforcesa degreeof structure. Thefact that differentsets of
events maybe combined
in different states enablesa more
parsimonious representation. The ways in which a planner
mayexploit this structure are discussed in the next section.
Balanced conditional narratives
Whenexternal events are present in the domain,the planner
has incomplete knowledgeabout the initial state or there
are non-deterministicaction effects, a plan with conditional
steps may succeed where any non-branching plan would
fail. Such a plan has steps that are only executed when
certain facts are true of the world at the time of executing
the plan.
Since plans in Weaverare translated into narratives in
this model,conditional plans are represented by conditional
narratives. For example:
Happens(Wash, 1,2)
Happens(Hang-out, 2, 3) 6-- Holds(Windy, State(2))
Happens(Spin-dry, 2, 3) 6-- ~Holds(Windy, State(2))
Here the domainof section has been extended with the
action Spin-dry, underagent control, whichdries the laundry
without the wind. Note that there is no modelof the domain
and this narrative in whichboth Hang-outand Spin-dry take
place.
In order to showHolds(Dry,State(3)) the proof that circumscription of narratives entails Baker’s circumscription
policy must be extendedto conditional narratives, but this is
simple to do. Miller and Shanahan’sproof takes an arbitrary
model M of
CIRC[TA N; Ab, Happens; Result, SO, State] (1)
where T is a domaindescription and N is a narrative description, or a conjunction of atomic statements of the form
Happens(a, 7-1, 7-2). They show that Mmust also be
modelof Baker’s system,
C I RC[T; Ab; Result, SO]
(2)
The proof works by deriving a contradiction: since Mis a
modelof T, if it is not a modelof 2, then there is a model
AT/that is strictly preferred to M. From~/a modelthat is
strictly preferred to Maccording to 1 can be found, which
violates the original assumption.
This proof can easily be generalised to allow N to contain a finite numberof conditional occurrencesof the form
Happens(s,7-1, 7-2) ~ Holds(f, r0), where 7-0 _< 7-1, by
observing that any model Mof (1) is a model of some
unconditional narrative formed from N according to the
actions which occur in the M’s interpretation of Happens.
Nowthe original proof can be re-used to show that Mis
also a modelof (2).
Minimisationof Happens*with arbitrary conditional narratives, however,can producenon-intuitive results. This is
because a minimalmodelcould include extra actions, either
external events or those underthe agent’s control, in order to
ensure that the preconditions of the conditional occurrences
do not hold. Considera subset of the previous narrative:
Happens(Wash,1, 2)
Happens(Hang-out, 2, 3) 6-- Holds(Windy, State(2))
48
From
this
narrative
it
is
possible
to prove -~Holds(Windy,State(2))[ This is because in a
minimal model for happens*, no Wind-blows events take
place, so the precondition will not be satisfied and only the
Washaction takes place. Despite its appeal to the paranoid,
modelsin whichexternal events dependon the agent’s conditional future plans are undesirable. For this I introducethe
concept of balanced conditional narratives, whichhave the
property that wheneverthe narrative includes a conditional
occurrenceoftheformHappens(A,tl, t2) 6-- P where P is
a combinationof statementsof the formHolds(f, State( t 1)
using disjunction, conjunctionand negation, then the set of
all such statementsin the narrative that refer to State(t 1)
must form a partition of the possible situations that hold
at that state. In other words, whateverthe combinationof
fluents that hold at time t 1, there mustbe exactly one action
that will be taken. The narrative above with three occurrence statements is dearly a balancedconditional narrative.
For the remainderof this paper, all conditional narratives
are assumedto be balanced.
The restriction to balanced conditional narratives does
not reduce the expressibility of the language if we include
the action type Void in the domain, whichis not mentioned
in the domaintheory. Thenconditional occurrences of the
Void action can be addedto the narrative simply to achieve
balance.
Planning
Having introduced the language used to modelthe class of
problems for which the Weaverplanner is designed, I will
describe the planning algorithm and sketch a proof that the
planner is sound, i.e. that plans producedby the planner in
responseto an initial state and goal description will achieve
the goal fromthe initial state.
Weaveris based on Prodigy4.0, with extensions to create
conditional plans. In the following description of Prodigy
4.01 follow (Fink &Veloso1995). Prodigyis given an initial
state description SOand a goal G, a first-order sentence in
the fluents of the domain.Oncompletion,Prodigy returns a
plan, a totally-ordered sequence of actions (ai[l < i < n)
such that
Holds(G,
Result(an,Result(an- 1, . . . Result(a 1, SO)
(3)
At any point in its execution, the state of Prodigyis described by its current incomplete plan. An incompleteplan
has two parts, a head-plan and a tall-plan, depicted in Figure 1. The head-planis a totally ordered sequenceof action
types, representing the prefix of Prodigy’scurrent preferred
plan. Thetall-plan is a partially orderedset of actions, built
by a backward-chainingalgorithm.
Prodigy begins execution with both the head-plan and
tall-plan empty. At each point in its execution, Prodigy
applies an execution simulation algorithm to its head-plan,
resulting in a simulated state Current. If Holds(G,Current)
is true. Prodigy terminates and returns the head-planas its
plan. Otherwise, one of two steps is non-deterministically
chosen: (1) pick an action type from the tall-plan that has
<- -" head-plan - -
[<- gap-~
<- - tail-plan - ~"
Figure1: Representationof an incompleteplan.
no childrenandmoveit to the endof the head-plan,or (2)
adda childless actiontype a to the tail-plan as the child of
an existing step. Theaction a mustbe suchthat Agent(a)is
true. Thesesteps determineProdigy’ssearch space, along
witha specificalgorithmto determine
the set of actiontypes
considered
to addto a tall-plan in step (2).
In the absenceof externalevents,the potential completeness of the searchspaceis easy to see: with a simplealgorithmin step (2) that only addsaction types to the empty
tall-plan andconsidersall possibleactiontypes,all possible
sequencesof actions of increasinglength can be generated
in the head-plan.In practice, a back-chaining
algorithmwill
be usedso that the tall-plan is a goaltree andback-jumping
techniqueswill further reducethe search space. For more
details, see (Fink&Veloso1995).
In orderto provethe soundness
of the algorithm°i.e. that
whena plan is returnedonecan proveequation3, it is not
necessaryto knowanythingabout the back-chainingalgorithmusedto generatethe tall-plan, or the details of the
searchstrategy. All that is importantis that the execution
simulationapplied to the head-planmakescorrect predictions aboutthe state resulting froman actual executionof
the head-plan. Executionsimulation maintainsa modelof
the state by applyingthe changesprescribedby the domain
modelfor the action types in the head-plan,andno others,
to a modelof the initial state. Thereforea lemmaused
by Kartha(Kartha1995)to provethat Baker’scircumscription policy andPednault’sADL
languageare equivalentfor
a class of domainscan be adaptedto showthat the plans
producedby Prodigyare correct with respect to the action
languagedevelopedhere, for the sameclass of domainsand
in the absenceof externalevents.
Thusin the absenceof external events, if a narrativeis
expressedas
Basedon this analysis, Weaverthen re-directs Prodigyto
producea newplan, makingtwodifferent kindsof modification. First, newactionsmaybe addedto the tall-plan whose
effects are chosento stop external eventsfromhavingeffects that interferewiththe plan(rather thanonlyto achieve
the preconditionsof other actions). Second,the plannercan
be appliedto twoor morecasesof the executionsimulation,
accountingfor different occurrences
of externalevents, and
combiningto producea conditionalplan. Weaver’soverall
algorithmis summarisedin Figure 2. Thetechnical details of choosingactions and managingback-chainingfor
conditional plans are not coveredhere: they can be found
in (Blythe1994)and(Blythe1996).
Pickaninitial state
Createa plan with Prodigy
Buildanexecution
graph
Planok?
Return
plan
Modifyplanningstate
Figure 2: Summary
of Weaver’salgorithm
n
A Happens(ai,t~, t~)
i=1
whereti < t~ _< ti+l for 1 < i < n and wherethe plan
(aill < i < n) is returned by Prodigyto achieve goal
frominitial state So, thenwecan prove
Vt >_t~n [Holds(G,State(Q)]
Planning with external events
Weavermakesuse of Prodigyto produceplans ignoring external events,then determines
externaleventsrelevantto the
planandconsidersthe different outcomes
that canarise from
the plan basedon the possibleoutcomes
of external events.
49
In this sectionI sketcha proofthat the plansproduced
by
Weaver
are correct whenthe initial state is certain andthere
are no actions with nondeterministiceffects, t In order to
showthis it is necessaryto describeWeaver’s
plan evaluation
modulein moredetail.
Weaverevaluates a plan by constructing an execution
graph. The execution graph is a directed graph each of
whosenodesis labelled with a fluent anda time, suchthat
no twonodesshare the samefluent and time. Thearcs are
labelled with either the symbolpersistence or the symbol
|Weaver
is correctwithouttheseassumptions,
butthe proofis
morecomplicated.
effect. Assumlng
that the plan is representedby the narrative
N : A Happens(ai, ti_
h ti)
i=1
whereto < ta < ... < tn-1 < tn, then the possible times
in the node labels are the ti. First, the plan consisting of
just these actions is simulated by Prodigy. Thenthe graph
is constructed as follows:
1. For eachfluent g~ in the goal statement,the node(g~, t,~)
is added.
2. At each time t~ wherei > 0, for each fluent f such that
there exists a node(f, ti),
(a) if the value of f does not change betweenState(h_l)
and State(ti), a node (f, ti-1) is added and an arc
(labelled persistence) is addedbetween(f, ti-1) and
(f, ti).
(b) for
each
predicate
the form Holds(f, Result(a~, s) ) 4- intheplanning
domalnT, for each fluent pj mentionedin P. a node
(pj,ti-1) is addedto the graph, and an arc (labelled
effect is addedfrom(pj, t~-l) to (f, tl), regardless of
whetherf changesfrom State(ti_ 1) to State(ti
Conditionalplans are dealt with by creating an execution
graph for each branch of the plan. Weaveruses the execution graph to search for potential flaws in a candidate plan
by examluingeach link. ff a link [(f, ti) --+ (f, ti+l)] is
labelled persistence then Weaverlooks for external events
e such that Holds(~f, Result(e, s)) 6-- is a s tatement in
T and P is consistent with the fluent values in the nodes for
time period tp given by
tp = max{t~lt~ < ti - duration(e)}.
In order to showthat the event maycause the plan to fall,
Weaverexamines the preconditions P to see if they are
actually applicable in the simulated state, or if somechain
of events can makethem so. Whileit is myultimate aim to
showthat this technique is correct, and that Weaverworks
for nondetermlnisticactions as well, in this paper I provea
weakerresult:
Lemma
I Let the domainT, consisting of action types and
external events, contain no non-deterministic effects and a
completely specified initial state SO. Let the narrative N
be such that Weaverdiscovers no persistence-threatening
events in the execution graphfor goal G, then G is entailed
by the circumscription policy of(l).
Proof sketch:
Supposethere is a model Mof(l) with N and T defined
for this domainand narrative such that the goal statement G
is not true in State(t ,~ ). Thenthere mustbe an earliest timet k
for whichthe valueof at least onefluent in the nodes(f, t k
is different from that in the simulated narrative, t~ > to
since the initial state is fixed. Considerthe links to (f, tk)
from nodeswith time label tk- 1. If there is no persistence
link, then there is an effect link for ak correspondingto an
expression Holds(f, Result(ak, s) ) +- P in such that th e
5O
preconditions P are satisfied in State(tk_ 1) but the result
does not hold in State(tk). This contradicts that Mis a
model of T A N.
Therefore there must be a persistence link (f, tk-1)
( f , t k ) suchthat .f hasthe samevaluein State( t h_l ) in Mas
in a modelin whichno external events occur, but a different
value in State(tk). Thus Mmust include some external
event occurrenceHappens(e,t’, t) with tk-i < t < tk that
changes the value of f. But then Weaverwould have found
this event type in its examinationof the executiongraphfor
N.
¯
The purpose of the execution graph is to focus Weaver’s
attention on the external events that are relevant to the plan.
If a modelMdiffers from the modelwith no external events
only in events that do not affect any of the persistence links
in the executiongraph, then the goal will be satisfied in M.
This graphical structure allows someof the independenceof
events and of action outcomesthat mayexist in the domain
to be exploited for efficient planning.
If Weaverdoes discover persistence-threateningevents, it
is still possibleto provein someciro_~mstances
that the plan
satisfies the goal. For each such event, if the preconditions
P of the event are consistent with the state whenthe event
begins, any extra fluents from P are addedto the execution
graph at the time point for this state. All event sequences
maybe considered that take place prior to this time stage
and maymakethe preconditions of the event be satisfied.
If no sequence of events is found that makesat least on
persistence-threatening event possible, the plan is correct,
by an extension of the previous proof. Such a search may
be prohibitively expensivein general, but Weaveris able to
use the effects of the eventsto restrict its search at each time
point for events that mayalter the state for relevant fluents
only.
Note that one of the desired features mentionedin the
introduction is satisfied: since external eventsare modelled
in the sameform as the actions, a planner can reason about
their effects and causes in order to makeuse of them to
achieve somegoal. Althoughthe planner cannot directly
cause an event e to take place, it maybe possible to produce
a state s in which e wouldhave a desired effect if it did
take place. Sucha plan could not be provedto satisfy the
goal in the languageof the previous section, but there may
be a non-emptyset of modelsin which the plan satisfies
the goal. Waysto makeuse of plans that sometimessatisfy
their goals are mentionedin section. Plans using external
events can be created in Prodigy by relaxing the constraint
that Agent(a)be true for all actions a addedto the tall-plan.
I note in passing that the conceptof a goal specification
must be refined for planners in domainswith external events.
Twopossible refinements for the notion of reaching a goal
state are (1) reaching a state s such that the goal condition
is true in s and all subsequent states, or (2) reaching
state s wherethe goal condition is true, regardless of its
value in subsequent states. Weaveruses the second, weaker
definition but the argumentsabout efficiency madein this
paper hold for either definition.
An extended example
In order to talk about planningwith external events. I introduce an example planning domain concerned with reading
a book in peace in myhouse. This requires being in a different roomfrom mycat, whichdemandsattention. I allow
function terms to represent fluents and actions and represent
the domainas follows.
First the actions Getbook, Go, Close and Catmovesare
specified:
Happens(Move(Diningroom),1, 2) 6-- Holds(Cat(Ifitehen),
Figure 3 showsthe execution graph created by Weaverfor
this plan, whichit uses to search for relevant events. Since
it finds that events of type Catmovesare relevant, shownin
the annotated execution graph in figure 4. lemma1 cannot
be used to showthat the plan achieves the goals. The reason
the plan does achieve its goals is that wheneverthese events
can take place, the conditions under whichthe event changes
the value of Cat(Diningroom) are not satisfied. Weaverhas
addedarcs labelled stop-event to showwhythese conditions
do not hold.
Holds( Havebook,Result( Getbook,s
Holds(In(z), Result(Go(x), s) ) +-Holds(Cat(x), Result(Catmoves(z), s)) +-Room(x) A Open(x) A In(x)
-,Holds(Open(z), Result(Close(x), s)) 6-- Room(z)
Efficiency issues for the planner
A backward-chainingplanner, whether partial-order or otherwise, must be able to evaluate a candidate plan for some
goal efficiently. Whethergoals can be satisfied transiently
or mustbe satisfied permanently,the planner needsto effectively computethe effects of a plan across manybranches of
a tree of future states. In this section I describe three techniquesthat can improvethe efficiency of this process. Ftrst,
given a particular plan the independenceimplicit in the multiple events modelcan be used to ignore those events that
cannot affect the outcomeof the plan with respect to goal
satisfaction. This approacheffectively collapses manymodels of the plan’s effects by ignoringunimportant
differences.
Second,a planner mayattach information to outcomesof actions and events specifying whichit believes are morelikely
than others. These can be combinedaccording to laws of
probability or Dempster-Shaferevidence combination, for
example, and can often allow a planner to have a high confidence in plan success while examininga relatively small
fraction of the possible futures. Third, efficiency gains can
comefrom a careful managementof the time spent searching
for candidate plans versus the time spent evaluating them.
It can often be the case that a candidate plan can be ruled
out before even all the relevant external events have been
identified. I describe the trade-offs involved in choosing
whereto spend computational effort below.
Next the domainobjects:
Room(x) ¢:~ [z = Kitchen V z = Diningroom]
Third,the initial state:
~Holds( H avebook,So)
Holds(Cat(Kitchen),
Holds( In (Kitchen),
Fourth, information about controllable actions and action
durations:
Agent( Getbook) A Agent(Go(x)) A Agent(Close(x))
Duration( Getbook ) = 1 A Duration(Close) =
Duration( Catmoves) = 1 A Duration(Go)
and finally the goal is specified:
Goal ¢~ 3sx[noom(x) A Holds( ln(x), s)
A~Holds(Cat(x), s) A Holds(Havebook,s)]
Exploiting independence in multiple events
The number of models for a given candidate plan is exponential in the plan’s length, as the numberof external
events that maytake place increases. Whileall the actions
chosenby the planner are presumablyrelevant to the plan,
manyof the external events that can take place will have
no relevance, becausethe fluents they alter do not affect the
actions in the plan or the final goal. If these events can be
ignored, the size of the tree can be exponentially reduced.
The newset of modelsconsidered will not uniquely specify
states that arise while the plan is executed,but will capture
enoughdetails to ensurethe plan is correct.
The execution graph filters out someof the events that
cannot affect the plan, and achieves this to a limited degree. In fact Weaverexaminesevents that might threaten a
persistence link further by checkingif its preconditionsare
completelysatisfied in its starting state, allowingfor other
Similarly to the examplefrom the laundry domain, the
two-step
plan represented by the narrative Happens(Getbook, O, 1)
Happens(Move(Diningroom),1, 2) does not achieve the
goal in every model of the domain. This is because
there exists a model that includes the event occurrence
Happens(Catmoves(Dinmgroom), 1, 2). However. the
three-step
plan represented by Happens(Close(Diningroom),0, 1)
Happens( Getbook, 1, 2)
Happens(Move(Diningroom),2, 3) does solve the plan.
Weaveris able to find this plan, by modifying the plan
above, and also a conditional plan that can be proved to
satisfy the goal. This conditional plan is
Happens(Getbook, 0, 1)
51
1)
0
Close(Diningroom)
2
Getbook
Move(Diningroom) 3
!
I
I
ln(Diningroom)
nav
~book
persistence
I
I
I
I
I
Hav~book
I
I
I
I
I
iningroom)persistence:Cat(Di],ingroom) persistence=Cat(Di~ingroom)
Cat(Dilingroom)
persistence~, Cat(D
I
I
I
I
Figure3: Thefirst stage of creation of the executiongraph,before external events are found. Thegraphcontains no arcs
labelledeffect becausethere are noconditionsnecessaryfor the actionsto achievethe effects in the graph.
model.Thusnon-deterministicplanners that makeno use
of probabilities, suchas Cassandra(Pryor&Collins1993),
couldbe extendedto use the sameaction model.
eventsthat mighttake place. Provingthat this morestrict
filter for eventsis correctrequiresaa extensionto the proof
presentedhere. I havealso investigateda meansto filter out
irrelevant eventsbasedon a static analysisthat canbe made
fromthe domainbefore planningbegins(Blythe 1996).
Thepowerof this approachdependson the use of small,
relatively independent
events, whichmaytake place concurrenfly. This representationchoiceis thereforeimportantto
the modelsatisfyingthe third criterion mentioned
in the introduction,allowingrelevanteventsto be foundefficiently.
Whilethe representationdoesn’tguaranteethis, it doesallowit.
Interleaving planning and plan evaluation
Using a probabilistic model of the
non-deterministic outcomes
In modelsof real-worlddomains,someof the possibleoutcomesof actions or events mayoften be considerablymore
likely than others, and in events with manypossible outcomes,a small numberof themmayoccur a high proportion of the time. This informationcan be usedby a planner
to both evaluate a plan, to withinsomeerror margin,more
quickly, andto create plans morequicklyby focussingon
the moreimportantscenariosin a candidateplan. Thelikelihoods of individual eventor action outcomescan be combined to providea likelihood that somepropertywill hold
after executinga plan, includinggoalsatisfaction.
Knowledge
about independenceof events and actions
can again be used to find moreefficient techniquesthan
summing
branchprobabilities over all possible branchesof
the event tree. Eventhoughan event is relevant to the
plan, it maynot be relevant to everystep, andin fact the
goal structure of the planleads directly to an independence
structure for the events andactions that can be used to
computethe probability of goal successin a Bayesiannet
(Kuslamerick,Hanks,&Weld1993). Indeed, the execution
graph described in section is implementedas a Bayesian
net in Weaver.Withinthe modelof events describedhere,
the use of probabilitiesis viewedas a techniquefor efficient
planningrather than as a fundamental
propertyof the action
52
Since computationalworkis required to evaluate the effects of a plan, there is a spectrumof possible planning
approachesbasedon the proportionof time spent searching
for plans andthe proportionof time spentevaluatingplans.
At one end of the spectrum, a planner mightattempt to
create a plan by backward
chaining on actions andevents,
andignoringthe effects of externaleventsuntil a complete
planis produced.
Atthis point the relevanteventsare determined,usingthe techniquesof section, anda reducedevent
tree or Bayesiannet mightbe usedto find the probability
that the plansucceeds.(It will probablybe low.)I refer
sucha planneras a "greedyplanner".
At the other end of the spectrum,a plannermightreason aboutthe external events that mayaffect each action
whenit is chosen. Since the earlier actions chosenby a
backward
chaining plannerappearlater in the plan, these
eventstructureswouldnot be identicalto the final eventtree
corresponding
to the appropriatepoint in the plan since the
possible states are unknown.Howeversomeelements of
the tree can be determinedaheadof time, and these might
allowthe plannerto pruneuselesspartial plansmuchearlier
than the greedy-planner
would.I refer to this planneras a
"greedyevaluator".
Atruly efficient plannerfor dynamic
domains
will operate
somewhere
in the middleof this spectrum.Theexact point
dependson the number
andthe impactof the external events
in the domain°In a domainwith a large numberof events
whichfrequently turn out to be irrelevant, a moregreedy
planner will be appropriate. In a domainwhereextexnal
events are closely coupledwith actions (for exampletheir
preconditions
are satisfied bythe effects of the action) and
frequentlyinterfere withplans, a moregreedyevaluatorwill
be appropriate.
0
,
Close(Diningroom)
Getbook
2
Move(Diningroom)
I
n(Diningroom)
I
3
’,
I
I
=
ln(Diningroom):l’
persistence
Ha~~.book
sto~nt
~
Cat(Dlmygroom)persistence
=, Ha,)lebook
Open(Di~ingroom)_ persistence Open(Di~ingroom)__
Cat(Dir~ngroom)
~ ~r~
,
persistence
#
("
Catmoves(Dinin0room)
P~stence ~ Cataaini l"~°°m ’
Calmoves(Diningreom)
Cat(Dini~gr°°m)%
Calmoves(Diningroom)
Figure 4: Theexecutiongraph, after events have beenexamined.Theroundedlines showthe events whosepreconditions
wereaddedto the graph.Theselines are not part of the graphitself.
Discussion
I havedescribed a modelfor actions in dynamicdomains,
wherethe effects of eventsbeyondthe agent’sdirect control
can be modelledas well as those of its actions. Themodel
satisfies someusefulcriteria for the construction
of efficient
plannersin these domains.Themodelof external eventsallowsthemto be incorporatedin plans by backward
chaining
planners.It also allowsefficient reasoningaboutthe events
relevant to a particular plan by exploiting their independencestructure. Themodelcan be describedby a modified
versionof narrativesin the situation calculus,describedin
(Miller & Shanahan1994), Althoughit does not makefull
use of Miller and Shanahan’sdescriptions of simultaneous
eventsandis less expressivethanothersthat addresssimilar
issues, for example
(Shanahan
1995),(Baral 1995),it is
of only a few currently used in an implementedplanner.
Themodelof non-deterministicaction used here is similar
to that of the Buridanplanner(Kushmerick,
Hanks,&Weld
1993).
Thereis aneedboth for moreanalysisof modelslike these
andof appropriateplanningalgorithmsandtechniques,and
for moreempirical studies of such domalnsand planners.
Weaveris currently a greedyplannerthat iteratively improvesits plan. Thatis, it constructsa completeplanwithout consideringextemalevents, andthen evaluatesthe plan
withrespectto the externalevents.It thenattemptsto repair
its plan by either addingsteps before events to makethem
inapplicableor addingsteps after events to recoverform
their effects. In future workI will use the Weaver
architecture as a platformto investigate the tradeoff betweentime
spent planningand time spent evaluatingplans in dynamic
domains.
53
This short paper has ignoredmanyinteresting aspectsof
planningin dyn0mic,non-deterministicdomains.In particular I haveassumedcompleteobservability and I assumed
the plannercreates entire plans beforeexecutingthem.The
modelof time used here is very simpleandthere has been
no accountof temporalgoal satisfaction (whereconstraints
are placedon the timeinterval overwhichthe goalis satisfied). However
the languageserves to illustrate someissues
for planners and for action languageswhichwill remain
importantin languagesthat handlethese features.
References
Baker,A. B. 1991.Nonmonotonic
reasoningin the frameworkof the situationcalculus.Artificial Intelligence49:5.
Baral, C. 1995. Reasoning about actions: Nondeterministiceffects, constraints, andqualification. In
Proc.14thInternationalJoint Conference
on Artificial Intelligence, 2017-2024.Montrb,al, Quebec:Morgan
Kaufmann.
Blythe,J. 1994.Planningwith external events.In de Mantarns, R. L., andPoole, D., eds., Proc.TenthConference
on Uncertainty
in Artificial Intelligence, 94-101.Seatde,
WA:MorganKaufmann.
Blythe, J. 1996. Decompositionsof markovchain~ for
reasoningabout external changein planners. In Drabble,
B., ed., Proc.ThirdInternationalConference
on Artificial
Intelligence PlanningSystems. Universityof Edinburgh:
AAAIPress.
Fink, E., and Veloso,M. 1995. Formalizingthe prodigy
planningalgorithm~In Ghallab,M., andMilanl,A., eds.,
NewDirections in AI Planning, 261-272. Assissi, Italy:
IOSPress.
Kartha, G.N. 1994. Two counterexamples related to
baker’s approachto the t~ameproblem.Artificial Intelligence 61:379-391.
Kartha, G.N. 1995. A Mathematical Investigation of
ReasoningAbout Actions. Ph.D. Dissertation, University
of Texasat Austin.
Knoblock, C.A. 1993. Generating Abstraction Hierarchies: An Automated Approach to Reducing Search in
Planning. Boston: Kluwer.
Kushmerick, N.; Hanks, S.; and Weld, D. 1993. An
algorithm for probabilisticplanning. Technical Report 9306-03, Department of ComputerScience and Engineering,
University of Washington.
Miller, R., and Shanahan, M. 1994. Narratives in the
situation calculus. Journal of Logic and Computation4(5).
Pryor, L., and Collins, G. 1993. Cassandra: Planning for
contingencies. Technical Report 41, The Institute for the
Learning Sciences.
Shanahan,M. 1995. A cirolm.~criptive calculus of events.
Artificial Intelligence 75(2).
54