A Decision Theoretic Planning Assistant *

From: AAAI Technical Report SS-94-06. Compilation copyright © 1994, AAAI (www.aaai.org). All rights reserved.
A Decision
Theoretic
Planning Assistant
*
Nathaniel
G. Martin
and James F. Allen
Department of Computer Science
University
of Rochester 14627-0226
{martin, j ames}@cs,rochester,edu
Abstract
1.1
Constraints
on Decision Theory in
Planning
Though decision theory allows one to encode information about uncertainty using probability and to encode
information about multiple goals using utilities, its use
in planning systems is complicated by three factors.
Wediscuss three difficulties in applying decision theory directly to planning: the computational complexity of reasoning about probability, the difficulty of gathering knowledgeof
probability and the difficulty of specifying behavior using utilities.
Wethen describe a decision theoretic planning assistant that avoids
someof the problems. In particular, instead of
generating plans it assists a traditional planner,
thereby avoiding the complexity of generating
and evaluating plans. It reasons about probability based on its experience, so does not require precise probabilities supplied in advance
and its behavior is specified through goals.
1
1. Reasoning about probability
complex,
2. necessary probabilities
Introduction
Traditional planners operate in restricted domains in
which knowledgeis certain and a single goal is specified.
In realistic situations, planners must be able to balance
achieving several goals of lower priority against achieving a single high priority goal. Without multiple prioritized goals, a planner will produce the best plan even
if that plan is terrible. For example, if a planner knows
only about trains, it will build a plan to ship oranges to
Hawaii by rail. Also, unless it attends to circumstances
unrelated to their goals, it mayachieved its goals at an
unacceptable cost. For example, a robot railroad engineer will speed unless it knows that losing an engine is
worse than missing a schedule. Because traditional planners are driven only by their goals, they cannot decide
to forego a goal that can be achieve only at unacceptable
cost.
Decision theory computes a preference order over competing goals of differing priorities and probabilities and
so promises a significant increase in the expressiveness
of planning languages. It allows for a unified treatment
of multiple prioritized goals and uncertainty. Unfortunately, this increased expressiveness is costly.
*This material is based on worksupported by the U.S. Air
Force -- RomeLaboratory research contract no. F30602-91C-0010
175
is computationaily
are often unknownand
3. the relationship between utility and behavior is not
well understood.
The computational complexity of reasoning about
probability is high. ThoughBayes nets have limited expressiveness (they can only encode propositional knowledge); the computational complexity of inferring probabilities using them is in NP [Cooper, 1987]. Because
first-order logics of probability are more expressive, the
complexity of inferring probabilities using them is worse
than undecidable [Abadi and Haipern, 1989].
Decision theoretic planners can reduce the complexity of their task by encoding the STRIPSassumption in
independence assumptions [Wellman, 1990]. Such planners can use Bayes nets [Pearl, 1988] to provide a graphical method of representing these assumptions, allowing
one to encode reasonable independence assumptions for
probabilistic systems. Unfortunately, these techniques
only minimizethe increased cost of decision theory, they
do not eliminate it.
The precise knowledgeof probability required for decision theory also constrains its use. Probability can be
understood as a representation of humanuncertainty. In
areas like medicine where articulate experts have access
to statistical data, it can represent these experts’ uncertainty. Unfortunately, such subjective probabilities are
impractical in domains lacking articulate experts. As an
alternative to asking experts, probabilities can be estimated from observations.
For example, no one knows
the probability that customers leave things under tables
in fast food restaurants, even though robots designed to
clean such restaurants might benefit from this information. The robot cleaning the floor does have the opportunity to gather information about the likelihood of
things being left on the floor. From these observations,
it could estimate the desired probability. It can also use
information about the precision of the estimates it has
computed.
Though recent work [Haddawy and Hanks, 1990;
Wellmanand Doyle, 1992] explores the relationship between goals and utilities,
specifying a utility function
that will cause a planner to behave properly is still
poorly understood. Whenwe specify a goMto a traditional planner we knowsomething about the resulting
state of the world in which we execute a successful plan,
namely that the goal will hold. Weknow nothing about
the state of the world after the execution of a decision
theoretic planner’s plan. Wemay want our planner to
achieve manythings, but a successful decision theoretic
plan maydo nothing because the risk action is too great.
This uncertainty about the state of the world after a decision theoretic planner’s plan is executed makes it hard
to design the utilities so a planner will achieve something concrete. It also makes it hard to evaluate the
plan produced. Indeed, the only reasonable evaluation
is performing the same analysis the planner did.
Wehave built a decision theoretic planning assistant
that collects statistics and suggests the actions that are
most likely to achieve effects given assumptions about
the state of the world. In doing so, it avoids the computational complexity of generating and evaluating plans
and it requires no prior knowledgeof probability. It also
treats each event mentioned in the plan it executes as
single goals Mlowinga simple utility function.
2
A Decision
Assistant
Theoretic
numberof occurrences of a reference event type that are
also occurrences of a target event type. For example, if
an engineer has been asked to load a car one hundred
times, but has taken the appropriate action only fifty
times, the probability at the .05 confidence level would
be [.4.6], (i.e., the probability is somewherebetween .4
and .6.)
Utilities are numbersand expected utilities are calculated by multiplying the lower bound of the confidence
interval and the upper bound of the confidence interval by the utility. Therefore, like probabilities, expected
utilities are intervals.
The language provides a function best-event-type
that takes a confidence level, a set of possible actions
event types and a goal event type as parameters. It
returns the set of the possible action event types among
which it cannot choose.
It also provides a function execute that takes a plan
(i.e., a set of event types and constraints on these event
types) as an argument. This function send a requests to
the agents who are most likely to be able to cause the
events specified in the plan.
The DTPArepresents
causation by the predicate
might-cause. This predicate focuses the DTPAattention on only those actions for which the programmer
has explicitly stated might cause the goal. Because the
goal is possible after any action, the DTPAconsiders too
manyactions if it relies only of probability.
Planning
2.2
The Decision Theoretic Planning Assistant (DTPA)was
constructed using an extension of Rhet[Martin, 1993].
This planning assistant provides three services to a traditional planner:
1. it advises the planner about the likelihood of events;
2. it executes plans by choosing the actions most likely
to accomplish the planner’s goals; and,
3. it monitors the actions to insure that they have
the effects the traditional planning modulehypothesized and notifies the planner whena low probability
event causes the plan to go awry.
The planning assistant’s knowledge of probability is
based on the number of instances of event types it has
seen. The event types are arranged in a hierarchy such
that all event instances of children are event instances of
the parent. Therefore, the planning assistant has more
accurate knowledge of the probability of event types
higher in the hierarchy than about event types lower in
the hierarchy.
2.1 The Language
Rhet [Allen and Miller, 1991] is an knowledge representation that provides, amongother things, inference like
Prolog’s, a type system, and an equality reasoning system that allows function terms. To build the DTPA,
Rhet was extended to represent probability, utility and
the number of occurrences of events efficiently.
The
probability function is the confidence interval for the
176
Example
As a running example, consider a computerized planner
trying to get some oranges from Avon to Bath. There is
a rail line between Avon and Bath. In Avon, there is a
warehouse with oranges, an engines, El, and a boxcar,
B1. The warehouse is manages by manager, W1. One
plan that achieves this goal is: ask W1to load the oranges into B1 then ask E1 to couple to B1, drive to Bath
and unload the car. Wewill concentrate of the first step
of the plan, loading B1 with oranges.
This example comes from the TRAINSdomain [Allen
and Schubert, 1991], a transportation domain containing
engineers, managers of factories, warehouses and production centers. The DTPAwas tested in this domain and
this example was derived from one of those tests.
In the TRAINSdomain, the DTPAdoes not itself
perform physical actions. Instead, all of its actions are
requests to appropriate agents to perform actions. For
example, when the planning assistant chooses Wi-Load
to try to get oranges loaded into a railroad car, it will
actually send a radio message (i.e., a Lisp expression
through a TCP/IP connection) to the agent requesting
that the agent load the car. Agents send reports when
they receive a request, whenthey initiate a requested action and when they complete the requested action. Actions are programs the agents execute; they may or may
not have their intended effects. Therefore, any one of
these reports may fail to reach the DTPA.If the planning assistant does not receive a report either the action
could have failed or the report could have been lost.
2.3 Advising a Planner
All planners must make assumptions as they plan. Some
early planning systems build their assumptions into
their knowledge representation
language. For example, STRIPS[Fikes and Nilsson, 1971] cannot represent
events it does not cause. First-order languages can represent more complex interactions between actions but this
increased expressiveness is bought at the cost of making
inference difficult.
STRIPScan generate plans because
it assumes that its knowledgeis complete and accurate.
Planners built with more expressive languages cannot
makethat assumption so it is difficult to prove that the
result of the plan will be the goal. For example, suppose
the planner asks Wl to load oranges into B1, then asks
E1 to take B1 to Avon. The oranges will get to Avon
only if the oranges are in the car when E1 leaves and
remain into the car until E1 arrives at Bath. Twopossible problems arise. E1 might not wait until the oranges
are loaded and E1 might not walt until it gets to Bath
to unload the oranges. In the first case, the oranges remain in Avon; in the second, they end up on the tracks
1between Avon and Bath.
Allen et. al. [1991] describe a planner that generates
plans by proving that the planner’s goals will hold if it
executes its plan. The planner generates plans by searching through the space of plans, which, here, is a space
of assumptions about the future. The planner can make
two types of assumptions: that it can execute any action
at any time and that the effects of the actions persist
until they axe needed. If the planner can prove a contradiction using an assumption, it abandons that assumption mudtries another. Whenfinds a set of assumptions
that is consistent with the current state of the world and
that allows it to prove its goal will hold, it accepts the
assumed actions under the assumed constraints as its
plan.
The DTPAhelps the planner choose appropriate assumptions. To do so, the planner supplies the planning
assistant with a confidence level, a set of candidate event
types, an event type that it wants to cause and a temporal interval. The planning assistant return the set of
event types most likely to cause the desired event type.
The set of event types are those amongwhich the planning assistant cannot choose. This indicates that any of
these assumptions are equally valid as far as the planning
assistant knows.
Example:
In constructing its simple plan, the planner can
choose Wl and E1 to load the oranges into
B1. Unless the planner has explicit knowledge
that one agent is preferable to the other, it cannot choose. The planning assistant can use it
knowledge of previous occurrences of requests
to couple to make the choice.
Suppose we have three event types named:
1. El-Load
1Robotsare stupid. Weforgot to tell one of our simulated
engineers to wait until it got to the station to unload its
oranges and the oranges endedup on the track. It took quite
awhile to find themtoo.
177
2. Wl-Load
3. Loaded
The first is the set of event instances that include a request to E1 to load a car, the second the set of event instances that include a
request to W1to load a car and the third the
set of event instances in which and engine and
a car are coupled. The planner-and the planning assistant-know that either request might
result in the cars being coupled.
[might-causeEi-LoadLoaded]
[might-causeWi-LoadLoaded]
Which engine should the planner try to couple
to the car?
The planner asks the DTPAfor its suggestion
using best-event-type as follows
(best-event-type
.95 Loaded
El-LoadWi-Load)
Dependingon the statistics and asserted probabilities, this function call will return either
El-Load, W1-Load, or both.
Given the statistics
below, best-event-type
returns W1-Load.
[occ so-fari00 El-Load]
[occ so-far35 Ei-LoadLoaded]
[occ so-far100 Wi-Load]
[occ so-far 65 Wl-LoadLoaded]
On the other hand, if the number of occurrences
are different, calling best-event-type returns
(W1-Load El-Load).
[occ
[occ
[occ
[occ
so-far
so-far
so-far
so-far
100 El-Load]
50 El-Load Load]
100 W1-Load]
60 W1-Load Load]
In the second case the DTPAhas no useful information.
The DTPAcan answer questions about probability
quickly because the planner constrains the question. It
gives the DTPAthe target event type and a set of reference event-types. It is easy for the DTPAto make
decision such constrained circumstances.
2.3.1
Persistence
assumptions
Persistence assumptions (i.e., assumptions that properties the planner knowsto be true now will continue to
be true in to the future) are commonin planning because, even if the planner knowsit caused an event that
made a property true, it does not knowthat the property
remains true until that property is needed. For example,
assuming that oranges remain in the railroad car until
they get to their destination is a persistence assumption.
One can describe event types in which the event instances of the event types are separated by another event
instance. In the previous example, we describe an event
type that consisted of event instances that were constructed from one event type following another, we could
describe such an event type consisting of three event instances: the two we are interested in and a third that
separates the other two.
Example:
Suppose the planning assistants
knowledge
base is the same as in the previous example. In
addition, the planner has two additional event
types called Load-Couple in which loading
the oranges is followed by coupling the loaded
car to the train and Load-Event-Couple in
which some unspecified event occurs between
the loading and the coupling
The planner describes this event using the following Rhet axioms, which are Horn clauses
where the head is separated from the rest of the
clause by the symbol "<". The first states that
an instance of the Load-Couple action event
type occurs when an instance of the Load action event type occurs before an instance of a
Couple event type. The second states that an
instance of the Load-Event-Couple type occurs when an instance of the Load event type
occurs, then any event instance occurs and finally an instance of the Couple event type occurs.
[occ
[occ
[occ
[occ
[occ
[occ
[occ
[occ
100 Load-Event-Couple]
35 Load-Event-Couple Move]
100 Load-Couple]
65 Load-Couple Move]
so-far
so-far
so-far
so-far
100 Load-Event-Couple]
50 Load-Event-Couple Move]
100 Load-Couple]
60 Load-Couple Move]
Calling best-event-type,
as above, returns
the set (Load-Couple Load-Event-Couple).
The planner can reason that it has insufficient
information at the .95 confidence level to assume that an event instance separating the
loading of the oranges and the coupling of the
engine makethe action fail to achieve its goal.
The planner may choose to assume that the
necessary properties for the event to hold continue to hold through any event instance it can
describe.
[[element-of?ei4 Load-Event-Couple]
<
[element-of?eil Load]
[element-of?ei2 Yet]
[element-of?ei3 Couple]
[Time-Before[time ?eil] [time ?ei2]]
[Time-Before[time ?el2] [time ?ei3]]]
2.4 Executing a Plan
The DTPAexecutes a plan by assuming that there are
no interactions between the event types specified in the
plan. It can make this assumption because the planner
and the planning assistant work in the same domain of
discourse. The planner will have considered all known
interactions between the events; if there are other interactions, the DTPAwill also be unaware of them. The
assumption allows the planning assistant to deal with
each event in the plan as a separate goal.
The DTPAtreats the execution of a plan as a sequence
of forced choices. It first tries to makethe choice by comparing confidence intervals. It this fails, it abstracts the
choice by removingconstraints on the actions. If it cannot make a choice at an acceptable level of abstraction,
it tries to make a choice using the maximumlikelihood
estimate. If this fails it chooses at random.
of these
[might-causeLoad-CoupleMove]
[might-causeLoad-Event-Couple
Move]
The plan would then cMlbest-event-type
~llows.
so-far
so-far
so-far
so-far
Here, calling best-event-type
returns the
event type Load-Couple. The planner would
know that it is important whether or not an
event instance occurs between the load of the
oranges and the coupling of the engine.
Suppose, on the other hand, that the number
of occurrences are different.
[[element-of?ei_3 Load-Couple]<
[element-of?ei_l Load]
[element-of?ei_2Couple]
[Time-Before[time ?ei_l][time ?ei_2]]]
The planner Mso knows that either
event typesmightmove the oranges.
has no information indication that an event occurring between the loading and the coupling
changes the probability of getting the commodity moved.Therefore, as far as the planning assistant knows, the properties necessary to move
successfully the commodities persists through
the occurrence of any event.
Suppose the number of occurrences are as below.
as
(best-event-type
.95 Move
Load-Couple Load-Event-Couple)
Depending on the statistics,
this function call will return either Load-Couple,
Load-Event-Couple, or both. Here, if the confidence intervals overlap, the planner knows
that the planning assistant had no information that contradicts the persistence assumptions necessary. Because the confidence intervals are incomparable, the planning assistant
2.4.1 Abstraction Hierarchy
The abstraction hierarchy has the set of all event instances at its root. Belowthe root are more specific sets
178
of events. For example, all actions are event types that
have an agent associated with them.
The DTPAchooses events to execute from the set of
actions it can cause. This is a specialization of action
events. The actions the DTPAcan cause are further
specialized by adding constraints onto them. For example, loading a car in Avonis a specialization of loading
a car anywhere.
Defining event types is one of the most difficult tasks
in writing programs for the DTPA.The task is difficult
because describing even simple causal models requires
many event types. The task is more difficult when one
tries arrange event types into a useful hierarchy.
The task of generating event hierarchies is difficult because the programmer controls the DTPA’slearning process by defining event types. The DTPAcan only learn
probabilities about the event types, so if its programmer has not added a particular event type, the DTPA
will not waste time considering it. This attribute is vitally important because the effectiveness of the DTPA’s
choices is constrained by the number of event types it
must consider. If it must consider manyevent types, it
will need a great deal of experience before it can say with
certainty that one is better than all the others. On the
other hand, if it chooses from only a few choices, a few
examples may allow it to make a clear choice.
The DTPAuses the event type hierarchy by looking
for the most specific event type that might cause the
event type requested by the planner. If the plan specifies
an event type in the planning assistant’s event hierarchy,
it searches for the action event type most likely to cause
event type directly.
The situation is more complicated when the planning
assistant’s event hierarchy does not match the planner’s.
Here, the planning assistant chooses a random element
of the planner’s event type and searches for the most specific event type in its hierarchy that contains the random
element. In doing this, the planning assistant assumes
that the plan the planner has specified depends on any
instance of the event type specified. The random element of the event type chosen will be such an instance
and any element of an event type containing the random
element will also be such and instance. The properties
that characterize the event type specified will also characterize the event type planner eventually chooses.
Once the planner has found appropriate event types,
it assigns a constant utility to each of the event types
so selected. It then uses the DTPA’ssearch capabilities
to look for the event type that it can cause that is most
likely to cause that event type.
Example:
The following shows a part of the abstraction hierarchy for loading cars. All actions are
events and all loadings are actions. At the next
level the hierarchy splits between a manager
loading or an engineer loading. The event instances in which an manager loading are further
split into loading by the different managers..
Load
Manager-Load
Wi-Load
Engineer-Load
EI-Load
Engineer-Load
The abstraction hierarchy is further elaborated
by selectively removing preconditions execution
of the program associated with the action. The
hierarchy below shows that M1can load a car
whenthe car and and stuff to load are available,
or it can ignore one or both of the preconditions. Every execution in which the precondition holds is also an instance of an execution in
which the precondition may or may not hold.
W1-Load
W1-Load(Here (Car))
W1-Load(Here (Stuff),
W1-Load (Here (Stuff))
W1-Load(Here (Stuff),
Here (Car))
Here (Car))
This type of abstraction is problematicM. Because event types may appear more than once
in the hierarchy (e.g., W1-Load(Here(Stuff),
Here (Car)) ), there maybe more event types
the more abstract level than at the more concrete levels. If this occurs, abstracting make
choices harder rather than easier.
2.4.2
Choosing an Event
The planning assistant causes each event specified in
the plan by searching for the action event most likely to
cause the specified event and executing the action associated with that event. It first selects the set the event
types that might-cause the event specified in the plan.
This set can be further restricted by the programmer.
~omthe restricted set, it selects those events that it
can cause (i.e., requests for action), then chooses among
this set based on the conditional probability of the specified event given those event types by choosing the event
type with the highest such conditional probability.
The DTPAfirst chooses at the .95 a level and, if
a choice can be made at that level, it is returned. If
the choice function returns a set of candidates among
which it cannot choose, the planning assistant abstracts
the choice by removing some of the pre-conditions of
the choice. This tack might be successful because the
DTPA’s set of observations
may be spread among a
smaller set of event types makingthe confidence intervals
narrower.
If abstraction does not help it choose based on confidence intervals, it chooses from the most specific set
using the maximumlikelihood estimate 2. If the maximumlikelihood estimates for the necessary conditional
2The planning assistant com~putesa maximum
likelihood
estimate using the formula ~ where y is the number of
times the event type under consideration has caused the de-
Event
Action
179
probabilities are equM,choice is impossible, so the plan3.
ning assistant chooses randomly
Example:
If the the DTPA
has the following statistics, it
cannot decide at the most concrete level of abstraction. The probability intervals it generates
are [0.37 0.63] for the concrete action involving
E1 and [0.19 0.44] for the action involving W1.
computes are
[occ so-far 50
E 1-Load (Here (Stuff),Here(Car))]
[occ so-far 15
El-Load (Here (Stuff),Here(Car))Loaded]
[occ so-far 50
W1-Load (Here (Stuff),Here(Car))]
[occ so-far 25
Wl-Load(Here(Stuff),
Here(Car)) Loaded]
At the next level of abstraction the situation
becomes worse because the DTPA’s experience is spread amonga larger number of event
types. The planning assistant has exactly the
same experience with El-Load(Here(Car))
and Wi-Load(Here(Stuff)), so here even
maximumlikelihood estimate fails to give a
clear choice.
[occ so-far25 El-Load(Here(Car))]
[occ so-farlO EI-Load(Here(Car))
Loaded]
[occ so-far25 EI-Load(Here(Stuff))]
[occ so-far 5 Ei-Load(Here(Stuff))
Loaded]
[occ so-far 25 Wl-Load(Here(Car))]
[occ so-far 15 Wi-Load(Here(Car))
Loaded]
[occ so-far 25 Wi-Load(Here(Stuff))]
[occ so-far I0 Wl-Load(Here(Stuff))
Loaded]
At the third level of abstraction, we are back
to the situation described previously, so we can
make a choice. The planning assistant chooses
to have the manager load the oranges.
[occ so-fari00 El-Load]
[occ so-far35 Ei-LoadLoaded]
[occ so-fari00 Wi-Load]
[occ so-far65 Wi-LoadLoaded]
the planning assistant itself cannot load oranges, so the
planning assistant’s representation of such action event
instances has no program associated with it.
Once the planning assistant has chosen an appropriate
action event type, it generates a hypothetical randominstance of that event type and constrains this newly generated event instance so that all its parameters match
the plan. For example, it may specify that this random
instance should occur at the time the planner specified.
The DTPAoften adds further constraints to the event
type it chose when it’s choice was made at an abstract
level. Even when a concrete event type is chosen, the
DTPA’sevent types rarely specify the time the program
should start so it uses the planner’s temporal specification.
The DTPAgenerates a random instance of an action
event type, which, because it is an action event instance
the DTPAcan cans, has a program is associated with it.
Because it knows that executing the program associated
with the event type causes an instance of that event type,
it immediately updates the statistics on that event type.
It has observed its owncreation of an event instance.
In current DTPAcan only execute requests in which
it sends a message to an agent. As usual, the event
instances generated by requests are collected into event
types to gather statistics.
The planning assistant hopes that the right agent will
receive the request and act on it appropriately but it
does not knowthis. If the agent does act on the request
appropriately, other event instances occur. The planning
assistant hopes that one of the event instances that will
occur is one of the events specified in the plan. It also
hopes that some of the events that will occur are observable so that it can tell that the goal has actually been
achieved. The planning assistant maximizes its expectation that its goMswill be fulfilled by maximizingthe
probability that the request will result in the goal event
types. It monitors observable event instance to see if the
goal does occur.
Once the planning assistant has performed all the
tasks associated with one event in the plan, it performs
the same process on the next event. It continues processing the event specified in the plan until none are left.
2.5
2.4.3
Causing an Event
The DTPAalways chooses an action event type that
it can cause. Action event types are event types that
occur because some agent runs a program. Action event
instances the planning assistant can execute include the
program that makes them occur. Other actions instances
represent other agents running programs. For example,
sired event type and n is the numberof times the event type
under consideration has been observed. It uses this formula
because it gives the lowest score to the event that has failed
most often even if none has succeeded.
3It would be interesting to try choosing using the maximumlikelihood estimate at different levels of abstraction
before choosing randomly.At present it does not.
180
Monitoring
the Execution
of a Plan
The planning assistant does not make observations automatically. It can monitor events in one of two ways: it
can cause an instance of an event type called an anticipation, or it can track an event type. Whenthe planning
assistant chooses an action, it reasons about events that
are likely to result from this action. If the possible event
is observable, it anticipates that event. It then assumes
that when an anticipated event is observed, that event
is the result of its actions. Alternatively, the planning
assistant can track all occurrences of an event type by
performing appropriate actions whenever sufficient evidence for the event type occurs.
The planning assistant anticipates by generating a hypothetical event instance and waiting until it sees a set
of conditions that would lead it to assume that such an
event instance has occurred. Usually, It cannot directly
observe the event instances that represent the success of
its actions. Whenit cannot observe success directly, it
collects the set of possible consequences of success and
anticipates subset of these events that are observable.
Whenit makes an observation that matches the observations that would lead it to conclude that an anticipated
event instance has occured, it changes the hypothetical event instance it anticipated into an actual event
instance of the same type. It then removes the event
instance from the list of anticipations and updates the
statistics for the event types of which the event instance
is a member. By anticipating,
the planning assistant
can update the statistics on the event type in which it
tries an action and the event type in which the action
succeeds.
Once the planning assistant has chosen an appropriate request and has executed the action associated with
that request, it collects the set of events that the request
might-cause.
It then chooses from this set the subset
that is observable.
Example:
The Load actions mentioned above are actually
requests to the appropriate agent to load oranges into a railroad car. Therefore, the DTPA
expects three reports from the agent. A subset
of the DTPA’sknowledge about these reports
appears below. The planing assistant needs
new event types to represent the reports being
generated and being received.
Load-Rec
Load-Started
Loaded
Loaded-Rep
Loaded-Rep-Rec
[might-causeWl-LoadLoad-Rec]
[might-causeLoad-RecLoad-Started]
[might-causeLoad-StartedLoaded]
[might-causeLoadedLoaded-Rep]
[might-causeLoaded-RepLoaded-Rep-Rec]
[observableLoaded-Rep-Rec]
[Loaded-Rep-Rec(ei)
> Loaded(ei)]
Using this knowledge, the planning assistant selects all of the event types that might occur
given the an event instance of the type it just
caused W1-Load. These events are: Load-Rec,
Load-Started,
Loaded-Rep,
Loaded-Rep-Rec.
It knowsonlyLoaded-Rep-Rec
is observable,
soit onlyanticipates
an instance
of thisevent
type. The planningassistantknows that
if it receives a report stating thatloading
has been completed that an instance of a
Loaded-Rep-Recoccurs. From this it can infer
that the loading was successfully completed.
Once the planning assistant has selected a set of observable events that the agent’s action might cause, it
then chooses the event type that makes such an observation most likely. Each observable event type has an
181
anticipation action event type. The DTPAchooses the
anticipation event type that is most likely to cause the
observation the same way that it chooses the actions that
cause the events specified directly in the plan. That is,
the planning assistant chooses an appropriate anticipation and executes the program associated with the chosen anticipation event type.
2.5.1
Observations
The planning assistant monitors the effects of its actions to update its statistics so that it will be able to
make better predictions in the future. Because monitoring the execution of the plan is not specified in the
plan itself, the planning assistant must assign utility to
monitoring. The planning assistant assumes that the
planner will use its services for manysubsequent similar
activities and that the best possible answers are desired.
Unfortunately, the planning assistant cannot compute
this utility without knowledgeof the future. Instead, it
uses a fixed utility.
The planning assistant also runs a process that monitors its sensory inputs. Each time a sensory input occurs the DTPAchecks each of the event instances it is
anticipation. If the sensory inputs matches the input expected from an anticipated event instance, it updates the
statistics on all the event types that contain this event
instance and all event type implied by the occurrence of
this event instance. The all observable event instances
that were anticipated as a result of the planning assistant are then removed from the set of event instances
that are anticipated. In this way the planning assistant
avoids updating the statistics twice for an event instance
that caused more than one observable event instance.
Example:
In the previous example, the planning assistant would have chosen the action event
type Anticipate-Load-Rep-Rec.
The program associated with all instances of this event
type puts a pair consisting of the observation
expected-a report from the agent performing
the action-and an instance of the event type
Load-Rep-Rec. Using the axiom from the previous example, the planning assistant infers
that a Loaded-Rep-Rec occurred and therefore
that an instance of the Loaded event type occurred. It updates the number of successes for
the W1-Loadaction event type because an instance of that event type lead to the loading.
2.5.2
Tracking Events
The planning assistant tracks event types by noting
that a certain set of conditions is sufficient cause to assume that an instance of an event type has occurred.
Whenit sees such a set of conditions, it generates a
random instance of the event type it is tracking and
updates the statistics
on the event type. By tracking
event types, the planning assistant can collect information about event types when it does not expect a particular instance of that type.
Tracking could also be used for other purposes. For example, it could reason that a likely observation resulting
from requesting a particular inept engineer to traverse a
segment of track is that the train will wreck. The planning assistant could anticipate an accident and notify
the planner if it observes the accident. To be able to use
this kind of information, the planner using the DTPA
would have to be able to accept reports that its plan is
in trouble.
3
Conclusion
The planning assistant described above avoids the three
pitfalls of decision theoretic planners. It avoids the first
by relying on the planner to ensure that its choices are
independent. It also arranges the probabilities for which
it has information into a hierarchy allowing it to find
appropriate information quickly.
The planning assistant does not require the person
programmingit to knowthe probability of effects given
actions. Instead, it infers these probabilities from observed effects of the actions. It gathers these observations by monitoring the plans it executes.
Finally, because it assists a planner, it can assumethat
the choices it makesare also independentin terms of utility. This allows it to treat each event it tries to achieve
as a simple goal. As Haddawy and Hanks[Haddawy and
Hanks, 1990] show, such simple goals can be captured
by constant utilities.
These solutions are not a panacea. Each one introduces new problems into using decision theory for planning..
Because probabilities a inferred from statistics,
the
DTPAplaces an even higher premium on knowledge of
probability. Only when it has simple choices, the number of alternative are few and it has large experience,
can it choose correctly. It attempts to deal with the lack
of information by arranging information in a hierarchy
and using it as an abstraction hierarchy. A better use
of the statistical
information it has would improve its
performance but it will still need to focus its attention
before applying probability.
Another limitation on the planning assistant’s behavior is its reliance on the simple utility structure. The
ability to trade off likely success on a low probability goal
against likely failure on a high probability goal would
greatly improve its performance. The current DTPA
cannot make such tradeoffs because it concentrates on
one plan event at a time. It focuses its knowledgethis
way but it cannot switch tasks if one proves too difficult.
It cannot shift to another part of the plan because presumably all of the steps in the plan are necessary for 0
completion. One solution to this problem would be to
have the planning assistant give up if it cannot find a
reasonable way to cause an event specified in the plan
and have the planner generate a new one that does not
contain this event.
RJ 7220 (67987), IBM Research, Almadan Research
Center, 1989.
[Allen and Miller, 1991] James Allen and Bradford
Miller, "The RHETSystem: A Sequence of SelfGuided Tutorials," Computer Science 325, University
of Rochester, 1991.
[Allen et al., 1991] James F. Allen, Henry A. Kautz,
Richard N. Pelavin, and Josh D. Tenenberg, Reasoning About Plans, Morgan Kaufman Publishing Co.,
San Mateo, CA, 1991.
[Allen and Schubert, 1991] James
F.
Allen
and Lenhart K. Schubert, "The TRAINSProject,"
ComputerScience 91-1, University of Rochester, 1991,
TRAINSTechnical Note.
[Cooper, 1987] Gregory F. Cooper, "Probabilistic
Inference Using Belief Networks is NP-Hard," Technical Report KSL-87-27, Stanford Univeristy, Stanford,
California 94305, May 1987.
[Fikes and Nilsson, 1971] R. E. Fikes and N. J. Nilsson,
"STRIPS: A new approach to the application of theorem proving to problem solving," Artificial Intelligence, 2:189-205, 1971.
[Haddawy and Hanks, 1990] Peter Haddawy and Steve
Hanks, "Issues in Decision-Theoretic Planning: Symbolic Goals and NumericUtilities," In Proceedings of
the DARPAWorkshop on Innovative Approaches to
Planning, Scheduling and Control, pages 48-58, 1990.
[Martin, 1993] Nathaniel G. Martin, Using Statistical
Inference to Plan Under Uncertainty, PhD thesis,
University of Rochester, Computer Science Department, Rochester, NY14627, 1993.
[Pearl, 1988] Judea Pearl, Probabilistic Reasoning In
Intelligent Systems: Networks of Plausible Inference,
Morgan Kaufmann Publishers, Inc, San Mateo, CA,
1988.
[Wellman, 1990] Michael P. Wellman, "The STRIPSassumption for planning under uncertainty," In American Association of Artificial Intelligence National
Conference 1990, pages 198-203, 1990.
[Wellman and Doyle, 1992] Michael P. Wellman and
Jon Doyle, "Modular Utility Representation
for
Decision-Theoretic Planning," In Proceedings of the
First International Conference on AI Planning Systems, pages 236-242, 1992.
References
[Abadi and Halpern, 1989] M. Abadi and J. Y. Halpern,
"Decidability and expressiveness of first-order logics
of probability," Computer Science Technical Report
182