From: AAAI Technical Report WS-97-08. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved.
Handling Contingency
Selection
Using Goal Values
Martha E. Pollack
Department of ComputerScience
and Intelligent SystemsProgram
University of Pittsburgh
Pittsburgh, PA15260
pollack@cs.pitt.edu
Nilufer
Onder
Department of ComputerScience
University of Pittsburgh
Pittsburgh, PA15260
nilufer@cs.pitt.edu
Abstract
Akey question in conditional planningis: howmany,
and whichof the possible executionfailures should
be plannedfor? Onecannot, in general, plan for all
the failures that can be anticipated: there axe simply
too many.Butneither can oneignore all the possible
failures, or onewill fail to produce
sufficientlyflexible
plans. Wedescribe a planning systemthat attempts
to identify the contingenciesthat contribute the most
to a plan’s overall value. Plan generation proceeds
by extendingthe plan to include actions that will be
takenin casethe identifiedcontingencies
fail, iterating
until either a givenexpectedvaluethresholdis reached
or planningtime is exhausted.
Introduction
Classical AI plan generation systems assumestatic environments and omniscient agents, and thus ignore the
possibility that events mayoccur in unexpectedways-that contingencies might arisc during plan execution.
A problemwith classical planners is, of course, that
things do not always go "according to plan." In
contrast, universal planning systems and more recent
MDP-basedsystems make no such assumption. They
produce "plans" or "policies" that are functions from
states to actions. However,the state space can be
enormous,makingit difficult or impossible to gener1.
ate completepolicies or truly universal plans
Conditional planners take the middle road. Theyallow for conditional actions with multiple possible outcomes and for sensing actions that allow agents to
determine the current state (Blythe & Veloso 1997;
Draper, Hanks, & Weld 1994; Etzioni et al. 1992;
Goldman& Boddy 1994; Peot & Smith 1992; Pryor
& Collins 1993). A key question in conditional planning is: howmany,and whichof the possible execution
failures should be plannedfor?
1ThusDeanet al. (1995)describe an algorithmto construct an initial policyfor a restricted set of states, and
incrementallyincreasethe set of states covered.
59
In this paper, we describe Mahinur, a probabilistic
partial-order
planner that supports conditional planning with contingency selection
based on probability
of failure and goal values. We present an iterative
refinement plemning algorithm that attempts to identify
the influence each contingency would have on the outcome of plan execution, and then gives priority to the
contingencies
whose failure
would have the greatest
negative impact. The algorithm first finds a skeletal
plan to achieve the goals, and then during each iteration selects a contingency whose failure will have a
maximal disutility.
It then extends the plan to include
actions to take in case the selected contingency fails.
Iterations
proceed until the expected value of the plan
exceeds some specified threshold or planning time is
exhausted.
Planning
with Contingency
Selection
Westart with a basic idea: a’plan is composedof many
steps that produce effects to support the goals and
subgoals, but not all of the effects contribute equally
to the overall success of the plan. Of course, if plan
success is binary--plans either succeedor fail--then
in somesense all the effects contribute equally, since
the failure to achieve any one results in the failure of
the whole plan. However, as has been noted in the
literature on decision-theoretic planning, plan success
is not binary. In this paper, wefocus on the fact that
the goals that a plan is intended to achieve maybe
decomposableinto subgoals, each of which has some
associated value.
For example,consider a car in a service station and
a plan involving two goals: installing the fallen front
reflector and repairing the brakes. Thevalue of achieving the former goal maybe significantly less than the
value of achievingthe latter. Consequently,effects that
support only the formergoal (e.g., getting the reflector
fromthe supplier) contribute less to the overall success
of the plan than the effects that support only the latter
(e.g., getting the calipers). Effects that support both
1.
2.
Skeletal
plan.:
Construct
a skeletal
plan.
Plan reflnement.:
While the plan is below
the. given expected value threshold:
2a. Select the contingency with the
highest expected disutility.
2b. Extend the plan ¢o include actions
that will be taken in case
the contingency fails.
I-RJ
I-FiJ
IFIJ
PA]B¢i’= (<{-pR},O.95,(PA.-BL)>,<(-PRLO.OS.{|>.<(PR),I,(}>}
~SPEc’r: (<~-BL~.{
}, L{ } J-FL/>,<~LM
1,0.IJ } J-FLt>.<~L~{
},O.gJ}./Ft./>
Figure 2: Twoexample .actions.
Figure 1: The planning algorithm.
goals (e.g., knowingthe phone number of the supplier)
will have the greatest importance. Wewill say that an
effect e supports a goal g if there is some path through
the plan starting at e and ending at g.
The high level specification of a planning algorithm
based on this idea is shownin Fig. 1. It first builds
a skeletal plan which is one in which every subgoal is
supported minimally, i.e., by exactly one causal or observation link. In our current implementation of the
Mahinur system, ¯the skeletal plan is constructed using Buridan with the restriction that each subgoal is
supported by exactly one link.
This algorithm depends crucially on the notions of
contingencies and their expected values. Wedefine
contingencies to be subsets of outcomes of probabilistic actions. For instance, if we repair the brakes, we
may or may not succeed in having the brakes functioning properly (perhaps the new calipers are defective).
Having the brakes functioning correctly is one contingent outcome of the repair action; not having them is
another. A deterministic action is simply one with a
single (contingent) outcome.
The importance of planning for the failure of any
particular contingency can then be seen to be a factor of two things: the probability that the contingency
will fail, and the degree to which the failure will effect the overall success of the plan. To compute the
former, one needs to know both the probability that
the action generating the particular contingency will
be executed, and the conditional probability of the
contingency given the action’s occurrence. Computing the latter is more difficult. In this paper, however,
we make two strong simplifying assumptions. First,
we assume that the top-level goals for any planning
sproblem can be decomposed into utility-independent
subgoals having fixed scalar values. Second, we assume
that all failures are equally difficult to recover from at
execution time.
With these two assumptions, we can compute the
2The utility of one subgoal does not dependon the deo
gree to which other goals are satisfied (Haddawy& Hanks
1993).
6O
expected disutility of a contingency’s failure: it is the
probability of its failure, multiplied by the sum of the
values of all the top-level goals it supports. Planning
for the failure of a contingency c means constructing a
plan that does not depend on the effects produced by
c, i.e., a plan that will succeed even if c fails to occur.
Action
Representation
To formalize these notions, we build on the representation for probabilistic actions that was developed for
(Kushmerick, Hanks, & Weld 1995, p. 247). Causal
actions are those that alter the state of the world. For
instance, the PAINTaction depicted both graphically
and textually in Fig. 2 has the effects of having the
part painted (PA) and blemishes removed CBL)95%
the time when the part has not been processed CPR).
Definition 1 A causal action N is defined by a set
of causal consequences:
N:
{<tl,Pl,l,el,1
tn,Pn,l,en,1
>,...,<tl,Pl,l,el,/
>,...,<~
tn,
>,...,
Pn,m,en,m >}.
For each i,j: ti is an expression called the consequence’s trigger, and ei,j is a set of literals called the
effects. Pi,j denotes the probability that the effects
in ei,j will be produced given that the literals in ti
hold. The triggers are mutually exclusive and exhaustive. For each effect e of an action, RESULT(e, st)
is a new state that has the negated propositions in e
deleted from, and the non-negated propositions added
to st.
Conditional plans also require observation actions so
that the executing agent will knowwhich contingencies
have occurred, and hence, which actions to perform.
Observational consequences are different from causal
consequences in that they record ¯ two additional pieces
of information: the label (in //) shows which proposition’s value is being reported on, the subject (in \\)
shows what the sensor looks at. The subject is needed
when an aspect of the world is not directly observable
and the sensor instead observes another (correlated)
aspect of the world. For instance, a robot INSP~.cTing
a part might look at whether it is blemished (BL),
report whether it is flawed (/FL/) (Fig. 2). Note
( II~l~¢’l"
Figure 3: Multiple outcomes.
the label and subject are not arbitrary strings--they
are truth-functional propositions.
Definition 2 An observation action N is a set of
observational consequences:
N:
{< sbjl,tl,Pl,l,el,i,ll,1
Figure 4: Observation links.
>,
¯ ..,< 8bjl,tl,Pl,hel,l,ll,!
>,...,
< sbjn, tn, Pn,1, en,1, In,1 >,
¯ .., < sbjmtmpn,m, en,m, ln,m >}.
For each i,j: sbjl is the subject of the sensor reading,
tl is the trigger, ei,j is the set of any effects the action has, and lij is the label that shows the sensor
report, pi,j is the probability of obtaining the effects
and the sensor report. The subjects and triggers are
mutually exclusive and exhaustive. The subject and
trigger propositions cannot be identical, although the
subject and label propositions can be.
This observation action representation is similar to
C-Buridan’s (Draper, Hanks, & Weld 1994). However,
C-Buridan labels are arbitrary strings that do not relate to propositions, and it does not distinguish the
subject of the sensor reading.
Contingencies
An important issue in the generation of probabilistic
plans is the identification of equivalent consequences.
Suppose that an action A is inserted to support a
proposition p. All of the outcomes of A that produce p
as an effect can then be treated as an equivalent contingent outcome, or contingency. The contingencies of
a step are relative to the step’s intended use. If the
operator in Fig. 3 is used to support p, then {a,b} is
its single contingency, but if it is used to support q,
then {a} and {b} are alternative contingencies.
The Plan Structure
Wedefine a planning problem in the usual way, as a
set of initial conditions encoded as a start step (START:
{< O, pl,el >,...,< O, pn, en >}), a goal (GOAL:
{91,...,g,~},1,0
>}), and an action library. Weassume that each top-level goal g~ has value val(gi).
plan in our framework has two sets of steps and links
(causal and observational), along with binding and ordering constraints.
Definition 3 A partially
ordered plan is a 6tuple, < To,To, O,Lc,Lo, S >, where Tc and To are
causal and observation steps, O is a set of temporal
61
ordering constraints, Lc and Lo are sets of causal and
observation links, and S is a set of open subgoals.
Causal links record the causal connections among
actions, while observational links specify which steps
will be taken depending on the report provided by an¯
observation step. It is important to note that both
types of links emanate from contingencies rather than
individual outcomes. For instance, in Fig. 4, SHIPwill
be executed if INSPECTreports fFL/, and REJECTwill
be executed otherwise.
Definition
4 A causal link is a 5-tuple
<
Si, e,p, Ck, Sj >. Contingencyc of step Si is the link’s
producer, contingency ck of step Sj is the link’s consumer, and p is the supported proposition (Si, Sj
TcUTo).
Definition 5 An observation link is a 4-tuple
< S~, c, l, Sj >. Contingencyc is the collection of consequences of Si E To that report l. An observation link
means that Sj E TcOTowill be executed only when the
sensor report provided by Si is I.
Often the planner needs to reason about all the steps
that are subsequent to a particular observation. We
therefore define a composite step, which is a portion of
the plan graph rooted at an observation action O. Each
composite step is factored into branches, one branch for
each contingency associated with O. For example, the
composite step (O :< {-BL, 1}, {BL,0.1} >< $1,$2 >
,< {BL, 0.9} >< $3 >), indicates that after some
observation action O, steps S1 and $2 will be executed
subsequent to one observation (which always occurs
when BL is false, and occurs 10%of the time when BI.
is true), while step $3 will be executed subsequent to
the alternative observation (which occurs 90%of the
time when BLis true).
Definition 6 A composite step includes an observation action and the entire plan graph below it:
(0:
...,
< {tl,l,Pl,1},...,
{tl,l,Pl,l}
<
>< Sl,1,...,Sl,rn
><
>).
< Si4,... ,Si,m >, called branchi, includes the steps
that will be executed if contingency i of the observation
>,
step is realized. < {ti,l,pi,1},--.,
{ti,l,pij} > denote
the condition for contingency i.
Although composite steps are an important part of
our overall framework, for brevity in this paper, we
provide only base definitions, not involving composite
steps. (See (Onder & Pollack 1996) for the general
definitions).
’---;-i,
ck ~ ....
~’
Expected
’
ck ~ .....
Value
The main idea in computing the expected value of a
contingency c is to find out which top-level goal(s) will
fail if the step falls to produce c. For example, suppose
that c supports top-level goal gl in two ways: along
one path with probability 0.7, and along another with
probability 0.8. If c fails, the most that will be lost is
the support for gi with probability 0.8, and thus the
expected value of c is 0.8x val(gl). Therefore, while
propagating the values to a contingency, we take the
maximumsupport it provides for each t0p-level goal.
Definition 7 Expected value of a contingency:
Suppose that contingency c has outgoing causal links
to consumer contingencies Cl,...,cn and possibly an
outgoing observation link to step S. Assumethat ki
is the probability that c~ supports goal g, and kn+l is
the probability that S supports g. Then, the expected
value of c with respect to g is:
val(g),if c directly supports g,
evg(c, g) = max(k1,, i., kn+l) x val(g), otherwise.
For z top-level goals, the expected value of c is
z
BY(c)= Z evg(c,g,).
i=1
Wenext consider the computation of expected disutility of a contingency’s failure. To begin, we need to
compute the probability that a given state will occur
after the execution of a sequence of actions.
Definition 8 The probability
of a state after a
(possibly null) sequence of steps is executed:
P[st’lst , <>] = {1,if st’ = st; O, otherwise.),
P[st’lst,<s >]=
Z pi,jP[ti[st]P[st’[RESULT(ei,j,
<tl ,Pl,j
P[st’[st,
st)],
,ei ,j > ES
< S1,...,Sn
>] =
Figure 5: Inserting a new observation action.
where S~ is a causal step 3. The action sequence is
a total ordering consistent with the partially ordered
plan.
The expected disutility
of a contingency c is the
product of the probability that the action generating c
will be executed, the conditional probability of c given
the action’s occurrence, and the value of c.
Definition 9 Expected disutility
of the failure of a contingency:
Let S be a step and c be
the ith contingency of S. If the condition for c is
< {ti4,Pi4),...,
{$i,j,Pi,j)
>, then the expected disutility of c is defined as
P[S is executed] x
(1 - (~k~__l pi,kP[ti,al
< 31,. ,Si-1 >]) x EV(c).
The Planning
Algorithm
Wecan now see how the algorithm in Fig. 1 works.
After forming a skeletal plan, it selects a contingency c
whose failure has maximalexpected disutility. Suppose
that c has an outgoing causal link < Si, c,p, ck,Sj >
(Fig 5, left). Because the proposition supported
c is p, the planner selects an observation action, So,
that reports the status of p and inserts it ordered to
come after Si. Twocontingencies for So are formed:
cl contains all of So’s outcomes that produce the label
/p/, and c2 contains all the other outcomes. The causal
link < Si, c,p, Ck, Sj > is removed and an observation
link < So,cl,/p/,Sj
> is inserted to denote that Sj
will be executed when the sensor report is /p/. In
addition, a causal link, < Si, c, sbj, cl, So > is inserted
(Fig. 5, right). This link prevents the insertion of
action that would perturb the correlation between the
propositions coded in the subject and the label.
Conditional planning then proceeds in the CNLP
style: the goal is duplicated and labeled so that it
cannot receive support from the steps that depend on
contingency cl of the observation step.
Plan iteration stops when a given expected value
threshold is reached or planning time is exhausted.
3The probability of an expression e with respect to a
state st is definedas: P[e[st] ---- {1, if e C_st; 0, otherwise.}.
~_,P[ulst, < & >]P[st’lu, < $2,..., S,.,>].
u
62
Definition 10 The expected value of a totally
ordered plan is the sum of the product of the probability that each of the z goal propositions will be true
after Sn is executed, and the value of the goal.
(START
-~
, ~:::::-:--::::.’
....................
z
ZZP[glu]×P[ul< sl,..,, s, >]×Pal(g,).
i----1
u
where u ranges over all possible states.
Wecurrently use an arbitrary total ordering of the
plan for this computation.
Example
The following example is based on one solved by CBuridan. The goal is to process (PR) and paint (PA)
parts. Initially, 30%of the parts are flawed (FL) and
blemished (EL), the rest are in good condition. The
"correct" way of processing a part is to reject it if it
is flawed, and to ship it otherwise. Weassign 100 as
the value of processing a part, and 560 as the value of
painting. The action library contains the SHIP, REJECT,
INSPECTand PAINTactions of Draper et al. (1994).
The Mahinur planning system starts by constructing
a skeletal plan, and nondeterministically chooses the
SHIP action to process the part. Twocontingencies of
SHIP are constructed (shown in Fig. 6 in dashed boxes):
the first one produces PR, and the second does not. A
causal link that emanates from the first contingency is
established for PR.
The triggers for the first contingency are adopted
as subgoals (-PR~’FL), and both are supported by START.
For START,two different sets of contingencies are constructed: one with respect to’PR, and one with respect
to-FL. Whenthe PAINTaction is inserted to support
the PAgoal, and the’PR trigger of the first contingency
is supported by START,the skeletal plan is complete.
The skeletal plan contains three contingencies whose
disutilities of failure must be calculated (the contingency for "PR of STARThas no alternatives).
The expected disutilities for the first contingencies of START,
PAINT,
and SHIPare 0.3 x 100 = 30 (for’FL), 0.05 × 560 ---28 (for PA), and 0.3 × 100 = 30 (for PR), respectively.
Mahinur chooses to handle the first contingency with
the higher expected disutility,
namely the first contingency of START.Because-FL is the proposition supported by that contingency, an action that inspects the
value of FL is inserted before SHIP (Fig. 7). The causal
link supporting’FL is removed and an observation link
from the-FL report is inserted. The INSPECTstep looks
at EL to report about FL. Therefore, the first contingency of STARTis linked to INSPECTto ensure that an
action that alters the value of BLcannot be inserted
between STARTand INSPECT(BL and FL are correlated).
63
Figure 6: The skeletal plan.
The new branch is then completed..For the duplicated goal, a new REJECTaction is inserted to support
PR, and the existing PAINTstep is used to support PA
yielding the plan shownin Fig. 7. If the success threshold has not been met, Mahinur will plan for additional
contingencies. Note that existing conditional planners
can solve this problem, but the plan generated will be
the same regardless of the value assigned to the toplevel goals. Within our framework, the planner is sensitive to the values assigned to each goal and is able
to focus on different parts of the plan based on the
disutility of contingencies.
Mahinur was implemented using CommonLisp on a
PC running Linux. Weconducted preliminary experiments on two sets of problems. In the first set, we used
synthetic problems which yielded plans containing 1, 2,
or 3 possible sources of failure. Table 1 showsthe total
run time for solving the problem and amount of time
used for disutility calculation. As expected, runtime
increases exponentially with the size of the problem4;
disutility calculation, however,is insignificant.
For the second set of experiments, we implemented
the C-Buridan algorithm.
C-Buridan constructs
branches for alternative contingencies in a somewhat
indirect fashion. Whenit discovers that there are two
incompatible actions, say, A1 and A2, each of which
can achieve some condition C, it introduces an observation action O with two contingent outcomes, and
then "splits" the plan into two branches, associating
each outcome with one of the actions. We used the
same support functions for conditional planning (context propagation, checking context compatibility), the
same ranking function and the same base planning sys4The problems marked with a * terminated with a LISP
memoryerror.
Probability
C-Buridan Mahinur
22
27
0.75
0.875
1317
106
0.9375
*
336
0.96875
*
566
)
i-- -
i i ...............
!:i
.......
:i.........
Table 2: Numberof nodes visited
threshold is increased.
ditionai planning than that taken in some earlier systems, notably C-Buridan which does not reason about
whether contingent outcomes are worth planning for.
The PLINTH system (Goldman & Boddy 1994) and
the Weaver system (Blythe & Veloso 1997) skip certain
contexts during the construction of probabilistic plans.
Our approach differs from theirs in two aspects. First,
we are focusing on partial-ordering,
where they use a
total-order approach. Second, we consider the impact
of failure in addition to the probability of failure.
Other relevant work includes the recent decisiontheoretic planning literature,
e.g.,the PYRRHUS
system (Williamson & Hanks 1994) which prunes the
search space by using domain-specific heuristic knowledge, and the DRIPS system (Haddawy & Suwandi
1994) which uses an abstraction hierarchy. Kambhampati (1994) describes planning algorithms with multicontributor causal links. Our algorithm maintains multiple contributors at the action outcomelevel in a probabilistic setting.
Figure 7: The complete plan.
#
0
1
2
3
4
1 poss. fail.
Disu.
Run
0.01 0.01
0.14 0.01
0.42 0.01
1.67 0.02
9.11 0.03
2 poss. fail.
Run
Disu
0.03
0.01
0.56
0.01
11.28
0.02
229.22
0.03
3558.87 0.04
Table 1: Planning time and disutiliiy
for iterations on three problems.
3 poss. fail.
Disu.
0.01
0.08
3.92
0.02
335.21 0.03
4368.43 0.05
Run
calculation time
tem. To demonstrate the advantage of directly planning for contingencies, we used a domain which contains a single PAINToperator. Painting a part repeatedly increases the probability of success by 0.5 n, where
n is the number of repetitions. It is an error to paint
an already painted part, thus the plan has to contain
an observe action to check whether the part has been
painted and repeat the PAINTstep only if it has not
been successful previously. Table 2 compares the run
times for C-Buridan and Mahinur as the probability
threshold is increased. Mahinur expands significantly
less number of nodes, because the planning process
concentrates on planning for contingencies, as opposed
to improving the support for the (sub)goals.
Summary and Related
as the probability
Work
Wepresented an approach to decision-theoretic
plan
refinement that directly reasons about the value of
planning for the failure of alternative contingencies. To
do this, we adopted a different strategy towards con-
64
Acknowledgments
This work has been supported by a scholarship
from the Scientific and Technical Research Council
of Turkey, by the Air Force Office of Scientific Research (Contract F49620-92-J-0422), by RomeLaboratory and the Defense Advanced Research Projects
Agency (Contract F30602-93-C-0038), and by an NSF
YoungInvestigator’s
Award (IRI-9258392),
References
Blythe, J., and Veloso, M. 1997. Analogical replay for
efficient conditional planning. In Proceedings of the
1.#h National Conference on Artificial Intelligence.
Dean, T.; Kaelbling, L. P.; Kirman, J.; and Nicholson, A. 1995. Planning under time constraints in
stochastic domains. Artificial Intelligence 76:35-74.
Draper, D.; Hanks, S.; and Weld, D. 1994. Probabilistic planning with information gathering and contingent execution. In Proceedings of the 2nd International Conference on Artificial Intelligence Planning
Systems, 31-36.
Etzioni, O.; Hanks, S.; Weld, D.; Draper, D.; Lesh,
N.; and Williamson, M. 1992. An approach to plan-
ning with incomplete information. In Proceedings of
the Third International Conference on Principles of
Knowledge Representation and Reasoning, 115-125.
Goldman, R. P., and Boddy, M. S. 1994. Epsilon-safe
planning. In Proceedings of the lOth Conference on
Uncertainty in Artificial Intelligence, 253-261.
Haddawy,P., and Hanks, S. 1993. Utility models for
goal-directed decision-theoretic planners. Technical
Report 93-06-04, Department of Computer Science
and Engineering, University of Washington.
Haddawy, P., and Suwandi, M. 1994. Decisiontheoretic refinement planning using inheritance abstraction.
In Proceedings of the ~nd International
Conference on Artificial Intelligence Planning Systems, 266-271.
Kambhampati, S. 1994. Multi-contributor
causal
structures for planning: a formalization and evaluation. Artificial Intelligence 69:235-278.
Kushmerick, N.; Hanks, S.; and Weld, D. S. 1995. An
algorithm for probabilistic planning. Artificial Intelligence 76:239-286.
Onder, N., and Pollack, M. E. 1996. Contingency selection in plan generation. In 1996 AAAI Fall Symposium on Plan Execution: Problems and Issues, 102108.
Peot, M. A., and Smith, D. E. 1992. Conditional
nonlinear planning. In Proceedings of the 1st International Conference on Artificial Intelligence Planning
Systems, 189-197.
Pryor, L., and Collins, G. 1993. Cassandra: Planning
for contingencies. Technical Report 41, The Institute
for the Learning Sciences, Northwestern University.
Williamson,
ning with a
ings of the
Intelligence
M., and Hanks, S. 1994. Optimal plangoal-directed utility model. In Proceed2nd International Conference on Artificial
Planning Systems, 176-18i.
65