Execution Cost Optimization for Hierarchical
Planning in the Now
by
Dylan Hadfield-Menell
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Computer Science and Engineering
,CS 2TUTE2
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
'i C
June 2013
© Massachusetts Institute of Technology 2013. All rights reserved.
/A
Author
...........................
Department of Electrical
....
.
.
ngineering and Computer Science
May 23, 2013
9A
,e%
1
Certified by . . . . . . .. . . . . . . ..
...
.
%-.
.
%N . . . . . .. . ..
.... .. . .. .. .... ... ...
Leslie Pack Kaelbling
Professor
Thesis Supervisor
Certified by
Tomaisfozano-Perez
Professor
Thesis Supervisor
........
Dennis M. Freeman
Chairman, Department Committee on Graduate Theses
Accepted by...
2 9J3
Execution Cost Optimization for Hierarchical Planning in
the Now
by
Dylan Hadfield-Menell
Submitted to the Department of Electrical Engineering and Computer Science
on May 23, 2013, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Computer Science and Engineering
Abstract
For robots to effectively interact with the real world, they will need to perform complex tasks over long time horizons. This is a daunting challenge, but human ability to
routinely solve these problems leads us to believe that there is underlying structure
we can leverage to find solutions. Recent advances using hierarchical planning [19]
have been able to solve these problems by breaking a single long-horizon problem into
several short-horizon problems. While this approach is able to effectively solve real
world robotics planning problems, it makes no effort to account for the execution cost
of an abstract plan and often arrives at poor quality plans. In this thesis, we analyze
situations that lead to execution cost inefficiencies in hierarchical planners. We argue
that standard optimization techniques from flat planning or search are likely to be
ineffective in addressing these issues. We outline an algorithm, RCHPN, that improves
a hierarchical plan by considering peephole optimizations during execution. We frame
the underlying question as one of evaluating the resource needs of an abstract operator and propose a general way to approach estimating them. We introduce the
marsupial logistics domain to study the effectiveness of this approach. We present
experiments in large problem instances from marsupial logistics and observed up to
30% reduction in execution cost when compared with a standard hierarchical planner.
Thesis Supervisor: Leslie Pack Kaelbling
Title: Professor
Thesis Supervisor: Tomis Lozano-P6rez
Title: Professor
3
4
Acknowledgments
First and foremost, I would like to thank my advisors, Leslie Kaelbling and Tomis
Lozano-Perez.
Taking 6.01 with them in my freshman year spring inspired me to
choose computer science as my major and their insights, encouragement, and prodding
have been indispensable in writing this thesis. I would like to thank the members of
the LIS lab for helping me get started with research and providing a great environment
to learn how to present. Finally, I'd like to acknowledge my parents who have been
helpful and supportive throughout this process and thank my friends for providing
welcome distractions when they were needed.
5
6
Contents
1
Introduction
11
2
Background
15
3
2.1
Domain representation . . . . . . . . . . . . . . . . . . . . . . . . . .
15
2.2
Abstraction in Planning
. . . . . . . . . . . . . . . . . . . . . . . . .
17
2.3
Hierarchical Planning in the Now . . . . . . . . . . . . . . . . . . . .
18
Optimizing Hierarchical Planning
3.1
3.2
3.3
4
21
Execution cost inefficiencies in hierarchical planning . . . . . . . . . .
21
3.1.1
Incorrect Ordering of Abstract Operators . . . . . . . . . . . .
22
3.1.2
Missed Parallel Structure . . . . . . . . . . . . . . . . . . . . .
25
Optimization in the now . . . . . . . . . . . . . . . . . . . . . . . . .
28
3.2.1
Context-sensitive ordering . . . . . . . . . . . . . . . . . . . .
29
3.2.2
Leveraging pairwise ordering information . . . . . . . . . . . .
31
Ordering-preference heuristics . . . . . . . . . . . . . . . . . . . . . .
33
37
Related Work
4.1
Symbolic planning
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
4.2
Hierarchical planning . . . . . . . . . . . . . . . . . . . . . . . . . . .
39
4.3
Partial orders and planning
. . . . . . . . . . . . . . . . . . . . . . .
41
43
5 Evaluation & Experiments
5.1
Transportation domain with marsupial robots . . . . . . . . . . . . .
43
Fluent specification . . . . . . . . . . . . . . . . . . . . . . . .
46
5.1.1
7
5.1.2
. . . . . . . . . . . . . . . . . . . . . .
47
. . . . . . . . . . . . . . . . . . . . . . . . .
50
5.2
Experiments and results
5.3
Learning ordering and combining rules
5.3.1
6
Operator specification
Learning Experiments
. . . . . . . . . . . . . . . . .
53
. . . . . . . . . . . . . . . . . . . . . .
55
Conclusion and Future Directions
57
6.1
59
Avenues for future research
. . . . . . . . . . . . . . . . . . . . . . .
A PDDL for Marsupial Logistics
A .1
D om ain
A.2
Problem instance
61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
61
. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
65
A .3 FF output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
70
8
List of Figures
3-1
Caricature of situations in which incorrectly ordering abstract tasks
results in poor execution cost. . . . . . . . . . . . . . . . . . . . . . .
3-2
23
Illustration of situation where combing subgoals can reduce execution
cost. ...........
....................................
26
5-1
Visualization of the Marsupial Logistics Domain.
. . . . . . . . . . .
44
5-2
Example planning tree for marsupial logistics. . . . . . . . . . . . . .
45
5-3
Average percent decrease in plan cost vs. problem size for RCHPN vs.
HPN. .............
5-4
....................................
...
52
Plot of percent decrease in execution cost vs. percent increase in planning tim e. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
9
54
10
Chapter 1
Introduction
A longstanding goal of robotics research is the development of machines that can
accomplish complex tasks in unstructured real-world settings. This is inspired by a
desire to build robots that can perform household tasks, assist in hospitals, or take
part in search and rescue operations.
Since the 1960's there have been significant advances in many of the component
modules for these robots. The release of the Kinect RGBD sensor has enabled cheap
high-quality perception. Hardware advances embodied in the Willow Garage PR2
robot, combined with improved motion planning methods, are beginning to enable
complex and interesting manipulation
[4].
State estimation techniques and proba-
bilistic methods have given us tools to reason about uncertainty in the world [6].
Symbolic planning has made great strides through the discovery of effective domain
independent heuristics [18, 15, 26]. Forty years of processor improvements according
to Moore's Law have given us the computing power to leverage these techniques.
While these improvements have led to dramatic advances in the ability of robots
to perform primitive actions, such as picking up a plate, the ability to combine these
actions to perform complex and novel tasks, such as clearing a table, remains beyond
the current state of the art. This is not without reason, planning problems faced
by a household robot are characterized by long horizons, partial observability, and
continuous variables. Planning is PSPACE-complete in the discrete, fully-observable
case, so the difficulty in applying it to real world settings is not surprising. Inspired
11
by human ability to routinely solve these seemingly intractable problems, we believe
that there is some underlying structure or simplicity in these problems that provides
a mechanism for reducing complexity in typical problem instances.
One way to reduce complexity, for certain classes of long-horizon problems, is to
use temporal hierarchy to decompose a problem into multiple short-horizon problems.
A method that has been shown to be effective in robotic mobile-manipulation problems is the Hierarchical Planning in the Now (HPN) architecture [19]. HPN makes use
of a an aggressive hierarchical strategy. It commits to an abstract plan and interleaves planning and execution to obviate the need to reason about all of the ways
an abstract action can be executed. It has been shown be shown to be correct and
complete for a class of hierarchical system specifications that is suitable for modeling
household robotic tasks and mobile manipulation problems.
While HPN is able to find solutions to many large planning problems, it makes no
claims about the quality of the behavior it produces, even when an optimizing algorithm (e.g. A*) is used to solve the individual subproblems. The resulting behavior
can be short-sighted, with the robot achieving one subgoal, only to have to undo it,
fix something else, and then re-achieve the original subgoal.
The fundamental difficulty is that, at the upper levels of the hierarchical planning
process, the models used do not account for the cost of taking abstract actions. From
the point of view of the abstract planner, all actions will take the same amount of time
to execute. This is clearly not the case, as different subtasks will result in different
sequences of primitive operations. However, specifying this cost can be difficult: it
may be highly variable and depend on details of the situation in which the operator
is executed. Determining this cost is generally as difficult as finding a fully grounded
plan.
For example, consider delivering a package to some destination in a distributed
robotic transportation system. We can consider operations of forklifts for loading,
unloading, and arranging packages within a truck or airplane, as well as operations
that drive and fly the transportation vehicles. The cost of delivering that package
depends on the initial locations of trucks and planes, the arrangements of other pack12
ages currently in their cargo holds, and the package of interest's current location.
Furthermore, these values depend on the initial state and on abstract operations that
are executed before the operator whose cost we are evaluating. As a result, two plans
that look similar at the abstract level (i.e., the execution order for two abstract operations is swapped) may result in large differences in the quality of the behavior that
the system can generate.
We propose a strategy for tackling the problem of optimization in. hierarchical
planning that addresses plan quality by dynamically reordering and grouping the
subgoals in an abstract plan. Our approach lets us frame the cost estimation problem as one in which, given two subgoals G 1 and G 2 , we must estimate which of
the following strategies will be most efficient: planning for and executing GI first,
planning for and executing G 2 first, or planning for them jointly and interleaving
their execution. Given the ability to answer that query, we will be able to perform
"peephole optimization" of the plan at execution time, taking advantage of immediate
knowledge of the current state of the world to select the best next action to take.
We propose general principles, based on concepts of shared and constrained resource use, for the design of heuristics to answer the ordering-preference queries.
The overall utility of this approach is demonstrated in very large instances of a multirobot transportation problem that cannot be solved through classical non-hierarchical
methods. We show up to 30% improvement in plan quality over the non-optimizing
version of HPN. Furthermore, on problems with little room for optimization, we find
that our approach results in only a negligible increase in planning time.
The remainder of this thesis is organized as follows. Chapter 2 provides an introduction to the representations and formalisms we use for planning, the use of general
use of abstraction in planning, and the HPN architecture. Chapter 3 describes the
execution cost optimization problem for hierarchical planning in detail and presents
our peephole optimization solution. Chapter 4 summarizes the related work in this
area. Chapter 5 describes the experiments performed to evaluate the effectiveness of
our system using hand-coded heuristics. Chapter 6 summarizes the contributions of
this thesis and concludes with a discussion of directions for future research.
13
14
Chapter 2
Background
This chapter provides an introduction to the problem of task planning, the representations used, and the solution techniques this work relies on. The first section
describes the planning problem. It uses an example of a package delivery problem to
illustrate the aspects a planning problems we would like to model. The second section
describes the use of abstraction in planning and motivates its need. The final section
describes the Hierarchical Planning in the Now (HPN) framework and describes its
advantages and disadvantages over other planning setups.
2.1
Domain representation
We use a relatively standard symbolic representation for planning operators, derived
from STRIPS [12] but embedded in Python to allow more freedom in specifying
preconditions and effects. In the domains considered in this paper, the geometric
aspects of loading trucks are discretized; it would be possible to use real continuous
representations of object and robot poses instead [19].
A domain is characterized by:
o Entities: names of individual objects in the domain; for example, trucks, packages, planes, forklifts, etc.
o Fluents: logical relationships between entities in the world that can change
over time; for example, In(packagel, truck3).
15
"
Initial state: a conjunction of logical fluents known to be true initially.
" Goal: a conjunction of logical fluents specifying a set of desired world states.
" Operators: actions that are parameterized by objects (e.g. PickUp(package2)).
Each operator, op, is characterized by:
- preconditions: pre(op), a conjunction of fluents that describes when this
operator is applicable
-
result: res(op, s), a conjunction of fluents whose value changes as a result
of applying op in state s
- choose: a list of variables names and values they can take on
- cost: a real valued cost of applying op.
A planning problem, H, is a tuple; H = (F,0,1, G), where F is a set of fluents, 0
is set a operators, I is an initial state, and G is a goal. A solution to H is a sequence
of operators p = {opi, op 2 , .
.
, opn}. A plan is feasible if the preconditions for opi
are satisfied in the initial state and the preconditions for each other operator, opi,
in the state that results from applying opi_1: I C pre(opi) and si = res(opi, si_1) C
pre(opj+i). A plan achieves a goal if the goal formula is satisfied in the final state:
s, E G. The cost of a plan, p, is the sum of of the costs of its operators: cost(p)
Z1
=
cost(opi).
In realistic domains, specifying a truth value for every possible fluent is usually
difficult and, as is the case for continuous domains, can even be impossible. We
address this problem by performing backward search from the goal set and computing
preimagesof subgoals under operators until we reach a subgoal that contains the initial
state. This method of chaining preimages is known as goal regression. This approach
allows us to avoid representing the initial state completely in the language of fluents
and instead provide a function for each fluent that allows its truth to be tested in the
initial state.
The primary difference between formalism presented here and standard formalisms
is the choose attribute of operators. When the possible values for a variable are fully
enumerated, this is analogous to including extra parameters for an operator. However,
when planning in large or continuous state spaces, we can, and do, generate a small
16
number of potential bindings based on the current state. This is similar to the effect
applicator modules from semantic attachment approaches to planning [11].
2.2
Abstraction in Planning
The complexity class of algorithms to find solutions to planning problems using the
domain representation we have described is exponential in the length of the solution [5]. In many real-world problems, such as planning for an entire day's worth of
actions, this may be hundreds or thousands of actions and will require unacceptable
amounts of computation time. One way to mitigate this is through the use of abstract
planning.
An abstraction method is a function,
f
: (F,0, I, G) -+ (F',0', I', G') that maps
a planning problem into a simplified version that is easier to solve. In this work, we
will focus on temporal abstractions, where the goal is to map problems into abstract
versions that have shorter solutions. The central concept is to use a solution to the
abstract problem to help find a solution to the original, concrete, problem. This
process of converting an abstract plan into a concrete one is known as refinement.
An abstraction method can be applied recursively in order to define a hierarchy of
abstraction spaces [27].
There are many strategies for constructing abstractions.
We will demonstrate
optimization methods in the context of temporal abstraction hierarchies of the type
used in HPN, but the techniques are general and could be applied to other types of
hierarchies.
We construct a hierarchy of temporal abstractions by assigning a criticality in the
form of an integer to each precondition of an operator, op. If the largest criticality
in op is n, then we have n abstract operators, denoted abs(op, i), 0 < i < n. The
preconditions of abs(op, i) are the preconditions of op which have criticality k > i.
This defines a hierarchy of abstractions for a particular operator, as more abstract
versions ignore more preconditions. An abstraction level for the whole space is a
mapping a :
- {1..... , n} which specifies the abstraction level for each operator.
17
Note that this depends on the particular way an operator's variables are bound to
entities.
Place(packagel, trucki) could map to a different abstraction level than
Place(package2, truck2).
2.3
Hierarchical Planning in the Now
Most hierarchical planning methods construct an entire plan at the concrete level,
prior to execution, using the hierarchy to control the search process. The HPN method,
in contrast, performs an online interleaving of planning and execution. This allows it
to be robust to uncertainty: it avoids planning for subgoals in the far future at a fine
level of detail because it is likely that those details may change. In addition, it can
choose to delay detailed planning because the information necessary to support that
planning has not yet been acquired.
Algorithm 1 The HPN planning and execution algorithm
1: procedure HPN(s, -y, a, world)
2:
p =Plan(s, y, a)
3:
for (opi, gi) in p do
4:
if IsConcrete(opi) then
5:
world.execute(opi, s)
6:
else
7:
HPN(s, gi, NextLevel(a, opi), world)
8:
end if
9:
end for
10: end procedure
The HPN algorithm is shown in Algorithm 1. It takes as inputs the current state
of the environment, s; the goal to be achieved, -y; the current abstraction level, a; and
the world, which is generally an interface to a real or simulated robotic actuation and
perception system. Initially a is set to the most abstract version of every operator.
HPN starts by calling the regression-based Plan procedure, which returns a plan
at the specified level of abstraction,
p = ((-, go), (opi, gi), ...,(opn, gn)) ,
18
where the opi are operator instances, g, = y, gi is the preimage of gj+ 1 under opi, and
s C go. The preimages, gi, will serve as the goals for the planning problems at the
next level down in the hierarchy.
HPN executes the plan steps, starting with action opi, side-effecting s so that the
resulting state will be available when control is returned to the calling instance of
HPN. If an action is a primitive, then it is executed in the world, which causes s to be
changed; if not, HPN is called recursively, with a more concrete abstraction level for
that step. The procedure NextLevel takes a level of abstraction a and an operator
op, and returns a new level of abstraction,
3, that
is more concrete than a.
The strategy of committing to the plan at the abstract level and beginning to
execute it before finding a full concrete plan is potentially dangerous. If it is not, in
fact, possible to make a plan for a subgoal at a more concrete level of the hierarchy,
then the entire process will fail.
In order to be complete, a completely general
hierarchical planning algorithm must be capable of backtracking across abstraction
levels if planning fails on a subgoal. An alternative, which we adopt in this work, is to
require hierarchical structures that have the downward refinement property (DRP),
which requires that any abstract plan that reaches a goal has a valid refinement that
reaches that goal. Bacchus and Yang
[2]
describe several conditions under which this
assumption holds.
19
20
Chapter 3
Optimizing Hierarchical Planning
A fundamental difficulty of hierarchical planning with downward refinement is that
the costs of abstract actions are not available when planning at the high level, so that
even if completeness is guaranteed, the resulting trajectories through the space can
be very inefficient. In this chapter, we analyze common cases where execution cost
inefficiencies arise in hierarchical planning. We argue that standard cost-sensitive
search is unlikely to effectively address these concerns. We present RCHPN to cope
with these issues via online peephole optimization and characterize the domain specific
information it requires.
3.1
Execution cost inefficiencies in hierarchical planning
To illustrate execution cost issues for hierarchical planning, consider a domain with a
single robot and multiple boxes. This robot can carry up to two boxes at a time and
must transport them to goal locations, avoiding obstacles. Our examples will make
heavy use of the Place operator so we give a full specification of its operator schema.
In the following, the function 1K stands for inverse-kinematics and, given a location
for an object and a grasp, computes the corresponding robot pose to hold that object
at that location with that grasp.
21
Place(obj, loc, grasp):
res: ObjLoc(obj, loc) A -,Holding(obj,grasp)
pre:
1. Legal Loc(IK(loc, grasp)), LegalLoc(loc)
2. ClearPath(CurrentLoc(obj),loc)
3. Holding(obj,grasp)
4. RobotLoc(IK(loc, grasp))
cost: 10
Remember that, at each level, our precondition formula is the conjunction of fluents at
that level and the levels above it. For example, the precondition for Place(obj, loc, grasp)
at abstraction level 2 is
LegalLoc(IK(loc, grasp)) A LegalLoc(loc) A ClearPath(CurrentLoc(obj),loc)
At the highest level, we require that the target location and a configuration from
which to place that object are collision free. In planning at the next level, we require
that there exist a path that will enable the robot to transfer the object from its current
location to its goal. At the next level we need to be holding the object. At the most
concrete level we require that the robot also be in the correct position to place the
object. This is a reasonable precondition hierarchy and can be used to find solutions
to mobile-manipulation problems. It mirrors hierarchies used in [19]. We now present
two example problems and explore t he behavior this hierarchical specification ellicits.
The execution cost issues we observe are indicative of broad classes of inefficiencies
we observe in hierarchical planning.
3.1.1
Incorrect Ordering of Abstract Operators
A common failure mode of temporal hierarchy, with respect to execution cost, comes
from the incorrect ordering of abstract tasks. Consider a robot tasked with placing 3
boxes, call them box 1 , box 2 , and box 3 , in an enclosed space, where placing one of the
22
U2
Initial State
Goal State
Figure 3-1: Caricature of situations in which incorrectly ordering abstract tasks results in poor execution cost. The task is to transport the boxes from their initial
locations on the left to their goal locations on the right. Our issue arises because all
orderings of the operators {Place(boxi), Place(box2), Place(box3 )} are valid abstract
plans yet placing box 3 first will require us to do extra work. If the abstract plan has
the operator sequence [Place(box3 ), Place(box2 ), Place(boxi)], we will have to move
box3 away from its goal in order to accomplish each additional subgoal and replace it
after. By comparison, if our planner orders the operators correctly it avoids this work
and roughly halves the resulting execution cost. It is hard to address this issue by
specifying cost estimates for abstract actions or by directly modifying the hierarchy
to preclude this behavior.
boxes blocks entry to that space. This is a common characteristic of manipulation
problems because of the conservative estimates used to avoid collisions. An example
scenario is depicted in Fig. 3-1.
Using the precondition hierarchy described above, the realizable abstract operator
sequences are permutations of
Place(box1, goal1 , grasp1), Place(box2 , goal2 , grasp2), Place(box3 , goal3 , grasp3 ).
Where goal is the goal location of boxi and graspi is a feasible grasp for placing boxi
in goali. Our issue arises because the execution cost of the corresponding concrete
plans exhibits a large variation over this set.
At the two different extremes of execution cost are the following two plans, shown
here with only ObjLoc fluents from subgoals to conserve space.
23
((Place(boxi, goal,, grasp,), ObjLoc(box1, goal,)),
(Plan 3.1)
(Place(box 2 , goal2 , grasp2 ), ObjLoc(boxi, goal1 ) A ObjLoc(box 2 , goal2 )),
(Place(box3 , goal3 , grasp3 ), ObjLoc(box1 , goal1 ) A ObjLoc(box 2 , goal2 ),
A ObjLoc(box 3 , goal3 )))
((Place(box 3 , goal3 , grasp3 ), ObjLoc(box 3 , goal3 )),
(Plan 3.2)
(Place(box2 , goal2 , grasp2 ), ObjLoc(box 3 , goal3 ) A ObjLoc(box 2 , goal2 ))),
(Place(boxi, goal1 , grasp1 ), ObjLoc(box 2 , goal3 ) A ObjLoc(box 2 , goal2 )),
A ObjLoc(box1, goal,)))
The abstract plans we generate specify serializationsof our goal: we will accomplish
each fluent in the goal sequentially and attempt to keep achieved fluents true. [23].
Plan 3.1 will have lower execution cost than Plan 3.2 because there is no concrete
plan in which ObjLoc(box 3 , goal3 ) becomes true before the other goal fluents and
remains true until the full goal is achieved. Placing box 3 in its goal location blocks
access to the goal locations for the other boxes. As a result achieving the second
and third subgoals in Plan 3.2 will consist of moving box 3 out of the way, placing the
appropriate box, and replacing box 3 . This increases the number of place operations
required from 3 to 7 and essentially doubles the execution cost of Plan 3.2 over Plan
3.1, depending on the cost of other required operators.
We ultimately achieve the goal, but at a much higher cost than is necessary.
A context-sensitive cost would enable us to avoid this issue and select the cheaper
option. Unfortunately, although we have costs for primitive actions, it is difficult
to determine the cost for an abstract operator at planning time. This cost depends
on solving many subsequent planning problems and is not purely a function of the
operator, parameters and abstraction level. For example, Plan 3.1 and Plan 3.2 use
the same operators with the same parameters at the same level of abstraction but
have very different execution costs. We cannot simply include this value as a part of
the domain description, as we do for primitive operators.
Computing cost estimates during execution is difficult to do without incurring a
24
large increase in planning time. In our example, the cost for Place(box1, goal,, grasp,)
depends on the locations of the other boxes, the location of the robot, and the clear
paths from box 1 's location to goal,. Furthermore, it relies on these values at the time
we plan for and execute that particular operator as opposed to their values when we
search for an abstract plan. The values we need depend on the results of the other
operators in the abstract plan and are not known at planning time. Even oracle
access to these values leaves much to be desired as the computations required can be
prohibitively expensive to evaluate at each node expansion during our search. Unless
careful attention is paid, this approach can result in worse performance than simply
solving the problem without hierarchy.
An alternative approach is to alter the hierarchy such that only plans which place
boX3 last, in this problem instance, are valid. A example solution of this type might
combine the preconditions for Place from levels 1 and 2 so that our first plan must
consider the ClearPath fluent when placing.
This approach does not scale well.
Other ordering issues will require adding different preconditions to more abstract
levels. The likely outcome is the reintroduction of most, if not all, preconditions
at the most abstract level and our agent is faced with the original, long-horizon,
intractable planning problem. There is a fundamental contradiction in this strategy:
abstract planning is efficient because it ignores details; adding those details to reduce
execution cost will increase planning time and negate the advantages of the hierarchy.
3.1.2
Missed Parallel Structure
In the process of refining and executing an abstract plan, each subgoal is achieved
sequentially. This is an important feature of hierarchical planning, as it keeps the
planning horizon for subproblems short. It also prevents hierarchical system from
leveraging parallelism in subtasks to reduce execution cost. Flexibility in serializing
subgoals can enable a hierarchical planner to find shorter plans. This section will provide an example of the execution cost savings this enables and discuss the difficulties
in leveraging these savings while maintaining efficiency. We consider a simplification
of the the example from Section 3.1.1 that ignores box 3 , but is otherwise identical, to
25
Pick(oini'
gap1 )
Plae1
ox
Plcebx
goa
rs1)s
oall. graspi
I
Pikbo2 , ni2,ga
Plac(box2
oal,grsp2 ))
P ac(
'9ox
2,
rra
2)
(a) Plan with Subgoal Serialization
ObjLo(box goAl)
Pick(box,,initn grasp1)
\
Piack(box ..goal, grasp2)
Obj)Loc(boi
,goa '
P c(box2.go it),grap)
Place(bOX2,goal2,grasp2)
(b) Plan without Subgoal Serialization
Figure 3-2: Illustration of situation where combing subgoals can reduce execution cost.
The roots of two different planning trees to accomplish the goal ObjLoc(box1, goal,) A
ObjLoc(boX2 , goal2 ). 3-2(a) represents the types of plans that HPN can find for this
goal. Because we serialize every subgoal in the abstract plan we will always plan
for placing the two boxes independently and will not be able to take advantage of
similar structure in the plans. 3-2(b) illustrates a solution which does not serialize
these subgoals. The resulting subproblem has a short horizon so we can still solve
it efficiently. Combining subgoals would enable an agent to avoid traveling extra
distance while incurring a small computational cost, in this scenario. Introducing
preconditions such the first solution found includes picking as well as placing will
increase the planning horizon to a point where only simple problems can be solved.
Augmenting the original planning problem with joint operators to enable this behavior
will increase the branching factor and detrimentally affect performance. Note that
these plans omit the level of planning that introduces the ClearPathfluent to improve
clarity.
26
illustrate these concerns.
Even for this simple scenario, our hierarchical planner will perform substantially
worse than optimal. There are two realizable abstract plans, one that places box 1,
then box 2 and one that places box 2 , then box 1 . Suppose we get the first option as our
abstract plan. In executing this plan the robot will travel to box 1 's initial location,
pick up box 1 , travel to goal,, place box 1 at goal,, then repeat this process for box 2 Recall that the robot in this example is capable of holding two boxes at once and the
initial locations of the boxes are close to each other. There exists a plan with less
execution cost that achieves this goal by picking up both boxes before transporting
both of them to their corresponding goal locations. While finding the optimal plan is
likely impossible while preserving efficiency, we should be able to take advantage of
this parallel structure in subproblems to find better plans. Fig. 3-2 depicts example
planning trees for this problem.
In order to take advantage of this parallelism we need to an abstract plan that
considers picking and placing for both objects at the same time: this lets us to
interleave the Pick and Place operators. Perhaps the simplest way to enable this is to
include preconditions that relate to this structure at the highest level. In this example,
that amounts to including the Holding precondition in the most abstract space so
that plans will include Pick operators. This creates similar issues to modifying our
hierarchy to deal with reordering. We collapse the hierarchy and make all but the
simplest problems intractable; imagine planning for picking and placing 10 objects as
a single planing problem.
Another option is to augment the planning problem with 'joint' operators: operators which represent the application of several operators at the same time. These
would enable us to plan jointly for these operators more concrete levels of the hierarchy. On the surface, this is a reasonable approach; it avoids increasing the planning
horizon. However, this solution runs into two issues. The first is that we are increasing
the branching factor of the planning problem exponentially. If we want to consider
doing
j
of n operators at the same time, we need to add O(ni) joint operators. This
will certainly have a negative impact on planning time.
27
Furthermore, simply adding these operators is not enough; we need to enable our
planner to intelligently select when it is appropriate to use a joint operator instead
of the corresponding sequential operators. The standard way to do this is to include
cost estimates for our new operators.
We have already argued that finding cost
estimates for abstract operators in planning is difficult; the problem compounds with
joint operators, as the corresponding subproblems are more complicated. In order
to leverage shared structure in subtasks to find cost savings without reducing our
capacity to solve hard problems we will need a different approach.
3.2
Optimization in the now
In this section, we outline the central contribution of this thesis: a novel refinement
strategy that enables execution cost optimization but retains the efficiency of aggressive hierarchical planning. It offers the opportunity to arrange or combine subgoals
such that planning for and executing them sequentially will result in shorter plans
without significantly increasing planning time.
The ordering problems discussed in the previous section arise from the fact that
there are many orders of an abstract plan that are equivalent with respect to the
abstract preconditions but not with respect to the ensuing execution cost. We argued
that an abstract planner is ill-equiped to select the correct ordering without incurring
unacceptable computational cost. Yet, the low cost options make use of the same operators found in each abstract plan. With this in mind, we can draw inspiration from
motion planning, where cost-sensitive planning frequently proceeds by first finding a
solution and then improving on that solution in a latter process [17].
We propose to use information from the current state of the world at plan execution time to perform peephole optimization. We find an initial plan using the same
abstract planning process as before. We modify the refinement process to heuristically select the next subgoal to achieve from the unachieved subgoals in our plan. We
restrict the subgoals considered to be subgoals for which the corresponding preconditions are true in the world. We also ensure that there is a valid plan, with respect
28
to the abstract preconditions, that executes this subgoal followed by some ordering
of the remaining unachieved subgoals.
Because we select the next subgoals from a small set of options at execution time,
this process can take advantage of more complex properties and details of the domain.
It can also perform more expensive computation because we do not need to do cost
estimation at each node expansion along our search. This will enable our planner to
consider cost optimization, with respect to the reordering of operators in our abstract
plan, without dramatically increasing computation time.
Similar analysis applies to the problem of deciding when to jointly achieve subgoals. An abstract planner does not generally have enough information to determine
whether groups of subtasks should be addressed jointly at a lower level of abstraction,
but good solutions can usually be found by considering combinations of subgoals in
the original plan. Treating these options in a post-processing step fits naturally into
our refinement strategy. In addition to reordering an abstract plan, our refinement
process considers achieving some of the subgoals jointly. We ensure correctness in
combining operators A and B by finding a valid plan in which A and B are planned
for sequentially. Then we use res(A, res(B, s)) as the goal for our next subproblem, where s is the current state. This increases the computational difficulty of the
subsequent planning problem in the hope that it will generate a better quality plan.
Algorithm 2 shows an extension of HPN, called RCHPN, that implements these mod-
ifications. RCHPN relies on two functions to make ordering or combining decisions:
SelectGap, which heuristically selects the next subtask to plan for, and SelectParallelOps, which will combine subgoals that should be considered jointly to expose
parallel structure. Both rely on a context-sensitive function, arrange, to find situations in which reordering or combining would be beneficial. Before describing these
procedures, we describe the ordering preference information they rely on.
3.2.1
RCHPN
Context-sensitive ordering
depends on the specification of a context-sensitive comparison function, arrange(gi,g2 , S),
which takes as arguments two subgoals and an initial state. It returns 0 if g, should
29
Algorithm 2 Reordering and combining hierarchical planning and execution algorithm
procedure RCHPN(s, -y, a, world)
2:
p =Plan(s, 1y, a)
while p # 0 do
4:
(op, g) = Select Gap(s, p, a)
6:
if IsConcrete(op) then
world.execute(op, s)
else
8:
10:
12:
14:
16:
sg = SelectParallelOps(s, op, g, p, a)
curindex = p.index(g)
sg-index = p.index(sg)
for (op', g') in p[curindex : sgindex] do
a = NextLevel(a, op')
end for
RCHPN(s, sg, NextLevel(a, op), world)
end if
end while
end procedure
be serialized before 92, returns 1 if
92
should be serialized before gi, and returns 2 if
they should be combined into a single subgoal and solved jointly. These correspond
to the subgoal sequences (gi; gi A 92), (92; g, 9A
2 ),
and (g A9 2 ) accordingly. It might
seem that in order to be effective, arrange will have to perform some sort of cost
estimation for an abstract task; so, why do we believe that it will be easier to specify
than a traditional cost function?
1. Evaluation takes place "in the now": the algorithm knows the current world
state and does not need to consider the many ways preconditions for an operator
could have been realized.
2. The task is simply to determine an ordering, not to estimate the actual costs,
which would generally be much more difficult to do accurately.
3. We only have to compute ordering preferences for the operators that actually
appear in the plan, rather than computing a cost for every operator that is
considered during the search.
The first property arises because our refinement procedure interleaves optimization
30
with planning. Thinking back to our boxes example, we argued that estimating the
abstract cost for placing a box was hard, in part, because we did not know the initial
location of the robot or the boxes when we do our cost evaluation. By considering
our options in a post-processing step, we can interleave re-ordering with planning and
give arrange direct access to these values.
The second property stems from the fact that we know what the alternative options are. In evaluating costs during a general search we do not know what other
operators we will need to compare to. Thus, we need a common criterion to compare this choice with any alternatives. This forces us to find an actual cost estimate
because specifying pairwise orders with all other options is infeasible. In contrast,
arrange knows what the differcnt alternatives are; we do not need the results of this
computation to apply beyond the comparison of these two subgoals.
Our final property is due to the restriction of our final abstract plan to plans that
contain operators from the initial solution. This enables us to do more complex and
costly computation for each call to arrangewithout unacceptably increasing the total
amount of computation. For example, determining if placing a box at its goal will
block all paths to place another is too costly to do at each node expansion. However,
performing that computation once for placements we are committed to performing is
computationally reasonable.
Of course, the risk remains that the particular plan chosen has no room for improvement, but there is an alternative plan with different subgoals that is much better. We know of no way to do cost optimization for large instances of such problems
effectively.
3.2.2
Leveraging pairwise ordering information
Assuming the existence of the arrange function, we now describe the peephole optimizations in RCHPN.
SelectGap, shown in Algorithm 3, takes a greedy approach to plan reordering. To
select the plan step to execute, it finds the preimage, gi, with the highest index i such
that s C gi. This is the plan step that is closest to the end of the plan such that,
31
were we to begin plan execution from that step, a state satisfying the goal condition
would hold. This strategy is similar to idea of executing the "highest true kernel"
from the STRIPS system [12]. SelectGap iterates through the rest of the plan calling
arrange(gi,gj, s) for
j
ranging from i + 1 to n. If it returns 1, then we attempt to
move gj to be directly before gi. If the resulting plan is valid with respect to the
abstract operators' preconditions, we accept the move and repeat this process with
the gj as the new "first" subgoal. Otherwise, we undo the change and continue as
before. This process terminates when we have checked all the way through the plan
without moving any operators. As long as arrangedoes not have cycles, the process
will terminate. In the worst case, we have to do 0(n 2 ) checks but this is negligible
when compared to the complexity of planning, which is exponential in n.
Algorithm 3 Reordering an Abstract Plan
procedure SELECTGAP(S, p, a)
2:
next-subgoal =HighestApplicableSubgoal(p, s)
next-index = p.index(next-subgoal)
4:
highest-checked = next-index
6:
while highest-checked < len(p) do
for (op, sg) in p[next-index :] do
if arrange(nextsubgoal,sg, s) = 1 then
8:
new-p = p.move((op, sg), nextindex)
if IsValid(new-p) then
10:
p = newp
12:
next-subgoal = sg
highest-checked = nextindex
14:
break
end if
end if
16:
highestchecked = p.index(sg)
end for
18:
end while
end procedure return sg
SelectParallelOps, shown in Algorithm 4, proceeds in a similar fashion. It maintains the next subgoal we will plan for, sg, which is initialized to be the result of
SelectGap.
It iterates through the rest of the plan, calling arrange(sg,gi, s) for i
ranging from the index of sg to n. If arrange returns 2, then we attempt to move
32
gi to be directly after sg in the plan. If the result is a valid plan we combine the gi
with sg and set sg to be the result. To ensure that planning problems considered at
the next level are not so large that we cannot solve them, we terminate this process
when we have checked through all subgoals or reach a complexity limit on sg. This
represents the trade-off between the complexity of planning and quality of the solutions we can hope to achieve. At the moment we do this by placing a cap on the
number of of tasks we can jointly plan and determined this value empirically for our
experiments. Exploring better ways to make this trade-off is an interesting avenue
for further research.
Algorithm 4 Combining Subgoals of an Abstract Plan
procedure SELECTPARALLELOPS(S, op, sg, p, a')
2:
next-sg = sg
next-index = p.index(next-sg) + 1
for (op, sg) in p[nextindex :] do
if arrange(next-sg, sg, s) = 2 then
newp = p.move((op, sg), nextindex)
4:
6:
if IsValid(new-p) then
8:
p = newp
nextsg = CombineGoals(nextsg,sg)
nextindex = next index + 1
10:
if MaxComplexity(next-sg) then
return next-sg
12:
end if
end if
14:
end if
end for
16:
end procedure return next-sg
3.3
Ordering-preference heuristics
Now we consider some principles that can guide the specification of the arrange
function for particular domains. We can frame this task in terms of shared resource
consumption. Recall the robot that must put several boxes in a room. In this example,
we can treat free space as the important resource. Placing each box uses the space
in the entry to that enclosed region. Our difficulty arises because placing box 3 does
33
not free up the resource when this task is complete, but rather consumes the resource
in perpetuity. The only way to enable subsequent subtasks to use this resource is
to undo that subgoal, which forces us to re-achieve it later. Combining tasks can be
viewed in a similar light: moving each box needs to use the robot resource. In this
case, the resource in question is shareable so combining these subgoals allows us to
take advantage of parallel structure in the sub-plans.
Generalizing from these examples, we can divide the resource use associated with
achieving a subgoal into three categories: shareable, contained, and continual. A
resource's use is shareablewith respect to a goal if, while it is being used to accomplish
that goal, it does not become unavailable. A resource's use is containedwith respect to
a goal if it becomes unavailable during the course of achieving that goal but becomes
available again after the goal has been achieved. Finally, a resource's use is continual
with respect to a goal if, so long as that goal is true, that resource will be unavailable.
This reduces arrange to two steps: computing an estimate of the resources consumed by achieving each subgoal and classifying the overlapping resource use as
shareable, contained, or continual. After this classification is done, determining the
correct output from arrangeis simple. If two subgoals need the same resource and it
is shareable, then they should be combined with the hope that this shared resource
will result in parallel structure in the plans and the opportunity for cost savings. If
a common resource's use is continual for one goal and contained for the other, the
one with the contained use should be ordered first. If tasks have contained use of
all shared resources, then any serialization is acceptable. Note that we should never
arrive at a situation where two subgoals require continual use of the same resource as
this implies that there is no refinement of this plan and that our hierarchy does not
possess the DRP.
There are several strategies for estimating the resources consumed by an operator
at abstraction level i. The first is simply to use the resources required by the associated concrete operator. We will refer to this as the 0 "h order estimate. In many
situations this may be enough. If we wish to make a more informed estimate, we
can include the resources required by the hidden preconditions. We can compute a
34
preimage of the preconditions for level i - 1 and keep track of the operators used in
that computation. We add to our additional resource estimate by including
0
th order
estimates of resources consumed by those operators. We will consider this a 1 st order
estimate.
We can extend this by going further back at level i - 1 and by considering preconditions at level i - 2. Thus, a 2 nd order estimate would use a 0 th order estimate for a
preimage of preconditions at level i - 2 and a 1 st and 0 th order estimate for operators
in the first and second preimages, respectively, of preconditions at level i - 1. Note
that in calculating these estimates we are doing a limited search for a plan. Trying
to compute increasingly complex preimages eventually boils down to solving the full
planning problem and will negate any computational savings from hierarchy. We
found that
2 nd
order estimates were sufficient for our purposes.
35
36
Chapter 4
Related Work
This chapter provides a brief overview of the related work from the planning literature.
4.1
Symbolic planning
The notion of serializable subgoals is due to Korf [23]. He analyzed planning as a
knowledge-guided search problem and explored the utility of subgoals in the planning
process. He defines serializable subgoals: subgoals that can be planned for sequentially without undoing previous subgoals. In Korf's terminology our refinement procedure is trying to find orders of operators such the we can serialize the corresponding
subgoals. Barrett and Weld explore this issue further and introduce the concept of
laboriously serializable subgoals: subgoals for which a non-trivial number of orderings
do not serialize [3]. This characterization shares many similarities with the situations
in which abstract planning can be inefficient.
Beginning in 2001, the discovery of high-quality domain-independent heuristics
for symbolic planning has led to rapid increase in the abilities of symbolic planners to
solve classic benchmark problems. Hoffman's Fast-Forward system, and corresponding heuristic, was the first heuristic planner to show reasonable performance across a
wide number of problems [18]. FF makes use of a relaxed planning graph, a planning
problem in which no fluents become false as the result of an operator and multiple
operators can be applied in parallel, to estimate distances to a goal for forward search.
37
The forward search algorithm used is a greedy hill-climbing algorithm. To maintain
completeness they resort to a more standard backtracking search if that fails.
Fast-Downward improved on the state of heuristic search by doing small searches
in an abstract space to create heuristic estimates [15]. It uses a causal graph heuristic
to automatically generate abstractions which are not accurate enough for direct hierarchical search, but provide estimates which serve as good heuristics. This system
shares the use of abstraction to reduce search with HPN approaches, but we use
abstraction for search control rather than using it to get heuristic estimates. It would
be interesting to see if the Fast-Downward heuristic, using the already existing hierarchy, could be used to speed up planning at a particular level of abstraction within
HPN.
The most recent advance in symbolic planning comes in the form of the LAMA
planner and is due to Richter and Westphal [26]. LAMA makes use of ordered landmarks, formulas which must become true at some point along any solution to a planning problem, to define a pseudo-heuristic. The pseudo-heurisitc counts the number
of landmarks that have not been achieved on this plan and is not a true heuristic because it depends on the search path as well as the state being evaluated. LAMA also
integrates cost-optimization into their search. An initial solution is found through
greedy hill climbing. Then, a series of weighted A* searches, which find solutions
with increasing optimality guarantees, are run until a set time limit expires. LAMA
introduces multi-queue heuristic search to use heuristic information from multiple
heuristic functions to guide search. While these algorithms have proved quite effective on IPC (International Planning Competition) benchmarks, they do not scale up
to the long-horizon problems faced by a robotic agent.
Dornhege et al. attempt to extend classical planning to more complicated domains by using external modules called semantic attachments [11]. These semantic
attachments allow designers to specify arbitrary code to test whether a fluent is true
in a world state or compute the effects of an action. The effect applicator modules
are analogous to our choose functions in operators, except that ours is used for regression planning and theirs for forward chaining. Semantic attachments enable the
38
planners they consider to avoid fully enumerating complicated effects or fluents for
each possible world state. They use these modules to consider a variant of the logistics domain that, similar to the domain used in our experiments, accounts for the
geometry of packages. This domain differs from ours in that they do not consider
the task of actually placing objects in vehicles and instead only check that there is a
feasible packing for the objects being considered.
4.2
Hierarchical planning
Precondition-dropping abstractions in hierarchical planning were first studied by Sacerdoti in his system, ABSTRIPS [27]. Preconditions with lower citicalities were considered details and dropped from initial planning problems. Sacerdoti's criterion for
determining which preconditions were details was the ability to find a short plan to
achieve them without violating preconditions from higher levels. ABSTRIPS differs
from RCHPN in that it finds a full plan at each level before refining and does not
consider reordering or combining subgoals.
Knoblock provided a more formal definition and analysis of refinement as well as
a system, ALPINE, to automatically derive hierarchies [22]. His definition requires
that ordering relations between operators must be preserved when refining a plan. He
defines an ordered monotonic (OM) refinement as one where new operators do not
change any fluents used in the abstract plan. He argues that hierarchies for which
all refinements are OM will be effective in problem solving and describes a system
which can find OM hierarchies. The drawback of this approach is that, while OM
hierarchies are effective, this property can be overly restrictive and many problems
may not admit an OM hierarchy.
Bacchus and Yang modeled a hierarchical planner as a branching probabilistic
process and analyzed the expected amount of computation as the probability that a
particular subproblem could be refined [2]. Their model predicts that abstract planning should be efficient if all subproblems can be refined (i.e. no backtracking across
levels of the hierarchy) or if the probability of refinement is very small, as bad plans
39
are quickly ruled out. They define hierarchies with the downward refinement property as hierarchies where every abstract plan can be refined. They define conditions
under which this can be achieved and prevent a systems which uses these conditions
to improves on APLINE hierarchies.
Nau et al. use a hierarchical task network (HTN) to hierarchically solve planning
problems [25].
In their setting, the goals are tasks, which have preconditions and
effects, but also specify the possible refinements. The components of these refinements
can themselves be abstract tasks. Nau et al. attempt to deal with optimality in several
ways. The most prevalent of these does a branch and bound search through the space
of task refinements. However, the costs used must be fully specified beforehand, which
requires a large amount of work on the part of the system designer. They attempt to
interleave abstract tasks, but do so in a blind, non-deterministic way.
Marthi et al. suggest a view of abstract actions centered around upper and lower
bounds on reachable sets of states [25] . They use angelic nondeterminism in addition
to upper and lower bounds on costs to find optimal plans. They do this both in
offline and realtime settings, providing hierarchical versions of A* and LRTA*. These
searches amount to heuristic search through the possible refinements of a high level
action. Their most effective algorithm, Hierarchical Satisficing Search, is similar to
the approach taken in HPN in that it commits to the best high level action which
can provably reach the goal within a cost bound. This is beneficial in that execution
will only begin if there is a proof that the task can be accomplished within the
bound. However, if the abstract level is ambiguous between several plans (i.e. different
orderings of the same HLAs), then they may miss an opportunity to reduce cost.
Factored planning generalizes hierarchical planning to decompose a planning problem into several factors. Factors are solved on their own, treating the problems solvable by other factors as abstract actions.
A solution for a problem is frequently
computed in a bottom-up manner, with factors computing preconditions and effects
that they publicize to other factors [1]. These planners exhibit local optimality in
that plans within a factor are optimal with respect to that factor but do not make
any attempt at global optimality. Furthermore, they have not been shown to scale
40
up to problems of the size necessary for a real robotics problem.
Srivastava and Kambhampati [29] decompose planning into causal reasoning and
resource scheduling. They plan initially in an abstract space where similar entities are
treated as the same and are scheduled in a later phase. This decomposition enables
them to scale standard planning domains and take advantage of similar objects in
a domain (e.g. two different robot hands) without increasing planning time. These
approaches are similar to ours in that our heuristics use a similar decomposition.
However, our system uses the decomposition to do online execution cost optimization
while their system uses this knowledge in order to scale up or optimize a classical
planner.
4.3
Partial orders and planning
The use of partial orders in planning is an old idea and dates back to Sacerdoti's
NOAH system [28]. Most uses of partial-order planning can be viewed as alternative,
non-hierarchical, planning algorithms where the goal is simply to find another plan.
The partial order planner that shares the most with our solution is the final version of
Prodigy [31]. Prodigy searches by maintaining a totally ordered 'head' and partially
ordered 'tail' for a plan. The state that results from executing the head of the plan is
the 'current' state. Planning proceeds by adding an operator to the tail or by adding
an operator from the tail, whose preconditions are satisfied in the current state, to the
head plan. This is similar to interleaving planning with execution because, although
the it is simulated, the current state can be used to guide planning for the tail.
However, Prodigy solves problems in a single planning step and falls prey to the same
types of issues as other non-hierarchical planners.
Bdkstr6m studied the problem of de-ordering or re-ordering a plan [8]. He considers modifying plans to find solutions with fewer constraints or to reduce parallel
execution time. He proposes several definitions for an optimal re-ordering and shows
that only the simplest of these is tractable to achieve. However, he finds a class of
plans for which determining an optimal de-ordering is efficient. Our work implicitly
41
relies on the de-ordered plan, but does not explicitly compute it. We do re-ordering,
but our goal is to minimize the execution cost of a hierarchical planner, which is not
a case Bdkstr6m considers.
The closest use of partial orders in planning to RCHPN is due to Hoffman, Porteous, and Sebastia [16]. They use partial orders between landmarks to guide search.
Hoffman et al. introduce landmarks and provides techniques for automatically finding
landmarks for a planning problem using a planning graph. They define several types
of ordering relations between landmarks, one of which, reasonable orders, deals with
landmarks that have to be undone and redone if achieved out of order. Subgoals in an
abstract plan become landmarks when we consider search at the next level. One way
to view the ordering issues we see in hierarchical planning is as violations of reasonable orders. Hoffman et al. treat the landmarks as a partially ordered abstract plan
and greedily plan for the closest unachieved landmark. This form of search control is
similar to ours, but we use heuristics to select a good subgoal to plan for next.
42
Chapter 5
Evaluation & Experiments
This chapter defines the marsupial logistics domain and lays out a candidate hierarchical decomposition of this domain. It presents experiments to evaluate the usefulness
of RCHPN in marsupial logistics. It concludes with a discussion of the issues associated with learning ordering rules for marsupial logistics and presents some results for
learning in a simple context.
5.1
Transportation domain with marsupial robots
We tested the RCHPN approach in a complex transportation domain, which is an
extension of a classical abstract logistics domain [32]. The goal is to transport several
packages to destination locations. The locations are grouped into cities: trucks can
move among locations within a city. Some locations in a city are airports: planes
can move among airports. Each truck has a geometrically constrained cargo area and
carries a "marsupial" robot. This robot can be thought of as an idealized forklift that
can move packages within the cargo area and onto and off of the truck. A plan for
transporting a package to a goal location will typically consist of transporting it (in a
truck) to an airport, flying it to the correct city, and then transporting it to the goal
location. Each time a package is loaded onto or removed from a truck, there will be a
detailed motion plan for a forklift. Fig. 5-1 depicts a graphical representation of this
domain.
43
I/
airport-1-trock-0. ~C
SElN
EU,.
EU,.
U.E1
U.ae-
EU
EU
EU
EU
EU DU.
EU U.E
Eir
__
__k-
Eu..
I...
I...
I...
UMM
Figure 5-1: Visualization of the Marsupial Logistics Domain. Circles are locations and
pink circles are airports. The additional windows represent the loading and storage
areas of the vehicles. The red squares represent a marsupial robot which takes care
of storing packages for transit. In order for vehicles to move, all packages, as well as
the loader, must be on one of the beige squares. Package 2 is about to be unloaded
at airport-1 so it can be flown to a destination.
44
AO:Unload (package: 1 truck,
loc3)
AL:Unload(package: 1, truck, loc3)
AO:Load(package: 1, truck, Joc2)
A2:Untoad(package: 1, truck,
loc3)
AO:Unload(package:
0, truck, loc3)
Al Unload(package: 0, truck,
loc3)
A0tLoad(package: 0, truck, Ioc2) A2:Unfoad(package: 0, truck, 1oc3)
Reorder
A1:Load(package:
1, truck, loc2)
AO:Load(package:
0, truck, loc2)
A2:Unload(package: 1, truck oc3)
In
A2:Load(package:
1, truck, loc2)
AI:Load(package:
A2 U nload(package:
0, truck, loc3)
Reorder
0, truck, loc2)
A2:Unload(package:
0, truck, loc3)
A2:Unload(package: 1, truck,
loc3)
Figure 5-2: The root of a planning tree for a simple problem in the marsupial logistics
domain that involves transporting two packages to another location within the same
city. At the high level, the Unload operators are recognized as overlapping on a
shareable resource (truck) and are combined. In refining Plan 3, the Load operator is
determined to overlap with the Unload operator on both the shareable resource of the
truck and the contained resource of the truck's location. It is reordered to be before
the first Unload because it is estimated, greedily, as being easier to achieve from the
current state. If there was not enough space in the truck, then the truck would not
be considered shareable and the ordering would remain unchanged.
45
The HPN framework supports using real robot kinematics and continuous geometry
for managing objects inside the trucks. For efficiency in these experiments, however,
we use a simplified version of the geometry in which the cargo hold is discretized
into a grid of locations; the robot occupies one grid location and can move in the
four cardinal directions. Each "package" takes up multiple cells and is shaped like
a Tetris piece. This model retains the critical aspects of reasoning about the details
and order of operations within the truck (even determining whether a set of objects
can be packed into a truck is, in general, NP-complete [10]). We can also see it as an
instance of the navigation among movable obstacles (NAMO) problem in a discrete
space [30].
To load a package onto a truck, for example, it might be necessary to
move, or even unload and reload other packages that are currently in the truck.
5.1.1
Fluent specification
This section provides a formal description of the fluents used in marsupial logistics.
Each fluent specifies a test function which will enable us to determine its truth value
in a given world model. SweptVolume is a function that takes a path, package, and
grasp as arguments and computes the region that must be clear for a loader to traverse
that path holding that package with that grasp.
" In(package, vehicle)
test: package G vehicle.objects
* At(vehicle, location)
test: vehicle.location = location
* PkgLoc(packge, vehicle, gridLoc)
test: vehicle.obj Loc[package] = gridLoc
" LoaderLoc(vehicle, gridLoc)
test: vehicle.loaderLoc = gridLoc
" LoaderHolding(vehicle,package, grasp)
test: vehicle.heldObject = package A vehicle.loaderGrasp= grasp
* ClearPath(path,grasp, package, vehicle)
test: VgridLoc G sweptVolume (path,grasp,package), -blocked(gridLoc, vehicle)
46
"
Same City(package, vehicle)
test: E{loci} s.t loco = package.location,connected(loci, loci_1), loc, = vehicle.location
" Packed(vehicle)
test: Vp e vehicle.objects, vehicle.objLoc[p] E vehicle.storageRegion
5.1.2
Operator specification
This section formalizes the operator schemas used for marsupial logistics. Operators
are divided into 3 categories: logistics operators, marsupial operators, and inference
operators. Logistics operators describe actions for loading and unloading packages
into vehicles, as well as moving vehicles between locations.
Marsupial operators
describe actions for manipulating packages within a vehicle. Inference operators enumerate preconditions for derived predicates; e.g., locations for objects such that a
vehicle is packed. They serve to enable our regression based planner to create subgoals for derived predicates.
Logistics operators have cost 10, marsupial operators
have cost 1, and inference operators have cost 0. We list the resources that primitive operators consume.
This listing does not classify resource use as contained,
shareable, or continual because those classifications are done with respect to abstract
operators and are left up to the arrange function. Operator schemas also include the
precondition criticalities that define the hierarchy we used for this domain.
" Load(package, vehicle, location):
res: In(package, vehicle),
PkgLoc(package, vehicle, vehicle.loadLoc)
pre:
1. Reachable(location, vehicle), At(package, location)
2. At(vehicle, location)
3. Clear(vehicle, loadRegion)
cost: 10
consumes: vehicle, vehicle.loadRegion, vehiclelocation
" Unload(package, vehicle, location):
res: At(package, location)
pre:
47
1. Reachable(location, vehicle)
2. Same City(package, vehicle)
3. In(package, vehicle)
4. At(vehicle, location)
5. PkgLoc(package, vehicle, vehicle.loadLoc)
cost: 10
consumes: vehicle, vehicle.loadRegion, vehicle.location
* Travel(vehicle, startLoc, resultLoc):
res: At(vehicle, resultLoc)
pre:
1. At(vehicle, startLoc), Connected(startLoc, resultLoc, vehicle)
2. Packed(vehicle)
cost: 10
consumes: vehicle, vehicle.location
" LoaderGrasp(vehicle,package, grasp, gridLoc):
res: LoaderHolding(vehicle, package, grasp)
choose: loaderLoc E GraspLocations(gridLoc,grasp),
pickPath E Paths(vehicle.loaderHome, targetLoc)
pre:
1. LegalGrasp(package, grasp, gridLoc, vehicle),
ClearPath(pickPath,grasp, package, vehicle),
PkgLoc(vehicle, package, gridLoc)
2. LoaderHolding(vehicle, None, None)
3. LoaderLoc(vehicle, loaderLoc)
cost: 1
consumes: vehicle.loader, loaderLoc, pickPath
* LoaderPlace(vehicle, package, gridLoc, grasp):
res: PkgLoc(package, gridLoc)
choose: loaderLoc E GraspLocations(gridLoc,grasp),
placePath E Paths(vehicle.loaderHome, targetLoc)
pre:
1. LegalGrasp(package, grasp, gridLoc, vehicle), In(package, vehicle)
2. ClearPath(placePath,gridLoc, grasp, package, vehicle)
3. LoaderHolding(vehicle, package, grasp)
4. LoaderLoc(vehicle, loaderLoc)
48
cost: 1
consumes: vehicle. loader, gridLoc, loaderLoc, placePath
" LoaderMove(vehicle, targetLoc, package, grasp):
res: LoaderLoc(vehicle, targetLoc)
choose: path E Paths(vehicle.loaderHome, targetLoc)
pre:
1. LegalGrasp(package, grasp, targetLoc, vehicle)
2. ClearPath(path,grasp, package, vehicle)
3. LoaderHolding(package,grasp, vehicle)
consumes: vehicle.loader, p
" SameCity(package, vehicle):
res: SameCity(package, vehicle)
choose: loc E ReachableLocs(vehicle)
pre:
1. 0
2. At(packge, loc)
cost: 0
consumes: vehicle, package
" Pack(vehicle):
res: Packed(vehicle)
choose: locfpkg] E vehicle.storageRegion V pkg s.t. In(pkg, vehicle),
loaderLoc E vehicle.storageRegion
pre:
1. 0
2. PkgLoc(pkg, vehicle, loc/pkg]) V pkg s.t. In(pkg, vehicle)
3. LoaderLoc(vehicle, loaderLoc)
cost: 0
consumes: vehicle, vehicle.storageRegion
* ClearPath(path,grasp, package, vehicle)
res: ClearPath(path,grasp, package, vehicle)
choose: loc/pkg] e vehicle.storageRegionV pkg s.t. overlaps(pkg, path)
pre:
1. 0
2. PkgLoc(pkg, vehicle, loc[pkg]) V pkg s.t. In(pkg, vehicle)
cost: 0
consumes: vehicle. loader
49
5.2
Experiments and results
We designed experiments to compare a classical non-hierarchical planner called FF [18],
HPN,
and RCHPN. FF is a fast, easy-to-use classical planning algorithm. However,
even small instances of the marsupial transportation domain are intractable for FF.
To demonstrate this, we ran FF on an instance with 8 locations, 2 of which were
airports; a single truck per airport; one plane; and a single package which occupied a
single location on the grid. The package needed to be transported from a location to
the airport it was not connected to. Even on this problem, FF took slightly less than
7.5 hours to find a solution of length 62. The pddl domain and problem files, as well
as the solution FF found, are shown in Appendix A. There have been improvements
in this class of planners [26], but they cannot ultimately address the fundamental
problem that we need to search over a long horizon with a large branching factor to
solve even the simplest problems in this domain.
We altered the basic HPN algorithm so that it solves easy problems more quickly
at the cost of a small increase in computation time on other problems.
Given a
conjunctive goal, we first check for the existence of a plan for a random serialization
of the fluents; this will succeed very quickly in problems with many goals that are
independent at the current level of abstraction and usually fails quickly otherwise. If
it fails, we search for a monotonic plan (one that never causes a goal fluent that is
already true to be made false). Should we fail to find a monotonic plan, we execute a
standard backward search. These are standard modifications to backchaining planners
and do not affect the overall correctness of the algorithm [12].
At the lowest levels of abstraction, we use a motion planner to determine detailed
placements of packages and motions of the robot. The motion planner could be something like an RRT in the continuous configuration space of robot and packages [24];
in this work, it is an implementation of A* in the discretized geometry of a truck.
The ability to elegantly use a specialized planner to solve sub-problems is a benefit
of the HPN approach and is key to its ability to tractably solve problems in this
domain. Work on integrating modules into standard symbolic architectures attempts
50
to enable similar benefits for standard symbolic planners, but places restrictions on
the types of planners that can be used [11].
In designing the arrange function, we must determine the resources used by abstract versions of operators and categorize those resources as shareable, contained,
or continual. Our implementation considers rearranging abstract versions of Load,
Unload, and SameCity. Other operators only appear lower in the hierarchy and the
plans they appeared in were frequently quite constrained; the computational effort to
reorder them is not worth it.
We estimated the abstract resource use of Load with a 0 th order estimate. For
SameCity, we used a I" order estimate. This allowed us to expose the resources
used to unload a package in this particular city. We used a
2 nd
order estimate for
abstract Unloads. We estimated the resource use to include the implicit SameCity
precondition. At lower levels in the hierarchy, we consider the free space resource
used in placing a package in the load region. We do not consider free space earlier
because, unless we know where a package is within the vehicle, it is hard to make
any useful assessment of this resource. This illustrates the utility of optimizing in the
now; we can postpone optimization as well as planning.
We adopted the convention that a vehicle resource was shareable for two goals if
there was an arrangement of packages, including packages mentioned in the goal that
fit in the vehicle. We estimated this with a greedy method that iteratively placed
packages as far towards the back of the vehicle as possible, preferring placements towards the sides as a tiebreaker. An example execution of a simple plan with reordering
and combining of subgoals is illustrated in Fig. 5-2.
We defined a distribution over planning problems within this domain and tested
on samples from that distribution. Each instance had 5 airports, with 4 locations
connected to each airport. The layout of each airport and connected locations was
randomly selected from a class of layouts: circular (roads between locations are connected in a circle), radial (each location is directly connected to the airport, but not
to other locations), linear (the same as circular with one connection dropped), and
connected (each location was connected to each other location). There was a single
51
Decrease in Plan Cost vs Problem Size
35
0
C 25
U
1A
20
Multiple Origin/
Multiple Destination
-
15-
0 Single Origin/Multiple
Destination
-
Multiple Origin/Few
Destination
ta
M 10
E Single Origin/Single
Destination
0
3
4
5
6
7
8
9
10
Number of Packages
Figure 5-3: Average percent decrease in plan cost vs. problem size for RCHPN vs.
HPN. This figure depicts results across four experimental conditions. 'Multiple Origin/Multiple Destination' shows the execution cost savings when there is little room
for improvement. As expected it shows very little difference between HPN and RCHPN.
'Single Origin/Single Destination' depicts execution cost reductions when there is a
large potential for execution cost savings. In this setting we see up to 30% improvements. The 'Single Origin/Multiple Destination' and 'Multiple Origin/Few Destination' conditions show an intermediate between the other two conditions. As expected,
they also show an intermediate reduction in cost.
truck to do the routing within each city and a single plane to route between the
airports. For the vehicles, the cargo area for packages was randomly selected to be
either small (3x6) or large (4x8). Packages were randomly selected from a set of 6
shapes.
We ran experiments in four different regimes: maximal parallel structure among
tasks, parallel structure in the destination of packages only, parallel structure in the
origin of packages only, and finally little to no parallel structure. This enables us to
evaluate the performance of RCHPN in situations where there is a large opportunity for
execution cost savings and in situations where there is little to no room for improve52
ment. To do this, we varied the number of potential start locations and destinations
for packages from a single option to a uniform selection from all locations in the domain. We collected data for tasks with 3 to 10 packages and averaged results across
10 trials. For a particular problem we ran both HPN and RCHPN and computed the
ratio of the costs (the sum of the costs of the all of the primitive operators executed
during the run) and averaged these ratios across 10 independent runs.
For problems with a single origin and destination, RCHPN achieved an average
of 30% improvement, roughly independent of problem size. When either the origin
or destination was dispersed, the average improvement dropped to about 15%. In
this case, smaller problems typically saw less improvement than larger ones. This
is because the more packages, the more likely it is that there is some structure the
planner will be able to take advantage of.
Fig.
5-3 depicts average decrease in
execution cost vs. problem size for our testing regimes.
Our heuristics only apply to packages going from and to similar locations, so when
both package origins and package destinations are distributed widely we expect to
see little improvement. This was borne out in our results, as the multiple origin,
multiple destination experiments saw 5% improvement. However, while we saw little
to no improvement in execution cost on those runs, we also saw little to no increase
in planning time. This illustrates the utility of doing peephole optimization outside
of the main planning loop. Our solution will spend a small amount of time at the
abstract level looking for parallel structure, but if none is found, it proceeds with
planning as normal and incurs a modest overhead. Fig. 5-4 shows this relationship
in detail.
5.3
Learning ordering and combining rules
One of the upsides of RCHPN is that the heuristics used to perform the optimization,
though domain specific, are usually quite simple. This creates a hope that it will be
possible to learn these rules from experience. This amounts to learning the arrange
function, as actual control decisions can be deduced from that. This task is made
53
Increase in Planning Time vs Execution Cost Reduction
x
c
x
9
40
X
x
a*10
0
030
40
s0
60
% Decrease in Execution Cost
Figure 5-4: Plot of percent decrease in execution cost vs. percent increase in planning time. Points are color coded to correspond to the data series from Fig. 5-3.
The positive correlation between the two highlights a useful property of optimizing
in a post-processing step: on problems where there is little parallel structure, our
modifications have little to no effect on the planning time. As more parallel structure
is introduced, more planning time is spent utilizing that structure. In some cases
planning time decreased slightly. This is a result of non-determinism in the planner.
difficulty by the need to learn rules at multiple levels in the hierarchy and issues with
effectively representing world state.
Interaction between rules at different levels in our hierarchy is one of the key
challenges in approaching this learning problem. We would like to learn rules independently or at least sequentially, learning first at either the highest or lowest levels
and progressing appropriately. Unfortunately, this decomposition is likely to lead to
issues. Many control rules at the high level will only result in good performance if
the appropriate ordering rules can be applied at lower levels. For example, the rule
'unload packages going to the same destination' is only a good idea if lower levels
in the hierarchy know to load both packages before unloading them. However, the
problem distribution we train with at more concrete levels is defined by the control
rules at more abstract levels. We only know to train lower levels to deal with this
type of rule if we already have that rule at higher levels. A possible way to approach
54
this is through use of coordinate ascent, alternating learning at multiple levels.
Our other key issue is one of representation. The underlying world models for HPN
can be quite complex and large. Representing this compactly while capturing enough
information to enable learning is a difficult task. One approach is to make use of the
logical formalism used for planning. There is some work done on performing statistical
learning with logical representations and we already need to create this representation
for planning [13].
However, these learning techniques usually assume that we are
always learning in the same logical world, so objects in the world are the same across
training instances. We would like to be able to learn control rules for a domain, where
the problem instance can vary according to an arbitrary probability distribution.
Enabling this in general requires finding correspondences between objects in different,
although similar, logical worlds. There are some kernels, such as the pyramid match
kernel, which may be able to cope with this [14].
5.3.1
Learning Experiments
We did some initial experiments with learning for this setting. We focused on input
representation issues and restricted ourselves to learning control rules at the highest
level of the hierarchy, using hand-coded rules for other abstraction levels. We kept the
road networks and locations constant across training instances, with 2 airports and
3 locations per airport, but allowed the number and types of packages to vary. We
re-named packages in descending order of distance (in the road network) to the goal
to avoid the issue of determining object correspondences. We used the truth values of
logical fluents as our basic features, and augmented our features was a binary matrix
that contained contradiction information about each pair of features.
We employed a two-tiered classification strategy with a dual SVM as the underlying classification method and used the libSVM software package to run our experiments [7, 9]. We first trained a classifier to determine when combining subgoals was
beneficial, and trained a different classifier to make subsequent ordering decisions.
We trained on 108 different problem instances. We found that a polynomial kernel of
degree 3 gave the best performance.
55
We evaluated our learned rules through leave one out cross-validation and compared to our hand-coded heuristics and an optimal decision strategy that plans for
each valid ordering or combination separately and selects the option that minimizes
the execution cost. This strategy is computationally infeasible for any reasonably
sized task but gives insight into the best we could hope to do. Overall, the results
were positive. Across the test set, selecting the best option every time results in a
total execution cost of 20173 units. Our hand-coded heuristics incur an additional
572 units of cost over the optimal solution: a 2.8% increase. Our learned control
function saw an increase of 247 units compared to the optimal solution. Although it
remains to be seen if these results generalize to more complicated scenarios, this is
a 56% improvement in the additional cost incurred and is a promising result. Note
that we are able to get this close to 'optimal' because we are only comparing control
strategies at the highest level of abstraction; all of the solutions being compared use
the same planning rules and heuristics at more concrete levels.
56
Chapter 6
Conclusion and Future Directions
If our goal is to build agents that can execute tasks in real-world long-horizon settings we need a way to efficiently select actions despite the large problem size and
PSPACE-complete nature of this problem. Aggressive hierarchical planning proposes
that we solve this by breaking up a single large problem into many small problems
that can be solved sequentially.
Hierarchical Planning in the Now (HPN) provides a way to use temporal hierarchy
for large planning problems. It interleaves planning with execution in order to reduce
the complexity associated with estimating the results of an abstract action. This
property also lets HPN extend naturally to the partially-observable case [20]. Interleaved planning and execution is done by committing to an initial abstract plan and
incrementally refining it, executing the first step in the abstract plan before refining
the next.
While this strategy is effective at finding solutions, it make no claims to the
quality of solutions produced. There is no cost model at abstract planning levels, so
abstract solutions can commit HPN-style planners to executing plans with very poor
execution cost. A common failure mode comes from incorrectly ordering subgoals so
that they do not serialize: subgoals are achieved, only to be undone and re-achieved
in planning and executing subsequent subgoals.
Additionally, HPN-style planners
commit to executing each subgoal in a abstract plan sequentially. This keeps the
planning horizon short, but can also hide potential cost savings if there is parallel
57
structure in subtasks. Considering subtasks jointly, in certain scenarios, enables a
hierarchical planner to find shorter solutions and leverage this structure.
A hierarchical planning process does not generally have enough information to
make these ordering and combining decisions. Building cost estimation into the planning loop will likely result in unacceptable slowdown. We presented RCHPN, a modification of HPN that considers re-ordering and combining subtasks in a post-processing
step, to address these concerns and perform execution cost optimization. RCHPN uses
a pairwise comparison function to make ordering decisions 'in the now' interleaving
optimization with planning and execution. This lets optimization procedures utilize
more details about the current world state and perform more expensive computation
when compared with optimization done purely at planning time.
We provided guidelines for writing heuristics to control optimization. We frame
the underlying issue as one of shared resource use and classify the resource use of
an abstract operator into one of three categories: shareable, contained, or continual.
From this classification it is straightforward to make the corresponding combining or
re-ordering decision.
To evaluate RCHPN we introduced the marsupial logistics domain: a modification
of the IPC logistics domain that introduces operators for manipulating and packing packages. This modification results in a large increase in the state space and
branching factor for similar problems and renders even simple problems intractable
for state-of-the-art symbolic planning algorithms. We presented a precondition dropping hierarchy that lets us to solve large problems in this domain.
We implemented re-ordering and combining heuristics for this domains and compared the performance of RCHPN and HPN with respect to planning time and execution
cost across a wide variety of large marsupial logistics problems. We found that, when
the opportunity is there, RCHPN is able to leverage parallel structure in subtasks and
make ordering decisions to reduce execution cost by up to 30%. When there is little
opportunity for cost savings, we find that the running time of RCHPN stays very close
to the running time of non-optimizing HPN.
We examined the problem of learning control rules for RCHPN. We found that two
58
key obstacles in this problem are dependence between learning tasks at different levels of the hierarchy and input representation. We performed experiments training an
SVM to make control decisions at the highest level of abstraction for small problems
and were able to outperform hand-coded heuristics. However, as these experiments
did not deal with learning at multiple levels and had a fairly simple problem distribution, it is still an open problem to determine if learning control rules is tractable
in this setting. In addition to applying learning, there are several interesting research
directions this works points toward. We conclude with a brief discussion of some of
these.
6.1
Avenues for future research
One interesting direction for further research is investigating the task of deriving ordering or combining relations automatically. The inspiration for this line of research
comes from the similarity between reasonable orderings for landmarks and the ordering failure modes we see in hierarchical planners. There are automated mechanism
to extract landmarks and corresponding orderings from planning problems. One of
the main tasks here is in determining the relationship between abstract subgoals and
landmarks. Once we commit to abstract subgoals, they will become true at some
point in any plan we can find, so the concepts are likely related. If we could apply
automated techniques to determine orderings between subgoals we would obviate the
need for learning, or hand-coding, heuristics.
In order to apply landmark extraction techniques from the literature we would
need to build a compact purely symbolic abstraction of our subproblems. This is not
a simple task and is an interesting research avenue in itself. Being able to build these
symbolic abstractions for complex and large domains would enable more flexibility
in the planner used to solve subproblems within HPN as well. This would enable
HPN-style planners to benefit from positive results in forward heuristic search, embodied in planners like FF, Fast-Downward, and LAMA. Such a representation does
exist for any problem HPN can solve; simply write down all actions considered by the
59
back-chaining planner. The challenge is to determine this without first solving the
planning problem. This is analogous to the approaches taken in sampling-based motion planning, where a small, discrete, representation, which is sufficient for planning,
is extracted from a large continuous domain [24, 21].
A final direction for future research is the extension of the class of plans we
consider in our optimization. In addition to combining and reordering subgoals, there
are potential cost savings to be found in rebinding variables in an abstract plan. An
example application might be jointly optimizing the placement of two objects within
a region subject to maintaining the correctness of the corresponding plan. Similar
arguments to those made about re-ordering and combining apply to the case of finding
a good binding of a variable with respect to execution cost. Adding this capability to
hierarchical planners makes the job of a system designer easier; the variable binding
that occurs during planning need only concern itself with finding a solution and can
pay less attention to the execution cost ramifications of that selection.
60
Appendix A
PDDL for Marsupial Logistics
The appendix contains the pddl domain and problem file used to test FF's performance on the marsupial logistics problem. It also includes FF's output from running
on these files.
A.1
Domain
marsupial-logistics domain
Dylan Hadfield-Menell 9/2012
(define (domain marsupial-logistics)
(:requirements :strips :typing)
(:types PACKAGE TRUCK LOCATION AIRPLANE AIRPORT PLACE DIR)
(:constants N E W S - DIR
;; grid places
p00 p10 p20 p30 p40
p01 p11 p 21 p31 p4 1
p02 p 12 p 22 p 32 p42
p03 p 13 p 23 p 33 p4 3
p04 p14 p 24 p 34 p4 4
p05 p 15 p 2 5 p3 5 p4 5
p06 p 16 p 2 6 p3 6 p4 6
p07 p 17 p2 7 p3 7 p4 7
p08 p18 p2 8 p3 8 p4 8
p09 p 19 p2 9 p3 9 p4 9 - place)
(:predicates
61
(at ?obj ?loc)
(in ?obj ?vehicle)
(exists-road ?locl ?loc2)
(at-grid ?x ?v ?y)
(loader-at ?p ?v)
(adj ?d ?x ?y ?v ) ; ?y is to the ?dir of ?x in ?v
(holding ?x ?v)
(clear ?x ?v)
(free ?v)
)
;;; Marsupial actions
(:action
move
:parameters (?d :precondition
dir ?rpl ?rp2 - place ?truck -
(either TRUCK AIRPLANE))
(and (loader-at ?rpl ?truck)
(adj ?d ?rpl ?rp2 ?truck)
(free ?truck)
(clear ?rp2 ?truck)
)
:effect (and (loader-at ?rp2 ?truck) (not (loader-at ?rpl ?truck))
(not (clear ?rp2 ?truck)) (clear ?rpl ?truck))
)
(:action
grasp
:parameters (?h - PACKAGE ?d - dir ?rp ?hp - place
?truck - (either TRUCK AIRPLANE))
:precondition
(and
(free ?truck)
(in ?h ?truck)
(at-grid ?h ?hp ?truck)
(loader-at ?rp ?truck)
(adj ?d ?rp ?hp ?truck))
:effect (and (holding ?h ?truck) (not (free ?truck)))
)
(:action
ungrasp
:parameters (?h - PACKAGE ?truck :precondition
(either TRUCK AIRPLANE))
(holding ?h ?truck)
:effect (and (not (holding ?h ?truck))
)
(:action
62
(free ?truck))
move_1_H
:parameters (?d - dir ?h - PACKAGE ?rpl ?rp2 ?hpl ?hp2 - place
?truck - (either TRUCK AIRPLANE))
:precondition
(and (holding ?h ?truck)
(loader-at ?rpl ?truck)
(adj ?d ?rpl ?rp2 ?truck)
(at-grid ?h ?hpl ?truck)
(adj ?d ?hpl ?hp2 ?truck)
(clear ?rp2 ?truck)
(clear ?hp2 ?truck)
)
:effect (and (loader-at ?rp2 ?truck) (not (loader-at ?rpl ?truck))
(at-grid ?h ?hp2 ?truck) (not (at-grid ?h ?hpl ?truck))
(clear ?hpl ?truck) (not (clear ?hp2 ?truck))
(clear ?rpl ?truck) (not (clear ?rp2 ?truck)))
)
;; Move when ?hpl = ?rp2 (e.g. move N when grasp N)
;; Pushing the grasped block ahead the robot
(:action
move_2_H
:parameters (?d - dir ?h - PACKAGE ?rpl ?rp2 ?hp2 - place
?truck - (either TRUCK AIRPLANE))
:precondition
(and (holding ?h ?truck)
(loader-at ?rpl ?truck)
(adj ?d ?rpl ?rp2 ?truck)
(at-grid ?h ?rp2 ?truck)
(adj ?d ?rp2 ?hp2 ?truck)
(clear ?hp2 ?truck)
)
:effect (and (loader-at ?rp2 ?truck) (not (loader-at ?rpl ?truck))
(at-grid ?h ?hp2 ?truck) (not (at-grid ?h ?rp2 ?truck))
(clear ?rpl ?truck) (not (clear ?hp2 ?truck)))
)
;;;;Logistics actions
(:action LOAD-TRUCK
:parameters
(?obj - PACKAGE
?truck - TRUCK
?loc - (either LOCATION AIRPORT))
:precondition
(and (at ?truck ?loc)
(at ?obj ?loc)
63
(clear p00 ?truck))
:effect
(and
(not (at ?obj ?loc))
(in ?obj ?truck)
(at-grid ?obj p00 ?truck)
(not (clear p00 ?truck))))
(:action LOAD-AIRPLANE
:parameters
(?obj - PACKAGE
?airplane - AIRPLANE
?loc - AIRPORT)
:precondition
(and
(at ?obj ?loc)
(at ?airplane ?loc)
(clear p00 ?airplane))
:effect
(and (not (at ?obj ?loc))
(in ?obj ?airplane)
(at-grid ?obj p00 ?airplane)
(not (clear p00 ?airplane))))
(:action UNLOAD-TRUCK
:parameters
(?obj - PACKAGE
?truck - TRUCK
?loc - (either LOCATION AIRPORT))
:precondition
(at ?truck ?loc)
(and
(in ?obj ?truck)
(at-grid ?obj p00 ?truck)
(not (holding ?obj ?truck)))
:effect
(and (not (in ?obj ?truck))
(at ?obj ?loc)
(clear p00 ?truck)
(not (at-grid ?obj p00 ?truck))))
(:action UNLOAD-AIRPLANE
:parameters
(?obj - PACKAGE
?airplane - AIRPLANE
?loc - AIRPORT)
:precondition
(and (in ?obj ?airplane)
(at ?airplane ?loc)
64
(at-grid ?obj p00 ?airplane)
(not (holding ?obj ?airplane)))
:effect
(and
(not (in ?obj ?airplane))
(at ?obj ?loc)
(clear p00 ?airplane)
(not (at-grid ?obj p00 ?airplane))))
(:action DRIVE-TRUCK
:parameters
(?truck - TRUCK
?loc-from - (either LOCATION AIRPORT)
?loc-to - (either LOCATION AIRPORT))
:precondition
(and (at ?truck ?loc-from)
(or (exists-road ?loc-from ?loc-to) (exists-road ?loc-to ?loc-from))
(clear p00, ?truck)(clear p10, ?truck)(clear p20, ?truck)
(clear pOt, ?truck)(clear p1, ?truck)(clear p21, ?truck)
(clear p02, ?truck)(clear p12, ?truck)(clear p22, ?truck)
(clear p03, ?truck)(clear p13, ?truck)(clear p23, ?truck)
(clear p04, ?truck)(clear p14, ?truck)(clear p24, ?truck))
:effect
(and (not (at ?truck ?loc-from))
(at ?truck ?loc-to)))
(:action FLY-AIRPLANE
:parameters
(?airplane - AIRPLANE
?loc-from - AIRPORT
?loc-to - AIRPORT)
:precondition
(and (at ?airplane ?loc-from)
(clear p00, ?airplane)(clear p10, ?airplane)(clear
(clear p01, ?airplane)(clear p1t, ?airplane)(clear
(clear p02, ?airplane)(clear p12, ?airplane)(clear
(clear p03, ?airplane)(clear p13, ?airplane)(clear
(clear p04, ?airplane)(clear p14, ?airplane)(clear
:effect
(and (not (at ?airplane ?loc-fror
(at ?airplane ?loc-to)))
)
A.2
Problem instance
(define (problem augmented-strips)
(:domain marsupial-logistics)
65
p20,
p21,
p22,
p23,
p24,
?airplane)
?airplane)
?airplane)
?airplane)
?airplane))
(:objects trucki - TRUCK
boston-locO
boston-loci
boston-loc2
boston-loc3
boston-loc4
-
LOCATION
LOCATION
LOCATION
LOCATION
LOCATION
boston-loc5 - LOCATION
boston-arpt - AIRPORT
boston-truck - TRUCK
sf-loc5 - LOCATION
sf-truck -
TRUCK
sf-arpt - AIRPORT
plane - AIRPLANE
pkg - PACKAGE
)
(:init
(at boston-truck boston-arpt)
(at pkg boston-loc5)
(at plane boston-arpt)
(loader-at
(clear pOl
(clear p 02
(clear p 03
(clear p04
(clear p 05
(clear p06
(clear p 07
(clear p0 8
(adj E p00
(adj E pOt
(adj S p01
(adj E p02
(adj S p02
(adj E p03
(adj S p03
(adj E p04
(adj S p04
(adj E p05
(adj S p05
(adj E p06
(adj S p06
(adj E p07
p00 boston-truck)(clear
boston-truck)(clear p1i
boston-truck)(clear p12
boston-truck)(clear p13
boston-truck)(clear p14
boston-truck)(clear p15
boston-truck)(clear p16
boston-truck)(clear p17
boston-truck)(clear p18
p10
p11
p0 0
p1 2
p0 1
p1 3
p0 2
p14
p03
p1 5
p04
p16
boston-truck)(adj N
boston-truck)(adj N
p10 boston-truck)(clear p20 boston-truck)
boston-truck)(clear p21 boston-truck)
boston-truck)(clear p22 boston-truck)
boston-truck)(clear p23 boston-truck)
boston-truck)(clear p24 boston-truck)
boston-truck)(clear p25 boston-truck)
boston-truck)(clear p26 boston-truck)
boston-truck)(clear p27 boston-truck)
boston-truck)(clear p28 boston-truck)
p00 pOt boston-truck);pOO
pOt p02 boston-truck)
boston-truck);pOt
boston-truck)(adj N p02 p03 boston-truck)
boston-truck);p02
boston-truck)(adj N p03 p04 boston-truck)
boston-truck);p03
boston-truck)(adj N p04 p05 boston-truck)
boston-truck);p04
boston-truck)(adj N p05 p06 boston-truck)
boston-truck);p05
boston-truck)(adj N p06 p07 boston-truck)
p 0 5 boston-truck);p06
p1 7 boston-truck)(adj N p07 p08 boston-truck)
66
p07 p06 boston-truck);p07
(adj
(adj
p0 8 p18 boston-truck)(adj S p08
(adj
p10 p20 boston-truck)(adj W p1O
1
(adj
p 10 p 1 1 boston-truck);p O
(adj
p1 1
p11 p21 boston-truck)(adj
(adj
p1 1
p 1 1 p12 boston-truck)(adj
p12
p12 p22 boston-truck)(adj
(adj
p12
p12 p 1 3 boston-truck)(adj
(adj
1
3
p23
boston-truck)(adj
(adj
p13
p
1
4
3
boston-truck)(adj
(adj
p13
p1 p
(adj
p 14
p 1 4 p24 boston-truck)(adj
(adj
p 14
p 1 4 p 1 5 boston-truck)(adj
5
1
p25 boston-truck)(adj
(adj
p15
p
p15
(adj
p1 5 p16 boston-truck)(adj
p16
(adj
p 1 6 p26 boston-truck)(adj
1
7
p16
boston-truck)(adj
p16 p
(adj
boston-truck)(adj
p27
p17
p17
(adj
p17
(adj
p 1 7 p18 boston-truck)(adj
p18
p18 p28 boston-truck)(adj
(adj
7
1
8
1
boston-truck);p18
(adj
p
p
p20 p 10 boston-truck)(adj N p20
(adj
p21 p 1 1 boston-truck)(adj N p21
(adj
p21 p20 boston-truck);p21
(adj
p22 p12 boston-truck)(adj N p22
(adj
p22 p21 boston-truck);p22
(adj
p23 p 1 3 boston-truck)(adj N p23
(adj
p23 p22 boston-truck);p 2 3
(adj
p24 p 1 4 boston-truck)(adj N p24
(adj
p24 p23 boston-truck);p24
(adj
p25 p 15 boston-truck)(adj N p25
(adj
p25 p24 boston-truck);p25
(adj
p26 p16 boston-truck)(adj N p26
(adj
p26 p25 boston-truck);p26
(adj
p27 p 1 7 boston-truck)(adj N p27
(adj
p27 p26 boston-truck);p27
(adj
p28 p18 boston-truck)(adj S p28
(adj
(exists-road boston-locO boston-loci)
(exists-road boston-loci boston-loc2)
(exists-road boston-loc2 boston-loc3)
(exists-road boston-loc3 boston-loc4)
(exists-road boston-loc4 boston-loc5)
(exists-road boston-loc5 boston-arpt)
(free boston-truck)
p07 boston-truck);p08
p00 boston-truck)
p0 1
p 10
p02
p 11
p0 3
p12
p04
p13
p05
p 14
p06
p15
p07
p16
p08
boston-truck)
boston-truck);p11
boston-truck)
boston-truck);p12
boston-truck)
boston-truck);p13
boston-truck)
boston-truck);p14
boston-truck)
boston-truck);p15
boston-truck)
boston-truck);p16
boston-truck)
boston-truck);p17
boston-truck)
p21 boston-truck);p20
p2 2 boston-truck)
p2 3 boston-truck)
p2 4 boston-truck)
p2 5 boston-truck)
p 26 boston-truck)
p 27 boston-truck)
p28 boston-truck)
p27 boston-truck);p28
(at sf-truck sf-arpt)
(loader-at p00 sf-truck)(clear p1O sf-truck)(clear p20 sf-truck)
67
(clear p0 1
(clear p02
(clear p03
(clear p 0 4
(clear p0 5
(clear p06
(clear p 0 7
(clear p08
(adj E p0 0
(adj E p0 1
(adj S p0 1
(adj E p02
(adj S p02
(adj E p03
(adj S p 0 3
(adj E p04
(adj S p 0 4
(adj E p05
(adj S p05
(adj E p06
(adj S p06
(adj E p07
(adj S p07
(adj E p08
(adj E plo
(adj N plo
(adj E p11
(adj N p11
(adj E p12
(adj N p12
(adj E p13
(adj N p 1 3
(adj E p14
(adj N p14
(adj E p15
(adj N p 15
(adj E p16
(adj N p16
(adj E p17
(adj N p17
(adj E p18
(adj S p18
(adj W p20
(adj W p21
(adj S p21
(adj W p22
(adj S p22
sf-truck)(clear p11
sf-truck)(clear p12
sf-truck)(clear p13
sf-truck)(clear p14
sf-truck)(clear p 15
sf-truck)(clear p16
sf-truck)(clear p17
sf-truck)(clear p18
p'0 sf-truck)(adj N
p11 sf-truck)(adj N
p0 0 sf-truck);pOl
p12 sf-truck)(adj N
p0 1 sf-truck);p02
p13 sf-truck)(adj N
p02 sf-truck);p03
p14 sf-truck)(adj N
p 0 3 sf-truck);p04
p15 sf-truck)(adj N
p 0 4 sf-truck);pO5
p 1 6 sf-truck)(adj N
p0 5 sf-truck);p06
p 1 7 sf-truck)(adj N
p 0 6 sf-truck);p07
p18 sf-truck)(adj S
p20 sf-truck) (adj W
p11 sf-truck);plO
p 21 sf-truck)(adj
p12 sf-truck)(adj
p22 sf-truck)(adj
p13 sf-truck)(adj
p23 sf-truck)(adj
p14 sf-truck)(adj
p24 sf-truck)(adj
p15 sf-truck) (adj
p25 sf-truck)(adj
p16 sf-truck)(adj
p26 sf-truck)(adj
p17 sf-truck)(adj
p27 sf-truck)(adj
p18 sf-truck)(adj
p28 sf-truck)(adj
8
p17 sf-truck);pl
p10 sf-truck)(adj N
p11 sf-truck)(adj N
p20 sf-truck);p2l
p12 sf-truck)(adj N
p21 sf-truck);p22
sf-truck)(clear p21 sf-truck)
sf-truck)(clear p22 sf-truck)
sf-truck)(clear p23 sf-truck)
sf-truck)(clear p24 sf-truck)
sf-truck)(clear p25 sf-truck)
sf-truck)(clear p26 sf-truck)
sf-truck)(clear p27 sf-truck)
sf-truck)(clear p28 sf-truck)
pOO p01 sf-truck);pOO
p01 p02 sf-truck)
p02 p03 sf-truck)
p03 p04 sf-truck)
p04 p05 sf-truck)
p05 p06 sf-truck)
p06 p07 sf-truck)
p07 p08 sf-truck)
p08 p07 sf-truck);p08
p1O p00 sf-truck)
p1 1
p11
p12
p12
p13
p13
p 14
p14
p1 5
p1 5
p1 6
p16
p17
p17
p18
p01
p10
p02
p11
p0 3
p12
p04
p13
p05
p14
p06
p15
p0 7
p16
p0 8
sf-truck)
sf-truck);pll
sf-truck)
sf-truck);pl2
sf-truck)
sf-truck);p13
sf-truck)
sf-truck);pl4
sf-truck)
sf-truck);pl 5
sf-truck)
sf-truck);pl 6
sf-truck)
sf-truck);pl 7
sf-truck)
p20 p21 sf-truck);p20
p21 p2 2 sf-truck)
p22 p2 3 sf-truck)
68
(adj w p23 p13 sf-truck)(adj
(adj S p23 p22 sf-truck);p23
(adj w p24 p14 sf-truck)(adj
(adj S p24 p23 sf-truck);p 24
(adj w p25 p 15 sf-truck)(adj
(adj S p25 p24 sf-truck);p25
(adj w p26 p16 sf-truck)(adj
(adj S p26 p25 sf-truck);p26
(adj w p27 p17 sf-truck)(adj
(adj S p27 p26 sf-truck);p27
(adj w p28 p18 sf-truck)(adj
(free sf-truck)
(loader-at p00 plane)(clear
(clear p01 plane)(clear p11
(clear p02 plane)(clear p12
(clear p03 plane)(clear p13
(clear p04 plane)(clear p14
(clear p05 plane)(clear p15
(clear p06 plane)(clear p16
(clear p07 plane)(clear p17
(clear p08 plane)(clear p18
(clear p09 plane)(clear p19
(free plane)
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
p0 0
p0 1
p0 1
p02
p02
p03
p03
p04
p04
p0 5
p0 5
p0 6
p06
p0 7
p07
p08
p 10
p 10
p1 1
p1 1
p12
p1 0
p1 1
p00
p12
p0 1
p13
p02
p 14
p03
p15
p04
p16
p0 5
p17
p0 6
p18
p20
p1 1
p21
p12
p22
plane) (adj N
plane) (adj N
plane);pOl
plane) (adj N
plane) ;p02
plane) (adj N
plane) ;p03
plane) (adj N
plane);p04
plane) (adj N
plane) ;pO 5
plane) (adj N
plane) ;p06
plane) (adj N
plane) ;p07
plane) (adj S
plane) (adj W
plane);plO
plane) (adj W
plane) (adj S
plane) (adj W
N p23 p24 sf-truck)
N p24 p 25 sf-truck)
N p25 p26 sf-truck)
N p26 p27 sf-truck)
N p27 p28 sf-truck)
S p28 p27 sf-truck);p28
p1O plane)(clear
plane)(clear p21
plane)(clear p22
plane)(clear p23
plane)(clear p24
plane)(clear p25
plane)(clear p26
plane)(clear p27
plane)(clear p28
plane)(clear p29
p20 plane)(clear
plane)(clear p31
plane)(clear p32
plane)(clear p33
plane)(clear p34
plane)(clear p35
plane)(clear p36
plane)(clear p37
plane)(clear p38
plane)(clear p39
pOO p01 plane);pOO
pOl p02 plane)
p0 2 p03 plane)
p03 p04 plane)
p0 4 p0 5 plane)
p05 p0 6 plane)
p06 p0 7 plane)
p07 p0 8 plane)
p08 p07 plane) ;p08
p10 p00 plane)
p11 p0 1 plane)
p11 p10 plane);pll
p12 p02 plane)
69
p30 plane)
plane)
plane)
plane)
plane)
plane)
plane)
plane)
plane)
plane)
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
(adj
N
E
N
E
N
E
N
E
N
E
N
E
S
W
W
S
W
S
W
S
W
S
W
S
W
S
W
S
W
p12
p13
p13
p14
p14
p15
p15
p16
p16
p17
p17
p18
p18
p2 0
p21
p21
p22
p22
p23
p23
p24
p24
p2 5
p25
p2 6
p26
p2 7
p2 7
p2 8
p 1 3 plane) (adj
p23 plane) (adj
p 1 4 plane) (adj
p24 plane) (adj
p 15 plane) (adj
p25 plane) (adj
p 1 6 plane) (adj
p26 plane) (adj
p 1 7 plane) (adj
p27 plane) (adj
p18 plane) (adj
p28 plane) (adj
p 1 7 plane);p18
p 1 0 plane) (adj
p11 plane) (adj
p20 plane);p 2 l
p12 plane) (adj
p21 plane);p22
p 1 3 plane) (adj
p22 plane) ;p23
p 1 4 plane) (adj
24
p 2 3 plane) ;p
p15 plane) (adj
p24 plane) ;p25
p1 6 plane) (adj
p25 plane) ;p26
p 1 7 plane) (adj
p26 plane) ;p 2 7
p18 plane) (adj
S p12 p 1 1
w p13 p03
S p 1 3 p12
w p14 p04
S p14 p 1 3
w p 15 p05
S p 15 p 1 4
w p16 p06
S p16 p 15
w p17 p07
S p 1 7 p16
w p 1 8 p08
plane);p12
plane)
plane);p13
plane)
plane);p14
plane)
plane);p15
plane)
plane);p 1 6
plane)
plane);p17
plane)
N p20 p21 plane);p20
N p21 p2 2 plane)
N p22 p23 plane)
N p23 p24 plane)
N p24 p25 plane)
N p25 p26 plane)
N p26 p 2 7 plane)
N p27 p 2 8 plane)
S p28 p27 plane);p28
)
(:goal (at pkg sf-arpt))
)
A.3
step
FF output
0:
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
MOVE N P00 P01 PLANE
MOVE N P01 P02 PLANE
MOVE N P02 P03 PLANE
MOVE N P03 P04 PLANE
MOVE N P04 P05 PLANE
MOVE N POO P01 BOSTON-TRUCK
MOVE N P01 P02 BOSTON-TRUCK
MOVE N P02 P03 BOSTON-TRUCK
MOVE N P03 P04 BOSTON-TRUCK
MOVE N P04 P05 BOSTON-TRUCK
MOVE E P05 P15 BOSTON-TRUCK
DRIVE-TRUCK BOSTON-TRUCK BOSTON-ARPT BOSTON-LOC5
70
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC5 BOSTON-LOC4
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC4 BOSTON-LOC3
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC3 BOSTON-LOC2
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC2 BOSTON-LOC1
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC1 BOSTON-LOCO
LOAD-TRUCK PKG BOSTON-TRUCK BOSTON-LOCO
MOVE S P15 P14 BOSTON-TRUCK
MOVE S P14 P13 BOSTON-TRUCK
MOVE S P13 P12 BOSTON-TRUCK
MOVE S P12 P11 BOSTON-TRUCK
MOVE S P11 P10 BOSTON-TRUCK
GRASP PKG W P10 POO BOSTON-TRUCK
MOVE_1_H N PKG P10 P11 P00 P01 BOSTON-TRUCK
MOVE_1_H N PKG P11 P12 P01 P02 BOSTON-TRUCK
MOVE_1_H N PKG P12 P13 P02 P03 BOSTON-TRUCK
MOVE_1_H N PKG P13 P14 P03 P04 BOSTON-TRUCK
MOVE_1_H N PKG P14 P15 P04 P05 BOSTON-TRUCK
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOCO BOSTON-LOC1
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC1 BOSTON-LOC2
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC2 BOSTON-LOC3
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC3 BOSTON-LOC4
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC4 BOSTON-LOC5
DRIVE-TRUCK BOSTON-TRUCK BOSTON-LOC5 BOSTON-ARPT
MOVE_1_H S PKG P15 P14 P05 P04 BOSTON-TRUCK
MOVE_1_H S PKG P14 P13 P04 P03 BOSTON-TRUCK
MOVE_1_H S PKG P13 P12 P03 P02 BOSTON-TRUCK
MOVE_1_H S PKG P12 P11 P02 P01 BOSTON-TRUCK
MOVE_1_H S PKG P11 P10 P01 POO BOSTON-TRUCK
UNGRASP PKG BOSTON-TRUCK
UNLOAD-TRUCK PKG BOSTON-TRUCK BOSTON-ARPT
MOVE E P05 P15 PLANE
LOAD-AIRPLANE PKG PLANE BOSTON-ARPT
MOVE S P15 P14 PLANE
MOVE S P14 P13 PLANE
MOVE S P13 P12 PLANE
MOVE S P12 P11 PLANE
MOVE S P11 P10 PLANE
GRASP PKG W P10 POO PLANE
MOVE_1_H N PKG P10 P11 POO P01 PLANE
MOVE_1_H N PKG P11 P12 P01 P02 PLANE
MOVE_1_H N PKG P12 P13 P02 P03 PLANE
MOVE_1_H N PKG P13 P14 P03 P04 PLANE
MOVE_1_H N PKG P14 P15 P04 P05 PLANE
FLY-AIRPLANE PLANE BOSTON-ARPT SF-ARPT
MOVEA1_H S PKG P15 P14 P05 P04 PLANE
MOVE_1_H S PKG P14 P13 P04 P03 PLANE
MOVE_1_H S PKG P13 P12 P03 P02 PLANE
71
59:
60:
61:
62:
MOVE_1.H S PKG P12 P11 P02 P01 PLANE
MOVE_1JH S PKG P11 P10 P01 POO PLANE
UNGRASP PKG PLANE
UNLOAD-AIRPLANE PKG PLANE SF-ARPT
0.04 seconds instantiating 6144 easy, 42 hard action templates
0.18 seconds reachability analysis, yielding 4879 facts and 4177 actions
0.00 seconds creating final representation with 243 relevant facts
0.01 seconds building connectivity graph
26839.73 seconds searching, evaluating 3621458 states, to a max depth of 8
26839.96 seconds total time
72
Bibliography
[1]
Eyal Amir and Barbara Engelhardt. Factored planning. In Proceeding of IJCAI
2003, pages 929-935, 2003.
[2] Fahiem Bacchus and Qiang Yang. Downward refinement and the efficiency of
hierarchical problem solving. Artificial Intelligence, 71(1):43 - 100, 1994.
[3] Anthony Barrett and Daniel S. Weld. Characterizing subgoal interactions for
planning. In Proceedings of IJCAI 1993, 1993.
[4] Mario Bollini, Jennifer Barry, and Daniela Rus. Bakebot: Baking cookies with
the pr2. In The PR2 Workshop: Results, Challenges and Lessons Learned in
Advancing Robots with a Common Platform, IROS, 2011.
[5] Tom Bylander. The computational complexity of propositional strips planning.
Artificial Intelligence, 69:165 - 204, 1994.
[6] Anthony R Cassandra, Leslie Pack Kaelbling, and Michael L Littman. Acting optimally in partially observable stochastic domains. In Proceedings of the
National Conference on Artificial Intelligence, pages 1023-1023, 1995.
[7] Chih-Chung Chang and Chih-Jen Lin. LIBSVM: A library for support vector
machines. ACM Transactions on Intelligent Systems and Technology, 2:27:127:27, 2011.
[8] Christer Bdkstr6m. Computational aspects of reordering plans. Journal of Artificial Intelligence Research, 9:99-137, 1998.
[9] Corinna Cortes and Vladimir Vapnik.
Learning, pages 273-297, 1995.
Support-vector networks. In Machine
[10] Erik D. Demaine, Susan Hohenberger, and David Liben-Nowell. Tetris is hard,
even to approximate. CoRR, cs.CC/0210020, 2002.
[11] Christian Dornhege, Patrick Eyerich, Thomas Keller, Sebastian Triig, Michael
Brenner, and Bernhard Nebel. Semantic attachments for domain-independent
planning systems. In in Proceedings of ICAPS, 2009.
[12] Richard E. Fikes and Nils J. Nilsson. STRIPS: A new approach to the application
of theorem proving to problem solving. Artificial Intelligence, pages 189-208,
1971.
73
[13] Paolo Frasconi and Andrea Passerini. Learning with kernels and logical representations. In Probabilisticinductive logic programming, pages 56-91. Springer,
2008.
[14] Kristen Grauman and Trevor Darrell. The pyramid match kernel: Discriminative
classification with sets of image features. In Proceedings of the Tenth IEEE
InternationalConference on Computer Vision - Volume 2, ICCV '05, pages 14581465, 2005.
[15] Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191-246, 2006.
[16] Jorg Hoffman, Julie Porteous, and Laura Sebastia. Ordered landmarks in planning. Journal of Artificial Intelligence Research, 22:215-278, 2004.
[17] Jean-Paul Laumond and Paul E. Jacobs and Michel Taix and Richard M. Murray.
A motion planner for car-like robots based on a global/local approach. In IEEE
Transactions on Robotics and Automation, volume 10, October 1994.
[18] J6rg Hoffman. FF: The fast-forward planning system. AI Magazine, 22:57-62,
2001.
[19] Leslie Kaelbling and Tomas Lozano-Perez. Hierarchical planning in the now.
IEEE Conference on Robotics and Automation, 2011.
[20] Leslie Pack Kaelbling and Tomas Lozano-Perez. Unifying perception, estimation
and action for mobile manipulation via belief space planning. In IEEE Conference
on Robotics and Automation (ICRA), 2012.
[21] Lydia E Kavraki, Petr Svestka, J-C Latombe, and Mark H Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces.
Robotics and Automation, IEEE Transactions on, 12(4):566-580, 1996.
[22] Craig A Knoblock. Automatically generating abstractions for planning. Artificial
Intelligence, 68(2):243 - 302, 1994.
[23] Richard E. Korf. Planning as search: A quantitative approach. Artificial Intelligence, 33(1):65 - 88, 1987.
[24] Steven M LaValle and James J Kuffner Jr. Rapidly-exploring random trees:
Progress and prospects. 2000.
[25] Dana Nau, Tsz-Chiu Au, Okhtay Ilghami, Ugur Kuter, J. WIlliam Murdock, Dan
Wu, and Fusun Yaman. SHOP2: An HTN planning system. JAIR, 20:379-404,
2003.
[26] Silvia Richter and Matthias Westphal. The LAMA planner: Guiding cost-based
anytime planning with landmarks. Journal of Artificial Intelligence Research,
39:127-177, 2010.
74
[27] Earl D. Sacerdoti.
Planning in a hierarchy of abstraction spaces.
Artificial
Intelligence, 5(2):115 - 135, 1974.
[28] Earl D Sacerdoti. The nonlinear nature of plans. Technical report, DTIC Document, 1975.
[29] Biplav Srivastava and Subbarao Kambhampati. Scaling up Planning by Teasing
Out Resource Scheduling, volume 1809 of Lecture Notes in Computer Science,
pages 172-186. Springer Berlin / Heidelberg, 2000.
[30] Mike Stilman and James J. Kuffner. Planning among movable obstacles with
artificial constraints. In Proceedings of WAFR 2006, 2006.
[31] Manuela Veloso, Jaime Carbonell, Alicia Perez, Daniel Borrajo, Eugene Fink,
and Jim Blythe. Integrating planning and learning: The prodigy architecture.
Journal of Experimental & Theoretical Artificial Intelligence, 7(1):81-120, 1995.
[32] Manuela M. Veloso. Planning and Learning by Analogical Reasoning. SpringerVerlag New York, Inc., Secaucus, NJ, USA, 1994.
75