Heuristic Search for Manipulation Planning Caelan Reed Garrett

advertisement
Heuristic Search for Manipulation Planning
by
Caelan Reed Garrett
Submitted to the Department of Electrical Engineering and Computer
Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Computer Science and Electrical Engineering
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2015
c Massachusetts Institute of Technology 2015. All rights reserved.
β—‹
Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Department of Electrical Engineering and Computer Science
May 21, 2015
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Tomás Lozano-Pérez
Professor
Thesis Supervisor
Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Leslie Pack Kaelbling
Professor
Thesis Supervisor
Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Prof. Albert R. Meyer
Chairman, Masters of Engineering Thesis Committee
2
Heuristic Search for Manipulation Planning
by
Caelan Reed Garrett
Submitted to the Department of Electrical Engineering and Computer Science
on May 21, 2015, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Computer Science and Electrical Engineering
Abstract
Manipulation problems involving many objects present substantial challenges for
planning algorithms due to the high dimensionality and multi-modality of the search
space. Symbolic task planners can efficiently construct plans involving many entities but cannot incorporate the constraints from geometry and kinematics. Existing
approaches to integrated task and motion planning as well as manipulation planning remain prohibitively slow to plan in these high-dimensional hybrid configuration
spaces involving many objects. We present the FFRob algorithm for task and motion
planning and the hybrid heuristic backward-forward (HHBF) planning algorithm for
general manipulation planning. Both algorithms adapt heuristic ideas from one of
the most successful symbolic planners in recent years, the FastForward (FF) planner,
to continuous robotic planning domains. FFRob discretizes task and motion planning problems using a multi-query roadmap structure that can be conditionalized to
model different placements of movable objects. This structure enables the planner to
efficiently compute the FFRob heuristic which incorporates geometric and kinematic
planning constraints to give a tight estimate of the distance to the goal. The resulting
tightly integrated planner is simple and performs efficiently in a collection of tasks
involving manipulation of many objects. HHBF generalizes this idea to planning
with arbitrary manipulation primitives. It dynamically searches forward from the
initial state towards the goal but uses a backward search from the goal, based on
a simplified representation of the actions, to bias the sampling of the infinite action
space towards action that are likely to be useful in reaching the goal. As a result, it
can construct long manipulation plans in a large class of manipulation domains even
more effectively than FFRob. For both algorithms, we empirically demonstrate their
effectiveness on complex manipulation tasks.
Thesis Supervisor: Tomás Lozano-Pérez
Title: Professor
Thesis Supervisor: Leslie Pack Kaelbling
Title: Professor
3
4
Acknowledgments
I would like to thank my research advisors: Professors Tomás Lozano-Pérez and
Leslie Kaelbling. Tomás first excited me about robotics and mentored me for four
years causing me grow substantially not only as a roboticist but as a scientist and
an engineer. Along the way, he also taught me a few jokes as well as the vital skill
of always fully erasing a whiteboard and beginning in its top left corner. Leslie
has always been an amazing resource for academic advice, planning brainstorming,
machine learning wisdom, and technical writing magic. I am excited to continue
learning from them while I pursue my Ph.D.
I have been fortunate to learn from many great teachers and mentors at MIT
and Thomas Jefferson High School for Science and Technology that inspired me to
find my passions and work hard to make them a reality. I would like to thank The
Chorallaries of MIT for being my family at MIT for the past four years. I also want
to thank Dr. Mark Livingston at the Naval Research Laboratory for taking me on as
a research intern in high school and introducing me to the thrill of finding and solving
new problems. Finally, I would like to thank my family for giving me the resources,
support, and love so I can follow my dreams.
5
THIS PAGE INTENTIONALLY LEFT BLANK
6
Contents
1 Introduction
13
1.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
13
1.2
Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16
1.2.1
Manipulation Planning . . . . . . . . . . . . . . . . . . . . . .
16
1.2.2
Discrete Task Planning . . . . . . . . . . . . . . . . . . . . . .
18
1.2.3
Integrated Task and Motion Planning . . . . . . . . . . . . . .
19
2 FFRob
23
2.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
23
2.2
Conditional Reachability Graph . . . . . . . . . . . . . . . . . . . . .
26
2.2.1
Constructing the crg . . . . . . . . . . . . . . . . . . . . . .
27
2.2.2
Querying the crg . . . . . . . . . . . . . . . . . . . . . . . . .
29
Planning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . .
30
2.3.1
Enforced Hill-Climbing . . . . . . . . . . . . . . . . . . . . . .
31
2.3.2
Computing the Relaxed Plan Graph . . . . . . . . . . . . . .
32
2.3.3
The FFRob heuristic . . . . . . . . . . . . . . . . . . . . . . .
35
2.3.4
Geometric biases . . . . . . . . . . . . . . . . . . . . . . . . .
36
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
37
2.3
2.4
3 HHFB
43
3.1
Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . .
43
3.2
Planning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
3.2.1
47
Top-level Forward Search . . . . . . . . . . . . . . . . . . . . .
7
3.2.2
Generating Useful Actions . . . . . . . . . . . . . . . . . . . .
49
3.2.3
Search Heuristics . . . . . . . . . . . . . . . . . . . . . . . . .
54
3.3
Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
54
3.4
Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
4 Conclusion
4.1
59
Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8
60
List of Figures
2-1 Pick and place operator descriptions. . . . . . . . . . . . . . . . . . .
24
2-2 Procedure for constructing the crg. . . . . . . . . . . . . . . . . . . .
28
2-3 Procedure for querying the crg. . . . . . . . . . . . . . . . . . . . . .
29
2-4 The primary search control procedure. . . . . . . . . . . . . . . . . .
31
2-5 Method for creating the Relaxed PlanGraph. . . . . . . . . . . . . . .
34
2-6 The initial and final state in tasks 0,1,2 in the experiments. . . . . . .
38
2-7 The initial and final state in tasks 3,4,5 in the experiments. . . . . . .
39
3-1 Generic action template format. . . . . . . . . . . . . . . . . . . . . .
45
3-2 Move and MoveHolding action templates. . . . . . . . . . . . . .
45
3-3 Pick and Place action templates. . . . . . . . . . . . . . . . . . . .
46
3-4 Top-level forward search . . . . . . . . . . . . . . . . . . . . . . . . .
48
3-5 Backward search to construct relaxed plan graph. . . . . . . . . . . .
52
3-6 The initial and final state for each problem. . . . . . . . . . . . . . .
53
9
THIS PAGE INTENTIONALLY LEFT BLANK
10
List of Tables
2.1
Experiment results. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
41
3.1
Manipulation experiment results. . . . . . . . . . . . . . . . . . . . .
57
11
THIS PAGE INTENTIONALLY LEFT BLANK
12
Chapter 1
Introduction
1.1
Background
A long-standing goal in robotics is to develop robots that can operate autonomously
in unstructured human environments. Recent hardware innovations have made mobile manipulator robots increasingly affordable, and sensing innovations, especially
RGBD sensors, provide unprecedented sensory bandwidth and accuracy. Progress in
algorithms for navigation and motion planning have enabled some basic forms of mobile manipulation, combined actuation of a robot’s base and end-effectors. But this is
primarily restricted to picking and placing on relatively uncluttered surfaces. However
planning for complex mobile manipulation problems involving cluttered environments
still present substantial challenges.
Researchers in artificial intelligence planning have been tackling problems that
require long sequences of actions and large discrete state spaces and have had some
notable success in recent years. However, these symbolic “task-level” planners do
not naturally encompass the detailed geometric and kinematic considerations that
motion planning requires. The original Shakey/strips robot system [14, 29], from
which many of these symbolic planners evolved, managed to plan for an actual robot
by working in a domain where all legal symbolic plans were effectively executable.
This required the ability to represent symbolically a sufficient set of conditions to
guarantee the success of the steps in the plan. This is not generally possible in
13
realistic manipulation domains because the geometrical and kinematic constraints
are significant and difficult to articulate symbolically.
Consider a simple table-top manipulation domain where a variety of objects are
placed on a table and the robot’s task is to collect some subset of the objects and
pack them in a box, or use them to make a meal, or put them away in their storage
bins. The basic robot operations are to pick up an object and place it somewhere
else; in addition, the robot can move its base in order to reach a distant object.
Note that, in general, to reach some object, we will have to move other objects out
of the way. Which objects need moving depends on their shapes, the shape of the
robot, where the robot’s base is placed and what path it follows to the object. When
an object is moved, the choice of where to place it requires similar considerations.
The key observation is that constructing a valid symbolic plan requires access to a
characterization of the connectivity of the underlying free configuration space (for the
robot and all the movable objects). We cannot efficiently maintain this connectivity
with a set of static assertions updated by strips operators; determining how the
connectivity of the underlying free space changes requires geometric computation.
Additionally, traditional motion planning algorithms that find paths between fully
specified configurations cannot address problems in which the configuration space of
interest is not just that of the robot but the configuration space of a kitchen, for
example, and the goal is to make dinner and clean the kitchen. We almost certainly
do not want to choose whether to get the frying pan or the steak next by sampling
configurations of the robot and kitchen and testing for paths between them.
Whereas classic robot motion planning requires a search in the robot configuration
space, manipulation planning requires a search in the combined configuration space
of the robot and all the movable objects in the world. Achieving a manipulation
goal requires choosing which object to move when, which grasps and intermediate
placements to use, etc. Existing manipulation planning algorithms [33, 7, 16, 18, 10,
3, 34, 15] take substantial time to plan operations involving relatively few objects.
The fact that manipulation planning remains challenging is not surprising. Working
in a high-dimensional space and making a long sequence of intertwined decisions is
14
notoriously difficult. Constructing such plans generally requires some methods for
partitioning the problem and for effective heuristic search guidance.
We present two algorithms for solving mobile manipulation problems. The first
algorithm is for solving task and motion planning problems in which the robot must
simultaneously perform classical AI planning while reasoning about its own motions.
The FFRob algorithm is a fully integrated task and motion planner that searches with
a heuristic that takes geometric information into account [15]. We show an extension
of the heuristic used in the FastForward (FF) [22] planning system that integrates
reachability in the robot configuration space with reachability in the symbolic state
space; we call this new heuristic FFRob. Both the search and the computation of the
FFRob heuristic exploit a discrete, pre-sampled roadmap [24] data structure that
allows multiple motion-planning queries on the closely related problems that arise
during the search to be solved efficiently.
The second algorithm extends the ideas of FFRob to a more general manipulation planning framework with arbitrary, continuous actions. The result is the hybrid
heuristic backward-forward (HHBF) planning algorithm that searches forward from
the initial state towards the goal but which uses a backward search from the goal,
based on a simplified representation of the actions, to bias the sampling of the infinite
action space towards actions that are likely to be useful in reaching the goal. This
process also allows us to compute an effective heuristic for focusing the search. The
HHBF heuristic is derived from the structure of the operators used in planning and
does not require any domain-specific heuristic to be specified by the user.
We perform experiments on challenging manipulation problems and explore the
effect of various planner configurations on their performance. We show that HHBF
is probabilistically complete. Finally, we demonstrate that HHBF can solve a broad
class of hybrid planning problems, planning problems with a mixed discrete and
continuous component, including manipulation planning.
15
1.2
Related Work
This work draws from existing approaches to manipulation planning and integrated
task and motion planning as well as ideas from the artificial intelligence task-planning
literature. Our focus will be on showing how ideas originally developed for discrete
task planning can be adapted to continuous-action domains to more efficiently solve
manipulation planning problems [16, 18, 3].
1.2.1
Manipulation Planning
In manipulation planning, the goal is not just to move the robot without collision,
but also to operate on the objects in the world. This problem was addressed from the
earliest days of algorithmic motion planning, for example by Lozano-Perez et. al. [28,
27] and Wilfong [39]. The modern treatment of this problem, involving continuous
grasps as well as continuous placements, was pioneered by Alami et al. [2, 1] who
introduced the manipulation graph. This graph breaks the problem of one robot
moving one object in a potentially complex environment into several problems of
moving between connected components of the combined configuration space where
grasps can be changed. A solution is an alternating sequence of transit paths, in
which the robot is not grasping an object, and transfer paths, in which it is.
Siméon et al. [33] expanded this work to more realistic settings by using probabilistic roadmaps. Specifically, they looked at manipulations necessitating multiple
regrasps. Their approach uses the manipulation graph to identify a high-level sequence of transit and transfer paths then performs the motion planning required to
achieve them. This work assumes that only a single movable object exists in the
domain.
Stilman et al. [36, 37] address a version of manipulation planning called navigation
among movable obstacles (NAMO), where the robot must reach a specified location
among a field of movable obstacles. They plan backwards from the final goal and
uses swept volumes to determine, recursively, which additional objects must be moved
and to constrain the system from placing other objects into those volumes. van den
16
Berg et al. [38] develop a probabilistically complete algorithm for NAMO. However,
this algorithm assumes that one can fully characterize the connected components of
the configuration space of the robot at each planning step; this is computationally
prohibitive for configuration spaces with more than two or three dimensions.
Hauser [16, 18] identified a generalization of manipulation planning as hybrid planning, that is, planning for systems with multiple (possibly infinitely many) modes,
representing different constraint sub-manifolds of the configuration space, for example, characterized by a grasp on a particular object or the placement of a movable
object. The key insight is that, as in the manipulation graph, one can conceptualize the planning process as alternating between moving in a single mode, where the
constraints are constant (e.g., moving through free space with a grasped object), and
switching between modes (e.g. grasping a new object). So, solving these problems
requires being able to plan within a single mode (this the realm of classical motion
planning) and identifying (sampling) configurations where modes can change, which
is generally specific to the task. Hauser provided a probabilistically complete algorithmic framework that solves problems of this type assuming that effective single-mode
planners and mode-transition samplers are available. However, a pure uni-directional
sampling-based algorithm has trouble solving high-dimensional problems. Hauser
provided search guidance by limiting the set of mode transitions attempted and by
providing pre-computed “utility” values to guide the choice of samples; although this
pre-computation required substantial time. He reports [18] that without using utility,
the planner would run for more than 30 minutes (on a task to push one object on a
table, requiring the robot to move around the table), whereas using utility lowers the
running time to “approximately four minutes”.
Barry et al. [3] defined an RRT-style search of the combined configuration space,
including a bi-directional version of the RRT. Importantly, the individual moves of
this algorithm consider complete plans to reach the goal, ignoring obstacles. This
ensures that suggested motions have some chance of being on the path to the goal.
They also investigated a two-layer hierarchy, where the higher level plans only for the
manipulated object (without the robot), with the objective of identifying relevant
17
mode transitions to guide the full planner. This planner was able to solve some
challenging manipulation problems, but was limited to domains with one movable
object and had running times on the order of minutes.
Our position is that constructing complex manipulation plans requires effective
heuristic search guidance. This has been a major focus of research in the artificial
intelligence planning community.
1.2.2
Discrete Task Planning
The AI-planning research community has developed many overall approaches but has
largely adopted heuristic state-space search methods. Bonet and Geffner [5, 6] popularized this idea by constructing domain-independent heuristics that quickly solve
relaxations of the actual planning task to approximate the cost to the goal from an
arbitrary state. The breakthrough was that these heuristics could be derived automatically by a domain-independent planner, based only on the symbolic description
of the domain using rules with pre-conditions and effects. The key idea is to define a
relaxed version of the planning problem where the operators do not have any “delete”
effects, that is, no previously achieved result becomes false. This is an easier problem to solve than the original problem and the solution can be used to provide an
approximation to the actual cost of solving a problem. One can define two heuristics, β„Žπ‘Žπ‘‘π‘‘ and β„Žπ‘šπ‘Žπ‘₯ , that compute their estimates on a relaxed version of the plan
graph [4] in polynomial time by adding or taking the max of the costs of achieving
individual terms in a conjunctive goal. The β„Žπ‘šπ‘Žπ‘₯ heuristic is admissible, while β„Žπ‘Žπ‘‘π‘‘
is inadmissible but more informed (it tends to be closer to the true cost and provides
more effective guidance in practice).
The FastForward (FF) planning system of Hoffman and Nebel [22] extended this
idea by extracting a plan from the relaxed plan graph and using its cost as a tighter
estimate of the cost to reach the goal from the initial state. We will refer to this
heuristic as β„ŽπΉ 𝐹 . Importantly, the resulting relaxed plans can also be used to suggest useful actions to consider at the current state; this idea has proved crucial to
subsequent planning systems, including our planning algorithms.
18
1.2.3
Integrated Task and Motion Planning
There have been a number of approaches to integrating discrete task planning and
continuous motion planning in recent years.
The pioneering Asymov system of Cambon et al. [7] conducts an interleaved search
at the symbolic and geometric levels. They carefully consider the consequences of
using non-terminating probabilistic algorithms for the geometric planning, allocating
computation time among the multiple geometric planning problems that are generated
by the symbolic planner. The process can be viewed as using the task planner as a
heuristic to guide the motion planning search. However, since the task-level planner
is ignoring geometry, its value as a heuristic is quite limited. The work of Plaku and
Hager [32] is similar in approach.
A natural extension to the classic symbolic planning paradigm is to introduce
“computed predicates” (also know as “semantic attachments”); that is, predicates
whose truth value is established not via assertion but by calling an external program
that operates on a geometric representation of the state. A motion planner can serve
to implement such a predicate, determining the reachability of one configuration
from another. This approach is currently being pursued, for example, by Dornhege et
al. [11, 12], as a way of combining symbolic task-level planners with motion planners to
get a planner that can exploit the abstraction strengths of the first and the geometric
strengths of the second. A difficulty with this approach, however, is that calling a
motion planner is generally expensive. This leads to a desire to minimize the set
of object placements considered, and, very importantly, to avoid calling the motion
planner during heuristic evaluation. Considering only a sparse set of placements may
limit the generality of the planner, while avoiding calling the motion planner in the
heuristic leads to a heuristic that is uninformed about geometric considerations and
may result in considerable inefficiency due to backtracking during the search for a
plan.
The work of Erdem et al. [13], is similar in approach to Dornhege et al. [11],
augmenting a task planner that is based on explicit causal reasoning with the ability
19
to check for the existence of paths for the robot.
Lagriffoul et al. [25] also integrate the symbolic and geometric search. They generate a set of approximate linear constraints imposed by the program under consideration, e.g., from grasp and placement choices, and use linear programming to compute
a valid assignment or determine one does not exist. This method is particularly successful in domains such as stacking objects in which constraints from many steps of
the plan affect geometric choices.
Pandey et al. [30] and deSilva et al. [8] use HTNs instead of generative task
planning. Their system can backtrack over choices made by the geometric module,
allowing more freedom to the geometric planning than in the approach of Dornhege et
al. [11]. In addition, they use a cascaded approach to computing difficult applicability
conditions: they first test quick-to-evaluate approximations of accessibility predicates,
so that the planning is only attempted in situations in which it might plausibly
succeed.
In the hpn approach of Kaelbling and Lozano-Pérez [23], a regression-based symbolic planner uses generators, which perform fast approximate motion planning, to
select geometric parameters, such as configurations and paths, for the actions. Reasoning backward using regression allows the goal to significantly bias the actions that
are considered. This type of backward chaining to identify relevant actions is also
present in work on navigation among movable obstacles. The work of Stilman et
al. [36, 37] also plans backwards from the final goal and uses swept volumes to determine, recursively, which additional objects must be moved and to constrain the
system from placing other objects into those volumes.
Srivastava et al. [35, 34] offer a novel control structure that avoids computing
expensive precondition values in many cases by assuming a favorable default valuation
of the precondition elements; if those default valuations prove to be erroneous, then
it is discovered in the process of performing geometric planning to instantiate the
associated geometric operator. In that case, symbolic planning is repeated. This
approach requires the ability to diagnose why a motion plan is not possible in a given
state, which can be challenging, in general. Empirically, their approach is the only one
20
of which we are aware whose performance is competitive with FFRob and HHBF.
All of these approaches, although they have varying degrees of integration of the
symbolic and geometric planning, generally lack a true integrated heuristic that allows
the geometric details to affect the focus of the symbolic planning. Both FFRob and
HHBF develop such a heuristic, provide methods for computing it efficiently, and
show that it results in a significant computational savings.
21
THIS PAGE INTENTIONALLY LEFT BLANK
22
Chapter 2
FFRob
FFRob is our first algorithm that uses heuristic search ideas from AI task planning
to efficiently solve robot planning problems. The algorithm is designed to solve integrated task and motion planning problems with a focus on pick-and-place problems.
In particular, FFRob is able to efficiently solve task and motion planning problems
with long horizons by leveraging geometric constraints in its heuristic to strongly
guide the search.
2.1
Problem Formulation
When we seek to apply the techniques of symbolic planning to domains that involve
robot motions, object poses and grasps, we are confronted with a series of technical
problems. In this section, we begin by discussing those problems and our solutions
to them, and end with a formal problem specification.
We might naturally wish to encode robot operations that pick up and place objects
in the style of traditional AI planning operator descriptions such as in Figure 2-1.
In these operations, the 𝐢, 𝑃 , and 𝐺 variables range over robot configurations,
object poses, and grasps, respectively. These are high-dimensional continuous quantities, which means that there are infinitely many possible instantiations of each of
these operators. We address this problem by sampling finitely many values for each
of these variable domains during a pre-processing phase. The sampling is problem23
Pick(𝐢1 , 𝑂, 𝐺, 𝑃, 𝐢2 ):
pre: HandEmpty, Pose(𝑂, 𝑃 ), RobotConf (𝐢1 ),
CanGrasp(𝑂, 𝑃, 𝐺, 𝐢2 ), Reachable(𝐢1 , 𝐢2 )
add: Holding(𝑂, 𝐺), RobotConf (𝐢2 )
delete: HandEmpty, RobotConf (𝐢1 )
Place(𝐢1 , 𝑂, 𝐺, 𝑃, 𝐢2 ):
pre: Holding(𝑂, 𝐺), RobotConf (𝐢1 ),
CanGrasp(𝑂, 𝑃, 𝐺, 𝐢2 ), Reachable(𝐢1 , 𝐢2 )
add: HandEmpty, Pose(𝑂, 𝑃 ), RobotConf (𝐢2 )
delete: Holding(𝑂, 𝐺), RobotConf (𝐢1 )
Figure 2-1: Pick and place operator descriptions.
driven, but may turn out to be inadequate to support a solution. If this happens, it
is possible to add samples and re-attempt planning, although that was not done in
the empirical results reported in this paper.
Even with finite domains for all the variables, there is a difficulty with explicitly
listing all of the positive and negative effects of each operation. The operations of
picking up or placing an object may affect a large number of Reachable literals: picking
up an object changes the “shape” of the robot and therefore what configurations it may
move between; placing an object changes the free configuration space of the robot.
Even more significant, which Reachable literals are affected can depend on the poses
of all the other objects (for example, removing any one or two of three obstacles may
not render a configuration beyond the obstacles reachable). Encoding this conditional
effect structure in typical form in the preconditions of the operators would essentially
require us to write one operator description for each possible configuration of movable
objects.
We address this problem by maintaining a state representation that consists of
both a list of true literals and a data structure, called details, that captures the
geometric state in a way that allows the truth value of any of those literals to be
computed on demand. This is a version of the semantic attachments strategy [11].
The last difficulty is in computing the answers to queries in the details, especially
24
about reachability, which requires finding free paths between robot configurations in
the context of many different configurations of the objects. We address this problem
by using a conditional roadmap data structure called a conditional reachability graph,
related to a PRM [24], for answering all reachability queries, and lazily computing
answers on demand and caching results to speed future queries.
More formally, a state is a tuple ⟨𝐿, 𝐷⟩, where 𝐿 is a set of literals and 𝐷 is a
domain-dependent detailed representation. A literal is a predicate applied to arguments, which may optionally have an attached test, which maps the arguments and
state into a Boolean value. A literal holds in a state if it is explicitly represented in
the state’s literal set, or its test evaluates to true in the state:
holds(𝑙, 𝑠) ≡ 𝑙 ∈ 𝑠.𝐿 or 𝑙.test(𝑠).
A goal is a set of literals; a state satisfies a goal if all of the literals in the goal hold
in the state:
satisfies(𝑠, Γ) ≡ ∀𝑙 ∈ Γ. holds(𝑙, 𝑠).
An operator is a tuple βŸ¨πœ‘, 𝑒pos , 𝑒neg , 𝑓 ⟩ where πœ‘ is a set of literals representing a
conjunctive precondition, 𝑒pos is a set of literals to be added to the resulting state,
𝑒neg is a set of literals to be deleted from the resulting state, and 𝑓 is a function that
maps the detailed state from before the operator is executed to the detailed state
afterwards. Thus, the successor of state 𝑠 under operator π‘Ž is defined
successor(𝑠, π‘Ž) ≡ βŸ¨π‘ .𝐿 ∪ π‘Ž.𝑒pos βˆ– π‘Ž.𝑒neg , π‘Ž.𝑓 (𝑠)⟩.
An operator is applicable in a state if all of its preconditions hold in that state:
applicable(π‘Ž, 𝑠) ≡ ∀𝑙 ∈ π‘Ž.πœ‘. holds(𝑙, πœ‘).
An operator schema is an operator with typed variables, standing for the set of operators arising from all instantiations of the variables over the appropriate type domains.
25
Our general formulation has broader applicability, but in this paper we restrict
our attention to a concrete domain in which a mobile-manipulation robot can move,
grasp rigid objects, and place them on a surface. To formalize this domain, we use
literals of the following forms:
βˆ™ RobotConf (𝐢): the robot is in configuration 𝐢, where 𝐢 is a specification of
the pose of the base as well as joint angles of the arm;
βˆ™ Pose(𝑂, 𝑃 ): object 𝑂 is at pose 𝑃 , where 𝑃 is a four-dimensional pose (π‘₯, 𝑦, 𝑧, πœƒ),
assuming that the object is resting on a stable face on a horizontal surface;
βˆ™ Holding(𝑂, 𝐺): the robot is holding object 𝑂 with grasp 𝐺, where 𝐺 specifies
a transform between the robot’s hand and the object;
βˆ™ HandEmpty: the robot is not holding any object;
βˆ™ In(𝑂, 𝑅): the object 𝑂 is placed in such a way that it is completely contained
in a region of space 𝑅; and
βˆ™ Reachable(𝐢1 , 𝐢2 ): there is a collision-free path between robot configurations
𝐢1 and 𝐢2 , considering the positions of all fixed and movable objects as well as
any object the robot might be holding and the grasp in which it is held.
The details of a state consist of the configuration of the robot, the poses of all the
objects, and what object is being held in what grasp.
Two of these literals have tests. The first, In, has a simple geometric test, to see
if object 𝑂, at the pose specified in this state, is completely contained in region 𝑅.
The test for Reachable is more difficult to compute; it will be the subject of the next
section.
2.2
Conditional Reachability Graph
In the mobile manipulation domain, the details contain a conditional reachability
graph (crg), which is a partial representation of the connectivity of the space of
26
sampled configurations, conditioned on the placements of movable objects as well as
on what is in the robot’s hand. It is similar in spirit to the roadmaps of Leven and
Hutchinson [26] in that it is designed to support solving multiple motion-planning
queries in closely related environments. The crg has three components:
βˆ™ Poses: For each object π‘œ, a set of possible stable poses.
βˆ™ Nodes: A set of robot configurations, 𝑐𝑖 , each annotated with a (possibly
empty) set {βŸ¨π‘”, π‘œ, π‘βŸ©} where 𝑔 is a grasp, π‘œ an object, and 𝑝 a pose, meaning
that if the robot is at the configuration 𝑐𝑖 , and object π‘œ is at pose 𝑝, then the
robot’s hand will be related to the object by the transform given by grasp 𝑔.
βˆ™ Edges: A set of pairs of nodes, with configurations 𝑐1 and 𝑐2 , annotated with
an initially empty set of validation conditions of the form βŸ¨β„Ž, 𝑔, π‘œ, 𝑝, π‘βŸ©, where 𝑏
is a Boolean value that is True if the robot moving from 𝑐1 to 𝑐2 along a simple
path (using linear interpolation or some other fixed interpolator) while holding
object β„Ž in grasp 𝑔 will not collide with object π‘œ if it is placed at pose 𝑝, and
False otherwise.
The validation conditions on the edges are not pre-computed; they will be computed
lazily, on demand, and cached in this data structure. Note that some of the collisionchecking to compute the annotations can be shared, e.g. the same robot base location
may be used for multiple configurations and grasps.
2.2.1
Constructing the crg
The crg is initialized in a pre-processing phase, which concentrates on obtaining a
useful set of sampled object poses and robot configurations. Object poses are useful
if they are initial poses, or satisfy a goal condition, or provide places to put objects
out of the way. Robot configurations are useful if they allow objects, when placed
in useful poses, to be grasped (and thus either picked from or placed at those poses)
or if they enable connections to other useful poses via direct paths. We assume that
the following components are specified: a workspace π‘Š , which is a volume of space
27
ConstructCRG(π‘Š, 𝑇, 𝑠, Γ, π’ͺ𝑓 , π’ͺπ‘š , πœƒ) :
1 𝑁 = {𝑠.details.robotConf } ∪ {robot configuration in Γ}
2 for π‘œ ∈ π’ͺπ‘š :
3
π‘ƒπ‘œ = {𝑠.details.pose(π‘œ)} ∪ {pose of π‘œ in Γ}
4
for 𝑖 ∈ {1, . . . , πœƒ.𝑛𝑝 }:
5
π‘ƒπ‘œ .add (sampleObjPose(π‘œ.shape, 𝑇 ))
6
for 𝑔 ∈ π‘œ.grasps:
7
for 𝑗 ∈ {1, . . . , πœƒ.π‘›π‘–π‘˜ }: 𝑁.add (sampleIK(𝑔, π‘œ, 𝑝), (𝑔, π‘œ, 𝑝))
8
for 𝑗 ∈ {1, . . . , πœƒ.𝑛𝑛 }: 𝑁.add (sampleConfNear(𝑔, ( )))
9 𝐸={ }
10 for 𝑛1 ∈ 𝑁 :
11
for 𝑛2 ∈ NearestNeighbors(𝑛1 , π‘˜, 𝑁 ):
12
if CFreePath(𝑛1 .𝑐, 𝑛2 .𝑐, π’ͺ𝑓 ): 𝐸.add (𝑛1 , 𝑛2 )
13 𝑁, 𝐸 = connectTrees(𝑁, 𝐸, π‘Š, πœƒ.𝑛𝑐 )
14 return βŸ¨π‘ƒ, 𝑁, 𝐸⟩
Figure 2-2: Procedure for constructing the crg.
that the robot must remain inside; a placement region 𝑇 , which is a set of static
planar surfaces upon which objects may be placed (such as tables and floor, but not
(for now) the tops of other objects); a set π’ͺ𝑓 of fixed (immovable) objects; a set
π’ͺπ‘š of movable objects; and a vector of parameters πœƒ that specify the size of the
crg. It depends, in addition, on the start state 𝑠 and goal Γ. We assume that each
object π‘œ ∈ π’ͺπ‘š has been annotated with a set of feasible grasps. The parameter
vector consists of a number 𝑛𝑝 of desired sample poses per object (type); a number
π‘›π‘–π‘˜ of grasp configurations per grasp; a number 𝑛𝑛 of configurations near each grasp
configuration; a number 𝑛𝑐 of RRT iterations for connecting configurations, and a
number π‘˜ specifying a desired degree of connectivity.
The ConstructCRG procedure is outlined in Figure 2-2. We begin by initializing the set of nodes 𝑁 to contain the initial robot configuration and the configuration
specified in the goal, if any. Then, for each object, we generate a set of sample poses,
including its initial pose and goal pose, if any, as well as poses sampled on the object
placement surfaces. For each object pose and possible grasp of the object, we use
the sampleIK procedure to sample one or more robot configurations that satisfy
28
ReachableTest(𝑐1 , 𝑐2 , 𝐷, crg) :
1 for (π‘œ, 𝑝) ∈ 𝐷.objects:
2
for 𝑒 ∈ crg.𝐸:
3
if not ⟨𝐷.heldObj , 𝐷.grasp, π‘œ, 𝑝, *⟩ ∈ 𝑒.valid :
4
𝑝 = CFreePath(𝑒.𝑛1 .𝑐, 𝑒.𝑛2 .𝑐, π‘œ@𝑝, 𝐷.heldObj , 𝐷.grasp)
5
𝑒.valid .add (⟨𝐷.heldObj , 𝐷.grasp, π‘œ, 𝑝, (𝑝 ! = None)⟩)
6 𝐺 = {𝑒 ∈ crg.𝐸 | ∀(π‘œ, 𝑝) ∈ 𝐷.objects. ⟨𝐷.heldObj , 𝐷.grasp, π‘œ, 𝑝, True⟩ ∈ 𝑒.valid }
7 return ReachableInGraph(𝑐1 , 𝑐2 , 𝐺)
Figure 2-3: Procedure for querying the crg.
the kinematic constraints that the object be grasped. We sample additional configurations with the hand near the grasp configuration to aid maneuvering among the
objects. We then add edges between the π‘˜ nearest neighbors of each configuration, if
a path generated by linear interpolation or another simple fixed interpolator is free
of collisions with fixed objects. At this point we generally have a forest of trees of
configurations. Finally, we attempt to connect the trees using an RRT algorithm as
in the sampling-based roadmap of trees [31].
To test whether this set of poses and configurations is plausible, we use it to
compute a heuristic value of the starting state, as described in section 2.3.2. If it
is infinite, meaning that the goal is unreachable even under extremely optimistic
assumptions, then we return to this procedure and draw a new set of samples.
2.2.2
Querying the crg
Now that we have a crg we can use it to compute the test for the Reachable literal,
as shown in ReachableTest below.
The main part of the test is in lines 6–7: we construct a subgraph of the crg that
consists only of the edges that are valid given the object that the robot is holding
and the current placements of the movable objects and search in that graph to see
if configuration 𝑐2 is reachable from 𝑐1 . Lines 1–5 check to be sure that the relevant
validity conditions have been computed and computes them if they have not. The
procedure CFreePath(𝑐1 , 𝑐2 , obst, π‘œ, 𝑔) performs collision checking on a straight-line,
29
or other simply interpolated path, between configurations 𝑐1 and 𝑐2 , with a single
obstacle obst and object π‘œ held in grasp 𝑔. Many of these tests can be done very
efficiently using simple bounding box checks. We also exploit the fact that we are
only interested in the connected component of the crg that includes the current
robot configuration which further increases efficiency.
In addition, the crg is used to implement applicableOps(𝑠, Ω, crg), which
efficiently determines which operator schema instances in Ω are applicable in a given
state 𝑠. For each schema, we begin by binding variables that have preconditions
specifying the robot configuration, object poses, the currently grasped object and/or
the grasp to their values in state 𝑠. We consider all bindings of variables referring to
objects that are not being grasped. For a pick operation, 𝑃 is specified in the current
state, so we consider all bindings of 𝐺 and 𝐢2 such that (𝐢2 , (𝐺, 𝑂, 𝑃 )) ∈ crg.𝑁 .
For a place operation, 𝐺 is specified in the current state, so we consider all bindings
of 𝑃 and 𝐢2 such that (𝐢2 , (𝐺, 𝑂, 𝑃 )) ∈ crg.𝑁 .
2.3
Planning Algorithms
A planning problem, Π, is specified by βŸ¨π‘ , Γ, π’ͺ, 𝑇, π‘Š, β„¦βŸ©, where 𝑠 is the initial state,
including literals and details, Γ is the goal, π’ͺ is a set of objects, 𝑇 is a set of placement
surfaces, π‘Š is the workspace volume, and Ω is a set of operator schemas.
Plan, shown below, is a generic heuristic search procedure. Depending on the
behavior of the extract procedure, it can implement any standard search control
structure, including depth-first, breadth-first, uniform cost, best-first, 𝐴* , and hillclimbing. Critical to many of these strategies is a heuristic function, which maps a
state in the search to an estimate of the cost to reach a goal state from that state.
Many modern domain-independent search heuristics are based on a relaxed plan graph
(rpg). In the following section, we show how to use the crg to compute the relaxed
plan graph efficiently.
30
Plan(Π, extract, heuristic, πœƒ)
1 βŸ¨π‘ , Γ, π’ͺ, 𝑇, π‘Š, β„¦βŸ© = Π
2 crg = ConstructCRG(π‘Š, 𝑇, 𝑠, Γ, π’ͺ, πœƒ)
3 def H(s): heuristic(RPG(𝑠, Γ, crg, Ω))
4 q = Queue(SearchNode(𝑠, 0, H(𝑠), None))
5 while not q.empty():
6
𝑛 = extract(q)
7
if satisfies(𝑛.𝑠, Γ): return 𝑛.path
8
for π‘Ž ∈ applicableOps(𝑛.𝑠, Ω, crg):
9
𝑠′ = successor(𝑛.𝑠, π‘Ž)
10
π‘ž.push(SearchNode(𝑠′ , 𝑛.cost + 1, H(𝑠′ ), 𝑛))
Figure 2-4: The primary search control procedure.
2.3.1
Enforced Hill-Climbing
The primary search mechanism we use in our experiments, like in FF, is enforced
hill climbing. This search commits to the nearest state that has a strictly lower
heuristic value. When one is not immediately available, it performs a breadth-first
search (BFS) until it finds a state with a strictly lower heuristic value. If the BFS
fails by running out of new states to explore, it terminates. The original enforced
hill-climbing algorithm restricts the BFS to only progress the plan forward from the
current hill-climbing state. This technique is incomplete because the BFS could begin
on a branch that is a dead end. But it is efficient in practice because it resets the
search agenda each time a lower heuristic value is achieved. Note that enforced hillclimbing can be made complete by storing the state space search tree and allowing
the BFS to undo actions on the current branch ensuring that the BFS can always
return to the start state. Another advantage of the original enforced hill-climbing
to robotic planning is that it allows planning and execution to be performed online
together. Because the search commits to an action, the robot can execute the action
while the planner finishes its computation.
31
2.3.2
Computing the Relaxed Plan Graph
In classical symbolic planning, a plan graph is a sequence of alternating layers of
literals and actions. The first layer consists of all literals that are true in the starting
state. Action layer 𝑖 contains all operators whose preconditions are present and
simultaneously achievable in literal layer 𝑖. Literal layer 𝑖 + 1 contains all literals that
are possibly achievable after 𝑖 actions, together with a network of mutual exclusion
relations that indicates in which combinations those literals might possibly be true.
This graph is the basis for GraphPlan [4] and related planning algorithms.
The relaxed plan graph is a simplified plan graph, without mutual exclusion conditions; it is constructed by ignoring the negative effects of the actions. From the
rpg, many heuristics can be computed. For example, the β„Žπ‘Žπ‘‘π‘‘ heuristic [6] returns
the sum of the levels at which each of the literals in the goal appears. It is optimistic,
in the sense that if the mutual exclusion conditions were taken into account, it might
take more steps to achieve each individual goal from the starting state; it is also pessimistic, in the sense that the actions necessary to achieve multiple goal fluents might
be “shared.” An admissible heuristic, β„Žπ‘šπ‘Žπ‘₯ [6], is obtained by taking the maximum of
the levels of the goal literals, rather than the sum; but it is found in practice to offer
weaker guidance. An alternative is the FF heuristic [22], which performs an efficient
backward-chaining pass in the plan graph to determine how many actions, if they
could be performed in parallel without deletions, would be necessary to achieve the
goal and uses that as the heuristic value. An important advantage of the FF heuristic
is that it does not over-count actions if one action achieves multiple effects, and it
enables additional heuristic strategies that are based on helpful actions. We use a
version of the helpful actions strategy that reduces the choice of the next action to
those that are in the first level of the relaxed plan, and find that it improves search
performance.
In order to use heuristics derived from the rpg we have to show how it can be
efficiently computed when the add lists of the operators are incomplete and the truth
values of some literals are computed from the crg in the details. We present a
32
method for computing the rpg that is specialized for mobile manipulation problems.
It constitutes a further relaxation of the rpg which allows literals to appear earlier in
the structure than they would in an rpg for a traditional symbolic domain. This is
necessary, because the highly conditional effects of actions on Reachable literals makes
them intractable to compute exactly. The consequence of the further relaxation is
that the β„Žπ‘Žπ‘‘π‘‘ and β„Žπ‘šπ‘Žπ‘₯ heuristics computed from this structure have less heuristic
force. However, in section 2.3.3 we describe a method for computing a version of β„Žπ‘“ 𝑓
that recovers the effectiveness of the original.
The intuition behind this computation is that, as we move forward in computing
the plan graph, we consider the positive results of all possible actions to be available.
In terms of reachability, we are removing geometric constraints from the details; we
do so by removing an object from the universe when it is first picked up and never
putting it back, and by assuming the hand remains empty (if it was not already)
after the first place action. Recall that, in applicable and satisfies, the holds
procedure is used to see if a literal is true in a state. It first tests to see if it is
contained in the literal set of the state; this set becomes increasingly larger as the
rpg is computed. If the literal is not there, then it is tested with respect to the crg
in the details, which becomes increasingly less constrained as objects are removed.
Importantly, since the geometric tests on the crg are cached, the worst-case
number of geometric tests for planning with and without the heuristic is the same. In
practice, computing the rpg for the heuristic is quite fast, and using it substantially
reduces the number of states that need to be explored.
RelaxedPlanGraph, shown below, outlines the algorithm in more detail. In
the second part of line 1, in a standard implementation we would generate all possible
instantiations of all actions. However, because of the special properties of reachability,
we are able to abstract away from the particular configuration the robot is in when an
action occurs; thus, we consider all possible bindings of the non-configuration variables
in each operator, but we only consider binding the starting configuration variable to
the actual current starting configuration and leave the resulting configuration variable
free. In line 2, we initialize hState, which is a pseudo-state containing all literals that
33
RelaxedPlanGraph(𝑠, Γ, crg, Ω) :
1 𝐷 = 𝑠.𝐷; ops = allNonConfBindings(Ω)
2 literals = [ ] ; actions = [ ] ; hState = 𝑠
3 while True
4
layerActions = { } ; layerLiterals = { }
5
for op ∈ ops:
6
if applicable(op, hState):
7
layerActions.add (op)
8
layerLiterals.union(op.epos )
9
ops.remove(op)
10
if op.type = pick : 𝐷.objects.remove(op.obj )
11
if op.type = place: 𝐷.heldObj = None
12
literals.append (layerLiterals)
13
actions.append
⋃︀ (layerActions)
14
hState = ⟨ 𝑖 literals 𝑖 , 𝐷⟩
15
if satisfies(hState, Γ): return (literals, actions)
16
if layerActions = { }: return None
Figure 2-5: Method for creating the Relaxed PlanGraph.
are possibly true at the layer we are operating on, and a set of details that specifies
which objects remain as constraints on the robot’s motion at this layer. In line 6, we
ask whether a operator schema with all but the resulting configuration variable bound
is applicable in the heuristic state. We only seek a single resulting configuration that
satisfies the preconditions of op in hState; even though many such configurations
might exist, each of them will ultimately affect the resulting hState in the same way.
Lines 7–9 constitute the standard computation of the rpg. In lines 10–11 we perform
domain-specific updates to the detailed world model: if there is any way to pick up
an object, then we assume it is completely removed from the domain for the rest
of the computation of the rpg; if there is any way to put down the currently held
object, then we assume that there is no object in the hand, when doing any further
computations of reachability in the crg. Line 14 creates a new hState, which consists
of all literals possibly achievable up to this level and the details with possibly more
objects removed.
There is one last consideration: the strategy shown above does not make the de34
pendencies of Reachable literals at level 𝑖 on actions from level 𝑖 − 1 explicit; the truth
of those literals is encoded implicitly in the details of the hState. We employ a simple
bookkeeping strategy to maintain a causal connection between actions and literals,
which will enable a modified version of the FF heuristic to perform the backward
pass to find a parallel plan. We observe that, in the relaxed plan, once an object is
pick ed, it is effectively removed from the domain. So, we add an extra positive effect
literal, Picked (π‘œ) to the positive effects set of the pick action, just when it is used in
the heuristic computation.
2.3.3
The FFRob heuristic
The FF heuristic operates by extracting a relaxed plan from the rpg and returning
the number of actions it contains. A relaxed plan 𝒫 constructed for starting state 𝑠
and set of goal literals 𝐺 consists of a set of actions that has the following properties:
(1) For each literal 𝑙 ∈ 𝐺 there is an action π‘Ž ∈ 𝒫 such that 𝑙 ∈ π‘Ž.𝑒pos and (2) For
each action π‘Ž ∈ 𝒫 and each literal 𝑙 ∈ π‘Ž.πœ‘, either 𝑙 ∈ 𝑠 or there exists an action π‘Ž′ ∈ 𝒫
such that 𝑙 ∈ π‘Ž.𝑒pos .
That is, the set of actions in the relaxed plan collectively achieve the goal as
well as all of the preconditions of the actions in the set that are not satisfied in the
initial state. It would be ideal to find the shortest linear plan that satisfied these
conditions, however that is NP-hard [22]. Instead, the plan extraction procedure
works backwards, starting with the set of literals in the goal 𝐺. For each literal
𝑙 ∈ 𝐺, it seeks the “cheapest” action π‘Ž* that can achieve it; that is,
π‘Ž* = arg
min
{π‘Ž|𝑙∈π‘Ž.𝑒pos }
∑︁
β„’(𝑙) ,
𝑙∈π‘Ž.πœ‘
where β„’(𝑙) is the index of the lowest layer containing 𝑙 (which is itself a quick estimate
of the difficulty of achieving 𝑙.)
The minimizing π‘Ž* is added to the relaxed plan, 𝑙 and any other literals achieved
by π‘Ž* are removed from the goal set, and the preconditions π‘Ž* .πœ‘ are added to the goal
set unless they are contained in 𝑠. This process continues until the goal set is empty.
35
The rpg computed as in section 2.3.2 does not immediately support this computation, because the Picked fluents that are positive results of Pick actions do not match
the Reachable fluents that appear in preconditions. In general, there may be many
ways to render a robot configuration reachable, by removing different combinations
of obstacles. Determining the smallest such set is known as the minimum constraint
removal problem [17]. Hauser shows it is NP-Hard in the discrete case and provides
a greedy algorithm that is optimal if obstacles must not be entered more than once.
We have extended this method to handle the case in which objects are weighted; in
our case, by the level in the rpg at which they can be picked. The weighted MCR
algorithm attempts to find a set of obstacles with a minimal sum of weights that
makes a configuration reachable.
So, any action precondition of the form Reachable(𝑐) is replaced by the set of
preconditions Picked (π‘œ) for all objects π‘œ in the solution to the weighted MCR problem
for configuration 𝑐. This represents the (approximately) least cost way to make 𝑐
accessible. Having carried out this step, we can use the standard FF method for
extracting a relaxed plan. The FFRob heuristic returns the number of actions in
this relaxed plan.
2.3.4
Geometric biases
It frequently happens that multiple states have the same heuristic value; in such
cases, we break ties using geometric biases. Intuitively, the idea is to select actions
that maximize the reachability of configurations in the domain from the current state.
This information is inexpensively obtained from the CRG which already supports a
similar reachability computation.
βˆ™ Choose actions that leave the largest number of configurations corresponding
to placements of objects in their goal poses or regions available. This captures
the idea that blocking goal regions should be avoided if possible. This is useful because although a heuristic will report when a placement is immediately
bad, i.e., already blocking future goals, it will not convey information that the
36
placement may prevent two necessary placements later in the search because
it was out in the open. This is because the relaxed plan assumed that a free
placement exists, despite objects being placed there, because it does not model
negative effects of actions.
βˆ™ Choose actions that leave the largest total number of configurations corresponding to placements reachable; this ensures that all placements are as tight as
possible against the edge of the reachable space.
βˆ™ If neither of the previous biases breaks the tie, then select actions that maximize
the total number of reachable configurations.
Additional tie-breaking when using β„Žπ‘“ 𝑓 can come from β„Žπ‘Žπ‘‘π‘‘ and β„Žπ‘šπ‘Žπ‘₯ . The relaxed
plan graph already encodes both these heuristics which can vary in value even for
states that have the same β„Žπ‘“ 𝑓 . For example, a relaxed plan given by β„Žπ‘“ 𝑓 may have
its goals concentrated closer to the current state despite the same total length.
These biases experimentally prove to be helpful in giving the search additional
guidance in this domain to reduce the amount of backtracking to undo bad decisions.
2.4
Results
We have experimented with various versions of this algorithm, differing in the definition of the heuristic, on a variety of tasks; we report the results in this section.
As previously mentioned, the search strategy in all of our experiments is enforced
hill-climbing [22]. The parameters governing the creation of the crg are: 𝑛𝑝 ∈
[25 − 50] (the number of placements for each object); this varies with the size of the
placement regions; π‘›π‘–π‘˜ = 1 (number of robot configurations for each grasp); 𝑛𝑛 = 1
(number of additional robot configurations near each grasp); 𝑛𝑐 = 250 (number of
RRT iterations); π‘˜ = 4 (number of nearest neighbors).
In our experiments, we generate an initial crg using these parameters during
pre-processing and then test whether the value of the heuristic at the initial state is
finite. If it is not, we discard it and try again, with the same parameters. Very few
37
(a) Median 8 actions
(b) Median 10 actions
(c) Median 16 actions
Figure 2-6: The initial and final state in tasks 0,1,2 in the experiments.
38
(a) Median 18 actions
(b) Median 20 actions
(c) Median 32 actions
Figure 2-7: The initial and final state in tasks 3,4,5 in the experiments.
39
retries were necessary to find a crg with finite heuristic value. This condition was
effective: in every case in our experiments, the crg contained a valid plan.
The following versions of the planner are compared in the experiments:
1. No β„Ž: The heuristic always returns 0.
2. β„Žπ‘“ 𝑓 : This is the original heuristic in FF, based only on the symbolic literals,
completely ignoring the reachability conditions when computing the heuristic.
Helpful actions are not used.
3. β„Žπ‘Žπ‘‘π‘‘π‘Ÿ : This is a version of the original β„Žπ‘Žπ‘‘π‘‘ heuristic that returns the sum of
the levels of the rpg at which the goal literals are first found. This makes use
of the crg to reason about reachability. It does not build a relaxed plan and,
therefore, does not have helpful actions.
4. β„Žπ‘“ 𝑓 π‘Ÿ,β„Žπ‘Ž : This computes the rpg, does a backward scan to find a relaxed plan
and computes helpful actions based on that plan.
5. β„Žπ‘“ 𝑓 π‘Ÿπ‘ : Like β„Žπ‘“ 𝑓 π‘Ÿ but using geometric biases to break ties and without using
helpful actions.
6. β„Žπ‘“ 𝑓 π‘Ÿπ‘,β„Žπ‘Ž : Like β„Žπ‘“ 𝑓 π‘Ÿ but using geometric biases to break ties and using helpful
actions.
We tested our algorithm on 6 different tasks, in which the goals were conjunctions
of In(𝑂𝑖 , 𝑅𝑗 ) for some subset of the objects (the ones not colored red). Other objects
were moved as necessary to achieve these goals. The last three tasks are shown in
Figure 3-6; the first three are tasks are simpler variations on task 3 (Figure 3-6(a)).
Table 2.1 shows the results of running the algorithms in each of the tasks.
Each entry in the table reports median time (t) (in gray), median absolute deviation, MAD, of the times (m), and states (s) expanded. Each task also incurs a
pre-processing time for building the roadmap; this is reported (in seconds) in the Pre
column of the table. The median-based robust statistics are used instead of the usual
mean and standard deviation since the data has outliers. Entries with a median time
40
T
Pre
t
265
300
300
300
300
300
No
m
35
0
0
0
0
0
0
1
2
3
4
5
21
25
29
23
30
51
T
β„Žπ‘“ 𝑓 π‘Ÿ , β„Žπ‘Ž
t
m
s
6
5
78
3
0
8
5
1
12
83 19 464
300 0 1274
300 1
272
0
1
2
3
4
5
β„Ž
s
48719
63407
50903
39509
23920
9422
t
7
16
17
99
18
106
t
102
283
300
300
300
300
β„Žπ‘“ 𝑓 π‘Ÿπ‘
m
5
11
13
43
3
17
β„Žπ‘“ 𝑓
m
72
17
0
0
0
0
s
87
153
114
523
20
32
s
6123
14300
8947
4849
1574
1533
t
41
162
300
300
300
300
β„Žπ‘Žπ‘‘π‘‘π‘Ÿ
m
s
19 536
55 2042
0 3052
0 1767
0 1028
0
592
β„Žπ‘“ 𝑓 π‘Ÿπ‘ , β„Žπ‘Ž
t m s
2
0 23
4
1 49
7
2 32
13 1 69
16 3 20
99 14 32
Table 2.1: Experiment results.
of 300 and MAD of 0 did not successfully complete any of the simulations. There were
20 simulations per task for the first two heuristics and 120 simulations per task for the
others. Running times are from a Python implementation running on a 2.6GHz Intel
Core i7. Performance optimization, switching to a more efficient language, and using
state-of-the-art libraries for low-level geometric processing such as collision-checking
would further improve these runtimes.
As can be clearly seen, especially in the number of expanded states, exploiting geometric information in the heuristic produces substantial improvements. Introducing
geometric biases to settle ties helps in the most cluttered of the examples. These empirical results suggest strong heuristics guidance through reasoning about geometric
constraints is vital for efficient task and motion planning algorithms.
41
THIS PAGE INTENTIONALLY LEFT BLANK
42
Chapter 3
HHFB
FFRob was able to solve complicated pick-and-place problems for a PR2 robot with
10 degrees of freedom in seconds. However, FFRob, as part of a pre-processing step,
constructed an initial set of sampled robot configurations to serve as the basis of its
geometric search space. These configurations included grasping and placing configurations as well as free-space samples connecting them. This led to an important limitation; FFRob cannot reliably plan for tasks that require sampling configurations that
depend on previous results, such as tasks requiring stacking or regrasping. One can
attempt to alleviate this problem by dense sampling, but this substantially impacts
performance, since FFRob carries out an unguided forward search over this densely
sampled set of configurations to build the relaxed plan graph. The HHBF planner
does all its sampling in a dynamic fashion, eliminating the cost of pre-processing and
enabling it to do tasks that involve stacking and regrasping.
3.1
Problem Formulation
We consider different approach to modeling high-dimensional hybrid configuration
spaces than FFRob. A state of such a system is characterized by a finite set of
configuration variables βŸ¨π‘ž1 , . . . , π‘žπ‘› ⟩, which may be discrete (such as which object a
robot is holding or whether the light is turned on) or continuous (such as the jointspace configuration of a robot or the pose of an object).
43
Without making any assumptions about the nature of the configuration space and
the transition dynamics, planning in such a space is quite difficult. We will adopt
a problem representation that can reveal useful underlying structure in the domain
that will be exploited by our method. There are three important kinds of leverage:
βˆ™ Factoring and sparsity: by representing the state space as the product of the
spaces of a set of state variables, we are able to assert that each action of
the robot affects only a small subset of the state variables, allowing individual
actions to be contemplated in state spaces that are effectively much smaller.
βˆ™ Continuous modes: there are some continuous subspaces of the whole space that
have continuous dynamics, which allows us to use classic sample-based robot
motion planning techniques to move within those subspaces.
βˆ™ Heuristic estimates: by constructing relaxed versions of a planning problem, we
can efficiently obtain estimates of the cost to reach a goal state and use these
estimates to make the search for a solution much more efficient.
We might describe a basic robot manipulation domain containing one robot and
π‘š rigid objects using the state description βŸ¨π‘Ÿ, π‘œ1 , . . . , π‘œπ‘š , β„Ž, π‘”βŸ©, where π‘Ÿ is a full configuration of the robot, π‘œπ‘– is the pose of object 𝑖, β„Ž is a discrete variable describing
which object the robot is currently holding or none, and 𝑔 is a grasp transformation
describing the relation between the currently held object (if there is one) and the
robot’s end effector frame. Much more complicated domains could include other discrete and continuous variables (such as whether the light is on, mass or temperature
of the objects, etc.)
We will characterize the state-transition dynamics using action templates that
expose the sparse structure of dependence among the values of state variables and
enable the construction of an efficiently solvable relaxed version of the problem. An
action template has the form shown in Figure 3-1.
It specifies a generally infinite set of actions, one for each possible binding of the
parameters πœƒπ‘– , which may be drawn from infinite domains, such as robot configurations. It contains a set of preconditions, each of which requires a state variable to
44
Action(πœƒ1 , . . . , πœƒπ‘˜ ):
pre: π‘žπ‘– ∈ 𝐢𝑖 (πœƒ)
...
eff: π‘žπ‘— = 𝑣𝑗 (πœƒ)
...
Figure 3-1: Generic action template format.
be within a subset of its domain. Not all dimensions of the state must be mentioned
in the preconditions: if one is not mentioned then it is implicitly allowed to take on
any value. It also contains a set of effects, each of which stipulates that a particular
state variable will take on a particular value. Again, not every state variable must be
mentioned in the effects; any variable that is not mentioned is assumed to retain the
value it had before the action was executed. In the following, we will slightly extend
the action-template notation to allow constraints to be placed not just separately on
the values of individual state variables, but on collections of them as well.
Basic “pick-and-place” robot manipulation actions can be described using the following action templates.
The robot may move from configuration π‘ž to configuration π‘ž ′ if it is not holding
any object and there is a collision-free simple (e.g., straight-line) path from π‘ž to π‘ž ′
given the poses of all the objects.
Move(π‘ž, π‘ž ′ ):
MoveHolding(π‘ž, π‘ž ′ , 𝑖, 𝛾):
pre: π‘Ÿ = π‘ž
pre: π‘Ÿ = π‘ž
β„Ž = none
β„Ž=𝑖
′
(π‘ž, π‘ž , π‘œ1 , . . . , π‘œπ‘š ) ∈
𝑔=𝛾
cFreePath(π‘ž, π‘ž ′ ; π‘œ1 , . . . , π‘œπ‘š )
(π‘ž, π‘ž ′ , β„Ž, 𝑔, π‘œ1 , . . . , π‘œπ‘š ) ∈
eff: π‘Ÿ = π‘ž ′
cFreePathH(π‘ž, π‘ž ′ , 𝑖, 𝛾; π‘œ1 , . . . , π‘œπ‘š )
eff: π‘Ÿ = π‘ž ′
π‘œπ‘– = Pose(π‘ž ′ , 𝛾)
Figure 3-2: Move and MoveHolding action templates.
cFreePath is a constraint on all of its arguments, representing the set of assignments
of π‘ž, π‘ž ′ , π‘œ1 , . . . , π‘œπ‘š for which there is a collision free path from π‘ž to π‘ž ′ . If the robot
is holding object 𝑖 in grasp 𝛾, it may move from configuration π‘ž to π‘ž ′ ; the free-path
constraint is extended to include the held object and the effects are extended to
45
Pick(π‘ž, 𝑖, 𝛾):
pre: π‘Ÿ = π‘ž
β„Ž = none
π‘œπ‘– = Pose(π‘ž, 𝛾)
eff: β„Ž = 𝑖
𝑔=𝛾
Place(π‘ž, 𝑖, 𝛾):
pre: π‘Ÿ = π‘ž
β„Ž=𝑖
𝑔=𝛾
eff: β„Ž = none
Figure 3-3: Pick and Place action templates.
include the fact that the pose of the held object changes as the robot moves. The
values of π‘Ÿ and π‘œπ‘– change in such a way that 𝑔 is held constant.
The Pick action template characterizes the state change between the robot holding nothing and the robot holding (being rigidly attached to) an object. It is parameterized by a robot configuration π‘ž, an object to be picked 𝑖, and a grasp transformation
𝛾. If the robot is in the appropriate configuration, the hand is empty, and the object
is at the pose obtained by applying transformation 𝛾 to the robot’s end-effector pose
when it is in configuration π‘ž, then the grasp action will succeed, resulting in object
𝑖 being held in grasp 𝛾. No other variables (including the robot confirmation or the
object pose) are changed.
There is an important connection between these action templates and the idea of
modes in multi-modal planning. In the case that the result variables are continuous,
then the constraints on any other variables that are mentioned in the preconditions
constitute a mode: they are conditions defining a subspace in which the result variables can be changed via multiple primitive steps, using techniques of sample-based
planning in continuous spaces.
3.2
Planning Algorithm
In this section, we describe the hybrid heuristic backward-forward (HHBF) search
algorithm. It is designed to exploit the structure of sparsity and continuous modes in
the domain dynamics and to use goal-directed heuristic guidance to search efficiently
for solutions in high-dimensional hybrid spaces.
The algorithm takes as input a description of the domain dynamics, encoded in
46
action templates, an initial state, and a goal set (described in the same formalism as
the preconditions in an action template). It also takes a set of mode-specific sampling
and planning modules that exploit the local smooth dynamics of individual continuous
modes, to provide strong guidance to the overall search process.
3.2.1
Top-level Forward Search
HHBF executes a search in the space of complete configurations of the domain,
starting at an initial state 𝑖 and applying instances of the action templates, computing
a resulting state, and so on. Two major difficulties arise immediately: first, because
there are infinitely many possible instantiations of the action templates, the branching
factor is infinite; second, there is no heuristic guidance to help select among partially
constructed solutions. We address these problems in detail in sections 3.2.2 and 3.2.3.
We will focus on trying to find a feasible (rather than optimal) solution, and
attempt to do so as quickly as possible using an enforced-hill-climbing procedure that
takes advantage of the availability of strong heuristic guidance. We modify the typical
control-structure of an agenda-based search to combine enforced hill-climbing with
strategies for handling an infinite branching factor.
The arguments to the top-level procedure (shown in figure 3-4) are an initial state
𝑠0 , a goal set Γ, and some mode-specific samplers and planners Π. The algorithm
begins by initializing β„Žmin to keep track of the best (least) heuristic value seen so far
and by setting a time-out value 𝜏 to be 1.
To expand a state, in line 6, we call procedure ReachabilityGraph which makes
a graph structure containing a set of complete state configurations, including 𝑛.𝑠 (the
current state) as well as states in Γ, connected with action-template instances. This
graph is relaxed in the sense that many of the edges might not be legally traversible,
but it is constructed in such a way that the actions it contains are good suggestions
of how to reach a goal. This procedure may involve significant computational effort,
and is governed by a time-out 𝜏 . The procedure applicable extracts from the
reachability graph 𝐺 a set of actions that are applicable in state 𝑛.𝑠 and that are
likely to be present on paths to states in the goal set. If the set of actions returned is
47
search(𝑠0 , Γ, Π)
1 π‘žπ‘’π‘’π‘’π‘’ = Queue(SearchNode(𝑠0 , None, None, ∞))
2 β„Žmin = ∞; 𝜏 = 1
3 while not queue.empty():
4
𝑛 = π‘žπ‘’π‘’π‘’π‘’.pop(); π‘Ÿπ‘’π‘ π‘’π‘‘ = False
5
if satisfies(𝑛.𝑠, Γ): return RetracePlan(𝑛)
6
𝐺 = ReachabilityGraph(𝑛.𝑠, Γ, Π, 𝜏 )
7
𝐴 = applicable(𝑛.𝑠, Γ, 𝐺)
8
if 𝐴 = { }: 𝜏 = 𝜏 * 2
9
for π‘Ž ∈ 𝐴:
10
𝑠′ = successor(𝑛.𝑠, π‘Ž)
11
β„Ž = heuristic(𝑛.𝑠, Γ, 𝐺)
12
if β„Ž < β„Žmin
13
q = Queue(SearchNode(𝑠0 , None, None, ∞))
14
β„Ž = β„Žmin ; reset = True
15
π‘žπ‘’π‘’π‘’π‘’.push(SearchNode(𝑠′ , π‘Ž, 𝑛, β„Ž))
16
if π‘Ÿπ‘’π‘ π‘’π‘‘: break
17
if not π‘Ÿπ‘’π‘ π‘’π‘‘: π‘žπ‘’π‘’π‘’π‘’.push(𝑛)
18 return None
Figure 3-4: Top-level forward search
empty, then we double the time-out. And we see in line 17 that node 𝑛 is added back
to the queue, so it will be considered again with a longer time-out. For each applicable
actions that is produced, the state 𝑠′ that results from applying it to 𝑛.𝑠 is computed
using the action-template descriptions, and its heuristic value β„Ž is calculated. If β„Ž
represents a new minimum then the enforced-hill-climbing aspect of this search causes
us to delete the whole queue (the elements of which necessarily have heuristic values
greater than β„Ž) and re-initialize it to have just the starting state and 𝑠′ . If a state does
not produce a successor with lower heuristic value, it is added back into the queue to
maintain completeness, so that we will re-encounter that state and attempt to find
additional successors through the process of generating a more complete reachability
graph.
48
3.2.2
Generating Useful Actions
Our goal is to find a small subset of the infinity of applicable actions that is likely to
contain action instances on a path to a goal state. We search backward from the goal
to construct a reachability graph that is made up of vertices (points in the complete
configuration space of the system) and edges (action-template instances connecting
two vertices in the set). As in relaxed-planning methods from the AI literature [5,
6, 22], we weaken the transition requirements by allowing the preconditions for an
action to be achieved independently: so, for example, an action with preconditions 𝑃
and 𝑄 might be enabled after taking actions that individually achieve 𝑃 and achieve
𝑄 even though those actions could not make 𝑃 and 𝑄 true simultaneously, which is
the actual precondition of the action. In addition, we take advantage of continuousspace planning knowledge to generate instances of actions that are effective for moving
within a continuous subspace (such as a path of robot configurations that are collisionfree with respect to poses of other objects).
The construction of the reachability graph also respects the multi-modal structure
of the hybrid search space [16]. The idea is that the system can move within and
among a generally infinite set of modes. A mode is a subspace or lower-dimensional
sub-manifold of the original complete configuration space, in which some state variables can be changed, subject to a constraint being maintained on the all the state
variables. So, for example, a robot moving while holding an object can change the
robot configuration and the pose of the object, subject to the grasp transformation
being maintained and to not colliding with fixed objects. In this case, the mode is
specified by constant required values for the grasp and other object poses. We will
take special steps to generate action instances that move within modes as well as those
that allow us to move between desirable modes (such as by picking up or placing an
object).
Pseudocode for ReachabilityGraph is shown in figure 3-5. The procedure
takes as input an initial state and goal, a set of samplers and planners Π = (Λ, Φ, Ψ)
and a timeout 𝜏 . Elements in the queue are (π‘žπ‘– ∈ 𝐢, πœ‡, π‘Ž): a desired set of values
49
for a state variable, π‘žπ‘– ∈ 𝐢, a mode πœ‡ in which that value should be achieved, and
the action π‘Ž that this setting of π‘žπ‘– is intended to enable. The mode is expressed as a
constraint on the values of another set of state variables; we will denote it as πœ‡.
In line 4 we insert into the queue all of the individual constraints on the state
variables that are in the goal; these can be achieved under no mode constraints.
Then, until either the queue is exhausted or the timeout is reached, we pop and
process a queue element. There are several calls to sample-based procedures (Λ, Φ,
and Ψ), and so there are many possible control structures, including running some
aspects of the procedure in parallel. For simplicity, assume each is called with a time
bound, derived from 𝜏 , or asked for a fixed number of samples in a way that depends
on 𝜏 . The overall completeness condition on the HHBF algorithm depends on these
procedures eventually being called to generate arbitrarily many samples.
Procedure Φ is a condition sampler. It takes as input a constraint on one or more
state variables and generates samples of those variables that satisfy the constraint.
So, in line 7, the variable 𝜎 will take on specific values for the variables mentioned
in the mode πœ‡ that satisfy the mode constraints. Similarly, in line 9, the variable πœ‰
will take on specific values for the target variable π‘žπ‘– . Line 10 performs bookkeeping
to associate the values πœ‰ and 𝜎 with action π‘Ž.
A complete target sample 𝑣 is constructed from πœ‰ and 𝜎 by arbitrarily sampling
values of the other state variables; in general, the modes will be chosen in such a
way that they specify all the relevant constraints on changing π‘žπ‘– ; samples can be
completed so that they are easier to connect up to elements of 𝑉 and Γ.
Procedure Ψ(𝜎, 𝑉 ) is a mode sampler. It takes the current mode values 𝜎 and tries
to find action instances 𝑒 that will enter mode 𝜎 from some other arbitrary mode. It
is biased to try to connect from modes that are present in elements of 𝑉 .
Procedure Λ(𝜎, πœ‰, 𝑉 ) is a mode planner. It takes the current mode values 𝜎 and
generates complete samples that hold 𝜎 constant but vary the target variables, attempting to connect to the target value πœ‰ as well as other vertices in 𝑉 . One familiar type of mode planner is an RRT, which proposes moves in the space of robot
configurations, subject to the constraints of the mode. 𝑉 is the union of complete
50
configurations, including the targets satisfying 𝐢 and πœ‡ as well as configurations in
which the mode can be changed to satisfy πœ‡.
Once all of this sampling is complete, we consider the action instances associated
with the edges that move within this mode and move into this mode. For each of
the action instances, we add each of its preconditions, independently, into the queue.
For each of these preconditions π‘žπ‘˜′ ∈ πΆπ‘˜ , we must determine the mode conditions
under which it should be achieved. Conservatively, we might let the mode be all the
preconditions involving all of the other variables π‘žπ‘— for 𝑗 ΜΈ= π‘˜. This would mean, for
example, considering moving the robot, subject to positions of all the objects. Because
we are attempting to quickly generate a set of useful actions, we may choose to relax
the mode requirements; the procedure Σ determines which of the other preconditions
to include in the mode, for each condition we add.
We finish the loop by pushing the queue element we began with back into the
queue, so that, as the procedure runs for more time, we may go back and generate
more actions that are helpful for achieving these conditions. Ultimately, when the
timeout is reached, we return the union of all the vertices and edges generated during
the execution.
Here is a very informal sketch of how one iteration of the algorithm might work.
Assume we pop queue item: (π‘Ÿ ∈ livingRoom, (β„Ž = 6, 𝑔 ∈ upright), π‘Ž), which seeks to
change the robot’s configuration to be a member of the set of configurations in the
livingRoom, subject to the mode constraints of the held object being object 6 and the
grasp being among the set in which the object is upright in the robot’s hand.
In line 7 of the code, we would make the mode concrete by sampling some concrete
values 𝛾 for 𝑔 that satisfy the constraint; so 𝜎 might now be β„Ž = 6, 𝑔 = 𝛾. In line 9,
we would make the target value concrete by sampling some robot configurations that
are in the livingRoom set; so πœ‰ might be such a robot configuration. We would
then generate some complete samples that agree with 𝜎 and πœ‰ and add them to π‘‰Φ .
The next step is to consider ways of achieving these mode constraints. Instances
of the Pick action can do this; we have already fixed 𝑖 = 6 and 𝛾, but they also
require a robot configuration. The specific sampler Ψ might generate values of π‘Ÿ that
51
ReachabilityGraph(𝑠, Γ, Π, 𝜏 ) :
1 (Λ, Φ, Ψ) = Π
2 𝑉 = {𝑠}; 𝐸 = { }
3 π‘žπ‘’π‘’π‘’π‘’ = Queue()
4 for (π‘žπ‘– , 𝐢) ∈ Γ: q.push((π‘žπ‘– ∈ 𝐢, {}, none))
5 while not queue.empty() and 𝑑 < 𝜏 :
6
(π‘žπ‘– ∈ 𝐢, πœ‡, π‘Ž) = π‘žπ‘’π‘’π‘’π‘’.pop()
7
for 𝜎 ∈ Φ(πœ‡) :
8
π‘‰Φ = { }
9
for πœ‰ ∈ Φ(𝐢)
10
if π‘Ž ΜΈ= None: Connect((π‘žπ‘– , πœ‰, 𝜎), π‘Ž)
11
π‘‰Φ = π‘‰Φ ∪ {complete(πœ‰, 𝜎)}
12
π‘‰Ψ , πΈΨ = Ψ(𝜎, π‘‰Φ ∪ 𝑉 )
13
π‘‰Λ , πΈΛ = Λ(𝜎, πœ‰, π‘‰Ψ ∪ π‘‰Φ ∪ 𝑉 )
14
for π‘Ž′ ∈ πΈΛ ∪ πΈΨ :
15
for (π‘žπ‘˜′ ∈ πΆπ‘˜ , Σ(π‘žπ‘˜′ )) ∈ pre(π‘Ž′ ):
16
queue.push(((π‘žπ‘˜′ ∈ πΆπ‘˜ ), Σ(π‘žπ‘˜′ ), π‘Ž′ ))
17
𝑉 = π‘‰Λ ∪ π‘‰Ψ ∪ π‘‰Φ ∪ 𝑉
18
𝐸 = πΈΛ ∪ πΈΨ ∪ 𝐸
19
queue.push((π‘žπ‘– ∈ 𝐢, πœ‡, π‘Ž))
20 return (𝑉, 𝐸)
Figure 3-5: Backward search to construct relaxed plan graph.
correspond to picking the object up from some canonical locations, including its pose
in the initial state or in other vertices in 𝑉 . Finally, we consider ways of moving within
the mode: using an RRT, for example, we can generate new robot configurations
that are connected to existing vertices via instances of the Move action template,
subject to the mode constraints. Note that, we leave out the poses of other moveable
objects when we generate relaxed mode constraints because an action not currently
performable may be applicable with a different configuration of objects.
Now, for any of the actions we have sampled so far, we would independently push
each of their preconditions onto the queue; the mode for each of those preconditions
would be the assignments of the other variables in the precondition.
52
(a) Problem 1: Moving blue block, clearing green block, and stacking purple cylinder
(b) Problem 2: Regrasping of blue and green blocks
(c) Problem 3: Sorting blue and green blocks task similar to [15]
Figure 3-6: The initial and final state for each problem.
53
3.2.3
Search Heuristics
Our reachability graph is a type of relaxed plan graph that contains some optimistic
action sequences for achieving conditions in the domain. An estimate of how many
steps are required to achieve one particular condition from the starting state is simply
the length of the shortest path in 𝐺 from the a vertex in which that condition holds
to the starting state. A goal is made up of a conjunction of conditions. The simplest
heuristic, β„Žadd simply adds up the heuristic estimates for each of the conditions and
uses that as the overall estimate. This may result in a significant overestimation of the
cost, in the case that one action might make several conditions true simultaneously.
A more robust estimate can be made by searching within 𝐺 for a set of actions that
constitute paths from the starting state to a set of states that individually satisfy the
conditions; the heuristic β„ŽFF returns a value that is the size of that set.
Finally, note that a finite heuristic cost is a necessary condition for the existence
of valid plan succeeding from the current state. An infinite cost indicates that at
least one goal condition is not currently satisfiable even under the relaxed planning
assumption. Thus, the search control does not need to expand states until they have
a finite heuristic cost because only then will it be clear that the current state could
possibly be on a satisfying plan. This additionally helps determine how long the
reachability graph should be grown. In our experiments, the reachability graph often
contains the first step of a feasible plan as soon as it achieves a finite heuristic cost.
3.3
Completeness
We sketch an argument that HHBF is probabilistically complete if the mode samplers
and planners it is provided are themselves effective.
Definition 1. A manipulation problem βŸ¨π‘ 0 , Γ, Λ, Φ, Ψ⟩ is robustly feasible if and only
if the following conditions hold:
βˆ™ There exists a valid plan composed of a sequence of action instances that transform 𝑠0 into a state that satisfies all the conditions in Γ.
54
βˆ™ There exists some πœ– such that every perturbation to configuration and mode
action instance arguments in the feasible plan of total distance less than πœ– is
also a valid plan.
βˆ™ For each movement through the configuration space under a given mode, the
corresponding mode planner is probabilistically complete.
βˆ™ For each mode parameter and condition there is an πœ–-ball of satisfying parameters and conditions.
βˆ™ Given an πœ–-ball of satisfying modes, the mode samplers will generate an action
that causes a satisfying mode with nonzero probability.
βˆ™ Given an πœ–-ball of satisfying condition values, the condition samplers will generate an action that generates a sample with nonzero probability.
Definition 2. A manipulation planning algorithm is probabilistically complete if and
only if for every robustly feasible manipulation problem, the probability of finding a
solution approaches one in the limit as the number of samples approaches infinity.
Theorem 1. Given any robustly feasible manipulation planning problem, the ReachabilityGraph procedure called from the initial state will eventually generate an immediate successor action that lies on a feasible solution with probability one as the
number of samples approaches infinity.
Proof Sketch: First note that a sampler with a fixed, nonzero probability of sampling
a satisfying value will generate a valid sample with probability one in the limit.
Let 𝑃 be a robustly feasible solution to the manipulation planning problem. Plan
𝑃 can be distilled down to the sequence of action instances that first satisfy some
goal condition necessary later on the plan. This forms a reachability plan that is a
sub-sequence of the original plan. Inductively apply this argument starting at the end
of the reachability plan. For the sequence of action instances in the current mode,
the condition samplers have nonzero probability of constructing a goal sample. This
applies to the mode samplers as well. Finally, the planner sampler is assumed to be
55
probabilistically complete so it will find a satisfying robust plan that connects the
goal sample and mode sample. Thus, these action instances have nonzero probability
of being generated in the limit. When applied backwards to the beginning of the
reachability plan, it follows that a first action instance on a feasible plan is generated.
Theorem 2. HHBF is probabilistically complete.
Proof Sketch: Consider any robustly feasible manipulation problem. Suppose that the
algorithm functions non-deterministically and can consider all successor branches at
once without loss of generality. Starting with the initial state, the previous theorem
indicates that the ReachabilityGraph grown on the first state will generate a first
action on some feasible solution to the manipulation problem with probability one in
the limit. The search control then creates a branch with a new state resulting from
applying this action. The manipulation problem formed by replacing the initial state
with this new state must also be robustly feasible by transitivity. This argument can
be inductively applied for the finite number of steps corresponding to the length of
the plan. The resulting sequence of action instances is a valid plan because it generate
the state sequence. Because each action is generated with probability one in the limit,
the probability of generating the plan is one in the limit by addition of limits.
3.4
Results
We applied HHBF to three different manipulation problems to characterize its performance. Solving these problems requires stacking, regrasping, and long-horizon
manipulation. The planner and PR2 robot manipulation simulations were written
in Python using OpenRAVE [9]. In each problem, red objects represent moveable
objects that have no particular goal condition. However, they impose geometric constraints on the problem and must, in many cases, be manipulated in order to produce
a satisfying plan. For example, in problem 1, a stacked red block prevents the green
block from being clear, so the planner first plans to unstack it before moving the
56
P
1
2
3
t
29
24
300
No β„Ž
m
8
8
0
s
168
71
828
β„Žmax
t
m
s
22 5
45
21 4
20
300 0 1083
β„ŽAdd
t
m
15
3
20
4
167 115
s
19
21
92
β„ŽFF
t m s
16 3 19
19 4 19
99 33 96
β„ŽFF , 𝐻𝐴
t m s
14 2 20
22 5 18
78 20 77
Table 3.1: Manipulation experiment results.
green block. Problem 3 is designed to be comparable to trial 6 of the FFRob planning system [15].
Each problem uses the same set of samplers and planners Π = (Λ, Φ, Ψ). These
constitute a set of skills for the robot that are only invoked when determined useful
by the reachability graph. Condition samplers Φ sample goal placements within a
region or on an object or sample grasps. The placement sampling is done by Monte
Carlo sampling of poses on a polygonal surface. The grasp sampling chooses from
a finite set of grasps for polyhedral obstacles and from a infinite set of grasps for
cylindrical objects. Mode samplers Ψ(𝜎, 𝑉 ) sample action instances such as Pick,
Place, Unstack and Stack that transition between modes given existing, inverse
reachability, and inverse kinematics samples. Finally Mode planners Λ(𝜎, πœ‰, 𝑉 ) sample
action trajectories through the robot’s configuration space for both when the robot’s
hand is empty and when it is holding an object. This is done by chaining bidirectional
RRT’s in separate configuration spaces of the robot’s base and arm.
We compared the performance of the following heuristics on each problem: No
β„Ž, β„Žmax , β„ŽAdd , and β„ŽFF . We also compared β„ŽFF , 𝐻𝐴 which, in addition to using β„ŽFF ,
includes helpful action pruning as in the FF planning system [22]. Helpful action
pruning is a method used in task planning to further reduce the candidate set of
successor actions by limiting successors to branches on the relaxed plan. Namely,
only actions that achieve values on FF’s computed relaxed plan are considered. This
means that the search control considers only a subset of the set of useful actions
computed by the reachability graph when progressing the search. Although this
technique is not complete, it is well known to reduce search effort.
We enforce a five minute timeout on all experiments. In practice, restarting the
57
search after a fixed amount of time, as in many sample-based robotic planning algorithms, may improve performance. There were 20 trials per problem all conducted
on a 2.6GHz Intel Core i7. Each entry in Table 3.1 reports the median runtime (t),
median absolute deviation, (MAD) of the runtimes (m), and median number of states
(s) expanded in the forward search. The median-based statistics are used to be robust
against outliers. The statistics for trials that failed to find a solution are included in
the entries. Thus, table entries with a runtime of 300 and MAD of 0 did not find a
plan for any trial. The runtimes displayed encompass the full planning time since the
algorithm requires no pre-processing time. The low number of states searched indicate
that the heuristic is able to frequently guide the search almost directly to a solution.
The runtimes are improvements over runtimes on comparable problems reported by
Srivastava et al. [34] and over the problems posed to FFRob. The experiments show
that the dynamic and directed nature of the algorithm allow it avoid oversampling
while solving a larger number of problems involving stacking and regrasping.
58
Chapter 4
Conclusion
We have shown how to use search and heuristic ideas from AI task planning to create
new algorithms to efficiently solve robotic planning problems. The FFRob planning
algorithm uses data structures for multi-query motion planning algorithms to discretize integrated task and motion planning algorithms. These data structures allow
quick computation of the FFRob heuristic which incorporates geometric reachability
constraints into its cost estimate. This produces a deeply integrated task and motion
planning system that was empirically shown to efficiently solve long pick-and-place
problems on a PR2 robot.
Using the geometric heuristic insights from FFRob, HHBF generalized these
ideas to a general manipulation planning framework with diverse action types including stacking and regrasping. A key difference from FFRob is that HHBF dynamically generates action samples useful for the global forward search using backwards
reasoning on the Reachability Graph. This enables the algorithm to generate fewer
expensive action samples because it only generates samples when they are likely to
satisfy a current goal. We also gave a template for defining additional action primitives that could include non-prehensile manipulation skills. This allows HHBF to
solve a wide breadth of manipulation problems not just limited to pick-and-place
task and motion planning problems. We also gave conditions for which HHBF is
probabilistically complete. Finally, empirical results showed that it was even more
efficient than FFRob on a higher fidelity simulator and that it seemed to outperform
59
the previous state-of-the-art planner by Srivastava et al. [34].
4.1
Future Work
In the future, we will look at applying HHBF to manipulation problems that involve
non-prehensile actions such as pushing. Then, we will perform experiments on a real
PR2 robot to plan and execute tasks such as preparing meals and cleaning rooms in
real time. After standard performance engineering, HHBF will likely be able to solve
these tasks in several seconds. Additional work will involve incorporating uncertainty
into these planning problems to better model the real world, using replan to adapt
to unexpected changes in the state, and exploiting hierarchical and factored planning
to obtain further algorithmic improvements.
Another direction for future research is to further generalize HHBF to solve arbitrary planning problems over continuous variables in infinite state-spaces. Although
we focused on robotics applications, HHBF supports any action primitives that can
be modeled by its action templates. There are potential applications in AI numerical planning, resource allocation, and control of other industrial systems [21]. These
applications motivate more theoretical study into the decidability of these general
problems and completeness of planning algorithms for this action template representation. Finally, our work focused on using heuristic search ideas from the FF
Planning System because of its simplicity, but other planning systems have subsequently had even greater success such as the FastDownward and Context-Enhanced
Additive Planning Systems [19, 20]. Ideas from these algorithms may lead to even
better planning algorithms for hybrid systems.
60
Bibliography
[1] R. Alami, J.-P. Laumond, and T.Siméon. Two manipulation planning algorithms.
In WAFR, 1994.
[2] Rachid Alami, Thierry Simeon, and Jean-Paul Laumond. A geometrical approach to planning manipulation tasks. the case of discrete placements and
grasps. In ISRR, 1990.
[3] Jennifer Barry, Leslie Pack Kaelbling, and Tomas Lozano-Perez. A hierarchical
approach to manipulation with diverse actions. In Robotics and Automation
(ICRA), 2013 IEEE International Conference on, pages 1799–1806. IEEE, 2013.
[4] Avrim L Blum and Merrick L Furst. Fast planning through planning graph
analysis. Artificial intelligence, 90(1):281–300, 1997.
[5] Blai Bonet and Hector Geffner. Planning as heuristic search: New results. In
Proc. of 5th European Conf. on Planning (ECP), pages 360–372, 1999.
[6] Blai Bonet and Héctor Geffner. Planning as heuristic search. Artificial Intelligence, 129(1):5–33, 2001.
[7] Stephane Cambon, Rachid Alami, and Fabien Gravot. A hybrid approach to
intricate motion, manipulation and task planning. International Journal of
Robotics Research, 28, 2009.
[8] Lavidra de Silva, Amit Kumar Pandey, Mamoun Gharbi, and Rachd Alami. Towards combining HTN planning and geometric task planning. In RSS Workshop
on Combined Robot Motion Planning and AI Planning for Practical Applications,
2013.
[9] Rosen Diankov and James Kuffner. Openrave: A planning architecture for autonomous robotics. Technical Report CMU-RI-TR-08-34, Robotics Institute,
Carnegie Mellon University, 2008.
[10] Mehmet Remzi Dogar and Siddhartha S. Srinivasa. A planning framework
for non-prehensile manipulation under clutter and uncertainty. Auton. Robots,
33(3):217–236, 2012.
61
[11] Christian Dornhege, Patrick Eyerich, Thomas Keller, Sebastian Trüg, Michael
Brenner, and Bernhard Nebel. Semantic attachments for domain-independent
planning systems. In International Conference on Automated Planning and
Scheduling (ICAPS), pages 114–121. AAAI Press, 2009.
[12] Christian Dornhege, Andreas Hertle, and Bernhard Nebel. Lazy evaluation and
subsumption caching for search-based integrated task and motion planning. In
IROS workshop on AI-based robotics, 2013.
[13] Esra Erdem, Kadir Haspalamutgil, Can Palaz, Volkan Patoglu, and Tansel Uras.
Combining high-level causal reasoning with low-level geometric reasoning and
motion planning for robotic manipulation. In IEEE International Conference on
Robotics and Automation (ICRA), 2011.
[14] Richard E. Fikes and Nils J. Nilsson. STRIPS: A new approach to the application
of theorem proving to problem solving. Artificial Intelligence, 2:189–208, 1971.
[15] Caelan Reed Garrett, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Ffrob:
An efficient heuristic for task and motion planning. In International Workshop
on the Algorithmic Foundations of Robotics (WAFR), 2014.
[16] K. Hauser and J.C. Latombe. Integrating task and prm motion planning: Dealing with many infeasible motion planning queries. In ICAPS09 Workshop on
Bridging the Gap between Task and Motion Planning, 2009.
[17] Kris Hauser. The minimum constraint removal problem with three robotics
applications. I. J. Robotic Res., 33(1):5–17, 2014.
[18] Kris Hauser and Victor Ng-Thow-Hing. Randomized multi-modal motion planning for a humanoid robot manipulation task. IJRR, 30(6):676–698, 2011.
[19] Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006.
[20] Malte Helmert and Héctor Geffner. Unifying the causal graph and additive
heuristics. In ICAPS, pages 140–147, 2008.
[21] Jörg Hoffmann et al. The metric-ff planning system: Translating ”ignoring delete
lists” to numeric state variables. J. Artif. Intell. Res.(JAIR), 20:291–341, 2003.
[22] Jörg Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. Journal Artificial Intelligence Research (JAIR),
14:253–302, 2001.
[23] Leslie Pack Kaelbling and Tomas Lozano-Perez. Hierarchical planning in the
now. In IEEE Conference on Robotics and Automation (ICRA), 2011.
[24] L. E. Kavraki, P. Svestka, J.-C. Latombe, and M. H. Overmars. Probabilistic
roadmaps for path planning in high-dimensional configuration spaces. IEEE
Transactions on Robotics and Automation, 12(4):566–580, 1996.
62
[25] Fabien Lagriffoul, Dimitar Dimitrov, Alessandro Saffiotti, and Lars Karlsson.
Constraint propagation on interval bounds for dealing with geometric backtracking. In IEEE/RSJ International Conference on Intelligent Robots and Systems
(IROS), 2012.
[26] Peter Leven and Seth Hutchinson. A framework for real-time path planning in
changing environments. I. J. Robotic Res., 21(12):999–1030, 2002.
[27] T. Lozano-Pérez, J. L. Jones, E. Mazer, P. A. O’Donnell, W. E. L. Grimson,
P. Tournassoud, and A. Lanusse. Handey: A robot system that recognizes,
plans, and manipulates. In IEEE International Conference on Robotics and
Automation, 1987.
[28] Tomás Lozano-Pérez. Automatic planning of manipulator transfer movements.
IEEE Transactions on Systems, Man, and Cybernetics, 11:681–698, 1981.
[29] Nils J. Nilsson. Shakey the robot. Technical Report 323, Artificial Intelligence
Center, SRI International, Menlo Park, California, 1984.
[30] Amit Kumar Pandey, Jean-Philippe Saut, Daniel Sidobre, and Rachid Alami.
Towards planning human-robot interactive manipulation tasks: Task dependent and human oriented autonomous selection of grasp and placement. In
RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics, 2012.
[31] Erion Plaku, Kostas E. Bekris, Brian Y. Chen, Andrew M. Ladd, and Lydia E.
Kavraki. Sampling-based roadmap of trees for parallel motion planning. IEEE
Transactions on Robotics, 21(4):597–608, 2005.
[32] Erion Plaku and Gregory Hager. Sampling-based motion planning with symbolic,
geometric, and differential constraints. In IEEE International Conference on
Robotics and Automation (ICRA), 2010.
[33] Thierry Siméon, Jean-Paul Laumond, Juan Cortés, and Anis Sahbani. Manipulation planning with probabilistic roadmaps. IJRR, 23:729–746, 2004.
[34] Siddharth Srivastava, Eugene Fang, Lorenzo Riano, Rohan Chitnis, Stuart Russell, and Pieter Abbeel. Combined task and motion planning through an extensible planner-independent interface layer. In IEEE Conference on Robotics and
Automation (ICRA), 2014.
[35] Siddharth Srivastava, Lorenzo Riano, Stuart Russell, and Pieter Abbeel. Using
classical planners for tasks with continuous operators in robotics. In ICAPS
Workshop on Planning and Robotics (PlanRob), 2013.
[36] Mike Stilman and James J. Kuffner. Planning among movable obstacles with artificial constraints. In Workshop on Algorithmic Foundations of Robotics (WAFR),
2006.
63
[37] Mike Stilman, Jan-Ulrich Schamburek, James J. Kuffner, and Tamim Asfour.
Manipulation planning among movable obstacles. In IEEE International Conference on Robotics and Automation (ICRA), 2007.
[38] Jur Van Den Berg, Mike Stilman, James Kuffner, Ming Lin, and Dinesh
Manocha. Path planning among movable obstacles: a probabilistically complete
approach. In Algorithmic Foundation of Robotics VIII, pages 599–614. Springer,
2009.
[39] Gordon T. Wilfong. Motion planning in the presence of movable obstacles. In
Symposium on Computational Geometry, pages 279–288, 1988.
64
Download