Heuristic Search for Manipulation Planning Caelan Reed Garrett

Heuristic Search for Manipulation Planning by Caelan Reed Garrett Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Science and Electrical Engineering at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2015 c Massachusetts Institute of Technology 2015. All rights reserved. ○ Author . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Department of Electrical Engineering and Computer Science May 21, 2015 Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Tomás Lozano-Pérez Professor Thesis Supervisor Certified by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Leslie Pack Kaelbling Professor Thesis Supervisor Accepted by . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Prof. Albert R. Meyer Chairman, Masters of Engineering Thesis Committee 2 Heuristic Search for Manipulation Planning by Caelan Reed Garrett Submitted to the Department of Electrical Engineering and Computer Science on May 21, 2015, in partial fulfillment of the requirements for the degree of Master of Engineering in Computer Science and Electrical Engineering Abstract Manipulation problems involving many objects present substantial challenges for planning algorithms due to the high dimensionality and multi-modality of the search space. Symbolic task planners can efficiently construct plans involving many entities but cannot incorporate the constraints from geometry and kinematics. Existing approaches to integrated task and motion planning as well as manipulation planning remain prohibitively slow to plan in these high-dimensional hybrid configuration spaces involving many objects. We present the FFRob algorithm for task and motion planning and the hybrid heuristic backward-forward (HHBF) planning algorithm for general manipulation planning. Both algorithms adapt heuristic ideas from one of the most successful symbolic planners in recent years, the FastForward (FF) planner, to continuous robotic planning domains. FFRob discretizes task and motion planning problems using a multi-query roadmap structure that can be conditionalized to model different placements of movable objects. This structure enables the planner to efficiently compute the FFRob heuristic which incorporates geometric and kinematic planning constraints to give a tight estimate of the distance to the goal. The resulting tightly integrated planner is simple and performs efficiently in a collection of tasks involving manipulation of many objects. HHBF generalizes this idea to planning with arbitrary manipulation primitives. It dynamically searches forward from the initial state towards the goal but uses a backward search from the goal, based on a simplified representation of the actions, to bias the sampling of the infinite action space towards action that are likely to be useful in reaching the goal. As a result, it can construct long manipulation plans in a large class of manipulation domains even more effectively than FFRob. For both algorithms, we empirically demonstrate their effectiveness on complex manipulation tasks. Thesis Supervisor: Tomás Lozano-Pérez Title: Professor Thesis Supervisor: Leslie Pack Kaelbling Title: Professor 3 4 Acknowledgments I would like to thank my research advisors: Professors Tomás Lozano-Pérez and Leslie Kaelbling. Tomás first excited me about robotics and mentored me for four years causing me grow substantially not only as a roboticist but as a scientist and an engineer. Along the way, he also taught me a few jokes as well as the vital skill of always fully erasing a whiteboard and beginning in its top left corner. Leslie has always been an amazing resource for academic advice, planning brainstorming, machine learning wisdom, and technical writing magic. I am excited to continue learning from them while I pursue my Ph.D. I have been fortunate to learn from many great teachers and mentors at MIT and Thomas Jefferson High School for Science and Technology that inspired me to find my passions and work hard to make them a reality. I would like to thank The Chorallaries of MIT for being my family at MIT for the past four years. I also want to thank Dr. Mark Livingston at the Naval Research Laboratory for taking me on as a research intern in high school and introducing me to the thrill of finding and solving new problems. Finally, I would like to thank my family for giving me the resources, support, and love so I can follow my dreams. 5 THIS PAGE INTENTIONALLY LEFT BLANK 6 Contents 1 Introduction 13 1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 1.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16 1.2.1 Manipulation Planning . . . . . . . . . . . . . . . . . . . . . . 16 1.2.2 Discrete Task Planning . . . . . . . . . . . . . . . . . . . . . . 18 1.2.3 Integrated Task and Motion Planning . . . . . . . . . . . . . . 19 2 FFRob 23 2.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 2.2 Conditional Reachability Graph . . . . . . . . . . . . . . . . . . . . . 26 2.2.1 Constructing the crg . . . . . . . . . . . . . . . . . . . . . . 27 2.2.2 Querying the crg . . . . . . . . . . . . . . . . . . . . . . . . . 29 Planning Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 2.3.1 Enforced Hill-Climbing . . . . . . . . . . . . . . . . . . . . . . 31 2.3.2 Computing the Relaxed Plan Graph . . . . . . . . . . . . . . 32 2.3.3 The FFRob heuristic . . . . . . . . . . . . . . . . . . . . . . . 35 2.3.4 Geometric biases . . . . . . . . . . . . . . . . . . . . . . . . . 36 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 2.3 2.4 3 HHFB 43 3.1 Problem Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 43 3.2 Planning Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 3.2.1 47 Top-level Forward Search . . . . . . . . . . . . . . . . . . . . . 7 3.2.2 Generating Useful Actions . . . . . . . . . . . . . . . . . . . . 49 3.2.3 Search Heuristics . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.3 Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54 3.4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 4 Conclusion 4.1 59 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 60 List of Figures 2-1 Pick and place operator descriptions. . . . . . . . . . . . . . . . . . . 24 2-2 Procedure for constructing the crg. . . . . . . . . . . . . . . . . . . . 28 2-3 Procedure for querying the crg. . . . . . . . . . . . . . . . . . . . . . 29 2-4 The primary search control procedure. . . . . . . . . . . . . . . . . . 31 2-5 Method for creating the Relaxed PlanGraph. . . . . . . . . . . . . . . 34 2-6 The initial and final state in tasks 0,1,2 in the experiments. . . . . . . 38 2-7 The initial and final state in tasks 3,4,5 in the experiments. . . . . . . 39 3-1 Generic action template format. . . . . . . . . . . . . . . . . . . . . . 45 3-2 Move and MoveHolding action templates. . . . . . . . . . . . . . 45 3-3 Pick and Place action templates. . . . . . . . . . . . . . . . . . . . 46 3-4 Top-level forward search . . . . . . . . . . . . . . . . . . . . . . . . . 48 3-5 Backward search to construct relaxed plan graph. . . . . . . . . . . . 52 3-6 The initial and final state for each problem. . . . . . . . . . . . . . . 53 9 THIS PAGE INTENTIONALLY LEFT BLANK 10 List of Tables 2.1 Experiment results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41 3.1 Manipulation experiment results. . . . . . . . . . . . . . . . . . . . . 57 11 THIS PAGE INTENTIONALLY LEFT BLANK 12 Chapter 1 Introduction 1.1 Background A long-standing goal in robotics is to develop robots that can operate autonomously in unstructured human environments. Recent hardware innovations have made mobile manipulator robots increasingly affordable, and sensing innovations, especially RGBD sensors, provide unprecedented sensory bandwidth and accuracy. Progress in algorithms for navigation and motion planning have enabled some basic forms of mobile manipulation, combined actuation of a robot’s base and end-effectors. But this is primarily restricted to picking and placing on relatively uncluttered surfaces. However planning for complex mobile manipulation problems involving cluttered environments still present substantial challenges. Researchers in artificial intelligence planning have been tackling problems that require long sequences of actions and large discrete state spaces and have had some notable success in recent years. However, these symbolic “task-level” planners do not naturally encompass the detailed geometric and kinematic considerations that motion planning requires. The original Shakey/strips robot system [14, 29], from which many of these symbolic planners evolved, managed to plan for an actual robot by working in a domain where all legal symbolic plans were effectively executable. This required the ability to represent symbolically a sufficient set of conditions to guarantee the success of the steps in the plan. This is not generally possible in 13 realistic manipulation domains because the geometrical and kinematic constraints are significant and difficult to articulate symbolically. Consider a simple table-top manipulation domain where a variety of objects are placed on a table and the robot’s task is to collect some subset of the objects and pack them in a box, or use them to make a meal, or put them away in their storage bins. The basic robot operations are to pick up an object and place it somewhere else; in addition, the robot can move its base in order to reach a distant object. Note that, in general, to reach some object, we will have to move other objects out of the way. Which objects need moving depends on their shapes, the shape of the robot, where the robot’s base is placed and what path it follows to the object. When an object is moved, the choice of where to place it requires similar considerations. The key observation is that constructing a valid symbolic plan requires access to a characterization of the connectivity of the underlying free configuration space (for the robot and all the movable objects). We cannot efficiently maintain this connectivity with a set of static assertions updated by strips operators; determining how the connectivity of the underlying free space changes requires geometric computation. Additionally, traditional motion planning algorithms that find paths between fully specified configurations cannot address problems in which the configuration space of interest is not just that of the robot but the configuration space of a kitchen, for example, and the goal is to make dinner and clean the kitchen. We almost certainly do not want to choose whether to get the frying pan or the steak next by sampling configurations of the robot and kitchen and testing for paths between them. Whereas classic robot motion planning requires a search in the robot configuration space, manipulation planning requires a search in the combined configuration space of the robot and all the movable objects in the world. Achieving a manipulation goal requires choosing which object to move when, which grasps and intermediate placements to use, etc. Existing manipulation planning algorithms [33, 7, 16, 18, 10, 3, 34, 15] take substantial time to plan operations involving relatively few objects. The fact that manipulation planning remains challenging is not surprising. Working in a high-dimensional space and making a long sequence of intertwined decisions is 14 notoriously difficult. Constructing such plans generally requires some methods for partitioning the problem and for effective heuristic search guidance. We present two algorithms for solving mobile manipulation problems. The first algorithm is for solving task and motion planning problems in which the robot must simultaneously perform classical AI planning while reasoning about its own motions. The FFRob algorithm is a fully integrated task and motion planner that searches with a heuristic that takes geometric information into account [15]. We show an extension of the heuristic used in the FastForward (FF) [22] planning system that integrates reachability in the robot configuration space with reachability in the symbolic state space; we call this new heuristic FFRob. Both the search and the computation of the FFRob heuristic exploit a discrete, pre-sampled roadmap [24] data structure that allows multiple motion-planning queries on the closely related problems that arise during the search to be solved efficiently. The second algorithm extends the ideas of FFRob to a more general manipulation planning framework with arbitrary, continuous actions. The result is the hybrid heuristic backward-forward (HHBF) planning algorithm that searches forward from the initial state towards the goal but which uses a backward search from the goal, based on a simplified representation of the actions, to bias the sampling of the infinite action space towards actions that are likely to be useful in reaching the goal. This process also allows us to compute an effective heuristic for focusing the search. The HHBF heuristic is derived from the structure of the operators used in planning and does not require any domain-specific heuristic to be specified by the user. We perform experiments on challenging manipulation problems and explore the effect of various planner configurations on their performance. We show that HHBF is probabilistically complete. Finally, we demonstrate that HHBF can solve a broad class of hybrid planning problems, planning problems with a mixed discrete and continuous component, including manipulation planning. 15 1.2 Related Work This work draws from existing approaches to manipulation planning and integrated task and motion planning as well as ideas from the artificial intelligence task-planning literature. Our focus will be on showing how ideas originally developed for discrete task planning can be adapted to continuous-action domains to more efficiently solve manipulation planning problems [16, 18, 3]. 1.2.1 Manipulation Planning In manipulation planning, the goal is not just to move the robot without collision, but also to operate on the objects in the world. This problem was addressed from the earliest days of algorithmic motion planning, for example by Lozano-Perez et. al. [28, 27] and Wilfong [39]. The modern treatment of this problem, involving continuous grasps as well as continuous placements, was pioneered by Alami et al. [2, 1] who introduced the manipulation graph. This graph breaks the problem of one robot moving one object in a potentially complex environment into several problems of moving between connected components of the combined configuration space where grasps can be changed. A solution is an alternating sequence of transit paths, in which the robot is not grasping an object, and transfer paths, in which it is. Siméon et al. [33] expanded this work to more realistic settings by using probabilistic roadmaps. Specifically, they looked at manipulations necessitating multiple regrasps. Their approach uses the manipulation graph to identify a high-level sequence of transit and transfer paths then performs the motion planning required to achieve them. This work assumes that only a single movable object exists in the domain. Stilman et al. [36, 37] address a version of manipulation planning called navigation among movable obstacles (NAMO), where the robot must reach a specified location among a field of movable obstacles. They plan backwards from the final goal and uses swept volumes to determine, recursively, which additional objects must be moved and to constrain the system from placing other objects into those volumes. van den 16 Berg et al. [38] develop a probabilistically complete algorithm for NAMO. However, this algorithm assumes that one can fully characterize the connected components of the configuration space of the robot at each planning step; this is computationally prohibitive for configuration spaces with more than two or three dimensions. Hauser [16, 18] identified a generalization of manipulation planning as hybrid planning, that is, planning for systems with multiple (possibly infinitely many) modes, representing different constraint sub-manifolds of the configuration space, for example, characterized by a grasp on a particular object or the placement of a movable object. The key insight is that, as in the manipulation graph, one can conceptualize the planning process as alternating between moving in a single mode, where the constraints are constant (e.g., moving through free space with a grasped object), and switching between modes (e.g. grasping a new object). So, solving these problems requires being able to plan within a single mode (this the realm of classical motion planning) and identifying (sampling) configurations where modes can change, which is generally specific to the task. Hauser provided a probabilistically complete algorithmic framework that solves problems of this type assuming that effective single-mode planners and mode-transition samplers are available. However, a pure uni-directional sampling-based algorithm has trouble solving high-dimensional problems. Hauser provided search guidance by limiting the set of mode transitions attempted and by providing pre-computed “utility” values to guide the choice of samples; although this pre-computation required substantial time. He reports [18] that without using utility, the planner would run for more than 30 minutes (on a task to push one object on a table, requiring the robot to move around the table), whereas using utility lowers the running time to “approximately four minutes”. Barry et al. [3] defined an RRT-style search of the combined configuration space, including a bi-directional version of the RRT. Importantly, the individual moves of this algorithm consider complete plans to reach the goal, ignoring obstacles. This ensures that suggested motions have some chance of being on the path to the goal. They also investigated a two-layer hierarchy, where the higher level plans only for the manipulated object (without the robot), with the objective of identifying relevant 17 mode transitions to guide the full planner. This planner was able to solve some challenging manipulation problems, but was limited to domains with one movable object and had running times on the order of minutes. Our position is that constructing complex manipulation plans requires effective heuristic search guidance. This has been a major focus of research in the artificial intelligence planning community. 1.2.2 Discrete Task Planning The AI-planning research community has developed many overall approaches but has largely adopted heuristic state-space search methods. Bonet and Geffner [5, 6] popularized this idea by constructing domain-independent heuristics that quickly solve relaxations of the actual planning task to approximate the cost to the goal from an arbitrary state. The breakthrough was that these heuristics could be derived automatically by a domain-independent planner, based only on the symbolic description of the domain using rules with pre-conditions and effects. The key idea is to define a relaxed version of the planning problem where the operators do not have any “delete” effects, that is, no previously achieved result becomes false. This is an easier problem to solve than the original problem and the solution can be used to provide an approximation to the actual cost of solving a problem. One can define two heuristics, ℎ𝑎𝑑𝑑 and ℎ𝑚𝑎𝑥 , that compute their estimates on a relaxed version of the plan graph [4] in polynomial time by adding or taking the max of the costs of achieving individual terms in a conjunctive goal. The ℎ𝑚𝑎𝑥 heuristic is admissible, while ℎ𝑎𝑑𝑑 is inadmissible but more informed (it tends to be closer to the true cost and provides more effective guidance in practice). The FastForward (FF) planning system of Hoffman and Nebel [22] extended this idea by extracting a plan from the relaxed plan graph and using its cost as a tighter estimate of the cost to reach the goal from the initial state. We will refer to this heuristic as ℎ𝐹 𝐹 . Importantly, the resulting relaxed plans can also be used to suggest useful actions to consider at the current state; this idea has proved crucial to subsequent planning systems, including our planning algorithms. 18 1.2.3 Integrated Task and Motion Planning There have been a number of approaches to integrating discrete task planning and continuous motion planning in recent years. The pioneering Asymov system of Cambon et al. [7] conducts an interleaved search at the symbolic and geometric levels. They carefully consider the consequences of using non-terminating probabilistic algorithms for the geometric planning, allocating computation time among the multiple geometric planning problems that are generated by the symbolic planner. The process can be viewed as using the task planner as a heuristic to guide the motion planning search. However, since the task-level planner is ignoring geometry, its value as a heuristic is quite limited. The work of Plaku and Hager [32] is similar in approach. A natural extension to the classic symbolic planning paradigm is to introduce “computed predicates” (also know as “semantic attachments”); that is, predicates whose truth value is established not via assertion but by calling an external program that operates on a geometric representation of the state. A motion planner can serve to implement such a predicate, determining the reachability of one configuration from another. This approach is currently being pursued, for example, by Dornhege et al. [11, 12], as a way of combining symbolic task-level planners with motion planners to get a planner that can exploit the abstraction strengths of the first and the geometric strengths of the second. A difficulty with this approach, however, is that calling a motion planner is generally expensive. This leads to a desire to minimize the set of object placements considered, and, very importantly, to avoid calling the motion planner during heuristic evaluation. Considering only a sparse set of placements may limit the generality of the planner, while avoiding calling the motion planner in the heuristic leads to a heuristic that is uninformed about geometric considerations and may result in considerable inefficiency due to backtracking during the search for a plan. The work of Erdem et al. [13], is similar in approach to Dornhege et al. [11], augmenting a task planner that is based on explicit causal reasoning with the ability 19 to check for the existence of paths for the robot. Lagriffoul et al. [25] also integrate the symbolic and geometric search. They generate a set of approximate linear constraints imposed by the program under consideration, e.g., from grasp and placement choices, and use linear programming to compute a valid assignment or determine one does not exist. This method is particularly successful in domains such as stacking objects in which constraints from many steps of the plan affect geometric choices. Pandey et al. [30] and deSilva et al. [8] use HTNs instead of generative task planning. Their system can backtrack over choices made by the geometric module, allowing more freedom to the geometric planning than in the approach of Dornhege et al. [11]. In addition, they use a cascaded approach to computing difficult applicability conditions: they first test quick-to-evaluate approximations of accessibility predicates, so that the planning is only attempted in situations in which it might plausibly succeed. In the hpn approach of Kaelbling and Lozano-Pérez [23], a regression-based symbolic planner uses generators, which perform fast approximate motion planning, to select geometric parameters, such as configurations and paths, for the actions. Reasoning backward using regression allows the goal to significantly bias the actions that are considered. This type of backward chaining to identify relevant actions is also present in work on navigation among movable obstacles. The work of Stilman et al. [36, 37] also plans backwards from the final goal and uses swept volumes to determine, recursively, which additional objects must be moved and to constrain the system from placing other objects into those volumes. Srivastava et al. [35, 34] offer a novel control structure that avoids computing expensive precondition values in many cases by assuming a favorable default valuation of the precondition elements; if those default valuations prove to be erroneous, then it is discovered in the process of performing geometric planning to instantiate the associated geometric operator. In that case, symbolic planning is repeated. This approach requires the ability to diagnose why a motion plan is not possible in a given state, which can be challenging, in general. Empirically, their approach is the only one 20 of which we are aware whose performance is competitive with FFRob and HHBF. All of these approaches, although they have varying degrees of integration of the symbolic and geometric planning, generally lack a true integrated heuristic that allows the geometric details to affect the focus of the symbolic planning. Both FFRob and HHBF develop such a heuristic, provide methods for computing it efficiently, and show that it results in a significant computational savings. 21 THIS PAGE INTENTIONALLY LEFT BLANK 22 Chapter 2 FFRob FFRob is our first algorithm that uses heuristic search ideas from AI task planning to efficiently solve robot planning problems. The algorithm is designed to solve integrated task and motion planning problems with a focus on pick-and-place problems. In particular, FFRob is able to efficiently solve task and motion planning problems with long horizons by leveraging geometric constraints in its heuristic to strongly guide the search. 2.1 Problem Formulation When we seek to apply the techniques of symbolic planning to domains that involve robot motions, object poses and grasps, we are confronted with a series of technical problems. In this section, we begin by discussing those problems and our solutions to them, and end with a formal problem specification. We might naturally wish to encode robot operations that pick up and place objects in the style of traditional AI planning operator descriptions such as in Figure 2-1. In these operations, the 𝐶, 𝑃 , and 𝐺 variables range over robot configurations, object poses, and grasps, respectively. These are high-dimensional continuous quantities, which means that there are infinitely many possible instantiations of each of these operators. We address this problem by sampling finitely many values for each of these variable domains during a pre-processing phase. The sampling is problem23 Pick(𝐶1 , 𝑂, 𝐺, 𝑃, 𝐶2 ): pre: HandEmpty, Pose(𝑂, 𝑃 ), RobotConf (𝐶1 ), CanGrasp(𝑂, 𝑃, 𝐺, 𝐶2 ), Reachable(𝐶1 , 𝐶2 ) add: Holding(𝑂, 𝐺), RobotConf (𝐶2 ) delete: HandEmpty, RobotConf (𝐶1 ) Place(𝐶1 , 𝑂, 𝐺, 𝑃, 𝐶2 ): pre: Holding(𝑂, 𝐺), RobotConf (𝐶1 ), CanGrasp(𝑂, 𝑃, 𝐺, 𝐶2 ), Reachable(𝐶1 , 𝐶2 ) add: HandEmpty, Pose(𝑂, 𝑃 ), RobotConf (𝐶2 ) delete: Holding(𝑂, 𝐺), RobotConf (𝐶1 ) Figure 2-1: Pick and place operator descriptions. driven, but may turn out to be inadequate to support a solution. If this happens, it is possible to add samples and re-attempt planning, although that was not done in the empirical results reported in this paper. Even with finite domains for all the variables, there is a difficulty with explicitly listing all of the positive and negative effects of each operation. The operations of picking up or placing an object may affect a large number of Reachable literals: picking up an object changes the “shape” of the robot and therefore what configurations it may move between; placing an object changes the free configuration space of the robot. Even more significant, which Reachable literals are affected can depend on the poses of all the other objects (for example, removing any one or two of three obstacles may not render a configuration beyond the obstacles reachable). Encoding this conditional effect structure in typical form in the preconditions of the operators would essentially require us to write one operator description for each possible configuration of movable objects. We address this problem by maintaining a state representation that consists of both a list of true literals and a data structure, called details, that captures the geometric state in a way that allows the truth value of any of those literals to be computed on demand. This is a version of the semantic attachments strategy [11]. The last difficulty is in computing the answers to queries in the details, especially 24 about reachability, which requires finding free paths between robot configurations in the context of many different configurations of the objects. We address this problem by using a conditional roadmap data structure called a conditional reachability graph, related to a PRM [24], for answering all reachability queries, and lazily computing answers on demand and caching results to speed future queries. More formally, a state is a tuple ⟨𝐿, 𝐷⟩, where 𝐿 is a set of literals and 𝐷 is a domain-dependent detailed representation. A literal is a predicate applied to arguments, which may optionally have an attached test, which maps the arguments and state into a Boolean value. A literal holds in a state if it is explicitly represented in the state’s literal set, or its test evaluates to true in the state: holds(𝑙, 𝑠) ≡ 𝑙 ∈ 𝑠.𝐿 or 𝑙.test(𝑠). A goal is a set of literals; a state satisfies a goal if all of the literals in the goal hold in the state: satisfies(𝑠, Γ) ≡ ∀𝑙 ∈ Γ. holds(𝑙, 𝑠). An operator is a tuple ⟨𝜑, 𝑒pos , 𝑒neg , 𝑓 ⟩ where 𝜑 is a set of literals representing a conjunctive precondition, 𝑒pos is a set of literals to be added to the resulting state, 𝑒neg is a set of literals to be deleted from the resulting state, and 𝑓 is a function that maps the detailed state from before the operator is executed to the detailed state afterwards. Thus, the successor of state 𝑠 under operator 𝑎 is defined successor(𝑠, 𝑎) ≡ ⟨𝑠.𝐿 ∪ 𝑎.𝑒pos ∖ 𝑎.𝑒neg , 𝑎.𝑓 (𝑠)⟩. An operator is applicable in a state if all of its preconditions hold in that state: applicable(𝑎, 𝑠) ≡ ∀𝑙 ∈ 𝑎.𝜑. holds(𝑙, 𝜑). An operator schema is an operator with typed variables, standing for the set of operators arising from all instantiations of the variables over the appropriate type domains. 25 Our general formulation has broader applicability, but in this paper we restrict our attention to a concrete domain in which a mobile-manipulation robot can move, grasp rigid objects, and place them on a surface. To formalize this domain, we use literals of the following forms: ∙ RobotConf (𝐶): the robot is in configuration 𝐶, where 𝐶 is a specification of the pose of the base as well as joint angles of the arm; ∙ Pose(𝑂, 𝑃 ): object 𝑂 is at pose 𝑃 , where 𝑃 is a four-dimensional pose (𝑥, 𝑦, 𝑧, 𝜃), assuming that the object is resting on a stable face on a horizontal surface; ∙ Holding(𝑂, 𝐺): the robot is holding object 𝑂 with grasp 𝐺, where 𝐺 specifies a transform between the robot’s hand and the object; ∙ HandEmpty: the robot is not holding any object; ∙ In(𝑂, 𝑅): the object 𝑂 is placed in such a way that it is completely contained in a region of space 𝑅; and ∙ Reachable(𝐶1 , 𝐶2 ): there is a collision-free path between robot configurations 𝐶1 and 𝐶2 , considering the positions of all fixed and movable objects as well as any object the robot might be holding and the grasp in which it is held. The details of a state consist of the configuration of the robot, the poses of all the objects, and what object is being held in what grasp. Two of these literals have tests. The first, In, has a simple geometric test, to see if object 𝑂, at the pose specified in this state, is completely contained in region 𝑅. The test for Reachable is more difficult to compute; it will be the subject of the next section. 2.2 Conditional Reachability Graph In the mobile manipulation domain, the details contain a conditional reachability graph (crg), which is a partial representation of the connectivity of the space of 26 sampled configurations, conditioned on the placements of movable objects as well as on what is in the robot’s hand. It is similar in spirit to the roadmaps of Leven and Hutchinson [26] in that it is designed to support solving multiple motion-planning queries in closely related environments. The crg has three components: ∙ Poses: For each object 𝑜, a set of possible stable poses. ∙ Nodes: A set of robot configurations, 𝑐𝑖 , each annotated with a (possibly empty) set {⟨𝑔, 𝑜, 𝑝⟩} where 𝑔 is a grasp, 𝑜 an object, and 𝑝 a pose, meaning that if the robot is at the configuration 𝑐𝑖 , and object 𝑜 is at pose 𝑝, then the robot’s hand will be related to the object by the transform given by grasp 𝑔. ∙ Edges: A set of pairs of nodes, with configurations 𝑐1 and 𝑐2 , annotated with an initially empty set of validation conditions of the form ⟨ℎ, 𝑔, 𝑜, 𝑝, 𝑏⟩, where 𝑏 is a Boolean value that is True if the robot moving from 𝑐1 to 𝑐2 along a simple path (using linear interpolation or some other fixed interpolator) while holding object ℎ in grasp 𝑔 will not collide with object 𝑜 if it is placed at pose 𝑝, and False otherwise. The validation conditions on the edges are not pre-computed; they will be computed lazily, on demand, and cached in this data structure. Note that some of the collisionchecking to compute the annotations can be shared, e.g. the same robot base location may be used for multiple configurations and grasps. 2.2.1 Constructing the crg The crg is initialized in a pre-processing phase, which concentrates on obtaining a useful set of sampled object poses and robot configurations. Object poses are useful if they are initial poses, or satisfy a goal condition, or provide places to put objects out of the way. Robot configurations are useful if they allow objects, when placed in useful poses, to be grasped (and thus either picked from or placed at those poses) or if they enable connections to other useful poses via direct paths. We assume that the following components are specified: a workspace 𝑊 , which is a volume of space 27 ConstructCRG(𝑊, 𝑇, 𝑠, Γ, 𝒪𝑓 , 𝒪𝑚 , 𝜃) : 1 𝑁 = {𝑠.details.robotConf } ∪ {robot configuration in Γ} 2 for 𝑜 ∈ 𝒪𝑚 : 3 𝑃𝑜 = {𝑠.details.pose(𝑜)} ∪ {pose of 𝑜 in Γ} 4 for 𝑖 ∈ {1, . . . , 𝜃.𝑛𝑝 }: 5 𝑃𝑜 .add (sampleObjPose(𝑜.shape, 𝑇 )) 6 for 𝑔 ∈ 𝑜.grasps: 7 for 𝑗 ∈ {1, . . . , 𝜃.𝑛𝑖𝑘 }: 𝑁.add (sampleIK(𝑔, 𝑜, 𝑝), (𝑔, 𝑜, 𝑝)) 8 for 𝑗 ∈ {1, . . . , 𝜃.𝑛𝑛 }: 𝑁.add (sampleConfNear(𝑔, ( ))) 9 𝐸={ } 10 for 𝑛1 ∈ 𝑁 : 11 for 𝑛2 ∈ NearestNeighbors(𝑛1 , 𝑘, 𝑁 ): 12 if CFreePath(𝑛1 .𝑐, 𝑛2 .𝑐, 𝒪𝑓 ): 𝐸.add (𝑛1 , 𝑛2 ) 13 𝑁, 𝐸 = connectTrees(𝑁, 𝐸, 𝑊, 𝜃.𝑛𝑐 ) 14 return ⟨𝑃, 𝑁, 𝐸⟩ Figure 2-2: Procedure for constructing the crg. that the robot must remain inside; a placement region 𝑇 , which is a set of static planar surfaces upon which objects may be placed (such as tables and floor, but not (for now) the tops of other objects); a set 𝒪𝑓 of fixed (immovable) objects; a set 𝒪𝑚 of movable objects; and a vector of parameters 𝜃 that specify the size of the crg. It depends, in addition, on the start state 𝑠 and goal Γ. We assume that each object 𝑜 ∈ 𝒪𝑚 has been annotated with a set of feasible grasps. The parameter vector consists of a number 𝑛𝑝 of desired sample poses per object (type); a number 𝑛𝑖𝑘 of grasp configurations per grasp; a number 𝑛𝑛 of configurations near each grasp configuration; a number 𝑛𝑐 of RRT iterations for connecting configurations, and a number 𝑘 specifying a desired degree of connectivity. The ConstructCRG procedure is outlined in Figure 2-2. We begin by initializing the set of nodes 𝑁 to contain the initial robot configuration and the configuration specified in the goal, if any. Then, for each object, we generate a set of sample poses, including its initial pose and goal pose, if any, as well as poses sampled on the object placement surfaces. For each object pose and possible grasp of the object, we use the sampleIK procedure to sample one or more robot configurations that satisfy 28 ReachableTest(𝑐1 , 𝑐2 , 𝐷, crg) : 1 for (𝑜, 𝑝) ∈ 𝐷.objects: 2 for 𝑒 ∈ crg.𝐸: 3 if not ⟨𝐷.heldObj , 𝐷.grasp, 𝑜, 𝑝, *⟩ ∈ 𝑒.valid : 4 𝑝 = CFreePath(𝑒.𝑛1 .𝑐, 𝑒.𝑛2 .𝑐, 𝑜@𝑝, 𝐷.heldObj , 𝐷.grasp) 5 𝑒.valid .add (⟨𝐷.heldObj , 𝐷.grasp, 𝑜, 𝑝, (𝑝 ! = None)⟩) 6 𝐺 = {𝑒 ∈ crg.𝐸 | ∀(𝑜, 𝑝) ∈ 𝐷.objects. ⟨𝐷.heldObj , 𝐷.grasp, 𝑜, 𝑝, True⟩ ∈ 𝑒.valid } 7 return ReachableInGraph(𝑐1 , 𝑐2 , 𝐺) Figure 2-3: Procedure for querying the crg. the kinematic constraints that the object be grasped. We sample additional configurations with the hand near the grasp configuration to aid maneuvering among the objects. We then add edges between the 𝑘 nearest neighbors of each configuration, if a path generated by linear interpolation or another simple fixed interpolator is free of collisions with fixed objects. At this point we generally have a forest of trees of configurations. Finally, we attempt to connect the trees using an RRT algorithm as in the sampling-based roadmap of trees [31]. To test whether this set of poses and configurations is plausible, we use it to compute a heuristic value of the starting state, as described in section 2.3.2. If it is infinite, meaning that the goal is unreachable even under extremely optimistic assumptions, then we return to this procedure and draw a new set of samples. 2.2.2 Querying the crg Now that we have a crg we can use it to compute the test for the Reachable literal, as shown in ReachableTest below. The main part of the test is in lines 6–7: we construct a subgraph of the crg that consists only of the edges that are valid given the object that the robot is holding and the current placements of the movable objects and search in that graph to see if configuration 𝑐2 is reachable from 𝑐1 . Lines 1–5 check to be sure that the relevant validity conditions have been computed and computes them if they have not. The procedure CFreePath(𝑐1 , 𝑐2 , obst, 𝑜, 𝑔) performs collision checking on a straight-line, 29 or other simply interpolated path, between configurations 𝑐1 and 𝑐2 , with a single obstacle obst and object 𝑜 held in grasp 𝑔. Many of these tests can be done very efficiently using simple bounding box checks. We also exploit the fact that we are only interested in the connected component of the crg that includes the current robot configuration which further increases efficiency. In addition, the crg is used to implement applicableOps(𝑠, Ω, crg), which efficiently determines which operator schema instances in Ω are applicable in a given state 𝑠. For each schema, we begin by binding variables that have preconditions specifying the robot configuration, object poses, the currently grasped object and/or the grasp to their values in state 𝑠. We consider all bindings of variables referring to objects that are not being grasped. For a pick operation, 𝑃 is specified in the current state, so we consider all bindings of 𝐺 and 𝐶2 such that (𝐶2 , (𝐺, 𝑂, 𝑃 )) ∈ crg.𝑁 . For a place operation, 𝐺 is specified in the current state, so we consider all bindings of 𝑃 and 𝐶2 such that (𝐶2 , (𝐺, 𝑂, 𝑃 )) ∈ crg.𝑁 . 2.3 Planning Algorithms A planning problem, Π, is specified by ⟨𝑠, Γ, 𝒪, 𝑇, 𝑊, Ω⟩, where 𝑠 is the initial state, including literals and details, Γ is the goal, 𝒪 is a set of objects, 𝑇 is a set of placement surfaces, 𝑊 is the workspace volume, and Ω is a set of operator schemas. Plan, shown below, is a generic heuristic search procedure. Depending on the behavior of the extract procedure, it can implement any standard search control structure, including depth-first, breadth-first, uniform cost, best-first, 𝐴* , and hillclimbing. Critical to many of these strategies is a heuristic function, which maps a state in the search to an estimate of the cost to reach a goal state from that state. Many modern domain-independent search heuristics are based on a relaxed plan graph (rpg). In the following section, we show how to use the crg to compute the relaxed plan graph efficiently. 30 Plan(Π, extract, heuristic, 𝜃) 1 ⟨𝑠, Γ, 𝒪, 𝑇, 𝑊, Ω⟩ = Π 2 crg = ConstructCRG(𝑊, 𝑇, 𝑠, Γ, 𝒪, 𝜃) 3 def H(s): heuristic(RPG(𝑠, Γ, crg, Ω)) 4 q = Queue(SearchNode(𝑠, 0, H(𝑠), None)) 5 while not q.empty(): 6 𝑛 = extract(q) 7 if satisfies(𝑛.𝑠, Γ): return 𝑛.path 8 for 𝑎 ∈ applicableOps(𝑛.𝑠, Ω, crg): 9 𝑠′ = successor(𝑛.𝑠, 𝑎) 10 𝑞.push(SearchNode(𝑠′ , 𝑛.cost + 1, H(𝑠′ ), 𝑛)) Figure 2-4: The primary search control procedure. 2.3.1 Enforced Hill-Climbing The primary search mechanism we use in our experiments, like in FF, is enforced hill climbing. This search commits to the nearest state that has a strictly lower heuristic value. When one is not immediately available, it performs a breadth-first search (BFS) until it finds a state with a strictly lower heuristic value. If the BFS fails by running out of new states to explore, it terminates. The original enforced hill-climbing algorithm restricts the BFS to only progress the plan forward from the current hill-climbing state. This technique is incomplete because the BFS could begin on a branch that is a dead end. But it is efficient in practice because it resets the search agenda each time a lower heuristic value is achieved. Note that enforced hillclimbing can be made complete by storing the state space search tree and allowing the BFS to undo actions on the current branch ensuring that the BFS can always return to the start state. Another advantage of the original enforced hill-climbing to robotic planning is that it allows planning and execution to be performed online together. Because the search commits to an action, the robot can execute the action while the planner finishes its computation. 31 2.3.2 Computing the Relaxed Plan Graph In classical symbolic planning, a plan graph is a sequence of alternating layers of literals and actions. The first layer consists of all literals that are true in the starting state. Action layer 𝑖 contains all operators whose preconditions are present and simultaneously achievable in literal layer 𝑖. Literal layer 𝑖 + 1 contains all literals that are possibly achievable after 𝑖 actions, together with a network of mutual exclusion relations that indicates in which combinations those literals might possibly be true. This graph is the basis for GraphPlan [4] and related planning algorithms. The relaxed plan graph is a simplified plan graph, without mutual exclusion conditions; it is constructed by ignoring the negative effects of the actions. From the rpg, many heuristics can be computed. For example, the ℎ𝑎𝑑𝑑 heuristic [6] returns the sum of the levels at which each of the literals in the goal appears. It is optimistic, in the sense that if the mutual exclusion conditions were taken into account, it might take more steps to achieve each individual goal from the starting state; it is also pessimistic, in the sense that the actions necessary to achieve multiple goal fluents might be “shared.” An admissible heuristic, ℎ𝑚𝑎𝑥 [6], is obtained by taking the maximum of the levels of the goal literals, rather than the sum; but it is found in practice to offer weaker guidance. An alternative is the FF heuristic [22], which performs an efficient backward-chaining pass in the plan graph to determine how many actions, if they could be performed in parallel without deletions, would be necessary to achieve the goal and uses that as the heuristic value. An important advantage of the FF heuristic is that it does not over-count actions if one action achieves multiple effects, and it enables additional heuristic strategies that are based on helpful actions. We use a version of the helpful actions strategy that reduces the choice of the next action to those that are in the first level of the relaxed plan, and find that it improves search performance. In order to use heuristics derived from the rpg we have to show how it can be efficiently computed when the add lists of the operators are incomplete and the truth values of some literals are computed from the crg in the details. We present a 32 method for computing the rpg that is specialized for mobile manipulation problems. It constitutes a further relaxation of the rpg which allows literals to appear earlier in the structure than they would in an rpg for a traditional symbolic domain. This is necessary, because the highly conditional effects of actions on Reachable literals makes them intractable to compute exactly. The consequence of the further relaxation is that the ℎ𝑎𝑑𝑑 and ℎ𝑚𝑎𝑥 heuristics computed from this structure have less heuristic force. However, in section 2.3.3 we describe a method for computing a version of ℎ𝑓 𝑓 that recovers the effectiveness of the original. The intuition behind this computation is that, as we move forward in computing the plan graph, we consider the positive results of all possible actions to be available. In terms of reachability, we are removing geometric constraints from the details; we do so by removing an object from the universe when it is first picked up and never putting it back, and by assuming the hand remains empty (if it was not already) after the first place action. Recall that, in applicable and satisfies, the holds procedure is used to see if a literal is true in a state. It first tests to see if it is contained in the literal set of the state; this set becomes increasingly larger as the rpg is computed. If the literal is not there, then it is tested with respect to the crg in the details, which becomes increasingly less constrained as objects are removed. Importantly, since the geometric tests on the crg are cached, the worst-case number of geometric tests for planning with and without the heuristic is the same. In practice, computing the rpg for the heuristic is quite fast, and using it substantially reduces the number of states that need to be explored. RelaxedPlanGraph, shown below, outlines the algorithm in more detail. In the second part of line 1, in a standard implementation we would generate all possible instantiations of all actions. However, because of the special properties of reachability, we are able to abstract away from the particular configuration the robot is in when an action occurs; thus, we consider all possible bindings of the non-configuration variables in each operator, but we only consider binding the starting configuration variable to the actual current starting configuration and leave the resulting configuration variable free. In line 2, we initialize hState, which is a pseudo-state containing all literals that 33 RelaxedPlanGraph(𝑠, Γ, crg, Ω) : 1 𝐷 = 𝑠.𝐷; ops = allNonConfBindings(Ω) 2 literals = [ ] ; actions = [ ] ; hState = 𝑠 3 while True 4 layerActions = { } ; layerLiterals = { } 5 for op ∈ ops: 6 if applicable(op, hState): 7 layerActions.add (op) 8 layerLiterals.union(op.epos ) 9 ops.remove(op) 10 if op.type = pick : 𝐷.objects.remove(op.obj ) 11 if op.type = place: 𝐷.heldObj = None 12 literals.append (layerLiterals) 13 actions.append ⋃︀ (layerActions) 14 hState = ⟨ 𝑖 literals 𝑖 , 𝐷⟩ 15 if satisfies(hState, Γ): return (literals, actions) 16 if layerActions = { }: return None Figure 2-5: Method for creating the Relaxed PlanGraph. are possibly true at the layer we are operating on, and a set of details that specifies which objects remain as constraints on the robot’s motion at this layer. In line 6, we ask whether a operator schema with all but the resulting configuration variable bound is applicable in the heuristic state. We only seek a single resulting configuration that satisfies the preconditions of op in hState; even though many such configurations might exist, each of them will ultimately affect the resulting hState in the same way. Lines 7–9 constitute the standard computation of the rpg. In lines 10–11 we perform domain-specific updates to the detailed world model: if there is any way to pick up an object, then we assume it is completely removed from the domain for the rest of the computation of the rpg; if there is any way to put down the currently held object, then we assume that there is no object in the hand, when doing any further computations of reachability in the crg. Line 14 creates a new hState, which consists of all literals possibly achievable up to this level and the details with possibly more objects removed. There is one last consideration: the strategy shown above does not make the de34 pendencies of Reachable literals at level 𝑖 on actions from level 𝑖 − 1 explicit; the truth of those literals is encoded implicitly in the details of the hState. We employ a simple bookkeeping strategy to maintain a causal connection between actions and literals, which will enable a modified version of the FF heuristic to perform the backward pass to find a parallel plan. We observe that, in the relaxed plan, once an object is pick ed, it is effectively removed from the domain. So, we add an extra positive effect literal, Picked (𝑜) to the positive effects set of the pick action, just when it is used in the heuristic computation. 2.3.3 The FFRob heuristic The FF heuristic operates by extracting a relaxed plan from the rpg and returning the number of actions it contains. A relaxed plan 𝒫 constructed for starting state 𝑠 and set of goal literals 𝐺 consists of a set of actions that has the following properties: (1) For each literal 𝑙 ∈ 𝐺 there is an action 𝑎 ∈ 𝒫 such that 𝑙 ∈ 𝑎.𝑒pos and (2) For each action 𝑎 ∈ 𝒫 and each literal 𝑙 ∈ 𝑎.𝜑, either 𝑙 ∈ 𝑠 or there exists an action 𝑎′ ∈ 𝒫 such that 𝑙 ∈ 𝑎.𝑒pos . That is, the set of actions in the relaxed plan collectively achieve the goal as well as all of the preconditions of the actions in the set that are not satisfied in the initial state. It would be ideal to find the shortest linear plan that satisfied these conditions, however that is NP-hard [22]. Instead, the plan extraction procedure works backwards, starting with the set of literals in the goal 𝐺. For each literal 𝑙 ∈ 𝐺, it seeks the “cheapest” action 𝑎* that can achieve it; that is, 𝑎* = arg min {𝑎|𝑙∈𝑎.𝑒pos } ∑︁ ℒ(𝑙) , 𝑙∈𝑎.𝜑 where ℒ(𝑙) is the index of the lowest layer containing 𝑙 (which is itself a quick estimate of the difficulty of achieving 𝑙.) The minimizing 𝑎* is added to the relaxed plan, 𝑙 and any other literals achieved by 𝑎* are removed from the goal set, and the preconditions 𝑎* .𝜑 are added to the goal set unless they are contained in 𝑠. This process continues until the goal set is empty. 35 The rpg computed as in section 2.3.2 does not immediately support this computation, because the Picked fluents that are positive results of Pick actions do not match the Reachable fluents that appear in preconditions. In general, there may be many ways to render a robot configuration reachable, by removing different combinations of obstacles. Determining the smallest such set is known as the minimum constraint removal problem [17]. Hauser shows it is NP-Hard in the discrete case and provides a greedy algorithm that is optimal if obstacles must not be entered more than once. We have extended this method to handle the case in which objects are weighted; in our case, by the level in the rpg at which they can be picked. The weighted MCR algorithm attempts to find a set of obstacles with a minimal sum of weights that makes a configuration reachable. So, any action precondition of the form Reachable(𝑐) is replaced by the set of preconditions Picked (𝑜) for all objects 𝑜 in the solution to the weighted MCR problem for configuration 𝑐. This represents the (approximately) least cost way to make 𝑐 accessible. Having carried out this step, we can use the standard FF method for extracting a relaxed plan. The FFRob heuristic returns the number of actions in this relaxed plan. 2.3.4 Geometric biases It frequently happens that multiple states have the same heuristic value; in such cases, we break ties using geometric biases. Intuitively, the idea is to select actions that maximize the reachability of configurations in the domain from the current state. This information is inexpensively obtained from the CRG which already supports a similar reachability computation. ∙ Choose actions that leave the largest number of configurations corresponding to placements of objects in their goal poses or regions available. This captures the idea that blocking goal regions should be avoided if possible. This is useful because although a heuristic will report when a placement is immediately bad, i.e., already blocking future goals, it will not convey information that the 36 placement may prevent two necessary placements later in the search because it was out in the open. This is because the relaxed plan assumed that a free placement exists, despite objects being placed there, because it does not model negative effects of actions. ∙ Choose actions that leave the largest total number of configurations corresponding to placements reachable; this ensures that all placements are as tight as possible against the edge of the reachable space. ∙ If neither of the previous biases breaks the tie, then select actions that maximize the total number of reachable configurations. Additional tie-breaking when using ℎ𝑓 𝑓 can come from ℎ𝑎𝑑𝑑 and ℎ𝑚𝑎𝑥 . The relaxed plan graph already encodes both these heuristics which can vary in value even for states that have the same ℎ𝑓 𝑓 . For example, a relaxed plan given by ℎ𝑓 𝑓 may have its goals concentrated closer to the current state despite the same total length. These biases experimentally prove to be helpful in giving the search additional guidance in this domain to reduce the amount of backtracking to undo bad decisions. 2.4 Results We have experimented with various versions of this algorithm, differing in the definition of the heuristic, on a variety of tasks; we report the results in this section. As previously mentioned, the search strategy in all of our experiments is enforced hill-climbing [22]. The parameters governing the creation of the crg are: 𝑛𝑝 ∈ [25 − 50] (the number of placements for each object); this varies with the size of the placement regions; 𝑛𝑖𝑘 = 1 (number of robot configurations for each grasp); 𝑛𝑛 = 1 (number of additional robot configurations near each grasp); 𝑛𝑐 = 250 (number of RRT iterations); 𝑘 = 4 (number of nearest neighbors). In our experiments, we generate an initial crg using these parameters during pre-processing and then test whether the value of the heuristic at the initial state is finite. If it is not, we discard it and try again, with the same parameters. Very few 37 (a) Median 8 actions (b) Median 10 actions (c) Median 16 actions Figure 2-6: The initial and final state in tasks 0,1,2 in the experiments. 38 (a) Median 18 actions (b) Median 20 actions (c) Median 32 actions Figure 2-7: The initial and final state in tasks 3,4,5 in the experiments. 39 retries were necessary to find a crg with finite heuristic value. This condition was effective: in every case in our experiments, the crg contained a valid plan. The following versions of the planner are compared in the experiments: 1. No ℎ: The heuristic always returns 0. 2. ℎ𝑓 𝑓 : This is the original heuristic in FF, based only on the symbolic literals, completely ignoring the reachability conditions when computing the heuristic. Helpful actions are not used. 3. ℎ𝑎𝑑𝑑𝑟 : This is a version of the original ℎ𝑎𝑑𝑑 heuristic that returns the sum of the levels of the rpg at which the goal literals are first found. This makes use of the crg to reason about reachability. It does not build a relaxed plan and, therefore, does not have helpful actions. 4. ℎ𝑓 𝑓 𝑟,ℎ𝑎 : This computes the rpg, does a backward scan to find a relaxed plan and computes helpful actions based on that plan. 5. ℎ𝑓 𝑓 𝑟𝑏 : Like ℎ𝑓 𝑓 𝑟 but using geometric biases to break ties and without using helpful actions. 6. ℎ𝑓 𝑓 𝑟𝑏,ℎ𝑎 : Like ℎ𝑓 𝑓 𝑟 but using geometric biases to break ties and using helpful actions. We tested our algorithm on 6 different tasks, in which the goals were conjunctions of In(𝑂𝑖 , 𝑅𝑗 ) for some subset of the objects (the ones not colored red). Other objects were moved as necessary to achieve these goals. The last three tasks are shown in Figure 3-6; the first three are tasks are simpler variations on task 3 (Figure 3-6(a)). Table 2.1 shows the results of running the algorithms in each of the tasks. Each entry in the table reports median time (t) (in gray), median absolute deviation, MAD, of the times (m), and states (s) expanded. Each task also incurs a pre-processing time for building the roadmap; this is reported (in seconds) in the Pre column of the table. The median-based robust statistics are used instead of the usual mean and standard deviation since the data has outliers. Entries with a median time 40 T Pre t 265 300 300 300 300 300 No m 35 0 0 0 0 0 0 1 2 3 4 5 21 25 29 23 30 51 T ℎ𝑓 𝑓 𝑟 , ℎ𝑎 t m s 6 5 78 3 0 8 5 1 12 83 19 464 300 0 1274 300 1 272 0 1 2 3 4 5 ℎ s 48719 63407 50903 39509 23920 9422 t 7 16 17 99 18 106 t 102 283 300 300 300 300 ℎ𝑓 𝑓 𝑟𝑏 m 5 11 13 43 3 17 ℎ𝑓 𝑓 m 72 17 0 0 0 0 s 87 153 114 523 20 32 s 6123 14300 8947 4849 1574 1533 t 41 162 300 300 300 300 ℎ𝑎𝑑𝑑𝑟 m s 19 536 55 2042 0 3052 0 1767 0 1028 0 592 ℎ𝑓 𝑓 𝑟𝑏 , ℎ𝑎 t m s 2 0 23 4 1 49 7 2 32 13 1 69 16 3 20 99 14 32 Table 2.1: Experiment results. of 300 and MAD of 0 did not successfully complete any of the simulations. There were 20 simulations per task for the first two heuristics and 120 simulations per task for the others. Running times are from a Python implementation running on a 2.6GHz Intel Core i7. Performance optimization, switching to a more efficient language, and using state-of-the-art libraries for low-level geometric processing such as collision-checking would further improve these runtimes. As can be clearly seen, especially in the number of expanded states, exploiting geometric information in the heuristic produces substantial improvements. Introducing geometric biases to settle ties helps in the most cluttered of the examples. These empirical results suggest strong heuristics guidance through reasoning about geometric constraints is vital for efficient task and motion planning algorithms. 41 THIS PAGE INTENTIONALLY LEFT BLANK 42 Chapter 3 HHFB FFRob was able to solve complicated pick-and-place problems for a PR2 robot with 10 degrees of freedom in seconds. However, FFRob, as part of a pre-processing step, constructed an initial set of sampled robot configurations to serve as the basis of its geometric search space. These configurations included grasping and placing configurations as well as free-space samples connecting them. This led to an important limitation; FFRob cannot reliably plan for tasks that require sampling configurations that depend on previous results, such as tasks requiring stacking or regrasping. One can attempt to alleviate this problem by dense sampling, but this substantially impacts performance, since FFRob carries out an unguided forward search over this densely sampled set of configurations to build the relaxed plan graph. The HHBF planner does all its sampling in a dynamic fashion, eliminating the cost of pre-processing and enabling it to do tasks that involve stacking and regrasping. 3.1 Problem Formulation We consider different approach to modeling high-dimensional hybrid configuration spaces than FFRob. A state of such a system is characterized by a finite set of configuration variables ⟨𝑞1 , . . . , 𝑞𝑛 ⟩, which may be discrete (such as which object a robot is holding or whether the light is turned on) or continuous (such as the jointspace configuration of a robot or the pose of an object). 43 Without making any assumptions about the nature of the configuration space and the transition dynamics, planning in such a space is quite difficult. We will adopt a problem representation that can reveal useful underlying structure in the domain that will be exploited by our method. There are three important kinds of leverage: ∙ Factoring and sparsity: by representing the state space as the product of the spaces of a set of state variables, we are able to assert that each action of the robot affects only a small subset of the state variables, allowing individual actions to be contemplated in state spaces that are effectively much smaller. ∙ Continuous modes: there are some continuous subspaces of the whole space that have continuous dynamics, which allows us to use classic sample-based robot motion planning techniques to move within those subspaces. ∙ Heuristic estimates: by constructing relaxed versions of a planning problem, we can efficiently obtain estimates of the cost to reach a goal state and use these estimates to make the search for a solution much more efficient. We might describe a basic robot manipulation domain containing one robot and 𝑚 rigid objects using the state description ⟨𝑟, 𝑜1 , . . . , 𝑜𝑚 , ℎ, 𝑔⟩, where 𝑟 is a full configuration of the robot, 𝑜𝑖 is the pose of object 𝑖, ℎ is a discrete variable describing which object the robot is currently holding or none, and 𝑔 is a grasp transformation describing the relation between the currently held object (if there is one) and the robot’s end effector frame. Much more complicated domains could include other discrete and continuous variables (such as whether the light is on, mass or temperature of the objects, etc.) We will characterize the state-transition dynamics using action templates that expose the sparse structure of dependence among the values of state variables and enable the construction of an efficiently solvable relaxed version of the problem. An action template has the form shown in Figure 3-1. It specifies a generally infinite set of actions, one for each possible binding of the parameters 𝜃𝑖 , which may be drawn from infinite domains, such as robot configurations. It contains a set of preconditions, each of which requires a state variable to 44 Action(𝜃1 , . . . , 𝜃𝑘 ): pre: 𝑞𝑖 ∈ 𝐶𝑖 (𝜃) ... eff: 𝑞𝑗 = 𝑣𝑗 (𝜃) ... Figure 3-1: Generic action template format. be within a subset of its domain. Not all dimensions of the state must be mentioned in the preconditions: if one is not mentioned then it is implicitly allowed to take on any value. It also contains a set of effects, each of which stipulates that a particular state variable will take on a particular value. Again, not every state variable must be mentioned in the effects; any variable that is not mentioned is assumed to retain the value it had before the action was executed. In the following, we will slightly extend the action-template notation to allow constraints to be placed not just separately on the values of individual state variables, but on collections of them as well. Basic “pick-and-place” robot manipulation actions can be described using the following action templates. The robot may move from configuration 𝑞 to configuration 𝑞 ′ if it is not holding any object and there is a collision-free simple (e.g., straight-line) path from 𝑞 to 𝑞 ′ given the poses of all the objects. Move(𝑞, 𝑞 ′ ): MoveHolding(𝑞, 𝑞 ′ , 𝑖, 𝛾): pre: 𝑟 = 𝑞 pre: 𝑟 = 𝑞 ℎ = none ℎ=𝑖 ′ (𝑞, 𝑞 , 𝑜1 , . . . , 𝑜𝑚 ) ∈ 𝑔=𝛾 cFreePath(𝑞, 𝑞 ′ ; 𝑜1 , . . . , 𝑜𝑚 ) (𝑞, 𝑞 ′ , ℎ, 𝑔, 𝑜1 , . . . , 𝑜𝑚 ) ∈ eff: 𝑟 = 𝑞 ′ cFreePathH(𝑞, 𝑞 ′ , 𝑖, 𝛾; 𝑜1 , . . . , 𝑜𝑚 ) eff: 𝑟 = 𝑞 ′ 𝑜𝑖 = Pose(𝑞 ′ , 𝛾) Figure 3-2: Move and MoveHolding action templates. cFreePath is a constraint on all of its arguments, representing the set of assignments of 𝑞, 𝑞 ′ , 𝑜1 , . . . , 𝑜𝑚 for which there is a collision free path from 𝑞 to 𝑞 ′ . If the robot is holding object 𝑖 in grasp 𝛾, it may move from configuration 𝑞 to 𝑞 ′ ; the free-path constraint is extended to include the held object and the effects are extended to 45 Pick(𝑞, 𝑖, 𝛾): pre: 𝑟 = 𝑞 ℎ = none 𝑜𝑖 = Pose(𝑞, 𝛾) eff: ℎ = 𝑖 𝑔=𝛾 Place(𝑞, 𝑖, 𝛾): pre: 𝑟 = 𝑞 ℎ=𝑖 𝑔=𝛾 eff: ℎ = none Figure 3-3: Pick and Place action templates. include the fact that the pose of the held object changes as the robot moves. The values of 𝑟 and 𝑜𝑖 change in such a way that 𝑔 is held constant. The Pick action template characterizes the state change between the robot holding nothing and the robot holding (being rigidly attached to) an object. It is parameterized by a robot configuration 𝑞, an object to be picked 𝑖, and a grasp transformation 𝛾. If the robot is in the appropriate configuration, the hand is empty, and the object is at the pose obtained by applying transformation 𝛾 to the robot’s end-effector pose when it is in configuration 𝑞, then the grasp action will succeed, resulting in object 𝑖 being held in grasp 𝛾. No other variables (including the robot confirmation or the object pose) are changed. There is an important connection between these action templates and the idea of modes in multi-modal planning. In the case that the result variables are continuous, then the constraints on any other variables that are mentioned in the preconditions constitute a mode: they are conditions defining a subspace in which the result variables can be changed via multiple primitive steps, using techniques of sample-based planning in continuous spaces. 3.2 Planning Algorithm In this section, we describe the hybrid heuristic backward-forward (HHBF) search algorithm. It is designed to exploit the structure of sparsity and continuous modes in the domain dynamics and to use goal-directed heuristic guidance to search efficiently for solutions in high-dimensional hybrid spaces. The algorithm takes as input a description of the domain dynamics, encoded in 46 action templates, an initial state, and a goal set (described in the same formalism as the preconditions in an action template). It also takes a set of mode-specific sampling and planning modules that exploit the local smooth dynamics of individual continuous modes, to provide strong guidance to the overall search process. 3.2.1 Top-level Forward Search HHBF executes a search in the space of complete configurations of the domain, starting at an initial state 𝑖 and applying instances of the action templates, computing a resulting state, and so on. Two major difficulties arise immediately: first, because there are infinitely many possible instantiations of the action templates, the branching factor is infinite; second, there is no heuristic guidance to help select among partially constructed solutions. We address these problems in detail in sections 3.2.2 and 3.2.3. We will focus on trying to find a feasible (rather than optimal) solution, and attempt to do so as quickly as possible using an enforced-hill-climbing procedure that takes advantage of the availability of strong heuristic guidance. We modify the typical control-structure of an agenda-based search to combine enforced hill-climbing with strategies for handling an infinite branching factor. The arguments to the top-level procedure (shown in figure 3-4) are an initial state 𝑠0 , a goal set Γ, and some mode-specific samplers and planners Π. The algorithm begins by initializing ℎmin to keep track of the best (least) heuristic value seen so far and by setting a time-out value 𝜏 to be 1. To expand a state, in line 6, we call procedure ReachabilityGraph which makes a graph structure containing a set of complete state configurations, including 𝑛.𝑠 (the current state) as well as states in Γ, connected with action-template instances. This graph is relaxed in the sense that many of the edges might not be legally traversible, but it is constructed in such a way that the actions it contains are good suggestions of how to reach a goal. This procedure may involve significant computational effort, and is governed by a time-out 𝜏 . The procedure applicable extracts from the reachability graph 𝐺 a set of actions that are applicable in state 𝑛.𝑠 and that are likely to be present on paths to states in the goal set. If the set of actions returned is 47 search(𝑠0 , Γ, Π) 1 𝑞𝑢𝑒𝑢𝑒 = Queue(SearchNode(𝑠0 , None, None, ∞)) 2 ℎmin = ∞; 𝜏 = 1 3 while not queue.empty(): 4 𝑛 = 𝑞𝑢𝑒𝑢𝑒.pop(); 𝑟𝑒𝑠𝑒𝑡 = False 5 if satisfies(𝑛.𝑠, Γ): return RetracePlan(𝑛) 6 𝐺 = ReachabilityGraph(𝑛.𝑠, Γ, Π, 𝜏 ) 7 𝐴 = applicable(𝑛.𝑠, Γ, 𝐺) 8 if 𝐴 = { }: 𝜏 = 𝜏 * 2 9 for 𝑎 ∈ 𝐴: 10 𝑠′ = successor(𝑛.𝑠, 𝑎) 11 ℎ = heuristic(𝑛.𝑠, Γ, 𝐺) 12 if ℎ < ℎmin 13 q = Queue(SearchNode(𝑠0 , None, None, ∞)) 14 ℎ = ℎmin ; reset = True 15 𝑞𝑢𝑒𝑢𝑒.push(SearchNode(𝑠′ , 𝑎, 𝑛, ℎ)) 16 if 𝑟𝑒𝑠𝑒𝑡: break 17 if not 𝑟𝑒𝑠𝑒𝑡: 𝑞𝑢𝑒𝑢𝑒.push(𝑛) 18 return None Figure 3-4: Top-level forward search empty, then we double the time-out. And we see in line 17 that node 𝑛 is added back to the queue, so it will be considered again with a longer time-out. For each applicable actions that is produced, the state 𝑠′ that results from applying it to 𝑛.𝑠 is computed using the action-template descriptions, and its heuristic value ℎ is calculated. If ℎ represents a new minimum then the enforced-hill-climbing aspect of this search causes us to delete the whole queue (the elements of which necessarily have heuristic values greater than ℎ) and re-initialize it to have just the starting state and 𝑠′ . If a state does not produce a successor with lower heuristic value, it is added back into the queue to maintain completeness, so that we will re-encounter that state and attempt to find additional successors through the process of generating a more complete reachability graph. 48 3.2.2 Generating Useful Actions Our goal is to find a small subset of the infinity of applicable actions that is likely to contain action instances on a path to a goal state. We search backward from the goal to construct a reachability graph that is made up of vertices (points in the complete configuration space of the system) and edges (action-template instances connecting two vertices in the set). As in relaxed-planning methods from the AI literature [5, 6, 22], we weaken the transition requirements by allowing the preconditions for an action to be achieved independently: so, for example, an action with preconditions 𝑃 and 𝑄 might be enabled after taking actions that individually achieve 𝑃 and achieve 𝑄 even though those actions could not make 𝑃 and 𝑄 true simultaneously, which is the actual precondition of the action. In addition, we take advantage of continuousspace planning knowledge to generate instances of actions that are effective for moving within a continuous subspace (such as a path of robot configurations that are collisionfree with respect to poses of other objects). The construction of the reachability graph also respects the multi-modal structure of the hybrid search space [16]. The idea is that the system can move within and among a generally infinite set of modes. A mode is a subspace or lower-dimensional sub-manifold of the original complete configuration space, in which some state variables can be changed, subject to a constraint being maintained on the all the state variables. So, for example, a robot moving while holding an object can change the robot configuration and the pose of the object, subject to the grasp transformation being maintained and to not colliding with fixed objects. In this case, the mode is specified by constant required values for the grasp and other object poses. We will take special steps to generate action instances that move within modes as well as those that allow us to move between desirable modes (such as by picking up or placing an object). Pseudocode for ReachabilityGraph is shown in figure 3-5. The procedure takes as input an initial state and goal, a set of samplers and planners Π = (Λ, Φ, Ψ) and a timeout 𝜏 . Elements in the queue are (𝑞𝑖 ∈ 𝐶, 𝜇, 𝑎): a desired set of values 49 for a state variable, 𝑞𝑖 ∈ 𝐶, a mode 𝜇 in which that value should be achieved, and the action 𝑎 that this setting of 𝑞𝑖 is intended to enable. The mode is expressed as a constraint on the values of another set of state variables; we will denote it as 𝜇. In line 4 we insert into the queue all of the individual constraints on the state variables that are in the goal; these can be achieved under no mode constraints. Then, until either the queue is exhausted or the timeout is reached, we pop and process a queue element. There are several calls to sample-based procedures (Λ, Φ, and Ψ), and so there are many possible control structures, including running some aspects of the procedure in parallel. For simplicity, assume each is called with a time bound, derived from 𝜏 , or asked for a fixed number of samples in a way that depends on 𝜏 . The overall completeness condition on the HHBF algorithm depends on these procedures eventually being called to generate arbitrarily many samples. Procedure Φ is a condition sampler. It takes as input a constraint on one or more state variables and generates samples of those variables that satisfy the constraint. So, in line 7, the variable 𝜎 will take on specific values for the variables mentioned in the mode 𝜇 that satisfy the mode constraints. Similarly, in line 9, the variable 𝜉 will take on specific values for the target variable 𝑞𝑖 . Line 10 performs bookkeeping to associate the values 𝜉 and 𝜎 with action 𝑎. A complete target sample 𝑣 is constructed from 𝜉 and 𝜎 by arbitrarily sampling values of the other state variables; in general, the modes will be chosen in such a way that they specify all the relevant constraints on changing 𝑞𝑖 ; samples can be completed so that they are easier to connect up to elements of 𝑉 and Γ. Procedure Ψ(𝜎, 𝑉 ) is a mode sampler. It takes the current mode values 𝜎 and tries to find action instances 𝑒 that will enter mode 𝜎 from some other arbitrary mode. It is biased to try to connect from modes that are present in elements of 𝑉 . Procedure Λ(𝜎, 𝜉, 𝑉 ) is a mode planner. It takes the current mode values 𝜎 and generates complete samples that hold 𝜎 constant but vary the target variables, attempting to connect to the target value 𝜉 as well as other vertices in 𝑉 . One familiar type of mode planner is an RRT, which proposes moves in the space of robot configurations, subject to the constraints of the mode. 𝑉 is the union of complete 50 configurations, including the targets satisfying 𝐶 and 𝜇 as well as configurations in which the mode can be changed to satisfy 𝜇. Once all of this sampling is complete, we consider the action instances associated with the edges that move within this mode and move into this mode. For each of the action instances, we add each of its preconditions, independently, into the queue. For each of these preconditions 𝑞𝑘′ ∈ 𝐶𝑘 , we must determine the mode conditions under which it should be achieved. Conservatively, we might let the mode be all the preconditions involving all of the other variables 𝑞𝑗 for 𝑗 ̸= 𝑘. This would mean, for example, considering moving the robot, subject to positions of all the objects. Because we are attempting to quickly generate a set of useful actions, we may choose to relax the mode requirements; the procedure Σ determines which of the other preconditions to include in the mode, for each condition we add. We finish the loop by pushing the queue element we began with back into the queue, so that, as the procedure runs for more time, we may go back and generate more actions that are helpful for achieving these conditions. Ultimately, when the timeout is reached, we return the union of all the vertices and edges generated during the execution. Here is a very informal sketch of how one iteration of the algorithm might work. Assume we pop queue item: (𝑟 ∈ livingRoom, (ℎ = 6, 𝑔 ∈ upright), 𝑎), which seeks to change the robot’s configuration to be a member of the set of configurations in the livingRoom, subject to the mode constraints of the held object being object 6 and the grasp being among the set in which the object is upright in the robot’s hand. In line 7 of the code, we would make the mode concrete by sampling some concrete values 𝛾 for 𝑔 that satisfy the constraint; so 𝜎 might now be ℎ = 6, 𝑔 = 𝛾. In line 9, we would make the target value concrete by sampling some robot configurations that are in the livingRoom set; so 𝜉 might be such a robot configuration. We would then generate some complete samples that agree with 𝜎 and 𝜉 and add them to 𝑉Φ . The next step is to consider ways of achieving these mode constraints. Instances of the Pick action can do this; we have already fixed 𝑖 = 6 and 𝛾, but they also require a robot configuration. The specific sampler Ψ might generate values of 𝑟 that 51 ReachabilityGraph(𝑠, Γ, Π, 𝜏 ) : 1 (Λ, Φ, Ψ) = Π 2 𝑉 = {𝑠}; 𝐸 = { } 3 𝑞𝑢𝑒𝑢𝑒 = Queue() 4 for (𝑞𝑖 , 𝐶) ∈ Γ: q.push((𝑞𝑖 ∈ 𝐶, {}, none)) 5 while not queue.empty() and 𝑡 < 𝜏 : 6 (𝑞𝑖 ∈ 𝐶, 𝜇, 𝑎) = 𝑞𝑢𝑒𝑢𝑒.pop() 7 for 𝜎 ∈ Φ(𝜇) : 8 𝑉Φ = { } 9 for 𝜉 ∈ Φ(𝐶) 10 if 𝑎 ̸= None: Connect((𝑞𝑖 , 𝜉, 𝜎), 𝑎) 11 𝑉Φ = 𝑉Φ ∪ {complete(𝜉, 𝜎)} 12 𝑉Ψ , 𝐸Ψ = Ψ(𝜎, 𝑉Φ ∪ 𝑉 ) 13 𝑉Λ , 𝐸Λ = Λ(𝜎, 𝜉, 𝑉Ψ ∪ 𝑉Φ ∪ 𝑉 ) 14 for 𝑎′ ∈ 𝐸Λ ∪ 𝐸Ψ : 15 for (𝑞𝑘′ ∈ 𝐶𝑘 , Σ(𝑞𝑘′ )) ∈ pre(𝑎′ ): 16 queue.push(((𝑞𝑘′ ∈ 𝐶𝑘 ), Σ(𝑞𝑘′ ), 𝑎′ )) 17 𝑉 = 𝑉Λ ∪ 𝑉Ψ ∪ 𝑉Φ ∪ 𝑉 18 𝐸 = 𝐸Λ ∪ 𝐸Ψ ∪ 𝐸 19 queue.push((𝑞𝑖 ∈ 𝐶, 𝜇, 𝑎)) 20 return (𝑉, 𝐸) Figure 3-5: Backward search to construct relaxed plan graph. correspond to picking the object up from some canonical locations, including its pose in the initial state or in other vertices in 𝑉 . Finally, we consider ways of moving within the mode: using an RRT, for example, we can generate new robot configurations that are connected to existing vertices via instances of the Move action template, subject to the mode constraints. Note that, we leave out the poses of other moveable objects when we generate relaxed mode constraints because an action not currently performable may be applicable with a different configuration of objects. Now, for any of the actions we have sampled so far, we would independently push each of their preconditions onto the queue; the mode for each of those preconditions would be the assignments of the other variables in the precondition. 52 (a) Problem 1: Moving blue block, clearing green block, and stacking purple cylinder (b) Problem 2: Regrasping of blue and green blocks (c) Problem 3: Sorting blue and green blocks task similar to [15] Figure 3-6: The initial and final state for each problem. 53 3.2.3 Search Heuristics Our reachability graph is a type of relaxed plan graph that contains some optimistic action sequences for achieving conditions in the domain. An estimate of how many steps are required to achieve one particular condition from the starting state is simply the length of the shortest path in 𝐺 from the a vertex in which that condition holds to the starting state. A goal is made up of a conjunction of conditions. The simplest heuristic, ℎadd simply adds up the heuristic estimates for each of the conditions and uses that as the overall estimate. This may result in a significant overestimation of the cost, in the case that one action might make several conditions true simultaneously. A more robust estimate can be made by searching within 𝐺 for a set of actions that constitute paths from the starting state to a set of states that individually satisfy the conditions; the heuristic ℎFF returns a value that is the size of that set. Finally, note that a finite heuristic cost is a necessary condition for the existence of valid plan succeeding from the current state. An infinite cost indicates that at least one goal condition is not currently satisfiable even under the relaxed planning assumption. Thus, the search control does not need to expand states until they have a finite heuristic cost because only then will it be clear that the current state could possibly be on a satisfying plan. This additionally helps determine how long the reachability graph should be grown. In our experiments, the reachability graph often contains the first step of a feasible plan as soon as it achieves a finite heuristic cost. 3.3 Completeness We sketch an argument that HHBF is probabilistically complete if the mode samplers and planners it is provided are themselves effective. Definition 1. A manipulation problem ⟨𝑠0 , Γ, Λ, Φ, Ψ⟩ is robustly feasible if and only if the following conditions hold: ∙ There exists a valid plan composed of a sequence of action instances that transform 𝑠0 into a state that satisfies all the conditions in Γ. 54 ∙ There exists some 𝜖 such that every perturbation to configuration and mode action instance arguments in the feasible plan of total distance less than 𝜖 is also a valid plan. ∙ For each movement through the configuration space under a given mode, the corresponding mode planner is probabilistically complete. ∙ For each mode parameter and condition there is an 𝜖-ball of satisfying parameters and conditions. ∙ Given an 𝜖-ball of satisfying modes, the mode samplers will generate an action that causes a satisfying mode with nonzero probability. ∙ Given an 𝜖-ball of satisfying condition values, the condition samplers will generate an action that generates a sample with nonzero probability. Definition 2. A manipulation planning algorithm is probabilistically complete if and only if for every robustly feasible manipulation problem, the probability of finding a solution approaches one in the limit as the number of samples approaches infinity. Theorem 1. Given any robustly feasible manipulation planning problem, the ReachabilityGraph procedure called from the initial state will eventually generate an immediate successor action that lies on a feasible solution with probability one as the number of samples approaches infinity. Proof Sketch: First note that a sampler with a fixed, nonzero probability of sampling a satisfying value will generate a valid sample with probability one in the limit. Let 𝑃 be a robustly feasible solution to the manipulation planning problem. Plan 𝑃 can be distilled down to the sequence of action instances that first satisfy some goal condition necessary later on the plan. This forms a reachability plan that is a sub-sequence of the original plan. Inductively apply this argument starting at the end of the reachability plan. For the sequence of action instances in the current mode, the condition samplers have nonzero probability of constructing a goal sample. This applies to the mode samplers as well. Finally, the planner sampler is assumed to be 55 probabilistically complete so it will find a satisfying robust plan that connects the goal sample and mode sample. Thus, these action instances have nonzero probability of being generated in the limit. When applied backwards to the beginning of the reachability plan, it follows that a first action instance on a feasible plan is generated. Theorem 2. HHBF is probabilistically complete. Proof Sketch: Consider any robustly feasible manipulation problem. Suppose that the algorithm functions non-deterministically and can consider all successor branches at once without loss of generality. Starting with the initial state, the previous theorem indicates that the ReachabilityGraph grown on the first state will generate a first action on some feasible solution to the manipulation problem with probability one in the limit. The search control then creates a branch with a new state resulting from applying this action. The manipulation problem formed by replacing the initial state with this new state must also be robustly feasible by transitivity. This argument can be inductively applied for the finite number of steps corresponding to the length of the plan. The resulting sequence of action instances is a valid plan because it generate the state sequence. Because each action is generated with probability one in the limit, the probability of generating the plan is one in the limit by addition of limits. 3.4 Results We applied HHBF to three different manipulation problems to characterize its performance. Solving these problems requires stacking, regrasping, and long-horizon manipulation. The planner and PR2 robot manipulation simulations were written in Python using OpenRAVE [9]. In each problem, red objects represent moveable objects that have no particular goal condition. However, they impose geometric constraints on the problem and must, in many cases, be manipulated in order to produce a satisfying plan. For example, in problem 1, a stacked red block prevents the green block from being clear, so the planner first plans to unstack it before moving the 56 P 1 2 3 t 29 24 300 No ℎ m 8 8 0 s 168 71 828 ℎmax t m s 22 5 45 21 4 20 300 0 1083 ℎAdd t m 15 3 20 4 167 115 s 19 21 92 ℎFF t m s 16 3 19 19 4 19 99 33 96 ℎFF , 𝐻𝐴 t m s 14 2 20 22 5 18 78 20 77 Table 3.1: Manipulation experiment results. green block. Problem 3 is designed to be comparable to trial 6 of the FFRob planning system [15]. Each problem uses the same set of samplers and planners Π = (Λ, Φ, Ψ). These constitute a set of skills for the robot that are only invoked when determined useful by the reachability graph. Condition samplers Φ sample goal placements within a region or on an object or sample grasps. The placement sampling is done by Monte Carlo sampling of poses on a polygonal surface. The grasp sampling chooses from a finite set of grasps for polyhedral obstacles and from a infinite set of grasps for cylindrical objects. Mode samplers Ψ(𝜎, 𝑉 ) sample action instances such as Pick, Place, Unstack and Stack that transition between modes given existing, inverse reachability, and inverse kinematics samples. Finally Mode planners Λ(𝜎, 𝜉, 𝑉 ) sample action trajectories through the robot’s configuration space for both when the robot’s hand is empty and when it is holding an object. This is done by chaining bidirectional RRT’s in separate configuration spaces of the robot’s base and arm. We compared the performance of the following heuristics on each problem: No ℎ, ℎmax , ℎAdd , and ℎFF . We also compared ℎFF , 𝐻𝐴 which, in addition to using ℎFF , includes helpful action pruning as in the FF planning system [22]. Helpful action pruning is a method used in task planning to further reduce the candidate set of successor actions by limiting successors to branches on the relaxed plan. Namely, only actions that achieve values on FF’s computed relaxed plan are considered. This means that the search control considers only a subset of the set of useful actions computed by the reachability graph when progressing the search. Although this technique is not complete, it is well known to reduce search effort. We enforce a five minute timeout on all experiments. In practice, restarting the 57 search after a fixed amount of time, as in many sample-based robotic planning algorithms, may improve performance. There were 20 trials per problem all conducted on a 2.6GHz Intel Core i7. Each entry in Table 3.1 reports the median runtime (t), median absolute deviation, (MAD) of the runtimes (m), and median number of states (s) expanded in the forward search. The median-based statistics are used to be robust against outliers. The statistics for trials that failed to find a solution are included in the entries. Thus, table entries with a runtime of 300 and MAD of 0 did not find a plan for any trial. The runtimes displayed encompass the full planning time since the algorithm requires no pre-processing time. The low number of states searched indicate that the heuristic is able to frequently guide the search almost directly to a solution. The runtimes are improvements over runtimes on comparable problems reported by Srivastava et al. [34] and over the problems posed to FFRob. The experiments show that the dynamic and directed nature of the algorithm allow it avoid oversampling while solving a larger number of problems involving stacking and regrasping. 58 Chapter 4 Conclusion We have shown how to use search and heuristic ideas from AI task planning to create new algorithms to efficiently solve robotic planning problems. The FFRob planning algorithm uses data structures for multi-query motion planning algorithms to discretize integrated task and motion planning algorithms. These data structures allow quick computation of the FFRob heuristic which incorporates geometric reachability constraints into its cost estimate. This produces a deeply integrated task and motion planning system that was empirically shown to efficiently solve long pick-and-place problems on a PR2 robot. Using the geometric heuristic insights from FFRob, HHBF generalized these ideas to a general manipulation planning framework with diverse action types including stacking and regrasping. A key difference from FFRob is that HHBF dynamically generates action samples useful for the global forward search using backwards reasoning on the Reachability Graph. This enables the algorithm to generate fewer expensive action samples because it only generates samples when they are likely to satisfy a current goal. We also gave a template for defining additional action primitives that could include non-prehensile manipulation skills. This allows HHBF to solve a wide breadth of manipulation problems not just limited to pick-and-place task and motion planning problems. We also gave conditions for which HHBF is probabilistically complete. Finally, empirical results showed that it was even more efficient than FFRob on a higher fidelity simulator and that it seemed to outperform 59 the previous state-of-the-art planner by Srivastava et al. [34]. 4.1 Future Work In the future, we will look at applying HHBF to manipulation problems that involve non-prehensile actions such as pushing. Then, we will perform experiments on a real PR2 robot to plan and execute tasks such as preparing meals and cleaning rooms in real time. After standard performance engineering, HHBF will likely be able to solve these tasks in several seconds. Additional work will involve incorporating uncertainty into these planning problems to better model the real world, using replan to adapt to unexpected changes in the state, and exploiting hierarchical and factored planning to obtain further algorithmic improvements. Another direction for future research is to further generalize HHBF to solve arbitrary planning problems over continuous variables in infinite state-spaces. Although we focused on robotics applications, HHBF supports any action primitives that can be modeled by its action templates. There are potential applications in AI numerical planning, resource allocation, and control of other industrial systems [21]. These applications motivate more theoretical study into the decidability of these general problems and completeness of planning algorithms for this action template representation. Finally, our work focused on using heuristic search ideas from the FF Planning System because of its simplicity, but other planning systems have subsequently had even greater success such as the FastDownward and Context-Enhanced Additive Planning Systems [19, 20]. Ideas from these algorithms may lead to even better planning algorithms for hybrid systems. 60 Bibliography [1] R. Alami, J.-P. Laumond, and T.Siméon. Two manipulation planning algorithms. In WAFR, 1994. [2] Rachid Alami, Thierry Simeon, and Jean-Paul Laumond. A geometrical approach to planning manipulation tasks. the case of discrete placements and grasps. In ISRR, 1990. [3] Jennifer Barry, Leslie Pack Kaelbling, and Tomas Lozano-Perez. A hierarchical approach to manipulation with diverse actions. In Robotics and Automation (ICRA), 2013 IEEE International Conference on, pages 1799–1806. IEEE, 2013. [4] Avrim L Blum and Merrick L Furst. Fast planning through planning graph analysis. Artificial intelligence, 90(1):281–300, 1997. [5] Blai Bonet and Hector Geffner. Planning as heuristic search: New results. In Proc. of 5th European Conf. on Planning (ECP), pages 360–372, 1999. [6] Blai Bonet and Héctor Geffner. Planning as heuristic search. Artificial Intelligence, 129(1):5–33, 2001. [7] Stephane Cambon, Rachid Alami, and Fabien Gravot. A hybrid approach to intricate motion, manipulation and task planning. International Journal of Robotics Research, 28, 2009. [8] Lavidra de Silva, Amit Kumar Pandey, Mamoun Gharbi, and Rachd Alami. Towards combining HTN planning and geometric task planning. In RSS Workshop on Combined Robot Motion Planning and AI Planning for Practical Applications, 2013. [9] Rosen Diankov and James Kuffner. Openrave: A planning architecture for autonomous robotics. Technical Report CMU-RI-TR-08-34, Robotics Institute, Carnegie Mellon University, 2008. [10] Mehmet Remzi Dogar and Siddhartha S. Srinivasa. A planning framework for non-prehensile manipulation under clutter and uncertainty. Auton. Robots, 33(3):217–236, 2012. 61 [11] Christian Dornhege, Patrick Eyerich, Thomas Keller, Sebastian Trüg, Michael Brenner, and Bernhard Nebel. Semantic attachments for domain-independent planning systems. In International Conference on Automated Planning and Scheduling (ICAPS), pages 114–121. AAAI Press, 2009. [12] Christian Dornhege, Andreas Hertle, and Bernhard Nebel. Lazy evaluation and subsumption caching for search-based integrated task and motion planning. In IROS workshop on AI-based robotics, 2013. [13] Esra Erdem, Kadir Haspalamutgil, Can Palaz, Volkan Patoglu, and Tansel Uras. Combining high-level causal reasoning with low-level geometric reasoning and motion planning for robotic manipulation. In IEEE International Conference on Robotics and Automation (ICRA), 2011. [14] Richard E. Fikes and Nils J. Nilsson. STRIPS: A new approach to the application of theorem proving to problem solving. Artificial Intelligence, 2:189–208, 1971. [15] Caelan Reed Garrett, Tomás Lozano-Pérez, and Leslie Pack Kaelbling. Ffrob: An efficient heuristic for task and motion planning. In International Workshop on the Algorithmic Foundations of Robotics (WAFR), 2014. [16] K. Hauser and J.C. Latombe. Integrating task and prm motion planning: Dealing with many infeasible motion planning queries. In ICAPS09 Workshop on Bridging the Gap between Task and Motion Planning, 2009. [17] Kris Hauser. The minimum constraint removal problem with three robotics applications. I. J. Robotic Res., 33(1):5–17, 2014. [18] Kris Hauser and Victor Ng-Thow-Hing. Randomized multi-modal motion planning for a humanoid robot manipulation task. IJRR, 30(6):676–698, 2011. [19] Malte Helmert. The fast downward planning system. Journal of Artificial Intelligence Research, 26:191–246, 2006. [20] Malte Helmert and Héctor Geffner. Unifying the causal graph and additive heuristics. In ICAPS, pages 140–147, 2008. [21] Jörg Hoffmann et al. The metric-ff planning system: Translating ”ignoring delete lists” to numeric state variables. J. Artif. Intell. Res.(JAIR), 20:291–341, 2003. [22] Jörg Hoffmann and Bernhard Nebel. The FF planning system: Fast plan generation through heuristic search. Journal Artificial Intelligence Research (JAIR), 14:253–302, 2001. [23] Leslie Pack Kaelbling and Tomas Lozano-Perez. Hierarchical planning in the now. In IEEE Conference on Robotics and Automation (ICRA), 2011. [24] L. E. Kavraki, P. Svestka, J.-C. Latombe, and M. H. Overmars. Probabilistic roadmaps for path planning in high-dimensional configuration spaces. IEEE Transactions on Robotics and Automation, 12(4):566–580, 1996. 62 [25] Fabien Lagriffoul, Dimitar Dimitrov, Alessandro Saffiotti, and Lars Karlsson. Constraint propagation on interval bounds for dealing with geometric backtracking. In IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), 2012. [26] Peter Leven and Seth Hutchinson. A framework for real-time path planning in changing environments. I. J. Robotic Res., 21(12):999–1030, 2002. [27] T. Lozano-Pérez, J. L. Jones, E. Mazer, P. A. O’Donnell, W. E. L. Grimson, P. Tournassoud, and A. Lanusse. Handey: A robot system that recognizes, plans, and manipulates. In IEEE International Conference on Robotics and Automation, 1987. [28] Tomás Lozano-Pérez. Automatic planning of manipulator transfer movements. IEEE Transactions on Systems, Man, and Cybernetics, 11:681–698, 1981. [29] Nils J. Nilsson. Shakey the robot. Technical Report 323, Artificial Intelligence Center, SRI International, Menlo Park, California, 1984. [30] Amit Kumar Pandey, Jean-Philippe Saut, Daniel Sidobre, and Rachid Alami. Towards planning human-robot interactive manipulation tasks: Task dependent and human oriented autonomous selection of grasp and placement. In RAS/EMBS International Conference on Biomedical Robotics and Biomechatronics, 2012. [31] Erion Plaku, Kostas E. Bekris, Brian Y. Chen, Andrew M. Ladd, and Lydia E. Kavraki. Sampling-based roadmap of trees for parallel motion planning. IEEE Transactions on Robotics, 21(4):597–608, 2005. [32] Erion Plaku and Gregory Hager. Sampling-based motion planning with symbolic, geometric, and differential constraints. In IEEE International Conference on Robotics and Automation (ICRA), 2010. [33] Thierry Siméon, Jean-Paul Laumond, Juan Cortés, and Anis Sahbani. Manipulation planning with probabilistic roadmaps. IJRR, 23:729–746, 2004. [34] Siddharth Srivastava, Eugene Fang, Lorenzo Riano, Rohan Chitnis, Stuart Russell, and Pieter Abbeel. Combined task and motion planning through an extensible planner-independent interface layer. In IEEE Conference on Robotics and Automation (ICRA), 2014. [35] Siddharth Srivastava, Lorenzo Riano, Stuart Russell, and Pieter Abbeel. Using classical planners for tasks with continuous operators in robotics. In ICAPS Workshop on Planning and Robotics (PlanRob), 2013. [36] Mike Stilman and James J. Kuffner. Planning among movable obstacles with artificial constraints. In Workshop on Algorithmic Foundations of Robotics (WAFR), 2006. 63 [37] Mike Stilman, Jan-Ulrich Schamburek, James J. Kuffner, and Tamim Asfour. Manipulation planning among movable obstacles. In IEEE International Conference on Robotics and Automation (ICRA), 2007. [38] Jur Van Den Berg, Mike Stilman, James Kuffner, Ming Lin, and Dinesh Manocha. Path planning among movable obstacles: a probabilistically complete approach. In Algorithmic Foundation of Robotics VIII, pages 599–614. Springer, 2009. [39] Gordon T. Wilfong. Motion planning in the presence of movable obstacles. In Symposium on Computational Geometry, pages 279–288, 1988. 64

Heuristic Search for Manipulation Planning Caelan Reed Garrett

Related documents

Products

Support

Heuristic Search for Manipulation Planning Caelan Reed Garrett

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib