Using State Space Splitting to Compute Heuristic in Planning Yacine Zemali Patrick Fabiani Malik Ghallab SupaeĢro (ENSAE) 10 avenue Edouard Belin, BP 4032, 31055 Toulouse cedex - FRANCE ONERA / DCSD - Centre de Toulouse 2 avenue Edouard Belin, BP 4025, 31055 Toulouse cedex 4 - FRANCE LAAS-CNRS / RIA 7 avenue du Colonel Roche, 31077 Toulouse Cedex 4 - FRANCE zemali, fabiani @onera.fr, malik@laas.fr Abstract This position paper presents a new approach in order to compute an admissible and informative heuristic to be used in an informed search algorithm. In order to compute our heuristic, we use the TokenPlan planner developed in our group. This planner is based on Petri nets and token propagation. It develops a planning graph quite similar to Graphplan’s one, but with a very useful additional feature: each node of the graph (proposition or action) is assigned a class and a color. As FF uses Graphplan’s graph to compute its heuristic, we want to use TokenPlan’s graph features to compute a heuristic. The gain is that our heuristic will be admissible and informative thanks to the fact that it is computed by taking into account some of the negative interactions between actions. Introduction The planning community has seen a lot of approaches for solving planning problems. Two techniques have distinguished themselves by their good performances. The first one relies on disjunctive planning, represented by Graphplan [BF97] and its successors. The most recent Graphplan-based planner is STAN [LF99]. It is an optimal planner. Its main enhancement since Graphplan is that it performs a number of preprocessing analyses on the domain before planning. The second approach is based on state space search algorithms guided with heuristics. In most heuristic planner, the heuristic is computed automatically by considering a relaxed problem. For example, HSP [BG98] computes a heuristic by making the assumption that subgoals are independent: it does not take into account neither positive interactions nor negative ones. Those two state-of-the-art planning methods are often seen as orthogonal [Wel99], but several planners are trying to take advantage from both those methods to be more efficient. In particular, the main problem while computing a heuristic is to take into account the maximum of interaction between the subgoals. In other words, the relaxed problem should not make too strong assumptions about the independence of subgoals. That is why we can see the emergence of planners like FF [Hof00, Hof01] which takes into account all positive interactions by using Graphplan’s planning graph. In the same way, AltAlt [NNK00, NK01] uses the planning graph produced by STAN to compute its heuristic. Those last planners can be seen as “hybrid” methods, between the disjunctive planning and the heuristic planning. We propose a heuristic calculation method which takes into account some of the negative interactions. Our proposal is also an hybrid approach: we want to use our disjunctive planner to compute a heuristic, and then use it with an informed search algorithm. In this paper, we present our hybrid method. We first present some recent developments in heuristic planning. In the second section, we present our disjunctive planner. In the last section, we explain the way to compute our heuristic. Recent developments in heuristic planning HSPr [BG99] is an extension from HSP which takes into account some negative interaction while computing its heuristic. This planner uses a WA* algorithm performing a backward search from the goal to find a valid plan. HSPr* [HG00] computes an admissible heuristic using a shortest path algorithm. It finds an optimal solution using an IDA* algorithm to search the regression space. HSP 2.0 [BG01] is a planner that allows to switch between several heuristics. It uses a WA* search algorithm in which we can choose the value of the parameter. HSP 2.0 also gives the choice between forward and backward search. FF uses the Graphplan algorithm to compute the length of a solution of a relaxed problem (actions have no delete list). This length is then used as a heuristic for a fast search algorithm: enforced hill-climbing. As this algorithm is not complete, if no solution is found, FF switches to a WA* algorithm. One main advantage of FF is the fact that it takes into account positive interactions of the actions to compute its heuristic. Of course taking into account also the negative effects would give a much better heuristic (in fact the real length of the valid plan found by Graphplan), but computing such a heuristic with this method is equivalent to solve the problem with Graphplan. An extension of this approach is done with the AltAlt planner. This planner can take into account the negative effects of the actions to compute an admissible heuristic, by using the STAN planner. The calculation of this heuristic itself is computationally costly: the algorithm must evaluate some mutex relations. This planner uses HSPr’s regression search algorithm from the goal to find a valid plan. TokenPlan Our planner, TokenPlan [MF01, MF00], is designed to take advantage of the state space splitting. The notion of “splitting” was first presented in [Kam97]. The “splitting” strategy is controlled by the user who should give some simple rules of splitting in the PDDL [MC98] domain description. To explain our notion of splitting, consider the following example: A FSS (Forward State Space) search type of approach performs a full splitting. Indeed, as soon as an action is introduced in a plan prefix - narrowing down the current set of potential plans - the resulting set is pushed in a new branch of the search tree. On the contrary, a (disjunctive) Graphplan-like approach does no splitting at all: all the possible actions are introduced together, and the set of all the potential plans is considered as a whole when continuing planning. The classes introduced in TokenPlan allow to achieve an intermediate splitting, between Graphplan and FSS, and to adapt it to each problem. TokenPlan transforms a planning domain written in PDDL into an interpreted Petri net. The important thing is not the Petri net itself, but the fact that a Petri net works with places, transitions and tokens. It is exactly these tokens which are very important in the splitting process: they carry the information (classes and colors) that we use to construct the planning graph. There are action tokens and proposition tokens, equivalent to Graphplan’s action and proposition nodes. When a place contains one or more tokens, it is said to be marked. When a transition is triggered, one token of each of its input places is “consumed”, and all of its output places are marked. This new marking can allow other transitions to be triggered and so on. The planning proceeds as follow: the marking of the initial state is introduced in the net (one token per proposition). The next marking is obtained by triggering every possible transition from the initial state. We get the following marking by propagating tokens from this new marking exactly the same way, and so on. By memorizing the positions of the tokens (the marking), and their moves step after step, we obtain a leveled graph very similar to Graphplan’s one. Some of the mutex relations are easily encoded by the colors of tokens. Tokens can also belong to one or several classes. Each transition may modify this class. This mechanism allows to split the space search into classes. The solution plan is then extracted using CSP tools. Intermediate splitting is particularly valuable for optimization [FM00]. Indeed, it allows to group propositions according to the value they have when together. It allows to structure the search space in sets of states with an equivalent value with respect to what is to be optimized. One application of this is the optimization of a utility or a cost. We can group states having the same utility in a same class, it is then possible to extract an optimal plan (with the best utility) within a disjunctive approach. Space splitting has more applications than optimization: it could be used for handling conditional effects or uncertainty, it also can be used to simplify the computation of some mutual exclusion, but the issue addressed here is to use it to compute a heuristic. Our approach: using state space splitting to compute heuristics We propose to use our splitting approach to compute a heuristic and then use it with an informed search algorithm such as A*, IDA* or hill-climbing. Our proposal is not to take into account all the mutex relations, but just those which can be encoded simply by the colors and classes in the token propagation process. Typically, classes and colors allow to take implicitly into account many permanent mutexes, such as the fact that a given object cannot be at many different places at a time. However, other splitting strategies will be studied. Indeed, thanks to state space splitting, TokenPlan builds a deeper search graph, developing more levels than Graphplan, before backtracking for the first time in search for a solution. TokenPlan assesses more precisely the interactions between actions. We plan B B B B A B A A A A B backtrack A n backtrack Figure 1: estimation of a heuristic with Graphplan The goals are and . The heuristic provided by the length (in terms of number of levels) of the Graphplan’s planning graph is not very informative. n Figure 2: estimation of a heuristic with TokenPlan The heuristic computed by TokenPlan is simply the number of levels in the planning graph : . We have . to use the obtained number of levels, which is an admissible heuristic since the first solution will have at least that length. On the other hand, developing more levels requires more computation time. Yet, the use of colors and classes allows to filter the number of applicable actions per level, thus reducing the amount of work per level. Furthermore, it is the plan extraction phase of TokenPlan, not the graph building phase, that remains the most time-consuming phase of the planning process. Lastly, we will apply it on relaxed problems on which the graph building phase is very quick. Thus, we expect that for a reasonable amount of work (to be assessed), TokenPlan, applied to a relaxed problem, will output a more informative admissible heuristic than the one given by the length of the parallel plan found with Graphplan for the same problem (see figures 2 and 1). For example, if we adopt a full splitting approach (all possible states are then present in the planning graph, thus all the interactions between actions are taken into account), the planning process will not backtrack, because it evaluates all the nodes until it finds an exact solution. The depth of a such planning graph (in a full splitting case: a tree) directly gives the length of a solution plan. Of course a full splitting approach is very computationally costly and is useless for computing a heuristic: it requires as much work as finding the solution. Our approach is intermediate. Our idea, in order to compute our heuristic, is to partially ignore the delete list, and to propagate some mutex relations (using the colored tokens and some classes). Indeed, the depth of the planning graph would give an admissible heuristic. This heuristic is more informative than the depth of a graph produced by Graphplan for the same relaxed problem. The computed heuristic can be used with optimal search methods such as the A* algorithm. The main difficulty is to find a proper splitting strategy. So far, we have used splitting rules adapted for optimization problems. In order to compute our admissible heuristic, we need to split the search space in a way which maximizes the depth of the search graph when it reaches all the goals (the first point of backtrack in the original TokenPlan), while maintaining a good computation time, and without overestimating the plan length. We have to find a compromise. Perspectives The planning graph structure used by TokenPlan could provide a means for computing admissible heuristics. More precisely, we propose to study the benefits of using tokens propagation to avoid calculation of a certain number of mutex relations while taking some negative interactions into account. It means that our planning graph would take into account all positive interactions between the actions, and also part of the negative interactions. The length of the obtained planning graph would provide an informative-admissible heuristic. Now we must begin an experimental stage to find how efficient this approach is. We have to study the different ways of using search space splitting to encode some mutex relations. We must also study it on various classical planning domains to compare domains with strong interactions between subgoals and domains without. References [BF97] A.L. Blum and M.L. Furst. Fast planning through planning graph analysis. Artificial Intelligence, 90(1–2):281–300, 1997. [BG98] B. Bonet and H. Geffner. Hsp: Heuristic search planner. In Planning Competition of the 4th International Conference on Artificial Intelligence Planning and Scheduling (AIPS-98), 1998. [BG99] B. Bonet and H. Geffner. Planning as heuristic search: New results. In ECP-99, pages 360–372, 1999. [BG01] B. Bonet and H. Geffner. Heuristic search planner 2.0. AI Magazine, 22(3):77–80, 2001. [FM00] P. Fabiani and Y. Meiller. Planning with tokens. In ECAI-workshop on New Results in Planning, Scheduling and Design (PuK2000), 2000. [HG00] P. Haslum and H. Geffner. Admissible heuristics for optimal planning. In Artificial Intelligence Planning Systems, pages 140–149, 2000. [Hof00] J. Hoffmann. A heuristic for domain independent planning and its use in an enforced hill-climbing algorithm. In ISMIS-00, pages 216–227, 2000. [Hof01] J. Hoffmann. FF: The fast-forward planning system. 22(3):57–62, 2001. AI Magazine, [Kam97] S. Kambhampati. Challenges in bridging plan-synthesis paradigms. In IJCAI-97, 1997. [LF99] D. Long and M. Fox. Efficient implementation of the plan graph in STAN. Journal of Artificial Intelligence Research, 10:87–115, 1999. [MC98] D. McDermott and AIPS-98 Planning Competition Committee. PDDL -The Planning Domain Definition Language Version 1.2, 1998. [MF00] Y. Meiller and P. Fabiani. Planning with petri nets. In RJCIA-00, Lyon, September 2000. [MF01] Y. Meiller and P. Fabiani. Tokenplan ; a planner for both satisfaction and optimization problems. AI Magazine, 22(3):85–87, 2001. [NK01] X.L. Nguyen and S. Kambhampati. Reviving partial order planning. In IJCAI-01, 2001. [NNK00] R.S. Nigenda, X.L. Nguyen, and S. Kambhampati. Altalt: Combining the advantages of graphplan and heuristic state search. In International Conference on Knowledge-based Computer Systems, 2000. [Wel99] D.S. Weld. Recent advances in AI planning. AI Magazine, 20(2):93–123, 1999.