Using State Space Splitting to Compute Heuristic in Planning

advertisement
Using State Space Splitting to Compute Heuristic
in Planning
Yacine Zemali
Patrick Fabiani
Malik Ghallab
SupaeĢro (ENSAE)
10 avenue Edouard
Belin, BP 4032, 31055 Toulouse cedex - FRANCE
ONERA / DCSD - Centre de Toulouse
2 avenue Edouard Belin, BP 4025, 31055 Toulouse cedex 4 - FRANCE
LAAS-CNRS / RIA
7 avenue du Colonel Roche, 31077 Toulouse Cedex 4 - FRANCE
zemali, fabiani @onera.fr, malik@laas.fr
Abstract
This position paper presents a new approach in order to compute an admissible
and informative heuristic to be used in an informed search algorithm. In order to
compute our heuristic, we use the TokenPlan planner developed in our group. This
planner is based on Petri nets and token propagation. It develops a planning graph
quite similar to Graphplan’s one, but with a very useful additional feature: each
node of the graph (proposition or action) is assigned a class and a color. As FF
uses Graphplan’s graph to compute its heuristic, we want to use TokenPlan’s graph
features to compute a heuristic. The gain is that our heuristic will be admissible
and informative thanks to the fact that it is computed by taking into account some
of the negative interactions between actions.
Introduction
The planning community has seen a lot of approaches for solving planning problems.
Two techniques have distinguished themselves by their good performances.
The first one relies on disjunctive planning, represented by Graphplan [BF97] and its
successors. The most recent Graphplan-based planner is STAN [LF99]. It is an optimal
planner. Its main enhancement since Graphplan is that it performs a number of preprocessing analyses on the domain before planning.
The second approach is based on state space search algorithms guided with heuristics. In most heuristic planner, the heuristic is computed automatically by considering
a relaxed problem. For example, HSP [BG98] computes a heuristic by making the assumption that subgoals are independent: it does not take into account neither positive
interactions nor negative ones.
Those two state-of-the-art planning methods are often seen as orthogonal [Wel99], but
several planners are trying to take advantage from both those methods to be more efficient. In particular, the main problem while computing a heuristic is to take into
account the maximum of interaction between the subgoals. In other words, the relaxed
problem should not make too strong assumptions about the independence of subgoals.
That is why we can see the emergence of planners like FF [Hof00, Hof01] which takes
into account all positive interactions by using Graphplan’s planning graph. In the same
way, AltAlt [NNK00, NK01] uses the planning graph produced by STAN to compute
its heuristic.
Those last planners can be seen as “hybrid” methods, between the disjunctive planning
and the heuristic planning.
We propose a heuristic calculation method which takes into account some of the negative interactions. Our proposal is also an hybrid approach: we want to use our disjunctive planner to compute a heuristic, and then use it with an informed search algorithm.
In this paper, we present our hybrid method. We first present some recent developments
in heuristic planning. In the second section, we present our disjunctive planner. In the
last section, we explain the way to compute our heuristic.
Recent developments in heuristic planning
HSPr [BG99] is an extension from HSP which takes into account some negative interaction while computing its heuristic. This planner uses a WA* algorithm performing
a backward search from the goal to find a valid plan. HSPr* [HG00] computes an
admissible heuristic using a shortest path algorithm. It finds an optimal solution using
an IDA* algorithm to search the regression space. HSP 2.0 [BG01] is a planner that
allows to switch between several heuristics. It uses a WA* search algorithm in which
we can choose the value of the
parameter. HSP 2.0 also gives the choice between
forward and backward search.
FF uses the Graphplan algorithm to compute the length of a solution of a relaxed problem (actions have no delete list). This length is then used as a heuristic for a fast search
algorithm: enforced hill-climbing. As this algorithm is not complete, if no solution is
found, FF switches to a WA* algorithm.
One main advantage of FF is the fact that it takes into account positive interactions of
the actions to compute its heuristic.
Of course taking into account also the negative effects would give a much better heuristic (in fact the real length of the valid plan found by Graphplan), but computing such a
heuristic with this method is equivalent to solve the problem with Graphplan.
An extension of this approach is done with the AltAlt planner. This planner can
take into account the negative effects of the actions to compute an admissible heuristic,
by using the STAN planner. The calculation of this heuristic itself is computationally
costly: the algorithm must evaluate some mutex relations. This planner uses HSPr’s
regression search algorithm from the goal to find a valid plan.
TokenPlan
Our planner, TokenPlan [MF01, MF00], is designed to take advantage of the state space
splitting. The notion of “splitting” was first presented in [Kam97].
The “splitting” strategy is controlled by the user who should give some simple rules of
splitting in the PDDL [MC98] domain description. To explain our notion of splitting,
consider the following example:
A FSS (Forward State Space) search type of approach performs a full splitting. Indeed,
as soon as an action is introduced in a plan prefix - narrowing down the current set of
potential plans - the resulting set is pushed in a new branch of the search tree. On the
contrary, a (disjunctive) Graphplan-like approach does no splitting at all: all the possible actions are introduced together, and the set of all the potential plans is considered
as a whole when continuing planning.
The classes introduced in TokenPlan allow to achieve an intermediate splitting, between Graphplan and FSS, and to adapt it to each problem.
TokenPlan transforms a planning domain written in PDDL into an interpreted Petri net.
The important thing is not the Petri net itself, but the fact that a Petri net works with
places, transitions and tokens. It is exactly these tokens which are very important in
the splitting process: they carry the information (classes and colors) that we use to construct the planning graph. There are action tokens and proposition tokens, equivalent
to Graphplan’s action and proposition nodes.
When a place contains one or more tokens, it is said to be marked. When a transition
is triggered, one token of each of its input places is “consumed”, and all of its output
places are marked. This new marking can allow other transitions to be triggered and
so on. The planning proceeds as follow: the marking of the initial state is introduced in
the net (one token per proposition). The next marking is obtained by triggering every
possible transition from the initial state. We get the following marking by propagating tokens from this new marking exactly the same way, and so on. By memorizing
the positions of the tokens (the marking), and their moves step after step, we obtain a
leveled graph very similar to Graphplan’s one. Some of the mutex relations are easily
encoded by the colors of tokens. Tokens can also belong to one or several classes. Each
transition may modify this class. This mechanism allows to split the space search into
classes. The solution plan is then extracted using CSP tools.
Intermediate splitting is particularly valuable for optimization [FM00]. Indeed, it allows to group propositions according to the value they have when together. It allows to
structure the search space in sets of states with an equivalent value with respect to what
is to be optimized. One application of this is the optimization of a utility or a cost. We
can group states having the same utility in a same class, it is then possible to extract an
optimal plan (with the best utility) within a disjunctive approach.
Space splitting has more applications than optimization: it could be used for handling
conditional effects or uncertainty, it also can be used to simplify the computation of
some mutual exclusion, but the issue addressed here is to use it to compute a heuristic.
Our approach: using state space splitting to compute
heuristics
We propose to use our splitting approach to compute a heuristic and then use it with an
informed search algorithm such as A*, IDA* or hill-climbing.
Our proposal is not to take into account all the mutex relations, but just those which can
be encoded simply by the colors and classes in the token propagation process. Typically, classes and colors allow to take implicitly into account many permanent mutexes,
such as the fact that a given object cannot be at many different places at a time. However, other splitting strategies will be studied.
Indeed, thanks to state space splitting, TokenPlan builds a deeper search graph, developing more levels than Graphplan, before backtracking for the first time in search for a
solution. TokenPlan assesses more precisely the interactions between actions. We plan
B
B
B
B
A
B
A
A
A
A
B
backtrack
A
n
backtrack
Figure 1: estimation of a heuristic with
Graphplan
The goals are and . The heuristic
provided by the length (in terms
of number of levels) of the Graphplan’s planning graph is not very informative.
n
Figure 2: estimation of a heuristic with
TokenPlan
The heuristic computed by TokenPlan is simply the number of levels in
the planning graph :
. We
have
.
to use the obtained number of levels, which is an admissible heuristic since the first
solution will have at least that length.
On the other hand, developing more levels requires more computation time. Yet, the
use of colors and classes allows to filter the number of applicable actions per level, thus
reducing the amount of work per level. Furthermore, it is the plan extraction phase of
TokenPlan, not the graph building phase, that remains the most time-consuming phase
of the planning process. Lastly, we will apply it on relaxed problems on which the
graph building phase is very quick.
Thus, we expect that for a reasonable amount of work (to be assessed), TokenPlan, applied to a relaxed problem, will output a more informative admissible heuristic than the
one given by the length of the parallel plan found with Graphplan for the same problem
(see figures 2 and 1).
For example, if we adopt a full splitting approach (all possible states are then
present in the planning graph, thus all the interactions between actions are taken into
account), the planning process will not backtrack, because it evaluates all the nodes until it finds an exact solution. The depth of a such planning graph (in a full splitting case:
a tree) directly gives the length of a solution plan. Of course a full splitting approach
is very computationally costly and is useless for computing a heuristic: it requires as
much work as finding the solution. Our approach is intermediate.
Our idea, in order to compute our heuristic, is to partially ignore the delete list, and to
propagate some mutex relations (using the colored tokens and some classes). Indeed,
the depth of the planning graph would give an admissible heuristic. This heuristic is
more informative than the depth of a graph produced by Graphplan for the same relaxed problem. The computed heuristic can be used with optimal search methods such
as the A* algorithm.
The main difficulty is to find a proper splitting strategy. So far, we have used splitting
rules adapted for optimization problems. In order to compute our admissible heuristic,
we need to split the search space in a way which maximizes the depth of the search
graph when it reaches all the goals (the first point of backtrack in the original TokenPlan), while maintaining a good computation time, and without overestimating the plan
length. We have to find a compromise.
Perspectives
The planning graph structure used by TokenPlan could provide a means for computing
admissible heuristics. More precisely, we propose to study the benefits of using tokens
propagation to avoid calculation of a certain number of mutex relations while taking
some negative interactions into account.
It means that our planning graph would take into account all positive interactions between the actions, and also part of the negative interactions. The length of the obtained
planning graph would provide an informative-admissible heuristic.
Now we must begin an experimental stage to find how efficient this approach is. We
have to study the different ways of using search space splitting to encode some mutex
relations. We must also study it on various classical planning domains to compare domains with strong interactions between subgoals and domains without.
References
[BF97]
A.L. Blum and M.L. Furst. Fast planning through planning graph analysis.
Artificial Intelligence, 90(1–2):281–300, 1997.
[BG98]
B. Bonet and H. Geffner. Hsp: Heuristic search planner. In Planning Competition of the 4th International Conference on Artificial Intelligence Planning and Scheduling (AIPS-98), 1998.
[BG99]
B. Bonet and H. Geffner. Planning as heuristic search: New results. In
ECP-99, pages 360–372, 1999.
[BG01]
B. Bonet and H. Geffner. Heuristic search planner 2.0. AI Magazine,
22(3):77–80, 2001.
[FM00]
P. Fabiani and Y. Meiller. Planning with tokens. In ECAI-workshop on New
Results in Planning, Scheduling and Design (PuK2000), 2000.
[HG00]
P. Haslum and H. Geffner. Admissible heuristics for optimal planning. In
Artificial Intelligence Planning Systems, pages 140–149, 2000.
[Hof00]
J. Hoffmann. A heuristic for domain independent planning and its use in an
enforced hill-climbing algorithm. In ISMIS-00, pages 216–227, 2000.
[Hof01]
J. Hoffmann. FF: The fast-forward planning system.
22(3):57–62, 2001.
AI Magazine,
[Kam97] S. Kambhampati. Challenges in bridging plan-synthesis paradigms. In
IJCAI-97, 1997.
[LF99]
D. Long and M. Fox. Efficient implementation of the plan graph in STAN.
Journal of Artificial Intelligence Research, 10:87–115, 1999.
[MC98]
D. McDermott and AIPS-98 Planning Competition Committee. PDDL -The
Planning Domain Definition Language Version 1.2, 1998.
[MF00]
Y. Meiller and P. Fabiani. Planning with petri nets. In RJCIA-00, Lyon,
September 2000.
[MF01]
Y. Meiller and P. Fabiani. Tokenplan ; a planner for both satisfaction and
optimization problems. AI Magazine, 22(3):85–87, 2001.
[NK01]
X.L. Nguyen and S. Kambhampati. Reviving partial order planning. In
IJCAI-01, 2001.
[NNK00] R.S. Nigenda, X.L. Nguyen, and S. Kambhampati. Altalt: Combining the
advantages of graphplan and heuristic state search. In International Conference on Knowledge-based Computer Systems, 2000.
[Wel99]
D.S. Weld. Recent advances in AI planning. AI Magazine, 20(2):93–123,
1999.
Download