Proceedings of the Eighth International Symposium on Combinatorial Search (SoCS-2015) No One SATPlan Encoding To Rule Them All Tomáš Balyo Roman Barták Karlsruhe Institute of Technology Karlsruhe, Germany tomas.balyo@kit.edu Charles University in Prague, Faculty of Math. and Physics Prague, Czech Republic bartak@ktiml.mff.cuni.cz Abstract problem. In contrast, our approach is to select a set of diverse encoding schemes and design a rule that can choose the best encoding from this set for a given problem instance. The idea is closely related to the approach of sequential planning portfolios (Seipp et al. 2012). We also revisit action ranking, which is a crucial aspect of the Relaxed Relaxed Exists-Step encoding used in our set of encodings. In the experimental section of the paper we compare our new method to state-of-the-art encodings on the complete set of benchmark problems from the 2011 International Planning Competition (IPC) (Coles et al. 2012). The experimental results show that our new approach is a significant improvement over the state-of-the-art encodings with respect to both the number of solved problems and the time required to solve them. Solving planning problems via translation to propositional satisfiability (SAT) is one of the most successful approaches to automated planning. An important aspect of this approach is the encoding, i.e., the construction of a propositional formula from a given planning problem instance. Numerous encoding schemes have been proposed in the recent years each aiming to outperform the previous encodings on the majority of the benchmark problems. In this paper we take a different approach. Instead of trying to develop a new encoding that is better for all kinds of benchmarks we take recently developed specialized encoding schemes and design a method to automatically select the proper encoding for a given planning problem instance. In the paper we also examine ranking heuristics for the Relaxed Relaxed Exists-Step encoding, which plays an important role in our algorithm. Experiments show that our new approach significantly outperforms the state-of-the-art encoding schemes when compared on the benchmarks of the 2011 International Planning Competition. Preliminaries In this section we give the formal definitions of automated planning. We will use the multivalued SAS+ formalism (Bäckström and Nebel 1995) instead of the classical STRIPS formalism (Fikes and Nilsson 1971) based on propositional logic. A planning task Π in the SAS+ formalism is defined as a tuple Π = (X, O, sI , sG ) where Introduction One of the most successful approaches to automated planning is encoding the planning problem into a series of satisfiability (SAT) formulas and then using a SAT solver to solve them. The method was first introduced by Kautz and Selman (Kautz and Selman 1992) and is still very popular and competitive. This is partly due to the power of SAT solvers, which are getting more efficient year by year. Since then many new improvements have been made to the method, such as new compact and efficient encodings (Huang et al. 2010; Rintanen et al. 2006; Robinson et al. 2009; Balyo 2013), better ways of scheduling the SAT solvers (Rintanen et al. 2006), and modifying the SAT solver’s heuristics to be more suitable for solving planning problems (Rintanen 2012). In this paper we focus on the problem of encoding, i.e., the construction of propositional formulas from planning problem instances. The traditional approach is to design a universal encoding that works well for the largest subset of the benchmark problems and then use that encoding for each • X = {x1 , . . . , xn } is a set of multivalued variables with finite domains dom(xi ). • O is a set of actions (or operators). Each action a ∈ O is a tuple (pre(a), eff(a)) where pre(a) is the set of preconditions of a and eff(a) is the set of effects of a. Both preconditions and effects are of the form xi = v where v ∈ dom(xi ). • A state is a set of assignments to the state variables. Each state variable has exactly one value assigned from its respective domain. We denote by S the set of all states. sI ∈ S is the initial state. sG is a partial assignment of the state variables (not all variables have assigned values) and a state s ∈ S is a goal state if sG ⊆ s. An action a is applicable in the given state s if pre(a) ⊆ s. By s0 = apply(a, s) we denote the state after executing the action a in the state s, where a is applicable in s. All the assignments in s0 are the same as in s except for the assignments in eff(a) which replace the corresponding (same variable) assignments in s. If P = c 2015, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 146 [a1 , . . . , ak ] is a sequence of actions, then apply(P, s) = apply(ak , apply(ak−1 . . . apply(a2 , apply(a1 , s)) . . . )). A sequential plan P of length k for a given planning task Π is a sequence of k actions P such that sG ⊆ apply(P, sI ). An optimal sequential plan for this planning problem is the following: P ∗ = [loadP1 (A), move(A, B), loadP2 (B), move(B, C), unloadP1 (C), unloadP2 (C)]. The four semantics allow four different parallel plans: • The R2 ∃-Step semantics allows the single step plan [{loadP1 (A), move(A, B), loadP2 (B), move(B, C), unloadP1 (C), unloadP2 (C)}]. • The Relaxed ∃-Step semantics forbids having both move operations inside one step, since the effect of the first move action is not applied after the step. Therefore the shortest possible Relaxed ∃Step plan is the following two step parallel plan [{loadP1 (A), move(A, B), loadP2 (B)}, {move(B, C), unloadP1 (C), unloadP2 (C)}]. • The ∃-Step semantics does not allow the second load action to be inside the first step, since its precondition is not satisfied in the initial state. Similarly, the unload actions cannot be in the same step as the second move action. Therefore the plan needs to have three parallel steps [{loadP1 (A), move(A, B)}, {loadP2 (B), move(B, C)}, {unloadP1 (C), unloadP2 (C)}]. • To satisfy the ∀-Step semantics all actions, except for the unload actions, must be in separate parallel steps. The following five step plan is possible [{loadP1 (A)}, {move(A, B)}, {loadP2 (B)}, {move(B, C)}, {unloadP1 (C), unloadP2 (C)}]. Parallel Plans A parallel plan P with makespan k for a given planning task Π is a sequence of sets of actions (called parallel steps) P = [A1 , . . . , Ak ] such that (A1 ) ⊕ · · · ⊕ (Ak ) is a sequential plan for Π, where is an ordering function, which transforms a set of actions Ai into a sequence of actions (Ai ) and ⊕ denotes the concatenation of sequences. Let us denote by sj the world state in between the parallel steps Aj and Aj+1 , which is obtained by applying the sequence (Aj ) on sj−1 , i.e., sj = apply((Aj ), sj−1 ) (except for s0 = sI ). Which actions can be together in a parallel step is defined by four parallel planning semantics: ”for all”(∀)(Kautz and Selman 1996), ”exists”(∃)(Rintanen et al. 2006), ”relaxed exists”(R∃)(Wehrle and Rintanen 2007), and ”relaxed relaxed exists”(R2 ∃)(Balyo 2013). The requirements of the semantics are displayed in the following table. ∀ All possible orderings of the sets Aj make valid plans Each action a ∈ Aj is applicable in the state sj The effects of all actions a ∈ Aj are applied in sj+1 There exists an ordering of the sets Aj to make a valid plan ∃ R∃ R2 ∃ √ √ √ Finding Plans using SAT √ √ √ √ √ √ The basic idea of solving planning as SAT is the following (Kautz and Selman 1992). We construct (by encoding the planning task) a series of SAT formulas F1 , F2 , . . . such that if Fi is satisfiable then there is a parallel plan of makespan ≤ i. Then we solve them one by one starting from F1 until we reach the first satisfiable formula Fk . From the satisfying assignment of Fk we can extract a plan of makespan k. There exist more advanced algorithms for scheduling the solving of the Fi formulas (Rintanen et al. 2006) but since our goal is only to compare different encoding methods we will use the basic sequential scheduling algorithm. √ The following example demonstrates the differences between the four semantics and how the more relaxed semantics can allow shorter makespan plans. Example 1 We will model a simple package delivery scenario with a truck that needs to deliver two packages to the location C from the locations A and B. In the beginning the truck is located in A. The locations A, B, and C are all connected by direct roads. We will model the planning task using the following variables: • Truck location T, dom(T ) = {A, B, C} • Package locations P1 and P2 , dom(P1 ) = dom(P2 ) = {A, B, C, L} (Pi = L represents that the package i is on the truck). Now, having defined the variables X = {T, P1 , P2 } and their respective domains, we can define the initial state sI and the goal conditions sG . sI = {T = A, P1 = A, P2 = B} sG = {P1 = C, P2 = C} The action templates (operators) are described in the following table. To get the actual actions we need to substitute l, l1, l2 with A, B and C and i with 1 and 2. Action Preconditions Effects move(l1, l2) T = l1 T = l2 loadPi (l) T = l, Pi = l Pi = l unloadPi (l) T = l, Pi = l Pi = l Action Ranking for the R2 ∃-Step Encoding The recently introduced (Balyo 2013) Relaxed Relaxed Exists-Step Encoding which approximates the R2 ∃-Step parallel planning semantics has a very important aspect that was not addressed adequately enough in the original publication. This aspect is the ranking of actions and since it is an important part of our proposed method we will take a more detailed look in this paper. The Relaxed Relaxed Exists Step encoding only approximates the R2 ∃-Step semantics due to working with a fixed ordering of actions. This ordering is also used when the sequential plan is constructed from the action sets of the parallel plan obtained from the satisfying assignment of the formula Fk . To truly represent the R2 ∃-Step parallel planning semantics the encoding would have to consider all the possible rankings (orderings) of actions. Since we work with just one particular ordering a great care must be given to its selection. 147 The weakness of all these ranking methods is that only the actions of a planning task are considered. For example the initial state and the goal conditions are not used at all. Designing better ranking heuristics is a matter of future work. Table 1: Problems solved (#P) and their total makespan. Domain barman elevators floortile nomystery openstacks parcprinter parking pegsol scanalyzer sokoban tidybot transport visitall woodwork Total TSort #P/Mks 4/36 20/85 17/158 3/14 12/75 20/30 0/0 19/158 6/11 1/17 1/1 5/20 20/34 20/33 148 TSort−1 #P/Mks 2/29 20/99 18/185 4/20 13/66 20/249 0/0 18/155 9/16 1/19 1/1 6/40 12/113 20/57 144 Input #P/Mks 4/60 20/106 16/149 3/13 20/59 20/88 0/0 12/147 7/12 1/18 1/1 8/44 9/55 20/58 141 Input−1 Random #P/Mks #P/Mks 2/28 1/11 20/79 20/75 18/178 18/167 6/33 3/14 5/43 10/57 20/186 20/140 0/0 0/0 16/142 18/152 6/13 6/12 1/17 1/19 1/1 1/1 9/57 4/19 9/49 12/80 20/30 20/40 133 134 Ranking Comparison We have run the basic SAT planning algorithm with the R2 ∃-Step encoding using the five different ranking methods described above with a time limit of five minutes for SAT solving (time for ranking and encoding is not considered). The details of the experimental environment are given in the Experiments section below. The number of solved problems (out of 20 in each domain) and the total makespan of the found plans is presented in Table 1. Note, that the makespan represents the number of propositional formulas that were solved to obtain the parallel plan and not the length of the final sequential plans. The Topological Ranking (TSort) method solved the highest number of problems while the Inverted Input and Random methods were the least successful. Nevertheless, there are five domains, where TSort was outperformed. Most notable among these is the openstacks domain, where the Input ranking solved almost twice as many problems while its total makespan remained low. Focusing on the makespans, we can observe, that in several cases TSort maintains the smallest total makespan while solving as many or more problems than the other methods. This is most apparent for the parcprinter and visitall domains. Overall, we can conclude, that the action ranking affects the performance of planning very significantly and different ranking methods work well for different domains. Nevertheless, TSort appears to be the best choice in general while Input provides great performance on some of the domains and therefore we will use these two in our algorithm. The ordering of actions will be defined using action ranks. The ranking of actions is the assignment of a unique integer rank r(a) to each action a ∈ O. For the sake of correctness of the encoding the ranking can be arbitrary, however the selection of the ranking affects the performance of the planning process dramatically. Intuitively, the ranking should be such that the actions are ranked according to their order in a valid plan for the given planning task. The problem is, of course, that we do not know the plan in advance. Therefore, we should try to guess the order in which the given actions could appear in a plan. In the original R2 ∃-Step paper (Balyo 2013) only one ranking heuristic was described and used – the Topological Ranking. In this paper we describe new ranking methods and compare them experimentally. Altogether we examine the following ranking heuristics. Encoding Selection As we described in the introduction, our algorithm consists of a set of encodings E and a selection rule. The task of the selection rule is to select an encoding from E that should be used to solve a given planning task based on the planning task itself and other available information (for example the makespan parameter of the formula that we want to construct). For our algorithm we will use two encodings, the Reinforced encoding (Balyo et al. 2015), which represents the ∀-Step parallel planning step semantics, and the R2 ∃-Step encoding (Balyo 2013) with two ranking methods – Topological and Input ranking. The selection rule is based on our experimental observations, which suggest, that the R2 ∃-Step encoding performs best if the number of transitions is low. A transition (Huang et al. 2010) is a change of a state variable and the set of transitions of a planning problem instance is defined by its actions. The selection rule decides based on the average number of transitions, i.e., the total number of transitions (|∆|) divided by the number of state variables (|X|) in the given planning task. Based on our experiments we decided to use the threshold value 10. Therefore the first part of the rule is: If |∆|/|X| > 10 then use the Reinforced encoding otherwise use the R2 ∃-Step encoding. Furthermore, when the • Random Ranking. The ranks are assigned randomly to actions, each action has a unique rank. The purpose of this method is to serve as a baseline for comparison with the other methods. • Input Ranking. The ranks are assigned from 1 to n = |O| to the actions in the order in which they are listed in the input, i.e., in the definition of the planning task. The intuition behind this method is, that the author of the problem may have defined the actions in the order they are expected to appear in the plans. • Inverted Input ranking. First we take the Input Ranking, then we invert the ranks: r(a) := n − r(a), where n is the number of actions. If the Input Ranking is a ‘good’ ranking, then this ranking is supposed to be bad. • Topological Ranking. The original ranking method of (Balyo 2013) is based on topologically ordering the action enabling graph while ignoring cycles. • Inverted Topological Ranking. We invert the Topological Ranking in the same way we inverted the Input Ranking. 148 and ∃-Step Encodings (Rintanen et al. 2006) from his popular state-of-the-art SAT based planning system Madagascar (Rintanen 2013). Best of Rintanen (R*) is a virtual best encoding obtained by selecting the better result for each instance from the results of R∀ and R∃. The number of solved instances is presented in Table 2. We can observe, that the openstacks domain is very difficult for all but the R2 ∃-Step encoding (and Selective, which also uses it). The Selective encoding solves more than the R2 ∃Step encoding thanks to using two kinds of action ranking instead of just one. The sokoban and tidybot domains are very hard for all of our encodings, while the encodings of Rintanen can handle them much better. If we compare the encodings, we can observe that the best results are obtained by the Selective encoding followed by the Rintanen encodings. The Selective encoding very successfully combines the Reinforced and R2 ∃-Step encodings and is never worse for any domain than any of its components. The strength of the Selective encoding is that it can properly select the Reinforced encoding for the domains and problems, where the R2 ∃-Step is not efficient. Although R∀ and R∃ are the best individual encodings, their combination (the R*) does not bring any significant improvement. This suggest that they are not as diverse in the sense of problem coverage. In Figure 1 we provide a so called cactus plot, that visually compares the six tested encodings. The plot represents how many problems can be solved withing a given time limit. It is interesting, that the lines for the R2 ∃-Step and the ‘Rintanen ∀’ encoding cross each other several times, which means, that for certain time limits R2 ∃-Step would solve more instances than the ‘Rintanen ∀’ encoding. The cactus plot confirms, that the Selective method significantly outperforms all the other methods, furthermore, this holds for any reasonable time limit. Table 2: Problems solved (out of 20) in 30 mins. Reinf R2 ∃ Domain barman elevators floortile nomystery openstacks parcprinter parking pegsol scanalyzer sokoban tidybot transport visitall woodworking 4 20 18 20 0 20 0 10 15 2 2 18 10 20 Total 159 1800 R∀ R∃ R* 9 20 18 20 20 20 0 19 15 2 2 19 20 20 8 20 16 20 0 20 0 11 17 6 13 18 11 20 4 20 20 20 0 20 0 12 18 6 15 18 11 20 8 20 20 20 0 20 0 12 18 6 15 18 11 20 172 204 180 184 188 Reinforced R^2 Exists Forall Rint. Exists Rint. Best Rint. Selective 1600 1400 Time in seconds 8 20 18 6 15 20 0 19 9 2 2 13 20 20 Sel 1200 1000 800 600 400 200 0 0 50 100 150 200 250 Solved Problems Conclusion Figure 1: Coverage per time limit. In this paper we demonstrated that the idea of combining encodings by designing a simple rule that selects the more suitable encoding for a given planning problem instance is a perspective research direction to improve the performance of SAT based planning algorithms. Our experiments also revealed that simply combining the best state-of-the-art encodings (of Rintanen) does not lead to the best results. The presented rule is very simple and the set of encodings used for the selection is relatively small. Therefore we believe further improvements could be achieved by extending the set of encodings by novel or existing encodings and finding more sophisticated selection rules. Another promising topic of future work is finding better ranking heuristics to be used for the Relaxed Relaxed ExistsStep encoding. The current ones are very simplistic and do not use all the available information about the problem instance (such as initial state or goal conditions). R2 ∃-Step Encoding is used, we alternate between two action ranking heuristics. For even makespans we use the Topological Ranking method and for the odd makespans we use the Input Ranking method (rank the actions in the order they are defined in the input problem). Experiments The experiments were run on a computer with Intel i7 920 CPU @ 2.67 GHz processor and 6 GB of memory. The benchmark problems are from the optimal track of the 2011 International Planning Competition (IPC) (Coles et al. 2012) translated from PDDL to SAS+ using Fast Downward (Helmert 2006). To solve the instances we used the SAT solver Lingeling (Biere 2014) (version ats). We did experiments with a 30 minutes time limit for SAT solving using the following six encodings. We used the Reinforced Encoding (Reinf) (Balyo et al. 2015) and R2 ∃-Step Encoding (R2 ∃), which are the two components of the Selective Encoding (Sel). We also used Rintanen’s ∀-Step (R∀) Acknowledgments This research was partially supported by DFG project SA 933/11-1. Roman Barták is supported by the Czech Science Foundation under the project P10315-19877S. 149 References Bäckström, C., and Nebel, B. 1995. Complexity results for sas+ planning. Computational Intelligence 11:625–656. Balyo, T.; Barták, R.; and Trunda, O. 2015. Reinforced encoding for planning as sat. In Acta Polytechnica CTU Proceedings. Balyo, T. 2013. Relaxing the relaxed exist-step parallel planning semantics. In ICTAI, 865–871. IEEE. Biere, A. 2014. Lingeling and plingeling home page. http://fmv.jku.at/lingeling/. Coles, A. J.; Coles, A.; Olaya, A. G.; Celorrio, S. J.; López, C. L.; Sanner, S.; and Yoon, S. 2012. A survey of the seventh international planning competition. AI Magazine 33(1). Fikes, R., and Nilsson, N. J. 1971. Strips: A new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3/4):189–208. Helmert, M. 2006. The fast downward planning system. Journal of Artificial Intelligence Research (JAIR) 26:191– 246. Huang, R.; Chen, Y.; and Zhang, W. 2010. A novel transition based encoding scheme for planning as satisfiability. In Fox, M., and Poole, D., eds., AAAI. AAAI Press. Kautz, H. A., and Selman, B. 1992. Planning as satisfiability. In ECAI, 359–363. Kautz, H., and Selman, B. 1996. Pushing the envelope: Planning, propositional logic, and stochastic search. In Proceedings of the Thirteenth National Conference on Artificial Intelligence - Volume 2, AAAI’96, 1194–1201. AAAI Press. Rintanen, J.; Heljanko, K.; and Niemelä, I. 2006. Planning as satisfiability: parallel plans and algorithms for plan search. Artif. Intell. 170(12-13):1031–1080. Rintanen, J. 2012. Planning as satisfiability: Heuristics. Artif. Intell. 193:45–86. Rintanen, J. 2013. Planning as satisfiability: state of the art. http://users.cecs.anu.edu.au/ jussi/satplan.html. Robinson, N.; Gretton, C.; Pham, D. N.; and Sattar, A. 2009. Sat-based parallel planning using a split representation of actions. In Gerevini, A.; Howe, A. E.; Cesta, A.; and Refanidis, I., eds., ICAPS. AAAI. Seipp, J.; Braun, M.; Garimort, J.; and Helmert, M. 2012. Learning portfolios of automatically tuned planners. In McCluskey, L.; Williams, B.; Silva, J. R.; and Bonet, B., eds., Proceedings of the Twenty-Second International Conference on Automated Planning and Scheduling, ICAPS 2012, Atibaia, São Paulo, Brazil, June 25-19, 2012. AAAI. Wehrle, M., and Rintanen, J. 2007. Planning as satisfiability with relaxed exist-step plans. In Orgun, M. A., and Thornton, J., eds., Australian Conference on Artificial Intelligence, volume 4830 of Lecture Notes in Computer Science, 244–253. Springer. 150