Proceedings of the Twentieth International Conference on Automated Planning and Scheduling (ICAPS 2010) The Joy of Forgetting: Faster Anytime Search via Restarting Silvia Richter Jordan T. Thayer and Wheeler Ruml IIIS, Griffith University, Australia and NICTA silvia.richter@nicta.com.au Department of Computer Science University of New Hampshire {jtd7, ruml} at cs.unh.edu By reducing the weight over time, the search can be made progressively less greedy and more focused on quality. Complete anytime algorithms that are not based on WA* have also been proposed (Zhang 1998; Zhou and Hansen 2005; Aine et al. 2007). In these cases, an underlying global search algorithm is usually made more greedy and local by initially restricting the set of states that can be expanded, thus giving a depth-first emphasis to the search which can lead to finding a first solution quickly. Whenever a solution is found, the restrictions are relaxed, allowing for higherquality solutions to be found. We start by noting that the greediness employed by anytime algorithms to quickly find a first solution can also create problems for them. If the heuristic goal distance estimates are inaccurate, for example, the search can be led into an area of the search space that contains no goal or only poor goals. Anytime algorithms typically continue their search after finding the first solution, rather than starting a new search. This seems reasonable as it avoids duplicate effort. However, a key observation that we will discuss in detail below is that continued search may perform badly in problems where the initial solution is far from optimal and finding a significantly better solution requires searching a different part of the search space. In such cases restarting the search can be beneficial, as it allows changing bad decisions that were made near the start state more quickly. Reconsidering these early decisions means the quality of the bestknown solution may improve faster with restarts than if we continued to explore the area of the search space around the previous solution. Building on this observation that forgetfulness can be bliss, we propose an anytime algorithm that incorporates helpful restarts while retaining most of the knowledge discovered in previous search phases. Our approach differs substantially from existing anytime approaches, as previous work on anytime algorithms has concentrated explicitly on avoiding duplicate work (Likhachev et al. 2004; 2008). We show that the opposite can be useful, namely discarding information that may be biased due to greediness. We show that our method outperforms existing anytime algorithms in PDDL planning and that it is competitive in other benchmark domains. Using an artificial search space, we elucidate conditions under which our approach performs well. Abstract Anytime search algorithms solve optimisation problems by quickly finding a (usually suboptimal) first solution and then finding improved solutions when given additional time. To deliver an initial solution quickly, they are typically greedy with respect to the heuristic cost-to-go estimate h. In this paper, we show that this low-h bias can cause poor performance if the greedy search makes early mistakes. Building on this observation, we present a new anytime approach that restarts the search from the initial state every time a new solution is found. We demonstrate the utility of our method via experiments in PDDL planning as well as other domains, and show that it is particularly useful for problems where the heuristic has systematic errors. Introduction Heuristic search is a widely used framework for solving shortest-path problems. In particular, given a consistent heuristic and sufficient resources, the A* algorithm (Hart et al. 1968) can be used to find an optimal solution with maximum efficiency (Dechter and Pearl 1988). However, in large problems we may not be able to afford the resources necessary for calculating an optimal solution. A popular approach in this circumstance is complete anytime search, in which a (typically suboptimal) solution is found quickly, followed over time by a sequence of progressively better solutions, until the search is terminated or solution optimality has been proven. A* can be modified to trade solution quality for speed by weighting the heuristic by a factor w > 1 (Pohl 1970). The resulting algorithm, called weighted A* or WA* for short, searches more greedily the larger w is. Assuming an admissible heuristic, the cost of the solution it finds differs from the optimal by no more than the factor w. This ability to balance speed against quality while providing bounds on the suboptimality of solutions makes WA* an attractive basis for anytime algorithms (Hansen et al. 1997; Likhachev et al. 2004; Hansen and Zhou 2007; Likhachev et al. 2008). For example, WA* can be run first with a very high weight, resulting in a greedy best-first search that tries to find a solution as quickly as possible. Then the search can be continued past the first solution to find better ones. c 2010, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 137 Previous Approaches search (Zhou and Hansen 2005) which is based on breadthfirst search. In both cases, the underlying global search algorithm is made more greedy by initially restricting the set of states that can be expanded. In Window A*, this is done by using a “window” that slides downwards in a depth-first fashion: whenever a state with a larger g-value (level) than previously expanded is expanded, the window slides down to that level of the search space. From then on, only states in that level and the k levels above can be expanded, where k is the height of the window. Initially k is zero and it is increased by 1 every time a new solution is found. Anytime Window A* can suffer from its strong depth-first focus if the heuristic estimates are inaccurate. In beam-stack search, only the b most promising nodes in each level of the search space are expanded, where the beam width b is a user-supplied parameter. The algorithm remembers which nodes have not yet been expanded and returns to them later. However, beam-stack search explores the entire search space below the chosen states before it backtracks on its decision. This may be inefficient if the heuristic is (locally) quite inaccurate and a wrong set of successor states is chosen, e. g. states from which no near-optimal solution can be reached. We first present some existing approaches and discuss the circumstances under which their effectiveness may suffer. A* and WA* Several anytime algorithms are based on Weighted A* (WA*), which is in turn based on A*. The A* algorithm uses two data structures: Open and Closed. In the beginning, Open contains only the start state and Closed is empty. Iteratively one of the states from Open is selected, moved from Open to Closed, and expanded, and its successor states are inserted into Open. The state expanded at each step is one that minimises the function f (s) = g(s) + h(s), where g(s) is the cost of the best path currently known from the start state to state s, and h(s) is the heuristic value of s, i. e., an estimate of the cost of the best path from s to a goal. WA* is a variant of A* using the selection function f (s) = g(s) + w · h(s), with w > 1. Even if the heuristic function h is admissible, i. e., never overestimates the true goal distance, the weighting by w introduces inadmissibility. However, the solution is guaranteed to be w-admissible, i. e. the ratio between its cost and the cost of an optimal solution is bounded by w. Anytime Algorithms Based on WA* Anytime algorithms based on WA* run WA* but do not stop upon finding a solution. Rather, they report the solution and continue the search, possibly adjusting their data structures and parameters. A straightforward adjustment, for example, is to use the cost of the best known solution for pruning: after we have found a solution of cost c, we do not need to consider any states that are guaranteed to lead to solutions of cost c or more. This is the approach taken by Hansen and Zhou in their Anytime WA* algorithm (2007). Updating the cost bound is the only adjustment in Anytime WA*; the weight w is never decreased. (While the authors discuss this option, they did not find it to improve results on their benchmarks.) The Anytime Repairing A* (ARA*) algorithm by Likhachev et al. (2004; 2008) assumes an admissible heuristic and reduces the weight w each time it proves wadmissibility of the incumbent solution. This implicitly prunes the search space as no state is ever expanded whose f -value is larger than that of the incumbent solution. When reducing w, ARA* updates the f -values of all states in Open according to the new weight. Furthermore, ARA* avoids re-expanding states within search phases (where we use “phase” to denote the part of search between two weight changes). Whenever a shorter path to a state is discovered, and that state has already been expanded in the current search phase, the state is not re-expanded immediately. Instead it is suspended in a separate list, which is inserted into Open only at the beginning of the next phase. The rationale behind this is that even without re-expanding states the next solution found is guaranteed to be within the current sub-optimality bound (Likhachev et al. 2004; 2008). Of course, it may be that this solution is worse than it would have been had we re-expanded. The Effect of Low-h Bias Anytime WA* and ARA* keep the Open list between search phases (possibly re-ordering it). After finding a goal state sg , Open will usually contain many states that are close to sg in the search space because the ancestors of sg have been expanded; furthermore, those states are likely to have low heuristic values because of their proximity to sg . Hence, even if we update Open with new weights, we are likely to expand most of the states around sg before considering states that are close to the start state. This low-h bias can be a critical mistake if early decisions strongly influence final solution quality. Consider the small gridworld example in Fig. 1. The task is to reach a goal state (G1 or G2) from the start state S, where the agent can move with cost 1 to each of the 8 neighbours of a cell if they are not blocked. The heuristic has a systematic bias, underestimating goal distances in the left part of the grid. We start with weight 2 in Fig. 1a. Because the heuristic values to the left of S are lower than to the right of S, the search expands the states to the left and finds goal G1 with cost 6. The grey cells are generated, but not expanded in this search phase, i. e., they are in Open. After finding a solution with a cost of 6, in Fig. 1b the search continues with a reduced weight of 1.5. A solution with cost 5 consists in turning right from S and going to G2. However, the search will first expand all states in Open that have an f value smaller than 7, including many nodes near the original path. After expanding a substantial number of states, the second solution it finds is a suboptimal path that shares several states with the first solution, again with cost 6. If we instead restart with an empty Open list after finding the first solution (Fig. 1c), far fewer states are expanded. The critical state to the right of S is expanded quickly and the optimal path is found. We will see this phenomenon again below in the experimental section of this paper. Other Anytime Algorithms Recent anytime algorithms that are not based on WA* include the A*-based Anytime Window A* algorithm (Aine et al. 2007) and beam-stack 138 3.8 10.6 3.4 9.8 2.6 8.2 1.8 7.6 1.0 7.0 1.0 8.0 3.8 9.6 3.4 8.8 2.6 8.2 1.8 7.6 1.0 7.0 G1 3.8 8.6 S 4.0 9.0 2.6 8.2 1.8 7.6 1.0 7.0 1.0 8.0 G2 2.6 8.9 2.6 8.9 (a) initial search, w = 2 2.6 7.9 1.8 6.7 1.8 7.7 1.8 8.7 3.8 8.7 3.4 8.1 2.6 6.9 1.8 6.7 1.0 6.5 1.0 7.5 3.8 7.7 X S 3.8 7.7 3.4 7.1 4.0 7.0 X X X X G1 2.6 6.9 1.8 6.7 1.0 6.5 1.0 7.5 1.9 6.85 1.9 6.85 1.9 7.85 1.9 8.85 2.0 8.0 1.0 6.5 1.0 6.5 3.8 6.7 4.0 7.0 2.0 7.0 1.0 7.0 1.0 8.0 2.0 9.0 2.0 9.0 G2 (b) continued search, w = 1.5 S 4.0 8.0 3.0 7.0 2.0 7.0 1.0 7.0 G2 3.0 8.0 2.0 7.0 1.0 7.0 1.0 8.0 G1 (c) restarted search, w = 1.5 Figure 1: The effect of low-h bias. For all grid states generated by the search, h-values are shown above f -values. (a) Initial WA* search finds a solution of cost 6. (b) Continued search expands many states around the previous Open list (grey cells), finding another sub-optimal solution of cost 6. (c) Restarted search quickly finds the optimal solution of cost 5. Restarting WA* (RWA*) more flexibility in discovering different solutions. To overcome low-h bias, we propose Restarting WA*, or RWA* for short. It runs iterated WA* with decreasing weight, always re-expanding states when it encounters a cheaper path. RWA* differs from ARA* and Anytime WA* by not keeping the Open list between phases. Whenever a better solution is found, the search empties Open and restarts from the initial state. It does, however, re-use search effort in the following way: besides Open and Closed, we keep a third data structure “Seen”. When the new search phase begins, the states from the old Closed list are moved to Seen. When RWA* generates a state in the new search, there are three possibilities: Case 1: The state has never been generated before (it is neither in Open nor Closed nor Seen). In this case, RWA* behaves like WA*, calculates the heuristic value of the state and inserts it into Open. Case 2: The state has been encountered in previous search phases but not in the current phase (it is in Seen). In this case, RWA* can look up the heuristic value of the state rather than having to calculate it. In addition, RWA* checks whether it has found a cheaper path to the state or whether the previously found path is better, and keeps the better one. The state is removed from Seen and put into Open. Case 3: The state has already been encountered in this search phase (it is in Open or Closed). In this case, RWA* again behaves like WA*, re-inserting the state into Open only if it has found a shorter path to the state. Complete pseudo-code is shown in Fig. 2. This strategy can be implemented in an efficient way by maintaining a boolean value “seen” in each state of the Open/Closed list, rather than keeping the seen states in a separate list. In short, previous search effort is re-used by not calculating the heuristic value of a state more than once and by making use of the best path to a state found so far. Compared to ARA* and Anytime A*, our method may have to re-expand many states that were already expanded in previous phases. However, in cases where the calculation of the heuristic accounts for the largest part of computation time (e. g., in our planning experiments it is 80%), the duplicated expansions do not have much effect on runtime. Thus, RWA* re-uses most of the previous search effort, but its restarts allow for Restarts in Other Search Paradigms Restarts are a well-known and successful technique in combinatorial search and optimisation, e. g. in the areas of propositional satisfiability and constraint-satisfaction problems. Together with randomisation, they are used in systematic search (Gomes et al. 1998) as well as local search (Selman et al. 1992) to escape from barren areas of the search space. A restart is typically executed if no solution has been found after a certain number of steps. In most of these approaches restarting would be useless without randomisation, as the algorithm would behave exactly the same each time. By contrast, our algorithm is deterministic, and its restarts serve the purpose of revisiting nodes near the start of the search tree to promote exploration when the node evaluation function has changed. This is perhaps most closely related to the motivation behind limited-discrepancy search (LDS) (Harvey and Ginsberg 1995; Furcy and Koenig 2005). LDS attempts to overcome the tendency of depth-first search to visit solutions that differ only in the last few steps. It backtracks whenever the current solution path exceeds a given limit on the number of discrepancies from the greedy path, restarting from the root each time this limit is increased. Like random restarts and the backtracking in limited-discrepancy search, our restarts mitigate the effect of bad heuristic recommendations early in the search. Our use of restarts differs from limiteddiscrepancy backtracking in that we restart upon finding a solution, rather than after expanding all nodes with a given number of discrepancies from the greedy path. As opposed to random restarts, we restart upon success (when finding a new solution) rather than upon failure (when no solution could be found). A depth-first approach like LDS is however not competitve for shortest-path problems like planning. Lastly, incomplete methods can be made complete by restarting with increasingly relaxed heuristic restrictions (Zhang 1998; Aine et al. 2007). By contrast, our restarts occur within an algorithm that is already complete. To our knowledge, restarts have not been used in a complete best-first search before. Our use of restarts in an A*- 139 RWA*(w0 , φ) 1 bound ← ∞, w ← w0 , Seen ← ∅ 2 while not interrupted and not failed 3 do Closed ← ∅, Open ← {startstate} 4 while not interrupted and Open not empty 5 do remove s with minimum f (s) from Open 6 for s ∈ SUCCESSORS(s) 7 do if heuristic admissible and f (s ) ≥ bound 8 or g(s ) ≥ bound 9 then continue 10 cur g ← g(s) + TRANSITION COST(s, s ) 11 switch / (Open ∪ Seen ∪ Closed) : 12 case s ∈ 13 g(s ) ← cur g, pred(s ) ← s 14 calculate h(s ) and f (s ) 15 insert s into Open 16 case s ∈ Seen : 17 if cur g < g(s ) 18 then g(s ) ← cur g, pred(s ) ← s 19 move s from Seen to Open 20 case cur g < g(s ) : 21 g(s ) ← cur g, pred(s) ← s 22 if s ∈ Open 23 then update s in Open 24 else move s from Closed to Open 25 if IS GOAL(s ) 26 then break 27 if new solution s∗ was found (line 25) 28 then report s∗ 29 bound = g(s∗ ) 30 w ← max(1, w × φ) 31 Seen ← Seen ∪ Open ∪ Closed 32 else return failure cost. After consulting its authors, we chose a safe but large number as bound, as it is not clear how to come up with a reasonable upper bound for our experiments up-front. In addition we compare against an alternative version of Anytime A*. For this algorithm, which originally does not decrease its weight between search phases, we experienced improved performance if we do decrease the weight. Thus, we also report results for a variant dubbed “Anytime A* WS”, (WS for weight schedule), where the weight is decreased as in RWA* or ARA*. Classical PDDL Planning Planning is a notoriously hard problem where optimal solutions can be prohibitively expensive to obtain even under very optimistic assumptions (Helmert and Röger 2008). To efficiently find a (typically suboptimal) solution, inadmissible heuristics are employed. In our experiment, we use the popular FF heuristic and all classical planning tasks from the International Planning Competitions between 1998 and 2006. Experimental Setup We implemented each of the search algorithms within the Fast Downward planning framework (Helmert 2006) used by the winners of the satisficing track in the International Planning Competitions in 2004 and 2008. This framework originally uses greedy best-first search, which we replaced with the respective search algorithms. All other settings of Fast Downward were held fixed to isolate the effect of changing the search mechanism. Fast Downward has two options built in that can enhance the search for a solution. Since our goal is to achieve best possible performance for planning, we make use of these search enhancements. The first is delayed heuristic evaluation, which means that states are stored in Open with their parent’s heuristic value rather than their own. This technique is helpful if the computation of the heuristic is very expensive compared to the other components of the search (like expansions of states). If many more states are generated than expanded, delayed evaluation leads to a great reduction in the number of times we need to calculate heuristic estimates, if at a loss in accuracy. The second search enhancement in Fast Downward is the use of preferred operators (Helmert 2006). Preferred operators denote which successors of a state are deemed to lie on a promising path to the goal. They can be used as a second source of search information, complementing the heuristic. In particular, when run with the preferred-operator option, Fast Downward uses two Open lists, one with all operators and one with only preferred operators, and selects the next state from the two lists in an alternating fashion. For the A*based algorithms, the preferred-operator mechanism can be used as defined and it improves results notably; it was thus incorporated. Beam-stack search is the one algorithm that does not use preferred operators as it is not obvious how to incorporate them. For the WA*-based algorithms, we experimented with several starting weights and corresponding weight sequences, yet found that the relative performance did not Figure 2: Pseudo-code for the RWA* algorithm type algorithm is remarkable because at first glance, it would seem unnecessary: A* keeps a queue of all generated but yet unexpanded nodes, and is thus not thought to suffer from premature commitment. Our insight is that weighted A* effectively makes such commitments due to its low-h bias. Empirical Evaluation We compare the performance of RWA* with several existing complete anytime approaches: Anytime A* (Hansen and Zhou 2007), ARA* (Likhachev et al. 2004), Anytime Window A* (Aine et al. 2007), and beam-stack search (Zhou and Hansen 2005). For beam-stack search, we implemented the regular version rather than the memoryconserving divide-and-conquer variant (memory-conserving techniques could be applied to all the algorithms here). All parameters of the competitor algorithms were carefully tuned for best performance. The algorithms based on WA* (RWA*, Anytime A* and ARA*) all share the same code base, as they differ only in few details; and they perform best for the same starting weights. For beam-stack search, we experimented with beam width values between 5 and 10,000 and plot the best of those runs. We also conducted experiments with iteratively broadening beam widths, but did not obtain better results than with the best fixed value. Beamstack search expects an initial upper bound on the solution 140 Normalised Quality 0.88 0.86 0.84 RWA* Anytime A* WS ARA* WS Anytime A* Base WA* RWA* Anytime A* WS ARA* WS Beam Window A* 0.9 0.8 0.7 0.82 0.6 0.8 0.78 0.5 0.76 RWA* Beam Window A* 0.4 0.74 0.72 0.3 10 100 Time 1000 0.1 1 10 Time 100 1000 % Times Qual. achieved 0.9 100 80 60 40 20 0 0.7 0.8 0.9 Normalised Quality 1 Figure 3: Anytime performance in PDDL planning. change much. The weight sequence which gave overall best results and is used in the results below is 5, 3, 2, 1.5, 1. ARA* was adapted to use a weight schedule like RWA* and Anytime A* WS, since the original mechanism for decreasing weights in ARA* depends on the heuristic being admissible (ARA* originally reduces its weight by a very small amount whenever it proves that the incumbent solution is wadmissible for the current weight w). For beam-stack search, a beam width of 500 resulted in best anytime performance. The time and memory limits were 30 minutes and 3 GB respectively for each task, running on a 3 GHz Intel Xeon X5450 CPU. In order to compare on meaningful benchmark tasks only, we select a subset of them as follows: Firstly, we exclude tasks that none of the algorithms could solve. Secondly, since we are interested in showing the improvement of solutions over time, we exclude tasks where all algorithms found only one solution (and hence no improvement occurred). Thirdly, we exclude “trivial” tasks where all algorithms find the same best solution within less than one second. This leaves 1096 of the original 1612 tasks. ter than ARA*, whereas the original Anytime A* algorithm performs worse. It is notable that while all four WA*-based algorithms are very closely related, differing only in a few lines of code, the simple addition of the restarts leads to a significant improvement in performance. “Base WA*” depicts a WA* search that stops after finding the first solution and is provided for reference. Although the confidence intervals for all four anytime algorithms overlap to some extent in the left-most panel of figure 3 a paired student’s t-test reveals that the difference between each pair is statistically significant, having p-values ranging between 10−5 in the most discernible case and 10−3 in the least discernible case. The centre panel shows the results for Window A* and beam-stack search. Beam-stack search and Window A* perform significantly worse than the WA*-based algorithms. This is mainly due to solving fewer problems, as a significant determiner in gross performance is the number of problems solved by a certain time. By the time cut-off, all WA*-based algorithms solve (essentially the same) 84% of the tasks, while beam-stack search and Window A* solve only 78% and 63%, respectively. 3% (3%) of the tasks are solved by beam-stack search (Window A*) but not by the WA* algorithms. The opposite is true for 9% (24%) of the tasks. The right panel of Fig. 3 shows for the best version of each algorithm how often it achieved (at least) a certain solution quality. As can be seen, RWA* achieves high-quality solutions significantly more often than the other algorithms. The best quality is achieved by RWA* in 65% of the tasks, respectively, while all others achieve it in ≤ 51% of the tasks. The average number of solutions found was 3 for the weightdecreasing WA* methods and beam-stack search, 5 for nondecreasing Anytime A* and 1 for Window A*. When examining the distribution of RWA*’s performance over the different domains, we found that RWA* outperforms the other WA* algorithms in 40% of the domains, while being on par in the remaining 60%. In no domain did RWA* perform notably worse than any of the other WA* methods. Beam-stack search and Window A* show different strengths and weaknesses compared to the WA*-based algorithms. In the Optical Telegraph domain, beam-stack search performs well whereas all other algorithms perform badly. Beam-stack search and Window A* perform worst Results Fig. 3 summarises the results. We plot average normalised quality scores following the definition used in the most recent planning competition (Helmert et al. 2008): Q∗ /Q, where Q is the cost of the algorithm’s current solution and Q∗ is cost of the overall best solution found by any of the algorithms. Hence, the score ranges between 0 (task not solved yet) and 1 (A has found the best solution quality Q∗ ). These scores are then averaged over all tasks, and error bars in the plots give 95% confidence intervals on the mean. It is intentional that unsolved tasks negatively influence the score of an algorithm. The alternative would be to compare only on the tasks solved by all algorithms. However, that would give an advantage to methods that are less greedy, i. e., those that solve fewer problems but find solutions of better quality. Hence that would not reward good anytime performance (for example, breadth-first search would appear superior). Legends in our plots are sorted in decreasing order of final score. As the left and centre panels of Fig. 3 show, RWA* performs best, outperforming the other algorithms by a substantial margin. In the left panel, we compare against the other WA*-based algorithms. The weight-decreasing variant of Anytime A* (Anytime A* WS) is slightly bet- 141 RWA* ARA* Anytime A* WS Anytime A* len 165 165 165 165 1st solution c.i. exp 1142 1142 1142 1142 t 0.07 0.06 0.07 0.07 len 163 163 163 163 2nd solution c.i. exp 153 2370 153 1220 153 1231 153 1255 t 0.08 0.07 0.08 0.07 len 125 161 161 161 3rd (final) solution c.i. exp t 1 16919 0.84 145 5743 0.28 145 5629 13.25 145 206480 0.28 Table 1: Solution sequences for the largest task in the gripper domain. The plan length is denoted by len. The first step in which a solution deviates from the previous is denoted by c.i. (change index); exp is the number of states expanded until the solution is found, and t the runtime in seconds. on large problems, e. g. in the Schedule and Logistics 1998 domains. To summarise, we find that the WA*-based methods perform significantly better than Window A* and beam-stack search. We believe that this is due to the global WA*-based algorithms being better able to deal with inadmissible and locally highly varying heuristic values, whereas Window A* and beam-stack search commit early to certain areas of the search space. Exhausting a state-space area is particularly difficult here as inadmissible heuristic values cannot be used for pruning. Furthermore, beam-stack search cannot make use of the planning-specific preferred operators enhancement. We also performed experiments without preferred operators and delayed evaluation for all algorithms. There, the performance of all algorithms becomes drastically worse (e. g., the WA*-based methods leave twice as many problems unsolved and in addition show worse anytime performance on the solved problems). RWA* still performs better than all other algorithms, though by a much smaller margin than with the search enhancements. With fewer solved tasks and fewer solutions per task, RWA* cannot improve on the other WA* algorithms as much. Also, while the search enhancements are very helpful in finding a solution, they make the search slightly less informed and more greedy, contributing to low-h bias and thus making restarts more effective. We furthermore conducted experiments with other planning heuristics (not shown), including the recent landmark heuristic (Richter et al. 2008) and the context-enhanced additive heuristic (Helmert and Geffner 2008). Compared to the experiments described above, which use the FF heuristic, the performance of all algorithms was worse with these other heuristics. However, the relative performance of the algorithms was similar, with RWA* outperforming the other methods in all cases, though by smaller margins. three improving solutions, with the first solution found after less than one second, see Table 1. All these plans are found very quickly, in less than 15 seconds. However, the last plan found by RWA* is optimal, whereas the other WA* algorithms do not find any improved solutions during the remaining 29 minutes. The noteworthy aspect in this example is the indices of change for the plans, i. e. the steps in which subsequent plans first differ from their predecessors. For ARA* and the two Anytime A* variants these change indices are fairly high (≥ 145), denoting the fact that their subsequent plans only differ in the last 20 actions from the first plan found. For RWA*, the third solution has a change index of 1, i. e. it differs in the first action from the previous plans. This suggests that the big jump in solution quality for RWA* indeed results from further exploration near the start state, whereas the other algorithms unsuccessfully spend their effort in deeper areas of the search space. Other Benchmarks We also ran experiments in two other benchmark domains: the robotic arm domain and the sliding-tile puzzle, which we chose because of their previous use as anytime benchmarks (robotic arm), or their standing as traditional benchmarks in the search literature (sliding-tile puzzle). Robot Motion Planning This domain concerns motion planning for a simulated 20-degree-of-freedom robotic arm (Likhachev et al. 2004). The base of the arm is fixed, and the task is to move its end-effector to the goal while navigating around obstacles. An action is defined as a change of the global angle at a particular joint. The workspace is discretised by overlaying it with a grid of 50 × 50 cells, and the heuristic of a state is the distance from the current location of the robot’s end-effector to the goal, taking obstacles into account. The size of the state space is over 1026 , and in most instances it is infeasible to find an optimal solution. As before, we eliminate trivial instances, leaving 17 of the 22 tasks kindly made available to us by Maxim Likhachev. For ARA*, we use the settings suggested by Likhachev et al., starting with a weight of 3 and decreasing it by 0.02 each time; the other WA*-based algorithms also start with weight 3. Note that because the heuristic is admissible, ARA* can prove suboptimality bounds and make informed decisions on when to reduce its weight (namely, whenever a new bound is proven). For RWA*, on the other hand, a decrease in weight incurs a substantial overhead each time due A Detailed Example We argue that RWA* shows such good results because restarts encourage changes in the beginning of a plan rather than the end. The largest task in the gripper domain provides an illustration. Gripper tasks consist in transporting balls (here 42) from one room to another with the help of a two-armed robot. The initial WA* search phase finds a plan in which all balls are transported separately, whereas in the optimal plan the robot always carries two balls at a time. Window A* does not solve this problem, while beam-stack search finds only one solution (the optimal) after 2.5 seconds. The WA* algorithms find 142 1 0.96 0.95 0.94 Normalised Quality 0.98 0.96 0.94 0.92 0.9 0.92 0.9 0.9 0.85 RWA* Beam Window A* ARA* Anytime A* 0.88 0.86 0.84 1 10 100 Window A* RWA* ARA* Beam Anytime A* 0.8 0.75 1000 0.1 1 Time 10 0.88 0.86 RWA* ARA* / Anytime A* 0.84 100 0.1 Time 1 Time (s) 10 Figure 4: Anytime performance for robot motion planning (left), the 15-puzzle (middle), and an artificial search space (right). to the restart. This is why larger and less frequent weight decreases work better for it. We use a factor of 0.84 to decay the weight for RWA*, resulting in a similar weight schedule as in the planning experiment, reducing the weight whenever an improved solution is found. The results are shown in the left panel of Fig. 4. RWA* outperforms ARA* and Anytime A* for all timeouts above 1 second and shows overall very good anytime performance, including reaching the best final quality. Window A* does well in this domain and shows best results in the early stages of search. Anytime A* performs notably worse than ARA* here, though we found that its weight-decreasing variant (not shown) achieves similar performance as ARA*. Beam-stack search, here with a beam width of 100, takes longer than most of the other methods to achieve high performance, but then comes up steeply and achieves almost the same final quality as RWA*. (Error bars are not shown here due to the high variance across these hand-designed instances.) terms of computation time. The methods that are not based on WA* (Window A* and beam-stack search) perform well in some domains but badly in others. By contrast, RWA* shows robustly good performance over all domains. An Artificial Search Space To get a clearer picture of when restarts are helpful, we conducted experiments on a manually-designed search tree. We fixed a branching factor of 25 and a typical solution depth of 500. Nodes are characterised by their approximate goal distance (agd). The root has an agd-value of 500 and nodes with a value of 0 are labelled as goals. Edge costs are chosen randomly (with uniform probabilities) between 0 and 10, and for a given edge of cost c the agd-value of the reached child varies randomly by up to c from the parent’s value. This is analogous to the costs of walking in a gridworld, where a move of c steps may take an agent c steps closer to the goal, c steps further away or anything in between, depending on the direction the agent walks. The heuristic values were chosen to enforce systematic biases. The heuristic underestimates the agd-values of nodes by up to a certain percentage. The h-value of the root is chosen randomly with this constraint, and the resulting error factor (the ratio of h-value to agd-value) is recorded. For all other nodes, their h-values are correlated with their parent’s h-values such that a child’s h-value differs by no more than 1 from the value that would result when using the same error factor as the parent. This is achieved by first computing the “parent-induced h-value” that would result if a node had the same error factor as its parent. Then we determine a random value for the child within the constraint of heuristic accuracy. The actual h-value for the child is obtained by moving the parent-induced value by 1 into the direction of the random value. Averaged results for 100 runs with different random seeds for a heuristic that underestimates by up to 20% are shown in the right panel of Fig. 4, where RWA* and ARA* were run with a weight schedule of 2, 1.5, 1.25, 1.125, 1. The relative results are similar for heuristics of different accuracy. Note that the weight-decreasing variant of Anytime A* is equivalent to ARA* here, as there are no cycles in the search space. RWA* has a clear advantage over the other algorithms, and we found the same effect as in planning, that the change in- Sliding-Tile Puzzle In the sliding-tile puzzle domain, we tested on the 100 15-puzzle instances from Korf (1985) using the Manhattan distance heuristic. The starting weight for the WA* algorithms was 3, the weight sequence for RWA* being 3, 2, 1.5, 1.25, 1 while ARA* uses small decrements of 0.2. The beam width in beam-stack search was 500. The results are shown in the middle panel of Fig. 4. There are comparatively few different solutions, leading to similar performance of all WA*-based algorithms that decrease weight. Window A* performs best, with RWA* second and ARA* third. With smaller beam widths, the score of beamstack search rises faster in the beginning but it takes longer to achieve perfect quality; with larger beam widths the reverse holds. Summary While in the robot motion domain and the sliding-tile puzzle RWA* does not dominate its competitors as notably as in the PDDL planning experiment, we note that its performance is continually very good. Compared to its most similar competitors, the other WA* algorithms, RWA* is always better or on par. This is also the case in a third domain (gridworld pathplanning), as we demonstrated previously (Richter et al. 2009). This may suggest that even in cases where the restarts are not helpful, they do little harm in 143 References dices of solutions for RWA* are substantially lower than for the other algorithms. When weakening the correlation between the h-values of parents and children, the advantage of RWA* diminishes. For example, if all h-values are chosen completely at random, RWA* has no advantage over ARA*/Anytime A* unless a very accurate heuristic is used. This is due to the fact that without systematic errors in the heuristic, all algorithms explore more nodes in the top of the search tree, rather than committing quickly to a bad path. This is in line with observations by Zahavi et al. (2007), who note that inconsistent heuristics can be beneficial because they make it less likely that the search gets stuck in a region of bad heuristic estimates. In that case, the greedy search makes fewer early mistakes and restarts consequently provide less benefit. Additionally, we found that RWA* gains more advantage if the heuristic underestimates nodes near the root by a stronger factor than nodes that are lower in the tree. In none of the scenarios we looked at did RWA* perform notably worse than the other algorithms. Sandip Aine, P. P. Chakrabarti, and Rajeev Kumar. AWA* – A window constrained anytime heuristic search algorithm. In Proc. IJCAI 2007, pages 2250–2255, 2007. Rina Dechter and Judea Pearl. The optimality of A*. In Laveen Kanal and Vipin Kumar, editors, Search in Artificial Intelligence, pages 166–199. Springer-Verlag, 1988. David Furcy and Sven Koenig. Limited discrepancy beam search. In Proc. IJCAI 2005, pages 125–131, 2005. Carla P. Gomes, Bart Selman, and Henry A. Kautz. Boosting combinatorial search through randomization. In Proc. AAAI 1998, pages 431–437, 1998. Eric A. Hansen and Rong Zhou. Anytime heuristic search. JAIR, 28:267–297, 2007. Eric A. Hansen, Shlomo Zilberstein, and Victor A. Danilchenko. Anytime heuristic search: First results. Technical Report CMPSCI 97-50, University of Massachusetts, Amherst, September 1997. Peter E. Hart, Nils J. Nilsson, and Bertram Raphael. A formal basis for the heuristic determination of minimum cost paths. IEEE Transactions on Systems Science and Cybernetics, 4(2):100–107, 1968. William D. Harvey and Matthew L. Ginsberg. Limited discrepancy search. In Proc. IJCAI 1995, pages 607–615, 1995. Malte Helmert and Héctor Geffner. Unifying the causal graph and additive heuristics. In Proc. ICAPS 2008, pages 140–147, 2008. Malte Helmert and Gabriele Röger. How good is almost perfect? In Proc. AAAI 2008, pages 944–949, 2008. Malte Helmert, Minh Do, and Ioannis Refanidis. IPC 2008, deterministic part. Web site, http://ipc.informatik. uni-freiburg.de, 2008. Malte Helmert. The Fast Downward planning system. JAIR, 26:191–246, 2006. Richard E. Korf. Depth-first iterative-deepending: An optimal admissible tree search. AIJ, 27(1):97–109, 1985. Maxim Likhachev, Geoffrey J. Gordon, and Sebastian Thrun. ARA*: Anytime A* with provable bounds on sub-optimality. In Proc. NIPS-03, 2004. Maxim Likhachev, Dave Ferguson, Geoffrey J. Gordon, Anthony Stentz, and Sebastian Thrun. Anytime search in dynamic graphs. AIJ, 172(14):1613–1643, 2008. Ira Pohl. Heuristic search viewed as path finding in a graph. AIJ, 1:193–204, 1970. Silvia Richter, Malte Helmert, and Matthias Westphal. Landmarks revisited. In Proc. AAAI 2008, pages 975–982, 2008. Silvia Richter, Jordan T. Thayer, and Wheeler Ruml. The joy of forgetting: Faster anytime search via restarting. In International Symposium on Combinatorial Search (SoCS 2009), 2009. Bart Selman, Hector J. Levesque, and David G. Mitchell. A new method for solving hard satisfiability problems. In Proc. AAAI-92, pages 440–446, 1992. Uzi Zahavi, Ariel Felner, Jonathan Schaeffer, and Nathan R. Sturtevant. Inconsistent heuristics. In Proc. AAAI 2007, pages 1211– 1216, 2007. Weixiong Zhang. Complete anytime beam search. In Proc. AAAI 1998, pages 425–430, 1998. Rong Zhou and Eric A. Hansen. Beam-stack search: Integrating backtracking with beam search. In Proc. ICAPS 2005, pages 90– 98, 2005. Discussion As we saw in the example tasks from gridworld and the gripper domain, restarting anytime search overcomes the low-h bias of greedy anytime search that tends to expand nodes near a known goal. The desire to revisit nodes near the start state stems from the fact that a heuristic evaluation function is often less informed near the root of the search tree than near a goal and that early mistakes can be important to correct (Harvey and Ginsberg 1995; Furcy and Koenig 2005). For example, in problems with multiple goal states the heuristic may misjudge the relative distance of the goals and lead a greedy search into a suboptimal part of the search space. Similarly, domains with inaccurate heuristics that have systematic errors can lead to early mistakes and thus benefit from restarts. Conclusion We have demonstrated an important dysfunction in conventional approaches to anytime search: by trying to re-use previous search effort from a greedy search, traditional anytime methods suffer from low-h bias and tend to improve the incumbent solution starting from the end of the solution path. As we showed in this paper, this can be a poor strategy when the heuristic makes early mistakes. The counterintuitive approach of clearing the Open list, as in the Restarting Weighted A* algorithm, can lead to much better performance in such domains while doing little harm in others. Acknowledgements We thank Malte Helmert, Charles Gretton, and Patrik Haslum for helpful input and Maxim Likhachev for making his code available. NICTA is funded by the Australian Government, as represented by the Department of Broadband, Communications and the Digital Economy, and the Australian Research Council, through the ICT Centre of Excellence program. We also gratefully acknowledge support from NSF grant IIS-0812141 and the DARPA CSSG program. 144