From: AAAI Technical Report WS-97-10. Compilation copyright © 1997, AAAI (www.aaai.org). All rights reserved. Commencing Execution Richard Goodwin rgoodwinQwatson.ibm.com IBM, T.J. Watson Research Center Route134, YorktownHeights New York, 10598 (914) 945-2651 Abstract Peal-time search algorithms interleave search with execution in order to accomplisha task ei~ciently. Deciding when to begin execution of the current best action is crucial for creating high a performancerealtime search algorithm. This paper explores the question of when to commenceexecution for an on-line search performed with limited computation. Weex*mine Good’s ideslised algorithm for deciding when to begin execution and illustrate someof its llmltations. Wethen present a revised, idealised algorithm that removessomeof these limitations. The revised algorithm hasbeenusedu s basisforan algorithm that makeson-the-fly decisions for a simplified robot courier tuk. Previous empirical results show that this new algorithm outperforms previously used alsorithms. In this paper, we showthat the di~erence in performance is boundedby a ratio of two to one. We discuss the situations wherethis limit is approached and suggest that overlappins search with execution in these situations is mostbeneficial. Introduction Real time search methods, (Korf 1987), situated action (Suchman 1987) and anytime algorithms (Dean and Boddy 1988) all share the notion that computation and execution need to be interleaved in order to achieve good performance. All three recognized that, faced with s complicated task, some initial search can significantly increase the likelihood of success and increase ei~ciency. Conversely, sitting on one’s hands, considering all possible consequencesof all possible actions does not accomplish anything. At some point, action is called for. But when is this point reached? Deciding when to begin execution is complex. There is no way to know how much a solution will improve with a given amount of additional search. The best that can be achieved is to have some type of statistical model as is done in the anytime planning framework (Dean and Boddy 1988). In addition, the decision should take into account the fact that it may be possible to execute one part of the plan while planning another. Approaches The classical AI planning approach to deciding when to begin execution is to run the planner until it produces a plan that satisfies its goals and then to execute the plan. This approach recogni~,es that finding the plan with the highest expected utility maytake too long and that using the first plan that satisfies all the constraints may produce the best results (Simon and Kadane 1974). It also relies on the heuristic that in searching for a plan, a planner would tend to produce simple plans first and simple plans tend to be more efficient and robust than unnecessarily complex p|~ms, Execution is delayed until a complete plan has been generated. Only when there is a complete plan can there be a guarantee that an action will actually be needed and will not lead to a situation where the goals are no longer achievable. In contrast to the classical AI approach, the reactive approach is to pre-compile all plans and to always perform the action suggested by the stimulus-response rules applicable in the current situation. The answer to the question of when to begin acting is always "now" (Brooks 1986, Schoppers 1987). This approach is applicable when it is possible to pre-compile the plans and in dynamic environments where quick responses are required. A third approach is to explicitly reason about when to start execution while planning is taking place (Russell and Wefald 1991). Using information about the task, the current state of the plan and expected perfonnance of the planner, the decision to execute the current best action is made on-line. This allows execution to begin before a complete plan is created and facilitates overlapping of planning and execution. This approach also allows execution to be delayed in favour of optimizing the plan, if the expected improvementso warrants. Idealized Algorithms In thissection, we present Good’s ideali,.ed algorithm fordeciding whento beginexecution andourrevised version of thealgorithm thataccounts foroverlapping planning andexecution. Bothalgorithms areidealised because they assume that the decision maker knows the effect and duration of each of the possible computations and actions. Since this information is generally not available, we need to approximate it in order to create an operational system. In the next section we operationalize both algorithms and compare the results empirically. Making on-line decisions about when to begin execution can be modeled as a control problem where a recta-level controller allocates resources and time to the planner and decides when to send plans to the execution controller for execution. The control problem is to select actions and computations that maximize overall expected utility. Good’s idealized algorithm, which has been suggested by a number of researchers, selects the action from the set {a, C,,..., Ch} that has highest expected utility (Good 1971, Russell and Wefald 1991). In this list, a is the current best execution action and the C’i’s represent possible computational actions. The utility of each computational action must take into account the duration of the computation by which any act it recommends would be delayed. Of course, this ideal algorithm is non-operational since the improvement that each computation will produce and the duration of the computation may not be known. A practical implementation will require these values to be estimated. Revised Idealized to visit a set of locations and return to the initial location as quickly as possible. Weassume that Euclidian distance between locations is an accurate estimate of the travel time. Any ordering of locations is a valid tour, but some tours require more travel than others. The robot can use a simple edge exchange algorithm (two-opt) to improve its tour, but this computation takes time. The decision to commenceexecution involves a tradsoff between reducing the length of a tour through more computation and delaying the start of execution. Webegin by characterizing the performance of the two-opt algorithm using a performance curve, as is done in anytime planning (Dean and Boddy 1988). fit a curve to the empirical results of running the planner on multiple problems and use the curve to predict the results for new problems. This curve is used to supply the information needed to make each of the idealized algorithms operational. The expected improvement in the tour length, as a function of computation time for the two-opt algorithm, is accurately described by equation 1. In the equation, n is the number of locations and t is the elapsed time spent improving the tour. The parameters A and B are determined empirically and depend on the implementation and the computer used. A is a scaling factor that depends on the units used to measure distance. Measuring distances in meters rather than kilometers would increase A by 1000. B depends on the relative speed of the computer. Doubling the processing speed would decrease B by a factor of one half. Algorithm Good’s algorithm ignores the fact that many agents can act and compute concurrently. A better control question to ask is not whether the agent should act or compute, but whether the agent should act, compute, or do both. The corresponding control problem is to select from the list l(t, n) = nA(I -m) (I) Wecan differentiate this equation to get the rate of improvement as a function of the elapsed time: c,),..., ck),(¢,(¢,where each ordered pair represents an action to perform and a computation to do. ~b represents the null action corresponding to only computing. This is no "act only" pair since acting typically involves (non-planning) computation (e.g. reactive control) that requires some or all the computational resources. The advantage of considering pairs of actions and computations i s that it allows the controller to accept the current action and forgo some expected improvement in favour of using the computational resources to improve the rest of the plan. For example, I may be able to improve the path I am taking to get to my next meeting that saves me more time than the path planning takes. However, I might be better off if I go with the current plan and think about what I am going to say when I get to the meeting instead. l(t, n) = nABs-B* (2) The original idealized algorithm suggests either computing or acting, whichever one produces the best expected result at a given point in time. This corresponds to an execution control strategy that commencesexecution when the rate of tour improvementI(t, n) fails below the rate at which the robot can execute the tour (1:texe). (3) -I(~, n) = nABe-B’ = Reze If the rate of execution (Rexe) and the parameters of the curve (A and B) are known, then the start time is given by: lnt ~a,ffi, ~--3T) (4) B That is, the robot courier should optimize until t,,.. and then begin execution. Wewill call this the anytime methodfor deciding whento begin execution since it is based solely on the performance curve of the two-opt algorithm, an anytime algorithm. t.ar~ = Robot-Courier Tour Planner In this section, we show how to operationalize both Good’s algorithm and our revised algorithm for use by a simple robot courier. The task of the courier robot is 37 Compute only rate Compute and Ac~ rate = = nAB 1 = Reze+ ~.,, (5) 7(t,n- 1)d ae.e+ n - I) + + At,n - I)) At Step-Choice = -t - In((n + 1)ABe-s’ - 2Reze)/B (.- 1)AB Algorithm The decision algorithm based on the original idealised algorithm ignores the possible overlap of planning and execution. Taking this overlap into account gives a different decision procedure that can improve performance under some circumstances. When taking the overlap into account, our algorithm will make a choice about whether to begin execution of the next step or delay execution until the planner has had more time to improve the tour. Wecall our algorithm the step choice algorithm because it make discrete choices at each step rather than calculating a ~,,.,,before beginning execution. In this section, we give the mathematical basis for the step choice algorithm. Webegin by outlining some assumptions that we make in order to simplify the problem. We then give the formula for deciding when to begin execution of the next action. This formula can not be solved in a closed form, but can be approximated. Weconclude this section with a discussion of the implications of the step choice algorithm. In order to simplify the decision problem for the robot courier, we introduce three assumptions. We assume that computation is continuous and interruptible and the expected future bchaviour is predicted by the performance curve given in equation I. This assumption is also implicitly used by the anytime method given in the previous section. In addition, we also assume that execution actions are atomic and that once started, must be completed. The reason for using this assumption is to avoid complications associated with interrupting an action in order to start another. Typically, interrupting an action incurs some communications overhead and some effort to put things in a safe state before abandoning the action to begin another. By treating actions as atomic, we avoid the problem of modeling the cost of interrupting an action. We also avoid considering optimizations that involve interrupting the action. Computing the value of such an optimization may depend on how much of the action has been accomplished, which may introduce more communications overhead between the planner and the executer. For example, the distance from the current location to another loc~tion is constantly changing as the robot courier executes part of a tour. As a result, the value of interrupting the current action and heading towards a new location is also constantly changing 38 and the robot would have to know where it was going to be when an action was interrupted in order to evaluate the usefulness of interrupting an action. Finally, we assume that execution does not affect the amount of computation available for tour improvement. This is realistic if tour improvementis carried out on a separate processor from the one used to monitor execution. If the same processor is used for both tour improvement and execution monitoring, then the assumption is almost true if the monitoring overhead is low. If this assumption does not hold, then our algorithm can be easily modified to account for the degradation in planner performance while executing. Wemake this same assumption when running the anytime algorithm and comparing it to step-choice algorithm in the next section. Our revised, idealized algorithm suggests that the robot should begin execution when the expected rate of improvementdue to tour planning alone is less than the expected rate of improvement due to planning and executing. As with the anytime algorithm, we characterize the expected improvement in the tour using a performance profile. The rate of tour improvement for computing alone is given in equation 5. Whenthe robot chooses to act and compute, the robot is committing to the choice for the duration of the action, given our assumptions about atomic actions. The appropriate rate to use whenevaluating this option is not the instantaneous rate but the average rate over the duration of the action. Equation 6 gives the average expected rate of accomplishment for acting and computing. In this equation, At is the duration of the first action in the tour. Equating 5 and 6 and solving for /kt gives an expression for the time duration of a move the robot would be willing to execute as a function of time spent computing. Unfortunately, the resulting expression has no closed form solution 1. If the integral in equation 6 is replaced by a linear approximation, an expression for At can be found (equation 7). Using the linear approximation over-estimates the rate of computational improvement of the move and compute option and biases the robot towards moving. On the other hand, equation 6 only approximates the rate of optimization after taking a step. It under-estimates the effect of reducing the size of the optimization problem. *Theresult involves solving the omegafunction. (8) Since two-opt is an O(n2) algorithm per exchange, reducing the problem size by 1 has more than a linear effect, as equation 1 suggests. Since the estimates err in opposite directions, they tend to cancel each other out. Examiningthe form of equation 7, it is clear that At is infinite if the argument to the l~ 0 function reaches zero. This corresponds to an expected rate of tour improvement about twice the rate of path execution (Equation 8). At this point, the marginal rate of only computing is zero and the robot should execute any size step remaining in the plan. This suggests that t,,,,, should be chosen so that the anytime strategy waits only until the rate of improvement falls below twice the rate of execution, (I(t,n) nABs-Bt = 2 * Reze). Wecallthis the "Anytime-2" strategy. In the range where At is defined, increasing the rate the execution (Rexe) increased the size of an acceptable step. Similarly, increasing the rate of computation (B) decreases the size of the acceptable action. In the limit, the acceptable step correctly favours either always computing or always acting, as suggested by the relative speed of computing and execution. In between the extremes, the robot courier will take an action wheneverthe time to perform it is less than Z~t and does not necessarily delay execution. Does it make sense to perform an action even when the planner is currently improving the plan more than twice as fast as the robot can execute it? It does if the action is small enough. The amount of potential optimization lost by taking an action is limited by the size of the action. At best, an action could be reduced to zero time (or cost) with further planning. The opportunity cost of forgoing more planning by executing an action is thus limited by the size of the action. Performing short actions at the beginning of the tour removes them from the optimization process and has the advantage that the optimizer is faced with a problem with fewer locations, but nearly the same length tour and nearly ¯ the same opportunity for improvement. expected performance. If the robot gets lucky and the initial randomtour contains short actions at the start of the tour, then the robot can immediately begin execution. Thissuggests thatrobotmightbe betteroff by generating an initial tourusinga greedy algorithm and thenusingtwo-optratherthanusingtwo-opton an initial randomtour.In ourempirical comparison, we ignored thispotential optimization to avoidbiasing theresults infavour of thestepchoice algorithm. In previously reported results, we haveshownthat usingthestepchoicealgorithm produces betterperformancethanmeta-level controlschemesbasedon Good’salgorithm (Goodwin 1994).In a testusing100 examples, eachconsisting of 200randomly generated locations, thestepchoice algorithm gavea 4.5%performancegain whencomparedto the anytimealgorithm approach, despite incurring a 2.5%overhead forperforming on-line meta-level control. Theempirical resultsfromtherobot-courier tour plannershowa slightimprovement dueto takingthe overlapping of planning and execution intoaccount when decidingwhento beginexecution. The questionwe consider in thissection is howmuchof an improvement is possible. We presentan argumentthat the improvement is limitedto doublingperformance by reducing thetotaltimeby a factor of 2. Time Figure h Consider a task where the planner is improving a plan at a rate just slightly faster than the rate at which the plan can be executed ( I(t, n) Reze + el) figure 1. Good’s original algorithm would continue to optimize the plan, but not execute any part of it. The step-choice algorithm would attempt to do both plan and execute. The rate of task achievement is I(t,n1)-{- Reze = 2, Reze +el- e2, where e2 = -i(t,n) 7(t,n - 1). Th e ra tio of the rate s of Results and Limits Using an on-line decision strategy that executes actions that take less than Z~ allows the robot to be opportunistic. If at any point in the tour improvement process the current next moveis smaller that Z~, then the agent can begin moving immediately. In this way, the agent takes advantage of the planner’s performance on the given problem rather than relying only on its 39 In probabilistic domains, we face additional complications since we can not necessarily predict the outcome of an action. One of the reasons for executing actions in a probabilistic environment is to gather information about the resulting state, which allows us to limit the numberof states that needs to be considered. However,the ability to predict the likely results of an action are useful for scheduling computation during execution. Computation should be focused on finding the best actions for the most likely next states. In (Goodwin 1994), we give an example where the expected performance increases when a robot operating in a probabilistic environment begins taking a step that may fail and continues planning as if the step had succeeded. If the action succeeds, the robot continues with its plan. If the action fails, the robot throws awaythe plan from the anticipated state and plans from its current state. Again, this works because no failures are catastrophic and recovery costs are small. task accomplishment approaches 2 as el and e2 upproach zero. If the rate of plan improvement remains relatively constant over the duration of the task, then the revised algorithm performs the task almost twice as quickly as the original algorithm. It is only when the two rates, improvement and execution, are almost equal that considering overlapping has a significant elfeet. If either rate is muchhigher than the other, then the relative improvementis not as significant. Discussion The fact that overlapping execution with optimization can double performance is not surprising since we are basically splitting a task into two parts and working on each one separately. What is surprising is that the straight forward strategy of waiting until the rate of optimization falls below the rate of execution can perform half as well as the optimal strategy. This illustrates that we must be careful of the assumptions we make when designing recta-level control algorithms. Weconclude with a discussion of some of the factors that should be considered when deciding when to commenceexecution, beyond the obvious factors that we have already considered. Wealso consider some of the extentions needed to deal with less benign and probabilistic domains. The step-choice metaAevel control algorithm works for the simplified robot courier because the cost of undoing an action is the same as the cost of performing the action. In fact, because of the triangle inequality, it is rare that the action has to be fully undone. The robot courier would only retrace its path if the locations were co-linear and the new location was in the opposite direction, since the shortest route proceeds directly to the next location. At the other end of the spectrum are domains such as chess playing where it is impossible to undo a move2. In between, there are many domains with a distribution of recovery costs. Consider, as an example, leaving for the office in the morning. Suppose that it is your first day at work, so there is no pre.compiled, optimized plan yet. The first step in your plan is to walk out the door and then to walk to the car. Suppose you do this and while you are walking, your planner discovers that it would be better to bring your brief case. The cost of recovery is relatively cheap. Just walk back into the house and get it. On the other hand, if you forgot your keys, you may now be locked out of the house. In this sort of environment, using only the cost to do an action may not be the best strategy. The decision to begin execution must consider not only the time and resources needed to execute an action but also the time and resources needed to undo an action and probability that the action will need to be undone. References Rodney Brooks. A robust layered control system for a mobile robot. IEEE Journal of Robotics and Au. tom,qion, RA-2:14-23, April 1986. Thomas Dean and Mark Boddy. Aa analysis of time dependent planning. In Proceedings of the Seventh National Conference on Artificial Intelligence (AAAI-88), pages 49-54. AAAI, 1988. I. J. Good. Twenty seven principles of rationality. In Proceeding8 of the $!/mposium on the FOUNDATIONS of STATISTICAL INFERENCE. Rene Descartes Foundation, March 1971. Richard Goodwin. Reasoning about when to start acting. In K. Hammond,editor, Proceedings o/the 5eeond International Conference on Artificial Intel. lifence Planning Systems (AIF$-94), pages 86-91, Chicago, Illinois, June 1994. Richard E. Kor£ Real-time heuristic search: First results. In Proceedings of the Sizth National Conference on Artificial Intelligence (AAAI-87), pages 133-138, 1987. Stuart Russell and Eric Wefald. Do the RigM Thing. MIT Press, 1991. Marcel Schoppers. Universal plans for reactive robots in unpredictable environments. In Proceedings of the Tenth International Joint Conference on Artificial Intelligence (IJCAI-87), Milan, Italy, August 1987. IJCAI. Herbert Simon and Joseph Kadane. Optimal problem-solving search: All-or-nothing solutions. Technical Report CMU-CS-74-41, Carnegie Mellon University, Pittsburgh PA. USA, December 1974. Lucy Suchman. Plans and Situated Actions The problem of Human Machine Communication. Cambridge University Press, 1987. 2Youmaybe able to movea piece back to its previous location, but only after your opponent has had an opportunity to move, so the board will not be in the orisinal configuration unless your opponent also undoes his move. 4O