From: AAAI Technical Report SS-95-07. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved. How to Avoid "Thinking on Your Feet" Lloyd Greenwald* Department of Computer Science BrownUniversity, Box 1910, Providence, RI 02912 The commonnotion of "thinking on your feet" refers to the ability to quickly make decisions in response to changing situations. Implied in this concept is that reasoning about taking action takes place at (or very near) the time of action. In this note we argue that such a skill is neither practical nor necessary in general. A key research issue concerning approaches to taking time-critical action in dynamic, uncertain environments is to determine when it is most effective to reason about taking action. Over the last decade there has been a shift away from traditional AI reasoning and representation when dealing with systems that must take time-critical actions in dynamic, uncertain environments. Many approaches are converging toward a theory of action that adopts feedback-based paradigms in order to achieve purposeful reactive behavior. One of the primary difficulties that defines the differences between alternative approaches is the tension between anticipating all possible contingencies and deliberating quickly in timecritical situations. Somehave argued that what is at issue is a classic space-time tradeoff. This implies a constant level of resource expenditure that can be divided between pre-compiling and storing actions prior to execution and reasoning about actions during execution. Weargue here that the real issue is more subtle than that. The issue is not only how much time to spend reasoning about actions but also when to spend that time and what information to bring to bear on the problem. Strategic use of time can lead not only to a shifting of resources between execution-time computation and storage but to an efficient optimization of overall resource requirements. The focus of our work is to take advantage of structure in the problem domain and make use of off-line meta-reasoning. The structure we require is: a model that can be used for probabilistic prediction of howthe *This work was supported in part by a National Science Foundation Presidential Young Investigator Award IRI-8957601 and by the Air Force and the Advanced Research Projects Agency of the Department of Defense under Contract No. F30602-91-C-0041. 88 environment will evolve over time, in response to both system actions and exogenous processes; and a model that can be used to estimate the quality of actions that decision procedures available to the system will attain for a given amount of computational resources and period for processing. Wemakeuse of this information to optimize the allocation of limited on-line resources in order to combine reactive response with adaptable online deliberation. By doing so we may design systems that achieve the appearance of thinking on their feet without the burden of performing all reasoning about action under severe time constraints. When to Reason about Action Time-critical action in dynamic, uncertain worlds is hard. There is no reason to believe that any system can act optimally in all possible situations it mayencounter, whether it acts in response to situations or tries to anticipate situations ahead of time. Formal arguments along these lines have been presented by Ginsberg (Ginsberg 1989b). Nevertheless, there are important differences between alternative approaches to the logistics of reasoning about action. These differences can directly affect system performance. Two opposing approaches are to anticipate all possible contingencies prior to taking action (off-line) or to do no anticipation at all and perform all reasoning at execution-time (on-line). These represent extremes along a continuum of approaches available to the system designer. Ginsberg (Ginsberg 1989a) poses the following related questions that systems that take time-critical action in dynamic, uncertain domains should address: 1. Whenshould a system devote resources to the generation of plans to be cached, and which plans should be generated? 2. Whenshould a plan be cached? 3. Whenis a cached plan relevant to a particular action decision? 4. Can cached plans be debugged in some way that extends their usefulness at run time to situations outside those for which they were initially designed? Pre-compilation of Actions One of the earliest uses of pre-compilation of actions is Nilsson’s (Fikes et al. 1972, Nilsson 1985) work on triangle tables. In this work, traditional plans are converted into sets of state-operator pairs that may be accessed in any order (independent of original plan sequence). Schoppets (Schoppers 1987) generalizes this to the notion universal plans in which planning continues until actions have been computedfor all possible distinguishable situations. A universal plan is constructed off-line for a particular domainwith a given goal and then executed without modification. The resulting reactions specify primitive physical robot manipulations. Schoppers (Schoppers 1989) argues that production systems are similarly constructed and provide actions that are simply reactions to contents of working memory.Similarly, work has been done by Barto, Bradtke and Singh (Barto et al. 1993) on dynamic programming approaches that compile planning results into reactive strategies for real-time control. The primary argument for this approach (Schoppers 1987) is that by performing all reasoning prior to execution, time-critical reaction can be achieved even in the most complex, uncertain situations. The primary argument against this approach (Ginsberg 1989b) the enormouscosts of afiticipating all possible situations. Somehave argued that pre-compilation of actions is appropriate in certain domains. Agre and Chapman’s (Agre and Chapman 1987) research explores some domainsthat are effectively solved by precompiling behaviors into pure reaction. Pre-compilation of Actions with Run-Time Parameters Nilsson (Nilsson 1994) in his teleoreactive programs introduces another approach to precompilation of actions in which he allows for run-time binding of parameters and run-time compilation of action circuits. He argues that the advantage of runtime compilation is that it reduces the amount of circuitry that must be produced. In a sense, run-time compilation is just another way to enforce that the system performs pre-compilation of actions as temporally close as possible to when those actions may be needed, without actually performing reasoning during execution. All information known about the problem just prior to execution-time can be used to compile action circuits for the problem. The effect is that only the circuits most relevant to the current problem are instantiated. Where relevance is defined in terms of initial conditions and not in terms of any assumptions about the environment itself. On-Demand Reasoning about Action When discussing systems that provide computationally complex reasoning in response to on-line situations, we are only interested in those that do so for time-critical problems in dynamic, uncertain environments. Purely 89 reactive systems are attractive in these environments because the hard real-time requirement may be arbitrarily small. In general, the amountof time available for reasoning may vary or maybe characterized by soft deadlines. Several approaches have been proposed to vary the amount of reasoning performed to fit the response requirements. Amongthese are anytime algorithms (Dean and Boddy 1988, Boddy and Dean 1989), flexible computations (Horvitz 1987), precise computations (Shih et al. 1989), and design-to-time scheduling (Garvey and Lesser 1993). Musliner (Musliner 1993) argues that anytime algorithms are insufficient for hard real-time response requirements. His methods guarantee a level of response subject to hard real-time constraints. Georgeff and Lansky’s procedural reasoning system (PRS) (Georgeff and Lansky 1987) controls on-line computation using hand-coded procedures and on-line information. PRSis similar to Nilsson’s (Nilsson 1994) teleo-reactive programs in that it attempts to avoid overcommittment by reducing the amount of advanced planning. A related approach is the work of Lyons and Hendriks (Lyons and Hendriks 1994). Their approach is to construct reactive plans incrementally online in response to unanticipated situations. Hammond’s (Hammond1989) case-based planning and opportunistic memoryperform similarly. In that work a planner reasons on-line about combining plans in response to execution-time opportunities. It then caches the results for future use (both in planning and execution). Combining Deliberation and Reaction Chapman (Chapman 1989) argues that combining a classical planner with a reactive system seems to combine the worst features of both. Nevertheless, many systems have been proposed that combine deliberation and reaction, particularly for mobile robots. Examples of these systems include Gat’s (Gat 1991) and Simmons’ (Simmons 1991). In these systems a deliberative reasoning component is combined with a reactive control component. They differ in how the coordination between the components is managed. Pryor and Collins (Pryor and Collins 1992) argue that opportunistic on-demand reasoning is the key to combining deliberation and reaction. What Information to Employ when Reasoning about Action In conjunction with determining when to reason about action is the problem of determining what information to bring to bear on the problem. One of the drawbacks of universal plans is that they do not take advantage of information about the environment and must, therefore, anticipate all possible situations that could ever occur during any run of the system. Three sam- pie forms of information are: models of completeness, models of the environment and models of computation. Models of Completeness A complete (or universal) reaction plan has a pre-compiled action for every possible situation. However, this is not strictly necessary for completeness. Schoppers (Schoppers 1987), for example, uses conditions (propositions) to partition situations into classes for which the effects of actions are equivalent. This is, in effect, what is done by traditional AI systems that define actions in terms of conditions rather than situations. Schoppers (Schoppers 1989) argues that the computational complexity of universal plans is reduced because predicates are used instead of actual states. Manyother systems for reasoning about actions do not attempt to provide completeness. Brooks (Brooks 1986), for example, relies upon the tight feedback in his circuits to respond to all situations. But, he does not guarantee that the programmer in his Behavior Languagehas anticipated all situations. PRS(Georgeff and Lansky 1987) does not guarantee completeness. Dean, Kaelbling, Kirman and Nicholson (Dean el al. 1993) handle large state spaces by restricting attention to a subset of possible situations and provide a default reaction for those that are neglected. Musliner (Musliner 1993) effectively restricts attention to a subset of possible situations and guarantees that the system stays within that subset until the AI subsystem provides a new set of situation/reactions. Similarly Ramadge and Wonham(Ramadge and Wonham 1989) describe supervisory control of discrete event systems (DES) in which failure states are guaranteed unreachable. Abstraction and aggregation techniques may be used to reduce the computational resources needed to construct or execute a reaction plan. Boutilier and Dearden (Boutilier and Dearden 1994) build abstract world views for planning in time-constrained domains. In addition to affecting completeness, abstraction mayaffect the obtainable value of the associated reaction plan. Models of the Environment Many systems that are designed to reason about taking action in dynamic, uncertain environments do not take advantage of any a priori knowledge of the environment. In some cases this leads to losses of computational efficiency. In others it leads to reductions in performance. Somereasons to model the environment are to predict distributions of reachable states based on available information, to account for the effects of exogenous events, or to deduce patterns in the dynamics of the environment. Both Nilsson (Nilsson 1994) and Schoppers (Schoppets 1987) assume deterministic effects when planning actions for nondeterministic environments. The penalty for this is that they must provide actions for all possible situations in order to gain completeness. Fur- thermore, these assumptions may lead to unintended interactions between actions. Drummond, Bresina and Swanson (Drummondet al. 1994) use a stochastic model of action duration in order to construct contingent schedules off-line. In cases in which they successfully anticipate schedule breaks, they avoid the computational costs of on-demand reasoning during time-critical on-line execution periods. Both Dean, Kaelbling, Kirman and Nicholson (Dean el al. 1993) and Musliner (Musliner 1993) use knowledge of environmental dynamics to restrict attention to subsets of the space of situations at any given time. However,Musliner does not explicitly deal with models of uncertainty. Drummondand Bresina (Drummond and Bresina 1990) take into account a stochastic model of the environment in simulating the possible effects of actions and sampling the space of situations. Kushmerick, Hanks and Weld (Kushmerick et al. 1993) also take advantage of models in order to synthesizing plans in stochastic domains. Lansky (Lansky 1988) has developed planning systems for deterministic domains that exploit structural properties of the state space to expedite planning. Models of Computation There are various ways that a system may take advantage of models of its own computation. Models that may be used for meta-reasoning include profiles of computational performance, models of guaranteed response time, and models of worst-case behavior. These models may be combined with models of the environment to provide significant computation and performance benefits. Kaelbling’s (Kaelbling 1988) Gapps and Rex systems compile behavior into circuits with guaranteed constant computation cycles. Musliner (Musliner 1993) guarantees worst-case computation cycles. Georgeff and Lansky’s (Georgeff and Lansky 1987) PRS does meta-level reasoning on-line using models of the procedures in its library as well as the current beliefs, goals and intentions of the system. Dean and Boddy (Dean and Boddy 1988, Boddy and Dean1989) and Zilberstein (Zilberstein 1993) capture quality-time performance tradeoffs of anytime algorithms using performance profiles. Meta-level deliberation scheduling is then employedto allocate limited computational resources. Conditional performance profiles are used to represent the dependence of performance of a deliberation procedure on various input parameters. Simon and Kadane (Simon and Kadane 1975) consider the case of allocating time to search in which deliberation time is quantized into chunks of fixed, though not necessarily constant, size. Etzioni (Etzioni 1991) considers a similar model in which the deliberative chunks correspond to different problem solving methods. Russell and Wefald (Russell and Wefald 1989, Russell and Wefald 1991) describe a general approach in which deliberation is considered as a sequence of inference steps leading to the performance of an action; where the value of an action is time-dependent. Horvitz (Horvitz 1990) constructs time-dependent utility model8 and models dynamically changing computation costs to determine on-line when to halt computation and take action. He also uses offline analysis to indicate when on-line meta-reasoning is valuable. Garvey and Lesser (Garvey and Lesser 1993) model the task structure of solution methodsand their dependence, duration and quality. They use this structure to find execution methodsthat optimize quality for the available time. Response Planning Approach In the work of Greenwald and Dean on response planning (Greenwald and Dean 1995, Greenwald and Dean 1994) we propose an approach to reasoning about taking action that combines reactive response with adaptable on-line deliberation in solving time-critical planning and scheduling problems. We exploit known structure and models of the environment in order to optimize off-line the allocation of scarce on-line processing resources. In this work we consider the production, inventory and distribution (logistics) of on-line decision-making. Our primary tool is to compute offline both pre-compiled default reactions and strategies that specify the on-line dispatching and control of decision procedures. Compilation of strategies takes into account models of anticipated computational demands of decision procedures under on-line conditions. Ginsberg (Ginsberg 1989b) proposes that a definition of the difference between cognitive and noncognitive behavior is the ability to expend additional mental (computational) resources when necessary. The response planning approach combines the reactivity of off-line compilation of universal plans with the adaptability of on-line decision-making and on-demandprocessing. On-demand processing is used on predicted situations rather than in response to the current situation. Careful prediction allows us to restrict our reasoning to a small subset of the state space. Weuse stochastic models of the underlying domain to predict reachable states and models of the computational properties of available decision procedures to determine how to maximize the allocation of on-line resources. Our maximizations involve taking advantage of variabilities in anticipated computational demand across time. We address Ginsberg’s complaints about reactive systems by allowing flexibility in both the amount of thinking we do and when we do the thinking. Wehave developed the planning and execution architecture of Figure 1 that explicitly considers the problem of dynamically shifting deliberation focus. This architecture allows for concurrent planning and execution that is tailored to the dynamic nature of the environment. The planning and execution component is divided into several on-line deliberative components and an on-line reactive component that executes the non-stationary policies generated by the deliberative components. The policies are generated on-line by deliberative decision procedures f. Due to the combinatorics involved in deliberative processing there is typically a delay A between the time a state triggers the deliberative process and the time the resulting policy is available for execution. The choice of decision procedure and appropriate calling parameters is determined by the strategy table. A stochastic model of the environment is used both during on-line deliberation to predict future states and during strategy table construction to discern patterns of shifting dynamics. g(x(t),u(t)) Environment 1_ r ---~ Model of Environment r(x(o) : Off-Line Strategy Table __ u~ Ac~on Policy x(O 7urrent State redicted ~ T States f(...) On-Line I Ill[ , Diliberati°n[I]~ - ~t+a i ?n-LineReaetivity l[ Figure 1: Planning and Execution Architecture This approach is particularly suited to problems that can be modelled by sequences of problem instances with predictable, time-varying problem difficulty in which the existence of exogenous processes and realtime requirements makes computational delay costly. Our approach provides substantial improvements over fixed-allocations of resources in these problems. Offline processing is used to alleviate the need for expensive on-line meta-reasoning while maintaining the adaptability of on-line decision-making in the face of uncertain information. Weargue that we have addressed three of Ginsberg’s four questions. Weuse deliberation scheduling and online strategy tables to specify whena system should devote resources to the generation of plans to be cached and which plans should be generated. Weuse prediction based on models of the environment to determine when a plan should be cached. And we use situation specific reaction to determine whena cached plan is relevant to a particular action decision. Furthermore, we are exploring abstraction issues to help extend cached plans to situations initially designed. outside those for which they were Acknowledgements Thanks to Mike Williamson and Thomas Dean for their commentson this paper. References Agre, Philip E. and Chapman,David 1987. Pengi: An implementation of a theory of activity. In Proceedings AAAI-87. AAAI. 268-272. Barto, Andrew G.; Bradtke, Steven J.; and Singh, Satinder P. 1993. Learning to act using real-time dynamic programming. Submitted to AIJ Special Issue on Computational Theories of Interaction and Agency. Boddy, Mark and Dean, Thomas 1989. Solving timedependent planning problems. In Proceedings IJCAI 11. IJCAII. 979-984. Boutilier, C. and Dearden, R. 1994. Using abstractions for decision theoretic planning with time constraints. In Proceedings AAAI-94. AAAI. Brooks, Rodney A. 1986. A robust layered control system for a mobile robot. IEEE Journal o/Robotics and Automation 2:14-23. Chapman, David 1989. Penguins can make cake. AI Magazine 10(4):45-50. Dean, Thomas and Boddy, Mark 1988. An analysis of time-dependent planning. In Proceedings AAAI-88. AAAI. 49-54. Dean, Thomas; Kaelbling, Leslie; Kirman, Jak; and Nicholson, Ann 1993. Planning with deadlines in stochastic domains. In Proceedings AAAI-93. AAAI. 574-579. Drummond, Mark and Bresina, John 1990. Anytime synthetic projection: Maximizing the probability of goal satisfaction. In Proceedings AAAI-90. AAAI. 138-144. Drummond, Mark; Bresina, John; and Swanson, Keith 1994. Just-in-case scheduling. In Proceedings AAAI-94. AAAI. Etzioni, Oren 1991. Embeddingdecision-analytic control in a learning architecture. Artificial Intelligence 49(1-3):129-160. Fikes, Richard E.; Hart, Peter E.; and Nilsson, Nils J. 1972. Learning and executing generalized robot plans. Artificial Intelligence 3:251-288. Garvey, Alan and Lesser, Victor 1993. A survey of research in deliberative real-time artificial intelligence. Technical Report 93-84, University of Massachusetts Department of Computer Science, Amherst, MA. Gat, Erann 1991. Integrating reaction and planning in a heterogeneous asynchronous architecture for mobile robot navigation. SIGARTBulletin 2(4). 92 Georgeff, Michael P. and Lansky, AmyL. 1987. Reactive reasoning and planning. In Proceedings AAAI-87. AAAI. 677-682. Ginsberg, Matthew L. 1989a. Ginsberg replies to chapman and schoppers: Universal planning research: A good or bad idea? AI Magazine 10(4):61-62. Ginsberg, Matthew L. 1989b. Universal planning: An (almost) universally bad idea. AI Magazine 10(4):4044. Greenwald, Lloyd and Dean, Thomas 1994. Solving time-critical decision-making problems with predictable computational demands. In Second International Conference on AI Planning Systems. Greenwald, Lloyd and Dean, Thomas 1995. Anticipating computational demands when solving timecritical decision-making problems. In Goldberg, K.; Halperin, D.; Latombe, J.C.; and Wilson, R., editors 1995, The Algorithmic Foundations of Robotics. A. K. Peters, Boston, MA. Proceedings from the workshop held in February, 1994. IIammond, Kristian J. 1989. Opportunistic memory. In Proceedings IJCAI 11. IJCAII. Horvitz, Eric J. 1987. Reasoning about beliefs and actions under computational resource constraints. In Proceedings of the 1987 Workshop on Uncertainty in Artificial Intelligence. IIorvitz, Eric J. 1990. Computation and action under bounded resources. Ph.D. Thesis, Stanford University Departments of Computer Science and Medicine, Stanford, California. Kaelbling, Leslie Pack 1988. Goals as parallel program specifications. In Proceedings AAAI-88. AAAI. 60-65. Kushmerick, N.; Hanks, S.; and Weld, D. 1993. An Algorithm for Probabilistic Planning. Technical Report 93-06-03, University of Washington Department of Computer Science and Engineering. To appear in Artificial Intelligence. Lansky, AmyL. 1988. Localized event-based reasoning for multiagent domains. Computational Intelligence 4(4). Lyons, D. M. and ttendriks, A. J. 1994. Testing incremental adaptation. In Second International Conference on AI Planning Systems. Musliner, David John 1993. CIRCA: The Cooperative Intelligent Real-Time Control Architecture. Ph.D. Dissertation, The University of Michigan, Ann Arbor, MI. Nilsson, Nils 1985. Triangle tables: a proposal for a robot programming language. Technical report, SRI International. Nilsson, Nils 1994. Teleo-reactive programs for agent control. Journal of Artificial Intelligence Research 1:139-158. Pryor, Louise and Collins, Gregg 1992. Reference features as guides to reasoning about opportunities. In Proceedings Fourteenth Annual Conference of the Cognitive Science Society. Cognitive Science Society. 230-235. Ramadge, Peter and Wonham, Murray 1989. The control of discrete event systems. Proceedings of the IEEE 77(1):81-98. Russell, Stuart J. and Wefald, Eric H. 1989. Principles of metareasoning. In Brachman, Ronald J.; Levesque, Hector J.; and Reiter, Raymond, editors 1989, Proceedings of the First International Conference on Principles of Knowledge Representation and Reasoning. Morgan-Kaufmann,Los Altos, California. 400-411. Russell, Stuart J. and Wefald, Eric H. 1991. Do the Right Thing: Studies in Limited Rationality. MIT Press, Cambridge, MA. Schoppers, Marcel J. 1987. Universal plans for reactive robots in unpredictable environments. In Proceedings IJCAI 10. IJCAII. 1039-1046. Schoppers, Marcel J. 1989. In defense of reaction plans as caches. AI Magazine 10(4):51-60. Shih, W-K.; Liu, J. W. S.; and Chung, J-Y. 1989. Fast algorithms for scheduling imprecise computations. In Proceedings of the Real-Time Systems Symposium. IEEE. 12-19. Simmons, Reid 1991. Coordinating planning, perception, and action for mobile robots. SIGARTBulletin 2(4). Simon, Herbert A. and Kadane, Joseph B. 1975. Optimal problem-solving search: All-or-none solutions. Artificial Intelligence 6:235-247. Zilberstein, Shlomo 1993. Operational Rationality through Compilation of Anytime Algorithms. Ph.D. Dissertation, University of California at Berkeley.