How to Avoid "Thinking on Your Feet"

advertisement
From: AAAI Technical Report SS-95-07. Compilation copyright © 1995, AAAI (www.aaai.org). All rights reserved.
How to Avoid "Thinking
on Your Feet"
Lloyd
Greenwald*
Department of Computer Science
BrownUniversity, Box 1910, Providence, RI 02912
The commonnotion of "thinking on your feet" refers
to the ability to quickly make decisions in response to
changing situations. Implied in this concept is that
reasoning about taking action takes place at (or very
near) the time of action. In this note we argue that
such a skill is neither practical nor necessary in general. A key research issue concerning approaches to
taking time-critical action in dynamic, uncertain environments is to determine when it is most effective to
reason about taking action.
Over the last decade there has been a shift away
from traditional AI reasoning and representation when
dealing with systems that must take time-critical actions in dynamic, uncertain environments. Many approaches are converging toward a theory of action that
adopts feedback-based paradigms in order to achieve
purposeful reactive behavior. One of the primary difficulties that defines the differences between alternative
approaches is the tension between anticipating all possible contingencies and deliberating quickly in timecritical situations.
Somehave argued that what is at issue is a classic space-time tradeoff. This implies a constant level
of resource expenditure that can be divided between
pre-compiling and storing actions prior to execution
and reasoning about actions during execution. Weargue here that the real issue is more subtle than that.
The issue is not only how much time to spend reasoning about actions but also when to spend that time
and what information to bring to bear on the problem.
Strategic use of time can lead not only to a shifting
of resources between execution-time computation and
storage but to an efficient optimization of overall resource requirements.
The focus of our work is to take advantage of structure in the problem domain and make use of off-line
meta-reasoning. The structure we require is: a model
that can be used for probabilistic prediction of howthe
*This work was supported in part by a National
Science Foundation Presidential
Young Investigator
Award IRI-8957601 and by the Air Force and the Advanced Research Projects Agency of the Department
of Defense under Contract No. F30602-91-C-0041.
88
environment will evolve over time, in response to both
system actions and exogenous processes; and a model
that can be used to estimate the quality of actions that
decision procedures available to the system will attain
for a given amount of computational resources and period for processing. Wemakeuse of this information to
optimize the allocation of limited on-line resources in
order to combine reactive response with adaptable online deliberation. By doing so we may design systems
that achieve the appearance of thinking on their feet
without the burden of performing all reasoning about
action under severe time constraints.
When to Reason
about
Action
Time-critical action in dynamic, uncertain worlds is
hard. There is no reason to believe that any system
can act optimally in all possible situations it mayencounter, whether it acts in response to situations or
tries to anticipate situations ahead of time. Formal
arguments along these lines have been presented by
Ginsberg (Ginsberg 1989b).
Nevertheless, there are important differences between alternative approaches to the logistics of reasoning about action. These differences can directly affect
system performance. Two opposing approaches are to
anticipate all possible contingencies prior to taking action (off-line) or to do no anticipation at all and perform all reasoning at execution-time (on-line). These
represent extremes along a continuum of approaches
available to the system designer.
Ginsberg (Ginsberg 1989a) poses the following related questions that systems that take time-critical action in dynamic, uncertain domains should address:
1. Whenshould a system devote resources to the generation of plans to be cached, and which plans should
be generated?
2. Whenshould a plan be cached?
3. Whenis a cached plan relevant to a particular action
decision?
4. Can cached plans be debugged in some way that
extends their usefulness at run time to situations
outside those for which they were initially designed?
Pre-compilation of Actions One of the earliest
uses of pre-compilation of actions is Nilsson’s (Fikes
et al. 1972, Nilsson 1985) work on triangle tables. In
this work, traditional plans are converted into sets of
state-operator pairs that may be accessed in any order (independent of original plan sequence). Schoppets (Schoppers 1987) generalizes this to the notion
universal plans in which planning continues until actions have been computedfor all possible distinguishable situations. A universal plan is constructed off-line
for a particular domainwith a given goal and then executed without modification. The resulting reactions
specify primitive physical robot manipulations. Schoppers (Schoppers 1989) argues that production systems
are similarly constructed and provide actions that are
simply reactions to contents of working memory.Similarly,
work has been done by Barto, Bradtke and
Singh (Barto et al. 1993) on dynamic programming
approaches that compile planning results into reactive
strategies for real-time control.
The primary argument for this approach (Schoppers
1987) is that by performing all reasoning prior to execution, time-critical reaction can be achieved even in
the most complex, uncertain situations.
The primary
argument against this approach (Ginsberg 1989b)
the enormouscosts of afiticipating all possible situations. Somehave argued that pre-compilation of actions is appropriate in certain domains. Agre and
Chapman’s (Agre and Chapman 1987) research explores some domainsthat are effectively solved by precompiling behaviors into pure reaction.
Pre-compilation
of Actions with Run-Time
Parameters Nilsson (Nilsson 1994) in his teleoreactive programs introduces another approach to precompilation of actions in which he allows for run-time
binding of parameters and run-time compilation of action circuits. He argues that the advantage of runtime compilation is that it reduces the amount of circuitry that must be produced. In a sense, run-time
compilation is just another way to enforce that the
system performs pre-compilation of actions as temporally close as possible to when those actions may be
needed, without actually performing reasoning during
execution. All information known about the problem
just prior to execution-time can be used to compile action circuits for the problem. The effect is that only
the circuits most relevant to the current problem are
instantiated. Where relevance is defined in terms of
initial conditions and not in terms of any assumptions
about the environment itself.
On-Demand Reasoning
about Action
When
discussing systems that provide computationally complex reasoning in response to on-line situations, we are
only interested in those that do so for time-critical
problems in dynamic, uncertain environments. Purely
89
reactive systems are attractive in these environments
because the hard real-time requirement may be arbitrarily small. In general, the amountof time available
for reasoning may vary or maybe characterized by soft
deadlines.
Several approaches have been proposed to vary
the amount of reasoning performed to fit the response requirements. Amongthese are anytime algorithms (Dean and Boddy 1988, Boddy and Dean
1989), flexible computations (Horvitz 1987),
precise computations (Shih et al.
1989), and
design-to-time scheduling (Garvey and Lesser 1993).
Musliner (Musliner 1993) argues that anytime algorithms are insufficient for hard real-time response requirements. His methods guarantee a level of response
subject to hard real-time constraints.
Georgeff and Lansky’s procedural reasoning system
(PRS) (Georgeff and Lansky 1987) controls on-line
computation using hand-coded procedures and on-line
information. PRSis similar to Nilsson’s (Nilsson 1994)
teleo-reactive programs in that it attempts to avoid
overcommittment by reducing the amount of advanced
planning. A related approach is the work of Lyons
and Hendriks (Lyons and Hendriks 1994). Their approach is to construct reactive plans incrementally online in response to unanticipated situations.
Hammond’s (Hammond1989) case-based planning and opportunistic memoryperform similarly. In that work a
planner reasons on-line about combining plans in response to execution-time opportunities. It then caches
the results for future use (both in planning and execution).
Combining Deliberation
and Reaction Chapman (Chapman 1989) argues that combining a classical planner with a reactive system seems to combine
the worst features of both. Nevertheless, many systems have been proposed that combine deliberation
and reaction, particularly for mobile robots. Examples of these systems include Gat’s (Gat 1991) and
Simmons’ (Simmons 1991). In these systems a deliberative reasoning component is combined with a reactive control component. They differ in how the coordination between the components is managed. Pryor
and Collins (Pryor and Collins 1992) argue that opportunistic on-demand reasoning is the key to combining
deliberation and reaction.
What
Information
to Employ
when
Reasoning
about
Action
In conjunction with determining when to reason about
action is the problem of determining what information
to bring to bear on the problem. One of the drawbacks of universal plans is that they do not take advantage of information about the environment and must,
therefore, anticipate all possible situations that could
ever occur during any run of the system. Three sam-
pie forms of information are: models of completeness,
models of the environment and models of computation.
Models of Completeness A complete (or universal) reaction plan has a pre-compiled action for every
possible situation. However, this is not strictly necessary for completeness. Schoppers (Schoppers 1987),
for example, uses conditions (propositions) to partition situations into classes for which the effects of actions are equivalent. This is, in effect, what is done by
traditional AI systems that define actions in terms of
conditions rather than situations. Schoppers (Schoppers 1989) argues that the computational complexity
of universal plans is reduced because predicates are
used instead of actual states.
Manyother systems for reasoning about actions do
not attempt to provide completeness. Brooks (Brooks
1986), for example, relies upon the tight feedback in
his circuits to respond to all situations. But, he does
not guarantee that the programmer in his Behavior
Languagehas anticipated all situations. PRS(Georgeff
and Lansky 1987) does not guarantee completeness.
Dean, Kaelbling, Kirman and Nicholson (Dean el
al. 1993) handle large state spaces by restricting attention to a subset of possible situations and provide a default reaction for those that are neglected.
Musliner (Musliner 1993) effectively restricts attention
to a subset of possible situations and guarantees that
the system stays within that subset until the AI subsystem provides a new set of situation/reactions.
Similarly
Ramadge and Wonham(Ramadge and Wonham
1989) describe supervisory control of discrete event systems (DES) in which failure states are guaranteed unreachable.
Abstraction and aggregation techniques may be used
to reduce the computational resources needed to construct or execute a reaction plan. Boutilier and Dearden (Boutilier and Dearden 1994) build abstract world
views for planning in time-constrained domains. In addition to affecting completeness, abstraction mayaffect
the obtainable value of the associated reaction plan.
Models of the Environment Many systems that
are designed to reason about taking action in dynamic,
uncertain environments do not take advantage of any
a priori knowledge of the environment. In some cases
this leads to losses of computational efficiency. In others it leads to reductions in performance. Somereasons
to model the environment are to predict distributions
of reachable states based on available information, to
account for the effects of exogenous events, or to deduce patterns in the dynamics of the environment.
Both Nilsson (Nilsson 1994) and Schoppers (Schoppets 1987) assume deterministic effects when planning actions for nondeterministic environments. The
penalty for this is that they must provide actions for all
possible situations in order to gain completeness. Fur-
thermore, these assumptions may lead to unintended
interactions between actions.
Drummond, Bresina and Swanson (Drummondet al.
1994) use a stochastic model of action duration in order to construct contingent schedules off-line. In cases
in which they successfully anticipate schedule breaks,
they avoid the computational costs of on-demand reasoning during time-critical on-line execution periods.
Both Dean, Kaelbling, Kirman and Nicholson (Dean
el al. 1993) and Musliner (Musliner 1993) use knowledge of environmental dynamics to restrict attention
to subsets of the space of situations at any given time.
However,Musliner does not explicitly deal with models
of uncertainty.
Drummondand Bresina (Drummond
and Bresina 1990) take into account a stochastic model
of the environment in simulating the possible effects of
actions and sampling the space of situations. Kushmerick, Hanks and Weld (Kushmerick et al. 1993)
also take advantage of models in order to synthesizing
plans in stochastic domains. Lansky (Lansky 1988) has
developed planning systems for deterministic domains
that exploit structural properties of the state space to
expedite planning.
Models of Computation There are various ways
that a system may take advantage of models of its
own computation.
Models that may be used for
meta-reasoning include profiles of computational performance, models of guaranteed response time, and
models of worst-case behavior. These models may be
combined with models of the environment to provide
significant computation and performance benefits.
Kaelbling’s (Kaelbling 1988) Gapps and Rex systems
compile behavior into circuits with guaranteed constant computation cycles. Musliner (Musliner 1993)
guarantees worst-case computation cycles. Georgeff
and Lansky’s (Georgeff and Lansky 1987) PRS does
meta-level reasoning on-line using models of the procedures in its library as well as the current beliefs, goals
and intentions of the system.
Dean and Boddy (Dean and Boddy 1988, Boddy
and Dean1989) and Zilberstein (Zilberstein 1993) capture quality-time performance tradeoffs of anytime algorithms using performance profiles. Meta-level deliberation scheduling is then employedto allocate limited
computational resources. Conditional performance
profiles are used to represent the dependence of performance of a deliberation procedure on various input
parameters.
Simon and Kadane (Simon and Kadane 1975) consider the case of allocating time to search in which
deliberation time is quantized into chunks of fixed,
though not necessarily constant, size. Etzioni (Etzioni
1991) considers a similar model in which the deliberative chunks correspond to different problem solving methods. Russell and Wefald (Russell and Wefald 1989, Russell and Wefald 1991) describe a general approach in which deliberation is considered as
a sequence of inference steps leading to the performance of an action; where the value of an action is
time-dependent. Horvitz (Horvitz 1990) constructs
time-dependent utility model8 and models dynamically
changing computation costs to determine on-line when
to halt computation and take action. He also uses offline analysis to indicate when on-line meta-reasoning
is valuable.
Garvey and Lesser (Garvey and Lesser 1993) model
the task structure of solution methodsand their dependence, duration and quality. They use this structure
to find execution methodsthat optimize quality for the
available time.
Response
Planning
Approach
In the work of Greenwald and Dean on response planning (Greenwald and Dean 1995, Greenwald and Dean
1994) we propose an approach to reasoning about taking action that combines reactive response with adaptable on-line deliberation in solving time-critical planning and scheduling problems. We exploit known
structure and models of the environment in order to
optimize off-line the allocation of scarce on-line processing resources. In this work we consider the production, inventory and distribution (logistics) of on-line
decision-making. Our primary tool is to compute offline both pre-compiled default reactions and strategies
that specify the on-line dispatching and control of decision procedures. Compilation of strategies takes into
account models of anticipated computational demands
of decision procedures under on-line conditions.
Ginsberg (Ginsberg 1989b) proposes that a definition of the difference between cognitive and noncognitive behavior is the ability to expend additional
mental (computational) resources when necessary. The
response planning approach combines the reactivity of
off-line compilation of universal plans with the adaptability of on-line decision-making and on-demandprocessing. On-demand processing is used on predicted
situations rather than in response to the current situation. Careful prediction allows us to restrict our
reasoning to a small subset of the state space. Weuse
stochastic models of the underlying domain to predict
reachable states and models of the computational properties of available decision procedures to determine how
to maximize the allocation of on-line resources. Our
maximizations involve taking advantage of variabilities
in anticipated computational demand across time. We
address Ginsberg’s complaints about reactive systems
by allowing flexibility in both the amount of thinking
we do and when we do the thinking.
Wehave developed the planning and execution architecture of Figure 1 that explicitly considers the problem of dynamically shifting deliberation focus. This
architecture allows for concurrent planning and execution that is tailored to the dynamic nature of the
environment. The planning and execution component
is divided into several on-line deliberative components
and an on-line reactive component that executes the
non-stationary policies generated by the deliberative
components. The policies are generated on-line by deliberative decision procedures f. Due to the combinatorics involved in deliberative processing there is typically a delay A between the time a state triggers the
deliberative process and the time the resulting policy
is available for execution. The choice of decision procedure and appropriate calling parameters is determined
by the strategy table. A stochastic model of the environment is used both during on-line deliberation to
predict future states and during strategy table construction to discern patterns of shifting dynamics.
g(x(t),u(t))
Environment
1_
r
---~
Model of
Environment
r(x(o)
:
Off-Line
Strategy Table
__
u~
Ac~on
Policy
x(O
7urrent
State
redicted
~ T States
f(...)
On-Line I Ill[
, Diliberati°n[I]~
-
~t+a
i
?n-LineReaetivity
l[
Figure 1: Planning and Execution Architecture
This approach is particularly suited to problems that
can be modelled by sequences of problem instances
with predictable, time-varying problem difficulty in
which the existence of exogenous processes and realtime requirements makes computational delay costly.
Our approach provides substantial improvements over
fixed-allocations of resources in these problems. Offline processing is used to alleviate the need for expensive on-line meta-reasoning while maintaining the
adaptability of on-line decision-making in the face of
uncertain information.
Weargue that we have addressed three of Ginsberg’s
four questions. Weuse deliberation scheduling and online strategy tables to specify whena system should devote resources to the generation of plans to be cached
and which plans should be generated. Weuse prediction based on models of the environment to determine
when a plan should be cached. And we use situation
specific reaction to determine whena cached plan is relevant to a particular action decision. Furthermore, we
are exploring abstraction issues to help extend cached
plans to situations
initially designed.
outside those for which they were
Acknowledgements
Thanks to Mike Williamson and Thomas Dean for
their commentson this paper.
References
Agre, Philip E. and Chapman,David 1987. Pengi: An
implementation of a theory of activity. In Proceedings
AAAI-87. AAAI. 268-272.
Barto, Andrew G.; Bradtke, Steven J.; and Singh,
Satinder P. 1993. Learning to act using real-time dynamic programming. Submitted to AIJ Special Issue on Computational Theories of Interaction and
Agency.
Boddy, Mark and Dean, Thomas 1989. Solving timedependent planning problems. In Proceedings IJCAI
11. IJCAII. 979-984.
Boutilier, C. and Dearden, R. 1994. Using abstractions for decision theoretic planning with time constraints.
In Proceedings AAAI-94. AAAI.
Brooks, Rodney A. 1986. A robust layered control
system for a mobile robot. IEEE Journal o/Robotics
and Automation 2:14-23.
Chapman, David 1989. Penguins can make cake. AI
Magazine 10(4):45-50.
Dean, Thomas and Boddy, Mark 1988. An analysis
of time-dependent planning. In Proceedings AAAI-88.
AAAI. 49-54.
Dean, Thomas; Kaelbling, Leslie; Kirman, Jak; and
Nicholson, Ann 1993. Planning with deadlines in
stochastic domains. In Proceedings AAAI-93. AAAI.
574-579.
Drummond, Mark and Bresina, John 1990. Anytime
synthetic projection: Maximizing the probability of
goal satisfaction.
In Proceedings AAAI-90. AAAI.
138-144.
Drummond, Mark; Bresina, John; and Swanson,
Keith 1994. Just-in-case scheduling. In Proceedings
AAAI-94. AAAI.
Etzioni, Oren 1991. Embeddingdecision-analytic control in a learning architecture. Artificial Intelligence
49(1-3):129-160.
Fikes, Richard E.; Hart, Peter E.; and Nilsson, Nils J.
1972. Learning and executing generalized robot plans.
Artificial Intelligence 3:251-288.
Garvey, Alan and Lesser, Victor 1993. A survey of research in deliberative real-time artificial intelligence.
Technical Report 93-84, University of Massachusetts
Department of Computer Science, Amherst, MA.
Gat, Erann 1991. Integrating reaction and planning in
a heterogeneous asynchronous architecture for mobile
robot navigation. SIGARTBulletin 2(4).
92
Georgeff, Michael P. and Lansky, AmyL. 1987. Reactive reasoning and planning. In Proceedings AAAI-87.
AAAI. 677-682.
Ginsberg, Matthew L. 1989a. Ginsberg replies to
chapman and schoppers: Universal planning research:
A good or bad idea? AI Magazine 10(4):61-62.
Ginsberg, Matthew L. 1989b. Universal planning: An
(almost) universally bad idea. AI Magazine 10(4):4044.
Greenwald, Lloyd and Dean, Thomas 1994. Solving time-critical decision-making problems with predictable computational demands. In Second International Conference on AI Planning Systems.
Greenwald, Lloyd and Dean, Thomas 1995. Anticipating computational demands when solving timecritical decision-making problems. In Goldberg, K.;
Halperin, D.; Latombe, J.C.; and Wilson, R., editors
1995, The Algorithmic Foundations of Robotics. A. K.
Peters, Boston, MA. Proceedings from the workshop
held in February, 1994.
IIammond, Kristian J. 1989. Opportunistic memory.
In Proceedings IJCAI 11. IJCAII.
Horvitz, Eric J. 1987. Reasoning about beliefs and
actions under computational resource constraints. In
Proceedings of the 1987 Workshop on Uncertainty in
Artificial Intelligence.
IIorvitz, Eric J. 1990. Computation and action under
bounded resources. Ph.D. Thesis, Stanford University Departments of Computer Science and Medicine,
Stanford, California.
Kaelbling, Leslie Pack 1988. Goals as parallel program specifications.
In Proceedings AAAI-88. AAAI.
60-65.
Kushmerick, N.; Hanks, S.; and Weld, D. 1993. An
Algorithm for Probabilistic Planning. Technical Report 93-06-03, University of Washington Department
of Computer Science and Engineering. To appear in
Artificial Intelligence.
Lansky, AmyL. 1988. Localized event-based reasoning for multiagent domains. Computational Intelligence 4(4).
Lyons, D. M. and ttendriks, A. J. 1994. Testing incremental adaptation. In Second International Conference on AI Planning Systems.
Musliner, David John 1993. CIRCA: The Cooperative
Intelligent
Real-Time Control Architecture. Ph.D.
Dissertation,
The University of Michigan, Ann Arbor, MI.
Nilsson, Nils 1985. Triangle tables: a proposal for a
robot programming language. Technical report, SRI
International.
Nilsson, Nils 1994. Teleo-reactive programs for agent
control. Journal of Artificial Intelligence Research
1:139-158.
Pryor, Louise and Collins, Gregg 1992. Reference
features as guides to reasoning about opportunities.
In Proceedings Fourteenth Annual Conference of the
Cognitive Science Society. Cognitive Science Society.
230-235.
Ramadge, Peter and Wonham, Murray 1989. The
control of discrete event systems. Proceedings of the
IEEE 77(1):81-98.
Russell, Stuart J. and Wefald, Eric H. 1989. Principles of metareasoning. In Brachman, Ronald J.;
Levesque, Hector J.; and Reiter, Raymond, editors
1989, Proceedings of the First International Conference on Principles of Knowledge Representation and
Reasoning. Morgan-Kaufmann,Los Altos, California.
400-411.
Russell, Stuart J. and Wefald, Eric H. 1991. Do the
Right Thing: Studies in Limited Rationality.
MIT
Press, Cambridge, MA.
Schoppers, Marcel J. 1987. Universal plans for reactive robots in unpredictable environments. In Proceedings IJCAI 10. IJCAII. 1039-1046.
Schoppers, Marcel J. 1989. In defense of reaction
plans as caches. AI Magazine 10(4):51-60.
Shih, W-K.; Liu, J. W. S.; and Chung, J-Y. 1989.
Fast algorithms for scheduling imprecise computations. In Proceedings of the Real-Time Systems Symposium. IEEE. 12-19.
Simmons, Reid 1991. Coordinating planning, perception, and action for mobile robots. SIGARTBulletin
2(4).
Simon, Herbert A. and Kadane, Joseph B. 1975. Optimal problem-solving search: All-or-none solutions.
Artificial Intelligence 6:235-247.
Zilberstein,
Shlomo 1993. Operational Rationality
through Compilation of Anytime Algorithms. Ph.D.
Dissertation, University of California at Berkeley.
Download