I Coordinating Control Modes with Control Situations Barbara Hayes-Roth

From: AAAI Technical Report SS-93-03. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved.
ICoordinating Control Modeswith Control Situations
Barbara Hayes-Roth
KnowledgeSystems Laboratory
Stanford University
701 WelchRoad, B.dg C
Palo Alto, CA94304
e-mail: bhr@ksl.stanford.edu
Anintelligent agent has multiple goals, limited
resources, and a dynamicreal-time task environment.
Becauseit has manypredictable and unpredictable
opportunities to act, the agent must"control" its
actions-it must decide whichof manypossible actions
to execute at each opportunity over a period of time. In
a given situation, the agent’s control decisions
determine whetherit achieves its goals, what resources
it consumes,and what side effects it produces. More
generally, the agent’s approachto control determines
the rangeof situations it can handle,its flexibility in
respondingto unanticipated conditions, and its
coherence and comprehensibility to observers. How
should an agent control its actions in order to achieve a
high global utility?
Under the well-known "planning model" [4], an
agent plans its actions aheadof time. Givenits goals,
potential actions, and current and expectedstates of the
environment,the agent constructs a logical sequenceof
specific executable actions, perhaps with some
conditionality, whoseexecution will achieve the goals.
Then it executes the planned actions. To guard against
failures, the agent monitorsplan execution and, if
necessary, replans. Althoughthere are different
methodsfor constructing plans, all operationalize plans
as executable programsof action. For present purposes,
they are equivalent. The planning modelhas several
strengths. Anagent can construct plans that have
desirable global properties regarding goals, resources,
and side effects, as well as coherenceand
comprehensibility. Whilefollowing a plan, it can
conserve resources that would otherwise be spent
perceiving the environment, deciding among
alternative actions, or instantiating specific executable
actions; as a result it can execute plannedactions
promptly at run time. The agent can makedifferent
1 This paper is an excerpt from: "Opportunistic control
of action" to appear in IEEESystems, Man,and
Cybernetics, 1993, in press. The work was supported by
NASAcontract NAG2-581 under DARPA
Order 6822
and by subcontract No. 71715-1 from Cimflex
Teknowledge under DARPAcontract No. DAAA2192-C-0028, and AFOSRgrant AFOSR-91-0131.
44
plans for different situations and change plans when
necessary. Weaknessesof the planning model include
the high computationalcost of planning [3] and the
inflexibility of plannedbehavior under unanticipated
run-time conditions [1]. In the worst case, an agent
encounters a critical condition and undertakes a
computationally expensive replanning task as time
runs out.
Emphasizing the planning model’s weaknesses and
ignoring its strengths, someresearchers firmly reject it
in favor of the "reactive model"[1, 5, 6]. Instead of
planning, an agent reacts to environmentalevents as
they unfold. Givenits goals, potential actions, and
relevant possible states of the environment,the agent
(or more often, a programmeror knowledgeengineer
working with an automatic programmer)constructs
set of goal-specific perception-actionrules and stores
them in a computationaUyefficient form. On each
iteration of a perceive-act cycle, the agent perceives the
current environmentalstate, triggers the associated
rule, and performsits action. Thus, the agent can
respond flexibly to manyrun-time conditions, in some
implementations with boundedcycle time.
Weaknesses
include the infeasibility of anticipating,
distinguishing, and storing all run-time contingencies,
precipitous failure whenunanticipated conditions do
occur, the infeasibility of perceiving completerelevant
environmental state in boundedtime, and the uncertain
global effectiveness, efficiency, and coherenceof locally
determined behavior. In the worst case, an agent
encounters a critical condition and thrashes about
executing locally appropriate, but globally
uncoordinatedactions, as time runs out.
To makesense of this debate, we must distinguish
betweencontrol modesand control mechanisms.Control
modesare behaviors that differentially emphasizetwo
orthogonal qualities: sensitivity to run-time events and
commitmentto specific actions. Control mechanisms
are data structures and processes used by an agent to
realize particular control modes.Weargue that an
intelligent agent should be able to operate in different
control modesin different situations--and, therefore,
needs a control mechanismthat can support different
modesand smooth transitions amongthem.
Dimensionsof
Control Modes
Commitment
to Specific Actions
Minimal
Maximal
Maximal
Reflex
Goal-Specific
Reaction
Maximal t
ExlremeSkiing
Trail Highly Uncertain
ImpromptuDownhill
Trail Unfamiliar
Strategic Modes
Uncertainty
of the
Environment
Sensitivityto
Run-TimeEvents
RecreationalSkiing
Trail Somewhat
Familiar
Practice Turns
Trail VeryFamiliar
OlympiaDownhill
Trail Memorized
Minimal
t
DeadReckoning
Open-Loop
Plan Execution
Minimal
Minimal
Maximal
Constraint Imposedby Goals
Dimensionsof
ControlSituations
Figure 1. Different control modesfor different control situations.
Figure I characterizes theoretical spaces of control
situations and control modes, superimposed on one
another.
First consider the theoretical space of control
situations. The left vertical dimension represents the
uncertainty of the task environment, including both
exogenously produced events and events influenced by
the agent’s own actions. At one extreme, the
environment is minimally uncertain, allows only
predictable relevant events to occur, and demands
minimal run-time monitoring. At the other extreme, the
environment is maximally uncertain, allows many
unpredictable relevant events to occur, and demands
maximumrun-time monitoring. The bottom horizontal
dimension represents the degree of constraint imposed
by the agent’s goals. At one extreme, the goals are
minimally constraining, allow many alternative
successful courses of action, and therefore demand
minimumcommitment to specific actions. At the other
extreme, the goals are maximally constraining, allow
only a single successful course of action, and therefore
demand maximumcommitment to specific actions. We
assume continuous gradation of both dimensions of the
situation space.
Nowconsider the theoretical space of control
modes. The right vertical dimension represents
sensitivity to run-time events. At one extreme, the
45
agent is minimally sensitive and executes prepared
actions open-loop. At the other extreme, the agent is
maximally sensitive and conditionalizes every action
on immediate environmental conditions. The upper
horizontal dimension represents the agent’s
commitmentto specific actions. At one extreme, the
agent’s commitmentis minimal and it is willing to
perform any of a very large class of actions during the
run-time interval. At the other extreme, the agent’s
commitment is maximal and it is willing to perform
exactly one action at each point during the run-time
interval. Weassume continuous gradation of both
dimensions of the mode space.
The two spaces are superimposed to indicate that
control modes in particular regions of the mode space
produce the best performance in corresponding regions
of the situation space. Weassume that adequate
monitoring and preparation resources are available and
that, other things equal, an agent prefers to spend
resources on preparation rather than on monitoring in
order to achieve the best possible run-time
performance. Alternatively, if we assume variations in
availability or cost of monitoring and preparation
resources, the superimposed spaces show how runtime performance may be degraded in order to
conserve those resources. Weuse the domain of skiing
to illustrate
the proposed mapping between control
situations and modes for the four theoretical extreme
modes:open-loopplan execution,goal-specificreaction, dead
reckoning,and reflex. However,becauseintelligent
agents rarely encounter the extreme situations
represented along the borders of the situation space
and rarely operate in the associated extreme control
modes, our skiing examples merely approximate these
extreme modesand, therefore, are positioned well
inside the borders of the space. Weuse the term
strategic modeto refer to the large interior region of
more practical control modesthat modulate the balance
of sensitivity and commitment
for situations
embodyingintermediate degrees of uncertainty and
constraint.
Whenan agent has maximallyconstraining goals to
achieve in a minimallyuncertain environment, openloop plan execution producesthe best run-time
performance. The agent can prepare a detailed, globally
coordinated plan in advanceand execute it reliably and
efficiently with minimalrun-time monitoring. For
example, consider an Olympicdownhill skiier:
In the daysbeforethe race, after walkingthe courseand
skiing twotimetrials, youwill haverun the race in your
headdozensif not hundredsof times. Memorizing
the
courseis crucial since at speedsaveraging70 miles an hour
there is little opportunityfor adjustmentandyouneed
every preciousinstant to react to the unknown
and
unforseeable,primarilyimperfectionsin the skiing surface.
[2]
The skiier achieves maximum
race speed by severely
restricting run-time sensitivity to the environmentand
conditionality of actions.
Whenan agent has maximallyconstraining goals to
achieve in a maximallyuncertain environment, goaldirected reaction producesthe best run-time
performance. The agent can commitin advance to
performspecific actions under specific conditions, but
must determine which of these conditions hold and,
therefore, whichof these actions to execute based on
run-time monitoring. For example, consider the same
Olympicskiier participating in an imprompturace on
an unfamiliar trail. Heor she prepares specific racing
actions for adjusting speed and course based on
relevant gross features of the environment,as well as
subtler adjustments to surface imperfections. At run
time, the skiier focuses on monitoringjust the relevant
features of the environmentin order to decide whichof
its small set of race-relevant actions to execute and
whento execute them. Presumablythe skiier will not
achieve Olympicspeeds because of the time required
for additional run-time monitoring and adjustment of
actions and because of possible weaknessesin the
global properties of the action sequence.
Notice that in both planning and reaction modes-which correspond to the planning and reactive models
in the literature-an agent commitsto specific
executable actions in advancein order to achieve highly
constraining goals. Planning modeexploits
predictability in the environmentto eliminate most
monitoringactions, to restrict action to a single globally
coordinated sequence, and thus to streamline run-time
performance. Withoutthis predictability, reaction mode
prepares a larger numberof actions for a larger number
of contingencies. In intermediate modesalong the
planning-reaction border, the agent modulatesthe
amountof run-time monitoring it performs and and the
associated conditionality of its actions. In all modeson
this border, however, the agent expendssubstantial
computational resources in advance to determine the
set of specific conditionsit will monitorandthe set of
specific actions it is willing to perform.Theagent is
maximallycommittedand unlikely to respond to an
unanticipated event or to perform an unanticipated
action.
But what about situations in whichthe agent has
more general goals and does not need to constrain its
actions so severely? Whatabout situations in whichthe
agent wishes to behavemoreflexibly at run time or to
conserve resources spent on preparation?
Whenan agent has minimally constraining goals to
achieve in a minimally uncertain environment, dead
reckoning modeproduces the best run-time
performanceat less preparation cost than planning
mode. The agent can makea very general plan and, at
run time, execute any sequenceof actions that
constitutes a valid instance of the plan. For example,
consider our skiier whenpracticing basic skills on a
very familiar trail. For example,he or she mightdecide
to practice tight turns downthe fall line at the top of the
trail, whichis knownto be narrow and steep, and big
carving turns at the bottom, whichis knownto be
broad and shallow. The skiier does not bother to
specify the exact course or speed, the numberof turns
of each type, or whenthe transition should occur. He or
she confidently instantiates this general plan at run
time in any of manyequally valid sequencesof specific
turns, adjustments,etc. with little need to monitorthe
environment.
Whenan agent has minimally constraining goals to
achieve in a maximallyuncertain environment, reflex
modeproduces the best run-time performanceat less
preparation cost than reactive mode.The agent is
sensitive to manyclasses of run-time conditions and is
willing to performany actions the actual conditions
elicit. For example,consider our skiier when
performing "extreme skiing" on a treacherous
untracked trail never before encountered. He or she
will be happyto reach the bottomof the trail without
serious mishap.It is essential that the skiier remain
sensitive to a very broad range of possible conditions
and be willing to perform an unusually broad range of
actions, including for examplestopping, walking,
climbing, and radioing for help. At every point on the
trail, the skiier executes whateveractions are necessary
to survive the immediatesituation.
Notice that in both dead reckoning and reflex
modes,the agent is positioned implicitly to performa
muchlarger numberof specific actions and action
sequences, comparedto planning and reactive modes,
respectively. Deadreckoning modeexploits
predictability in the environmentto committo large
classes of actions, whichcan be instantiated top down
as alternative acceptablecourses of action at run time.
Withoutthis predictability, reflex moderelies heavily
on run-time monitoringto detect conditions that elicit
actions. In the intermediate modesalong the dead
reckoning-reflex border, the agent modulatesthe
amountof run-time monitoring it performs and the
balance of top-downversus bottom-upcontrol of
actions it executes. In all modeson this border,
however, the agent expends minimal computational
resources in advanceby identifying arbitrarily large
classes of events to monitoror actions to perform. It
does not committo monitor any specific events or to
performany specific actions. In that sense, and in
contrast to planning and reaction modes,the agent is
always responding to "unanticipated" events or
performing "unanticipated" actions.
As mentionedabove, we refer to modesin the large
interior region of the modespace as strategic modes.
Dependingon the uncertainty of its environment, the
degree of constraint imposedby its goals, and the
availability of resources for preparation and
monitoring, the agent makesa plan of intermediate
specificity that moreor less constrains the actions it
intends to execute. The agent workstoward its goals by
executing actions that happento be triggered at runtime and happen to match its plan. For example, when
skiing fast downa somewhat
familiar trail, our skiier
makesa general plan to perform tight turns downthe
fall line, but monitorsthe trail at run time to decide
exactly which turns to makewhere, whether any
unplannedclasses of actions are required, etc.
Notice that, in strategic mode,the agent does not
waste resources makingplans that are unrealistically or
unnecessarily specific, but remainsopen to the
possibility of executingalternative actions and action
sequences that only becomeapparent at run time. On
the other hand, it is not undulysensitive to the
environment, but organizes its monitoring and actions
within the general structure of its strategic plan.
47
Strategic modeproduces good run-time behavior in a
broader range of morerealistic situations than any of
the extreme modesalong the borders of the space.
Giventhe situation-specific effectiveness of
alternative control modesand their differential
demandsfor advance and run-time resources, we need
a control mechanismthat will allow an intelligent agent
to operate in any modethroughout the twodimensionalspace, at its owndiscretion. The
mechanismsof planning and reactive models reported
in the literature are mode-specific;they basically
support control modesonly in the correspondingly
namedextreme regions of the two-dimensional mode
space. Whilethese mechanismsmight be useful for
special-purpose applications, they do not provide the
range of control modesrequired of intelligent agents.
On the other hand, both mechanismscontain elements
of a comprehensivecontrol architecture. Integrating
and extending these componentprocesses, we propose
an opportunisticcontrol model,whichis describedin the
longer paper from whichthis analysis is excerpted.
References
1. P. E. Agre and D. Chapman."Pengi: An
implementationof a theory of activity." In Proceedings
of the National Conferenceon Artificial Intelligence,
San Matao, Ca: MorganKaufmann,1987.
2. J. Bowermaster."Uphill Racer." The NewYork
Times Sunday Magazine, February 2, 1992.
3. D. Chapman."Planning for conjunctive goals."
Artificial Intelligence. 32, 333-378,1987.
4. R. E. Fikes and N. J. Nilsson. "STRIPS:A new
approach to the application of theoremproving to
problemsolving." 2, 198-208, 1971.
5. S. J. Rosenscheinand L. P. Kaelbling. "The synthesis
of digital machineswith provable epistemic
properties." In Proceedings of the Conferenceon
Theoretical Aspects of Reasoning about Knowledge,
MorganKaufman,pp. 83-98, 1986.
6. M. Schoppers. "Universal plans for reactive robots
in unpredictable environments." In Proceedings of the
International |tint Conferenceon Artificial
Intelligence, San Matao, Ca: MorganKaufmann,1987.