From: AAAI Technical Report SS-93-03. Compilation copyright © 1993, AAAI (www.aaai.org). All rights reserved. ICoordinating Control Modeswith Control Situations Barbara Hayes-Roth KnowledgeSystems Laboratory Stanford University 701 WelchRoad, B.dg C Palo Alto, CA94304 e-mail: bhr@ksl.stanford.edu Anintelligent agent has multiple goals, limited resources, and a dynamicreal-time task environment. Becauseit has manypredictable and unpredictable opportunities to act, the agent must"control" its actions-it must decide whichof manypossible actions to execute at each opportunity over a period of time. In a given situation, the agent’s control decisions determine whetherit achieves its goals, what resources it consumes,and what side effects it produces. More generally, the agent’s approachto control determines the rangeof situations it can handle,its flexibility in respondingto unanticipated conditions, and its coherence and comprehensibility to observers. How should an agent control its actions in order to achieve a high global utility? Under the well-known "planning model" [4], an agent plans its actions aheadof time. Givenits goals, potential actions, and current and expectedstates of the environment,the agent constructs a logical sequenceof specific executable actions, perhaps with some conditionality, whoseexecution will achieve the goals. Then it executes the planned actions. To guard against failures, the agent monitorsplan execution and, if necessary, replans. Althoughthere are different methodsfor constructing plans, all operationalize plans as executable programsof action. For present purposes, they are equivalent. The planning modelhas several strengths. Anagent can construct plans that have desirable global properties regarding goals, resources, and side effects, as well as coherenceand comprehensibility. Whilefollowing a plan, it can conserve resources that would otherwise be spent perceiving the environment, deciding among alternative actions, or instantiating specific executable actions; as a result it can execute plannedactions promptly at run time. The agent can makedifferent 1 This paper is an excerpt from: "Opportunistic control of action" to appear in IEEESystems, Man,and Cybernetics, 1993, in press. The work was supported by NASAcontract NAG2-581 under DARPA Order 6822 and by subcontract No. 71715-1 from Cimflex Teknowledge under DARPAcontract No. DAAA2192-C-0028, and AFOSRgrant AFOSR-91-0131. 44 plans for different situations and change plans when necessary. Weaknessesof the planning model include the high computationalcost of planning [3] and the inflexibility of plannedbehavior under unanticipated run-time conditions [1]. In the worst case, an agent encounters a critical condition and undertakes a computationally expensive replanning task as time runs out. Emphasizing the planning model’s weaknesses and ignoring its strengths, someresearchers firmly reject it in favor of the "reactive model"[1, 5, 6]. Instead of planning, an agent reacts to environmentalevents as they unfold. Givenits goals, potential actions, and relevant possible states of the environment,the agent (or more often, a programmeror knowledgeengineer working with an automatic programmer)constructs set of goal-specific perception-actionrules and stores them in a computationaUyefficient form. On each iteration of a perceive-act cycle, the agent perceives the current environmentalstate, triggers the associated rule, and performsits action. Thus, the agent can respond flexibly to manyrun-time conditions, in some implementations with boundedcycle time. Weaknesses include the infeasibility of anticipating, distinguishing, and storing all run-time contingencies, precipitous failure whenunanticipated conditions do occur, the infeasibility of perceiving completerelevant environmental state in boundedtime, and the uncertain global effectiveness, efficiency, and coherenceof locally determined behavior. In the worst case, an agent encounters a critical condition and thrashes about executing locally appropriate, but globally uncoordinatedactions, as time runs out. To makesense of this debate, we must distinguish betweencontrol modesand control mechanisms.Control modesare behaviors that differentially emphasizetwo orthogonal qualities: sensitivity to run-time events and commitmentto specific actions. Control mechanisms are data structures and processes used by an agent to realize particular control modes.Weargue that an intelligent agent should be able to operate in different control modesin different situations--and, therefore, needs a control mechanismthat can support different modesand smooth transitions amongthem. Dimensionsof Control Modes Commitment to Specific Actions Minimal Maximal Maximal Reflex Goal-Specific Reaction Maximal t ExlremeSkiing Trail Highly Uncertain ImpromptuDownhill Trail Unfamiliar Strategic Modes Uncertainty of the Environment Sensitivityto Run-TimeEvents RecreationalSkiing Trail Somewhat Familiar Practice Turns Trail VeryFamiliar OlympiaDownhill Trail Memorized Minimal t DeadReckoning Open-Loop Plan Execution Minimal Minimal Maximal Constraint Imposedby Goals Dimensionsof ControlSituations Figure 1. Different control modesfor different control situations. Figure I characterizes theoretical spaces of control situations and control modes, superimposed on one another. First consider the theoretical space of control situations. The left vertical dimension represents the uncertainty of the task environment, including both exogenously produced events and events influenced by the agent’s own actions. At one extreme, the environment is minimally uncertain, allows only predictable relevant events to occur, and demands minimal run-time monitoring. At the other extreme, the environment is maximally uncertain, allows many unpredictable relevant events to occur, and demands maximumrun-time monitoring. The bottom horizontal dimension represents the degree of constraint imposed by the agent’s goals. At one extreme, the goals are minimally constraining, allow many alternative successful courses of action, and therefore demand minimumcommitment to specific actions. At the other extreme, the goals are maximally constraining, allow only a single successful course of action, and therefore demand maximumcommitment to specific actions. We assume continuous gradation of both dimensions of the situation space. Nowconsider the theoretical space of control modes. The right vertical dimension represents sensitivity to run-time events. At one extreme, the 45 agent is minimally sensitive and executes prepared actions open-loop. At the other extreme, the agent is maximally sensitive and conditionalizes every action on immediate environmental conditions. The upper horizontal dimension represents the agent’s commitmentto specific actions. At one extreme, the agent’s commitmentis minimal and it is willing to perform any of a very large class of actions during the run-time interval. At the other extreme, the agent’s commitment is maximal and it is willing to perform exactly one action at each point during the run-time interval. Weassume continuous gradation of both dimensions of the mode space. The two spaces are superimposed to indicate that control modes in particular regions of the mode space produce the best performance in corresponding regions of the situation space. Weassume that adequate monitoring and preparation resources are available and that, other things equal, an agent prefers to spend resources on preparation rather than on monitoring in order to achieve the best possible run-time performance. Alternatively, if we assume variations in availability or cost of monitoring and preparation resources, the superimposed spaces show how runtime performance may be degraded in order to conserve those resources. Weuse the domain of skiing to illustrate the proposed mapping between control situations and modes for the four theoretical extreme modes:open-loopplan execution,goal-specificreaction, dead reckoning,and reflex. However,becauseintelligent agents rarely encounter the extreme situations represented along the borders of the situation space and rarely operate in the associated extreme control modes, our skiing examples merely approximate these extreme modesand, therefore, are positioned well inside the borders of the space. Weuse the term strategic modeto refer to the large interior region of more practical control modesthat modulate the balance of sensitivity and commitment for situations embodyingintermediate degrees of uncertainty and constraint. Whenan agent has maximallyconstraining goals to achieve in a minimallyuncertain environment, openloop plan execution producesthe best run-time performance. The agent can prepare a detailed, globally coordinated plan in advanceand execute it reliably and efficiently with minimalrun-time monitoring. For example, consider an Olympicdownhill skiier: In the daysbeforethe race, after walkingthe courseand skiing twotimetrials, youwill haverun the race in your headdozensif not hundredsof times. Memorizing the courseis crucial since at speedsaveraging70 miles an hour there is little opportunityfor adjustmentandyouneed every preciousinstant to react to the unknown and unforseeable,primarilyimperfectionsin the skiing surface. [2] The skiier achieves maximum race speed by severely restricting run-time sensitivity to the environmentand conditionality of actions. Whenan agent has maximallyconstraining goals to achieve in a maximallyuncertain environment, goaldirected reaction producesthe best run-time performance. The agent can commitin advance to performspecific actions under specific conditions, but must determine which of these conditions hold and, therefore, whichof these actions to execute based on run-time monitoring. For example, consider the same Olympicskiier participating in an imprompturace on an unfamiliar trail. Heor she prepares specific racing actions for adjusting speed and course based on relevant gross features of the environment,as well as subtler adjustments to surface imperfections. At run time, the skiier focuses on monitoringjust the relevant features of the environmentin order to decide whichof its small set of race-relevant actions to execute and whento execute them. Presumablythe skiier will not achieve Olympicspeeds because of the time required for additional run-time monitoring and adjustment of actions and because of possible weaknessesin the global properties of the action sequence. Notice that in both planning and reaction modes-which correspond to the planning and reactive models in the literature-an agent commitsto specific executable actions in advancein order to achieve highly constraining goals. Planning modeexploits predictability in the environmentto eliminate most monitoringactions, to restrict action to a single globally coordinated sequence, and thus to streamline run-time performance. Withoutthis predictability, reaction mode prepares a larger numberof actions for a larger number of contingencies. In intermediate modesalong the planning-reaction border, the agent modulatesthe amountof run-time monitoring it performs and and the associated conditionality of its actions. In all modeson this border, however, the agent expendssubstantial computational resources in advance to determine the set of specific conditionsit will monitorandthe set of specific actions it is willing to perform.Theagent is maximallycommittedand unlikely to respond to an unanticipated event or to perform an unanticipated action. But what about situations in whichthe agent has more general goals and does not need to constrain its actions so severely? Whatabout situations in whichthe agent wishes to behavemoreflexibly at run time or to conserve resources spent on preparation? Whenan agent has minimally constraining goals to achieve in a minimally uncertain environment, dead reckoning modeproduces the best run-time performanceat less preparation cost than planning mode. The agent can makea very general plan and, at run time, execute any sequenceof actions that constitutes a valid instance of the plan. For example, consider our skiier whenpracticing basic skills on a very familiar trail. For example,he or she mightdecide to practice tight turns downthe fall line at the top of the trail, whichis knownto be narrow and steep, and big carving turns at the bottom, whichis knownto be broad and shallow. The skiier does not bother to specify the exact course or speed, the numberof turns of each type, or whenthe transition should occur. He or she confidently instantiates this general plan at run time in any of manyequally valid sequencesof specific turns, adjustments,etc. with little need to monitorthe environment. Whenan agent has minimally constraining goals to achieve in a maximallyuncertain environment, reflex modeproduces the best run-time performanceat less preparation cost than reactive mode.The agent is sensitive to manyclasses of run-time conditions and is willing to performany actions the actual conditions elicit. For example,consider our skiier when performing "extreme skiing" on a treacherous untracked trail never before encountered. He or she will be happyto reach the bottomof the trail without serious mishap.It is essential that the skiier remain sensitive to a very broad range of possible conditions and be willing to perform an unusually broad range of actions, including for examplestopping, walking, climbing, and radioing for help. At every point on the trail, the skiier executes whateveractions are necessary to survive the immediatesituation. Notice that in both dead reckoning and reflex modes,the agent is positioned implicitly to performa muchlarger numberof specific actions and action sequences, comparedto planning and reactive modes, respectively. Deadreckoning modeexploits predictability in the environmentto committo large classes of actions, whichcan be instantiated top down as alternative acceptablecourses of action at run time. Withoutthis predictability, reflex moderelies heavily on run-time monitoringto detect conditions that elicit actions. In the intermediate modesalong the dead reckoning-reflex border, the agent modulatesthe amountof run-time monitoring it performs and the balance of top-downversus bottom-upcontrol of actions it executes. In all modeson this border, however, the agent expends minimal computational resources in advanceby identifying arbitrarily large classes of events to monitoror actions to perform. It does not committo monitor any specific events or to performany specific actions. In that sense, and in contrast to planning and reaction modes,the agent is always responding to "unanticipated" events or performing "unanticipated" actions. As mentionedabove, we refer to modesin the large interior region of the modespace as strategic modes. Dependingon the uncertainty of its environment, the degree of constraint imposedby its goals, and the availability of resources for preparation and monitoring, the agent makesa plan of intermediate specificity that moreor less constrains the actions it intends to execute. The agent workstoward its goals by executing actions that happento be triggered at runtime and happen to match its plan. For example, when skiing fast downa somewhat familiar trail, our skiier makesa general plan to perform tight turns downthe fall line, but monitorsthe trail at run time to decide exactly which turns to makewhere, whether any unplannedclasses of actions are required, etc. Notice that, in strategic mode,the agent does not waste resources makingplans that are unrealistically or unnecessarily specific, but remainsopen to the possibility of executingalternative actions and action sequences that only becomeapparent at run time. On the other hand, it is not undulysensitive to the environment, but organizes its monitoring and actions within the general structure of its strategic plan. 47 Strategic modeproduces good run-time behavior in a broader range of morerealistic situations than any of the extreme modesalong the borders of the space. Giventhe situation-specific effectiveness of alternative control modesand their differential demandsfor advance and run-time resources, we need a control mechanismthat will allow an intelligent agent to operate in any modethroughout the twodimensionalspace, at its owndiscretion. The mechanismsof planning and reactive models reported in the literature are mode-specific;they basically support control modesonly in the correspondingly namedextreme regions of the two-dimensional mode space. Whilethese mechanismsmight be useful for special-purpose applications, they do not provide the range of control modesrequired of intelligent agents. On the other hand, both mechanismscontain elements of a comprehensivecontrol architecture. Integrating and extending these componentprocesses, we propose an opportunisticcontrol model,whichis describedin the longer paper from whichthis analysis is excerpted. References 1. P. E. Agre and D. Chapman."Pengi: An implementationof a theory of activity." In Proceedings of the National Conferenceon Artificial Intelligence, San Matao, Ca: MorganKaufmann,1987. 2. J. Bowermaster."Uphill Racer." The NewYork Times Sunday Magazine, February 2, 1992. 3. D. Chapman."Planning for conjunctive goals." Artificial Intelligence. 32, 333-378,1987. 4. R. E. Fikes and N. J. Nilsson. "STRIPS:A new approach to the application of theoremproving to problemsolving." 2, 198-208, 1971. 5. S. J. Rosenscheinand L. P. Kaelbling. "The synthesis of digital machineswith provable epistemic properties." In Proceedings of the Conferenceon Theoretical Aspects of Reasoning about Knowledge, MorganKaufman,pp. 83-98, 1986. 6. M. Schoppers. "Universal plans for reactive robots in unpredictable environments." In Proceedings of the International |tint Conferenceon Artificial Intelligence, San Matao, Ca: MorganKaufmann,1987.