Layered Cognitive Architectures: Where Cognitive Science Meets Robotics Pete Bonasso*, Mike Freed** *NASA Johnson Space Center, TRACLabs 1012 Hercules, Houston TX, 77058, USA r.p.bonasso@jsc.nasa.gov **NASA Ames Research Center, Cognition Lab Moffett Field, CA, 94035, USA mfreed@mail.arc.nasa.gov Abstract Although overlooked in recent years as tools for research, cognitive software architectures are designed to bring computational models of reasoning to bear on real-world physical systems. This paper makes a case for using the executives in these architectures as research tools to explore the connection between cognitive science and intelligent robotics. Top Middle Deliberative (Planning/Scheduling) Executive (Task level Sequencing) Cognitive Architectures Three-layer intelligent control architectures (Gat, 1998) are by now well-known and are taught in many graduate level AI courses (e.g., (Murphy, 2000)). Usually presented as a framework for organizing software, and reported as engineering tools for bringing AI to bear on robotic applications (e.g., (Bonasso et al., 2003; Bonasso et al., 1997b; Freed et al., 2004), an important aspect of these architectures is the fact that they were motivated to integrate computational models of reasoning with realworld control of physical devices. In particular, the executive layer serves as a syntactic and semantic differential between the representations at the top layer and those of the continuous low-level control. Our claim is that these executives, in combination with representative application simulations, provide a common proving ground for cognitive science and AI robotics research. The executive is shown in the organization of a threelayer cognitive architecture, in Figure 1. Typically, deep reasoning in the way of generative planning and scheduling executes at the top tier, predicting tasks required to achieve control objectives given an initial situation, and assigning to them available resources. The lowest tier transforms primitive actions into continuous control and monitoring activity for the hardware. The executive tier takes highlevel goals from the top tier (or from a human agent), decomposes them from a library of reactive-plans and ultimately transforms them into primitives for the bottom tier. Control processing is closed loop at all tiers, permitting execution monitoring and reactive re-planning and reconfiguration in response to changes in the environment or the robotic hardware. Gluing Cognition to Sensing and Action The three layers correspond to what Newell termed the neural band, the cognitive band and the rational band Bottom Low level control Low level control Low level control • Event driven • Computationally intensive • Large time constants • Converts goals to sequences of continuous activities • Interprets sensing as events • Intermediate time constants • Continuous • Sense/act processes • Small time constants Hardware Hardware Hardware Figure 1 Three-layer Architecture Cognitive (Newell, 1990) . In Newell’s theory, the cognitive band has multiple levels of knowledge processing from symbol access to goal attainment. And indeed, the executives in layered cognitive architectures must perform this range of cognitive processing in order to translate the form and intent of the top level (or human requests) to the continuous activity of the bottom level, as well as interpret the sensory events from the bottom level in the context of the goals of the top level. One of the first of these executives was the RAPs system (Firby, 1989; Firby, 1999). A typical RAP procedure, shown below, defines behavior for using a camera to visually locate and then move to a human. The applicability of the behavior depends on the whether the human’s location is already known, in which case the use of the camera is not required, and on the availability of the camera resource if needed. Here we see the tie from a state-based goal to the sensing and acting functions of the robot, together with a first order conjunctive query. RAPs is now embodied in a larger framework including a concept memory and a language parser for use in dialogue management (Fitzgerald and Firby, 1998). Thus, it is not only a control framework but also a human-robot interaction framework. PDL syntax is similar to RAPs (the wait-fors are preconditions rather than post-conditions), but Apex has built-in algorithms for assigning and scheduling resources, such as hands and eye gaze. As Apex executes, tasks continuously compete for resources, depending on their precedence (rank), importance, urgency and cost of interruption. In this way, Apex can model, for example, how a human driver shifts from a standard eye-scan to focusing on an accident up ahead and then back to standard scanning. The power of using these frameworks for execution lies in the fact that execution is a problem area where looking at humans is likely to shed light on how to better do AI robotics, and at the same time is an area where success on the AI robotics side should translate to a better understanding of human cognition. body of procedural knowledge and process it in fairly straightforward ways. In contrast, deliberative AI mechanisms make assumptions about the kind of knowledge employed – usually highly granular, purposeindependent models – and the way it is processed, e.g., computationally demanding search. We contend that these are not very plausible as human cognitive models Then too, the functionality requirements of these executives are driven by such factors as the need to exploit regularities in the task environment, the need to cope with uncertainty about current state, past state, future state, action outcomes, task resource requirements, task duration, etc., as well as time constraints imposed by the task and/or environment. These seem likely to be prime shaping factors in the development of human cognition. In contrast, deliberative functionality requirements are generally driven by the need to solve well defined but difficult, highly coupled problems under fairly strong assumptions of certainty and under relatively little time-pressure. It's not clear that such capabilities are substantial drivers in the development of human cognition, nor, for that matter, that people are very proficient at this kind of cognition. On a practical level, the kinds of tasks executives were designed and used for correspond to what in humans would be seen as commonplace and natural behaviors (for an early reference to this notion see (Agre, 1988)). This aspect makes an executive a good basis for studying and modeling natural decision-making and behavior tasks. In contrast, deliberative algorithms are designed for the kinds of tasks people do infrequently and often need help with, e.g., playing chess, deciding utilities among alternatives. Furthermore, the kinds of tasks dealt with by these executives are always grounded in the real world. So, the agent must reason about real world phenomena such as geometry, action and reaction, and the translation of signals into meaning at the cognitive level. We contend that human cognition is itself grounded in and informed by sensory experience, something more deliberative models of cognition tend to ignore. This latter point is brought out in the use of the 3T layered control architecture, which employs the RAPs executive (Bonasso et al., 1997a). In over a decade of using 3T in robotic and life-support applications, only twice was the deliberative layer called into play ((Schreckenghost et al., 2002), (Bonasso et al., 2003)). This was because in most applications, a domain theory complete enough to make use of generative planning or sophisticated scheduling was not available. However, the set of procedures provided by users, together with the sensing and acting skills for the hardware was usually sufficient to achieve successful intelligent control. More Correct Descriptions of Human Cognition Reference Problems (define-rap (move-to ?agent) (succeed (found ?agent)) (method use-camera (context (and (available ?sensor) (is-a ?sensor camera) (class-of ?agent human)) (task-net (sequence (t1 (scan-for-human ?agent ? sensor => ?loc) (t2 (track-human ?loc)) (t3 (move ?loc) (wait-for (at-loc ?loc ?epsilon) :succeed (found ?agent)) (wait-for (lost-track) :fail)))))) (method human-located ... A newer executive, called Apex, inspired by RAPs, focuses on achieving human-level task management (Freed et al., 2003) in order to support evaluation and design of human-machine systems (Freed et al., 2003), and analysis of how newly introduced technologies will affect human operators. Equally, it is used to provide human-like task management and control competence to robotic agents. An example procedure in the Apex Procedure Description Language (PDL) is shown below: (procedure (index (shift manual-transmission to ?gear using ?hand)) (profile ?hand) (step s1 (grasp stick with ?hand)) (step s2 (determine-target-gear-position ?gear => ?position)) (step s3 (move ?hand to ?position) (waitfor ?s1 ?s2)) (step s4 (terminate) (waitfor ?s3))) There are a number of reasons we feel that solutions to AI problems at the executive level are likely to be more correct descriptions of human cognition than solutions obtained with more deliberative AI mechanisms. The first is that the executives we have described rely on a large With both Apex and RAPs, simulations of reference problems are available, either from application domains or in the case of Apex, provided as part of the software system (http://human-factors.arc.nasa.gov/apex). We’ve used these simulations to develop a dialog management system (Bonasso, 2002), and to study human interaction with a variety of systems ranging from air traffic control work stations (Freed and Remington, 1998) to automatic teller machines (Freed et al., 2003). As well, researchers can develop simulations to support their own research and then test cognitive theories against those simulations using the execution framework. In research we are doing on advanced monitoring, we are Figure 2 A Remotely Operated vehicle Exploring a Black Smoker positing a representative problem domain – robotic platform, environment and task scenario -- that we believe will force us to examine key issues in the monitoring area. In particular, we are defining a two-armed, stereo vision, underwater rover, designed to search for deep ocean thermal vents (black smokers – see Figure 2), take chemical samples and temperature profiles, and report findings to the surface. Using Apex tools, we are building a simulation for this task scenario in which to test Apex models of advanced monitoring capabilities for the next generation of intelligent robot control. We recommend both cognitive scientists and AI roboticists investigate this research paradigm in order more closely tie together the results of both fields. References Agre, P. 1988. The Dynamic Structure of Everyday Life. MIT AI Laboratory, MIT. Bonasso, R. P. 2002. Using Discourse to Modify Control Procedures. Papers from the AAAI Fall Symposium on Human Robot Interaction AAAI Press, Tech Report FS-02-03: 9-14. Bonasso, R. P., Firby, J. R., Gat, E., Kortenkamp, D., Miller, D. P., and Slack, M. G. 1997a. Experiences with an Architecture for Intelligent, Reactive Agents. Journal of Experimental and Theoretical Artificial Intelligence 9: 237-256. Bonasso, R. P., Kortenkamp, D., and Thronesbery, C. 2003. Intelligent Control of a Water Recovery System: Three Years In The Trenches. Artificial Intelligence 24 (1): 19-44. Bonasso, R. P., Kortenkamp, D., and Whitney, T. 1997b. Using a Robot Control Architecture to Automate Shuttle Procedures. In Proceedings of Ninth Conference on Innovative Applications of Artificial Intelligence (IAAI), 949-956. Providence, RI. Firby, J. R. 1989. Adaptive Execution In Complex Dynamic Worlds. Doctoral, Computer Science, Yale. Firby, J. R. 1999. The RAPS Language Manual, Neodesic, Inc., Chicago. Fitzgerald, W. and Firby, J. R. 1998. The Dynamic Predictive Memory Architecture: Integrating Language with Task Execution. In Proceedings of Proceedings of IEEE Symposia on Intelligence and Systems. Washington, DC. Freed, M., Harris, R., and Shafto, M. 2004. Human Interaction Challenges is UAV-based autonomous surveillance. In Proceedings of the AAAI Spring Symposium. Stanford, CA. Freed, M., Matessa, M., Remington, R., and Vera, A. 2003. How Apex Automates CPM-GOMS. In Proceedings of 5th International Conference on Cognitive Modeling. Bamburg, Germany. Freed, M. and Remington, R. 1998. A conceptual framework for predicting errors in complex humanmachine environments. In Proceedings of Proceedings of1998 Meeting of the Cognitive Science Society. Madison, Wisconsin. Gat, E. 1998. Three-Layer Architectures. In Mobile Robots and Artificial Intelligence, Kortenkamp, D., Bonasso, R. P., and Murphy, R., Eds. Menlo Park, CA: AAAI Press, 195-210. Murphy, R. 2000. Introduction to AI Robotics. Cambridge, MA: MIT Press. Newell, A. 1990. Unified Theories of Cognition. Cambridge, MA: Harvard University Press. Schreckenghost, D., Thronesbery, C., Bonasso, R. P., Kortenkamp, D., and Martin, C. E. 2002. Intelligent Control of Life Support for Space Missions. IEEE Intelligent Systems 17 (5): 24-31.