Integrating Perception, Language and Problem Solving in a Cognitive Agent for a Mobile Robot* D. Paul Benjamin Computer Science Department Pace University Deryle Lonsdale Department of Linguistics Brigham Young University Damian Lyons Computer Science Department Fordham University Abstract We are implementing a unified cognitive architecture for a mobile robot. Our goal is to endow a robot agent with the full range of cognitive abilities, including perception, use of natural language, learning and the ability to solve complex problems. The perspective of this work is that an architecture based on a unified theory of robot cognition has the best chance of attaining human-level performance. This agent architecture is based on an integration of three theories: a theory of cognition embodied in the Soar system, the RS formal model of sensorimotor activity and an algebraic theory of decomposition and reformulation. These three theories share a hierarchical structure that is the basis of their integration. The three component theories of this architecture have been implemented and tested separately and their integration is currently underway. This paper describes these components and the overall architecture. Introduction The current generation of behavior-based robots is programmed directly for each task. The robot agents are written in a way that uses as few built-in cognitive assumptions as possible, and as much sensory information as possible, in a manner similar to Brooks, *This project is partially supported by a grant from the Department of Energy. Version of July 30, 2003 1 who proposed in the mid 1980’s an approach that emphasized fast, reactive actions and the absence of explicit models. The lack of cognitive assumptions gives them a certain robustness and generality in dealing with unstructured environments. However it is proving a challenge to extend the competence of such systems beyond navigation and some simple tasks [67]. Complex tasks that involve reasoning about spatial and temporal relationships require robots to possess more advanced mechanisms for planning, reasoning, learning and representation. One approach to this issue is to equip a robot agent with a behavior-based “reactive” module coupled to a “deliberative” module that engages in cognitive reasoning – these are called hybrid reactive/deliberative systems [5]. Many such systems consist of a symbolic planner, or planner and scheduler, as the deliberative component. The planner formulates long-term and more abstract tasks, while the behavior-based system carries out the steps in the task in a robust fashion [5]. For example, Lyons et al. [55,56] describe a hybrid system based on the concept of iterative and forced relaxation of assumptions about the environment. The deliberative module generates and maintains a behavior-based system that is as robust as possible, given the tradeoff between the time available to the planner and any previously sensed assumption failures. The reactive system is implemented using the port-automata based RS model [57] and ITL [2] is used for the planning component. Action schemas in ITL are linked to prepackaged RS automata networks. The planner reasons about instantiating and connecting networks to achieve its goals. As a plan is created, the planner incrementally transmits the instructions to the reactive component, Cognitive Robots which instantiates automata networks and begins the execution of the plan concurrently with the ongoing elaboration of the plan. These networks include monitoring automata whose job is to report back to the planner when components of the plan finish execution, or when sensed conditions change to such an extent that the plan is no longer valid. The system could then respond to the violation of assumptions about the environment by replanning, again concurrent with the ongoing automata execution, and upgrade components of the plan in a safe fashion. This hybrid approach endows a robot with a more sophisticated planning ability than either deliberative planning or reactive behavior provides alone; however, it does not address either learning or representation issues. Our objective is to take this approach an important further step towards true cognitive behavior. A planner can only consider plans based on the action and sensing repertoire it possesses. In this sense, it is as dependent for robust behavior on its designer as is a behavior-based system. Furthermore, a second limitation is that the planner is constrained by the environment built in by its designer (or that it learned.) Each formulation makes certain results easy to obtain and makes others difficult. As a result of these two limitations, robots can usually do a few things very well, and everything else poorly. This limitation is typical of planning systems and has led to research on systems that can autonomously expand or adapt their repertoire of planning components. Two such approaches are to add a learning capability so that the system can learn new plan components, or to add an ability to change formulations so that the system can transform a task into a much more easily solved form. Much research has added a learning module or a change of representation module to an existing planner in an attempt to construct a planner of greater power and generality. Typical examples of this line of research are [3,30,31,36,47,80]. Our approach is to add these capabilities to a planner by integrating it with a system with more general cognitive abilities. We are engaged in the development of a deliberative module that is capable of carrying out experimental attempts to solve a task with the objective of discovering the underlying structure of the task. This structure consists of local symmetries and invariants that are present in the percepts and motions of the robot as well as in the environment. The agent uses this structure both to create new high-level sensors [35] and to reformulate knowledge so that the task becomes easier to solve. Our approach is to build a complete cognitive robotic architecture by merging RS [57,58], which provides a model for building and reasoning about sensory-motor schemas, with Soar [38], a cognitive Version of July 30, 2003 2 architecture that is under development at a number of universities. RS possesses a sophisticated formal language for reasoning about networks of port automata and has been successfully applied to robot planning [56]. Soar is a unified cognitive architecture [66] that has been successfully applied to a wide range of tasks including tactical air warfare [71]. This cognitive robotic agent is currently being implemented and tested on Pioneer P2 robots (http://robots.activmedia.com/) in the Pace University Robotics Lab ( http://csis.pace.edu/robotlab ). Operation Our cognitive robot architecture operates in a top-down, goal-directed fashion, both for motor planning and sensory processing. A human gives a task to the robot via a spoken natural language utterance which is processed by the speech recognition system and subsequent linguistic processing. If necessary, a clarification dialogue unfolds between the robot and the human. The deliberative planning component generates an abstract plan, which it refines both before and during execution. Plans are represented as a hierarchy of schemas composed of sensory and motor components. Sensory processing is also goal-directed, operating in a goal-directed fashion to detect preconditions of plan components, monitor their invariants, and check on their postconditions. The vision is “active” in the sense of [23]. To support this approach, the representation of sensory data is also organized as a hierarchy in which the levels are of different granularities, with the coarsest granularity at the top of the hierarchy. For example, the visual input is organized in this way with the top level containing only the largest segments and lower levels adding smaller segments, in a manner similar to [83]. The robot generates this hierarchy beginning with the coarsest level and adds detail over time; this permits the robot to make decisions and to communicate with humans and other robots in an “anytime” fashion, using whatever information has been generated instead of waiting for the entire image to be completely processed. Active perception guides this process by focusing computation on certain portions of the image, to add detail that is necessary to check sensory goals and eliminate hypotheses. Top-down processing is predictive in nature: it generates hypotheses about the world and checks them. The advantage of this approach is that it can be extremely fast because only a few aspects of the world need to be checked and much low-level data can be ignored. To be effective, this approach requires that the set of hypotheses be small, which further requires that the environment be decomposable into small units. For Cognitive Robots effective problem solving, these small units should be functionally related, which permits the problem solver to recognize the situation quickly in terms of what should be done. It is known that humans can solve hard problems by organizing their perceptions this way [27]. From this perspective, one of the main goals of the problem solver is to represent the world in terms of such small functional units. Our theory of decomposition is well suited to this purpose. Decomposition and Reformulation The complexity of perception and action arises in part from the need for a robot to perceive and act simultaneously on several space and time scales, ranging from small scales of millimeters and milliseconds to large scales of kilometers and hours. Each of these scales seems to possess a complex structure, and each can potentially constrain any other. Yet a robot must perceive and act in real time. This topic has long been a central focus of work in artificial intelligence and robotics. Often, research into the analysis and synthesis of intelligent behavior has been hindered by the belief that complex behavior requires complex mechanisms that explicitly encode the behaviors. Recent work in dynamical systems theory suggests otherwise. The study of dynamical systems is a new field of mathematics with a very wide applicability throughout science. Very simply formulated dynamical systems can exhibit a wide range of complex behaviors that are not evident from their formulations. Even the simple pendulum turns out to possess chaotic behavior! This relationship between simple systems and complex behaviors poses a difficult hurdle, as sophisticated mathematics may be required to understand the behaviors of even very simply formulated nonlinear systems. However, artificial intelligence and robotics focus on the inverse problem: how to formulate a system that can express a given range of complex behaviors. Thus, for AI and robotics this relationship between simple systems and complex behaviors is attractive, as it holds the potential to greatly reduce the complexity of robotic systems by finding methods for building simply formulated systems that generate complex, intelligent behavior in a complex environment. Our decomposition algorithm finds a task’s hierarchy of local neighborhoods, and reformulates the task in terms of local coordinate systems for these neighborhoods. The relevant properties for system formulation are those properties that hold locally everywhere within the neighborhoods. Simple programs can measure these local properties and maneuver within local neighborhoods. Under certain conditions, the local neighborhood structure becomes a substitution tiling [75]. In these cases, self-similarity of the neighborhoods Version of July 30, 2003 3 permits the simple programs to be effective at several scales, creating a hierarchical structure that is simple and local at each scale, but capable of generating very complex global behaviors. Fortunately, substitution tilings appear to be ubiquitous in the physical world [75]. Local coordinate systems define the perceptual spaces for the robot. The axes in each coordinate system correspond to a set of features for describing the state space. Associated with the local symmetries of the local coordinate system are invariant subspaces of states (states on which particular combinations of features are invariant.) Motion in the state space decomposes into motion within invariant subspaces and motion normal to those subspaces. This decomposition reduces the complexity of motion planning, and also creates the goal for the perceptual system: the robot must perceive the visual symmetries and monitor the invariants of the motions. The features determined in this way are the semantic components that form the basis of communication. Thus, this approach integrates perception, action and communication in a principled way by connecting their local symmetries and invariants. Our previous work into this structure has shown that it exists in motion, perception, communication and planning. For example, the motions of a robot on a small hexagonal grid can be computed and analyzed, and then copies of the small grid can be composed in such a way that the properties scale up [19, 20]. The structure of local neighborhoods replicates on larger scales. Features scale up, with features of the basic neighborhood appearing in larger neighborhoods at larger scales. By planning locally within neighborhoods of different scales, the robot can follow a very complex path without performing any complex planning; the complexity of the path reflects the complexity of the environment rather than the complexity of the planner [8]. Perception exhibits this self-similar structure, too. Barnsley [6] has shown that this self-similar structure exists in images and has developed fractal compression methods that are commercially viable competitors in the areas of image processing and communications. A number of research and commercial efforts are underway in this area, including those at the University of Waterloo, INRIA and Ecole Polytechnique de l'Universite de Montreal. Natural language is another example of the interaction of top-down processing (overall discourse meaning, major syntactic and semantic structures) and bottom-up processing (word meaning, speech recognition). The algebraic basis of natural language and the use of local symmetries in its analysis are well understood [29,39,68]. Constraint satisfaction problems can also Cognitive Robots exhibit this self-similar structure [10]. One such problem is the tiling of the plane using Penrose tiles. The solutions of the Penrose tiling placement problem are typically found by combinatorial search, but Steinhardt [78] has shown that nine copies of the local pattern shown at left in (a) below can be replicated and overlapped to give the pattern in (c) at right. This larger pattern is self-similar to the original pattern using the substitution given in (b). Steinhardt shows that all Penrose tilings arise in this way, and thus Penrose tilings are locally everywhere isomorphic to the pattern in (a), at many scales. The implications of this structure for planning and problem solving are clear: instead of using combinatorial search in the space of possible solutions, a robot can search for a self-similar local neighborhood structure in the constraints of the task and the environment. When such a structure exists, the robot can exploit it to solve large, complex planning problems in an efficient hierarchical manner. The Cognitive Architecture Our robot cognitive architecture exhibits a hierarchy of sensory-motor modules in all facets of its operation, from abstract deliberative planning to realtime scheduling and reaction. As described above, all these modules are formulated in terms of symmetries and invariants. The implementation of this model is accomplished by integrating two existing systems: RS and Soar. RS is a formal model of robot computation that is based on the semantics of networks of port automata. A port automaton (PA) is a finite-state automaton equipped with a set of synchronous communication ports [58]. Formally we can write a port automaton P as: P = ( Q, L, X , , ) where Q is the set of states L is the set of ports X = ( Xi | i L ) is the event alphabet for each port Let XL = { (i, Xi) | i L } i.e., a disjoint union of L and X : Q XL 2Q is the transition function Version of July 30, 2003 4 = (i | i L) i : Q Xi is the output map for port i 2Q is the set of start states RS introduces a vocabulary to specify networks of port automata. A network of processes is typically built to capture a specific robot sensorimotor skill or behavior: sensory processes linked to motor processes by data communication channels and sequenced using process composition operations. RS process composition operations are similar to the well-known CSP algebraic process model of [74]. However, unlike CSP, in RS the notation can be seen as simply a shortcut for specifying automata; a process is a port automaton, and a process composition operation is two automata connected in a specific way. Composition operations include sequential, conditional and disabling compositions [57,58]. To analyze a network of processes, it is necessary to calculate how that network changes as time progresses and processes terminate and/or are created. This is the process-level equivalent of the PA transition function, combined with the axioms that define port-to-port communication. This Process Transition function can be used to analyze the behavior of RS networks [59]. Soar is a cognitive architecture originally developed at CMU and undergoing continuing development at a number of locations, including the University of Michigan and the Information Sciences Institute. Knowledge in Soar is represented as operators, which are organized into problem spaces. Each problem space contains the operators relevant to some aspect of the system's environment. In our system, some problem spaces contain operators describing the actions of the robot for particular tasks and subtasks; different coordinate systems (representations) are in different problem spaces. Other problem spaces contain operators governing the vision system, including operators about how to control the camera, operators for selecting software to process the visual data, and operators for the creation and modification of local coordinate systems in the visual data. The reformulation method is itself represented as operators in a problem space. Newly created coordinate systems are placed in new problem spaces. The basic problem-solving mechanism in Soar is universal subgoaling: every time there is choice of two or more operators, Soar creates a subgoal of deciding which to select, and brings the entire knowledge of the system to bear on solving this subgoal by selecting a problem space and beginning to search. This search can encounter situations in which two or more operators can fire, which in turn causes subgoals to be created, etc. When an operator is successfully chosen, the corresponding subgoal has been solved and the entire Cognitive Robots solution process is summarized in a single rule, called a chunk, which contains the general conditions necessary for that operator to be chosen. This rule is added to the system's rule set, so that in similar future situations the search can be avoided. In this way, Soar learns. The Soar publications extensively document how this learning method speeds up the system's response time in a manner that accurately models the speedup of human subjects on the same tasks. Universal subgoaling reflects the self-similarity of problem solving. This is the main reason for the choice of Soar as the software for the deliberative component of the cognitive architecture - the self-similar structure of Soar's problem solving matches the selfsimilar structure of the environment. In addition, the Soar research community is implementing a wide range of aspects of cognition in Soar, including natural language [53,73], concept learning [62,82] and emotion [61]. The strengths of RS include its formal mechanism for combining sensing and motion, its ability to reason about time, its combination of deliberative planning and reactive behavior, and its real-time parallel scheduling and execution environment. Its weaknesses are the lack of a mechanism for autonomous formation of sensors or actuators, and the lack of a model for implementation of cognitive abilities. Soar provides an integrated cognitive model with a full range of cognitive abilities, including perception, deliberative planning, reaction, natural language, learning, and emotion. On the other hand, Soar lacks parallelism and a temporal mechanism, which severely hampers its usefulness in robotics. By integrating RS and Soar, we are creating an architecture that will: process a wide range of modes of perceptual input, provide a complete range of cognitive abilities, including language and learning, reason about time and resources, implement parallel, reactive behaviors, and learn new high-level sensors and behaviors. The reactive component is specified and implemented in RS, and provides an execution environment for automata networks. The basic sensory system is always on and is hierarchical, shallow and bottom-up. The deliberative module can instantiate automata networks on demand, and can add networks to the sensory system that are task-specific, deep, and topdown. The deliberative component is implemented in Soar. It receives information sent by the executing reactive network, and can instantiate or stop networks. Soar reasons about the automata networks that are used to represent the environment and the task, and generates abstract plans that are refined into automata networks, creating new networks as necessary. Soar’s learning mechanism can form new automata networks by chunking, as well as learn new plan components and learn search control knowledge to improve planning and reaction time. Deliberative System: Soar Symbolic operators searching state space Adds sensory and motor networks that are task-specific, deep, and top-down. Modifies and removes networks on demand. Basic Cognitive Primitives Reactive System: RS Networks of port automata: parallel, with temporal properties Sensory Schemas Send perceptual data to Soar Sensory system: bottom-up, hierarchical, shallow, always on Version of July 30, 2003 Motor Schemas Motor system: schedules and executes port automata networks 5 Cognitive Robots Communication between humans and the robot will be assured by a natural language system implemented within the Soar cognitive modeling framework [43,44]. The system supports spoken human language input via an interface with Sphinx, a popular speech recognition engine [28,41]. Textual inputs representing best-guess transcriptions from the system will be pipelined whole utterances into the main natural language component. The NL component processes each word individually and performs the following operations in order to understand the input text: lexical access (which retrieves morphological, syntactic, and semantic information for each word from its lexicon) [50,52] syntactic model construction (linking together pieces of an X-bar parse tree) [46] semantic model construction (fusing together pieces of a lexical-conceptual structure) [51,73] discourse model construction (extracting global coherence from individual utterances) [34,35] Each of these functions is performed either deliberately (by subgoaling and via the implementation of several types of lower-level operators) or by recognition (if chunks and pre-existing operators have already been acquired via the system's learning capability). Three types of structure resulting from the structure will be important for subsequent processing: the X-bar model of syntax, the LCS model of semantics, and the discourse model. The depth and breadth of the interface between these structures and the robot's incoming percepts and outgoing actions will be explored during the project. The top-level language operators mentioned above can be sequenced in an agent's behavior with each other and with any other non-language task operators, providing a highly reactive, interruptible, interleavable real-time language capability [64]. In agent/human dialogues these properties are crucial [1,24]. As is typically implemented for human/robotic interaction, our system uses dialogue-based discourse interface between the robot and the NL component. The system's discourse processing involves aspects of input text comprehension (including referring to the prior results of syntax and semantics where necessary) and generation (i.e. the production of linguistic utterances). Both applications of discourse processing will involve planning and plan recognition, linguistic principles, realworld knowledge, and interaction management. The robotics domain will require a limited command vocabulary size of some 1500 words initially, and utterances will be comparatively straightforward. This will also improve the recognition rate of the speech Version of July 30, 2003 6 engine and support more diverse interaction environments. Possible communication modalities include [54]: (1) a command mode, or task specification, where the robot is capable of understanding imperative sentences and acting upon them (e.g. “Turn ninetydegrees to the left.” “Close the door.”) Other, more difficult types of utterances include (2) execution monitoring, where the robot takes instructions as part of a team, (3) explanation/error recovery to help the robot adapt and cooperate with changing plans, and (4) updating the environment representation to allow a user to aid the robot in maintaining a world model beyond its own perceptions. To begin with, the robot will understand imperative utterances, but other types of comprehension capabilities, as well as language generation, will be incrementally added. NL-Soar implements a discourse recipe-based model (DRM) for dialogue comprehension and generation [33]. It learns the discourse recipes, which are generalizations of an agent’s discourse plans, as a side effect of dialogue planning. This way, plans can be used for comprehension and generation. If no recipe can be matched, the system resorts to dialogue plans. This allows both a top-down and bottom-up approach to dialogue modeling. It also supports elements of BDI/DME functionality such as maintaining a common ground with information about shared background knowledge and a conversational record. Problem definition We select an important and flexible class of mobile robot applications as our example domain: the “shepherding” class. In this application, one or more mobile robots have the objective of moving one or more objects so that they are grouped according to a given constraint. An example from this domain is for a single mobile platform to push an object from its start position over intermediate terrain of undefined characteristics into an enclosure. Another example is for it to push a set of objects of various shapes and size all into the enclosure. A more complex example is to group objects so that only objects of a particular color are in the enclosure. This class of tasks is attractive for two reasons. The first is that it includes the full range of problems for the robot to solve, from abstract task planning to realtime scheduling of motions, and including perception, navigation and grasping of objects. In addition, the robot must learn how to push one or more objects properly. This range of demands is ideal for our purposes, because it creates a situation in which complex hierarchies of Cognitive Robots features and constraints arise. Also, the tasks can be made dynamic by including objects that can move by themselves, e.g. balls or other robots. The second reason is that we can embed lots of other problems in it. For example, we can create an isomorph of the Towers of Hanoi task by having three narrow enclosures and adding the constraint that no object can be in front of a shorter object (so that all objects are always visible). The three enclosures act as the three pegs in the Towers of Hanoi, and the constraint is isomorphic to the constraint that no disk can be on a smaller disk in the Towers of Hanoi. This creates a situation in which the robot can be presented with a Towers of Hanoi problem in a real setting with perceptual and navigational difficulties, rather than just as an abstract task. Similarly, we can embed bin-packing problems here, by making them enclosure-packing problems. Also, we can embed sorting problems, and embed blockstacking problems. In this way, many existing puzzles and tasks can be embedded in a realistic setting and posed to the robot. A Simple Example This is an example of how a change in representation can make a problem easier to solve. 1 2 3 Observer The robot (gray) must move the three boxes from the leftmost column to the rightmost. It can push or lift one box at a time. A box can never be in front of a smaller box, as seen from the observer. The larger a box is, the taller also. The front area must be kept clear so the robot can move; there can be no boxes left in this area. This task is isomorphic to the three-disk Towers of Hanoi puzzle. The enclosure can be made larger and more boxes to create larger versions of this task. There are many ways for the robot to represent the Version of July 30, 2003 7 possible actions of the robot in this task. Let us examine three different formulations of the two-disk version of the task and then examine how they scale up to larger versions of the task. Formulation F1: Let the nine states of the task be {(b, s), 0 < b, s < 4}, where b and s are the numbers of the columns the big and small boxes are in, respectively. Let the two actions be denoted X and Y, where X moves the small box right one column (wrapping around from column 3 to column 1), and Y moves the large box one column to the left (wrapping around from column 1 to column 3). X is always executable, but Y can be executed only in three states. Formulation F2: Let the states be the same as F1, and let the six possible actions be: X1 = move the front box from 1 to 2 Y1 = move the front box from 1 to 3 X2 = move the front box from 2 to 3 Y2 = move the front box from 2 to 1 X3 = move the front box from 3 to 1 Y3 = move the front box from 3 to 2 Each of these actions is executable in four states. Formulation F3: Let the states be the same as F1, and let the two possible actions be denoted by X and Z. X is the same as in F1, and Z is a macro-action that moves both boxes column to the left, wrapping around as before. Each of these formulations can be implemented in terms of every other. For example, Z in F3 is implemented as a disjunction of sequences of actions from F1: Z = {XXY, XYX, YXX} or from F2: Z = {X1Y1X2, X2Y2X3, X3Y3X1}. The three formulations given above differ in that the first indexes the moves according to the box moved, whereas the second indexes the moves by the column moved from (or to, for the inverse moves.) The third formulation eliminates subgoal interference, which is present in the first two formulations. One advantage of the first and third formulations is clear: they scale up to more boxes, because sequences of actions in F1 and F3 behave in the same way as new moves are added for the new boxes. However, the actions in the second formulation must be redefined as more boxes are added, so that learned behaviors in this formulation cannot possibly be reused. This theory does not scale up. The result is that analysis and synthesis of the two-box problem performed using the first or third formulation can be reused when larger problems are attempted, but analysis and synthesis from the second formulation must be discarded when larger problems are attempted. Furthermore, the properties of these two representations scale up to all of these problems, e.g., the family of representations based on F3 has no subgoal Cognitive Robots interference. The third representation is clearly the best for efficient planning in this domain. The local symmetries of these formulations are formally examined in [11,14]. Space considerations prevent reproducing this analysis here. Informally, the local symmetries of each formulation determine how it can be used to create larger versions of the task. For example, let us consider formulation F1. In this case, three distinct copies of this formulation are created and “glued” together by sharing their local symmetries. This intersection of these three two-box tasks yields a set of relations for the three-box task. For example, consider moving the middle box one column to the right while putting the smaller box back where it started. When considering the middle box as the larger box in the task consisting of the smaller two boxes, this move is xyxxy (in the state when all boxes are in column), yxxyx (when the smallest box is to the right of the middle box), or xxyxxyxx. On the other hand, when considering this box as the smaller box in the task consisting of the larger two boxes, this move is x. This means that we add the relation xyxxy = yxxyx = xxyxxyxx to the formulation. Adding all such relations transforms F1 into F3, as is shown in detail in [11]. This reformulation makes the task easier to solve by removing all subgoal interference to produce independent controls. This method of composing larger tasks from smaller ones guarantees that this decomposition generalizes to any such task with n boxes, so that by induction the properties of the two-box task determine the properties of all these tasks. Summary: Status and Goals Soar has been connected to Saphira (the robot’s C++-based control software), and successful subgoaling and autonomous learning of navigation has been demonstrated using the reformulation method of analyzing local symmetries and invariants. This work utilizes only the sonars and not the vision. An initial integration of RS with Soar is partly completed. NL-Soar is a mature project dating back more than a decade. The current work involves integration with the speech-recognition software and speech production software of the robot, and is mostly complete. In addition, a more flexible discourse model for the robot is also partly completed. One of the major goals of this project is to integrate visual understanding into this architecture. This portion of the project is just beginning. Currently, we are giving the architecture the ability to predict the consequences of its actions by simulating itself and its environment. We are connecting the NeoEngine gaming software platform to Soar so that Version of July 30, 2003 8 the robot can create virtual copies of the real environment and simulate its actions. Over the next year, our goals are to finish the integration of RS and Soar and test its effectiveness on simple shepherding tasks without vision or language, and to continue work on NL-Soar’s discourse model and begin to integrate it into the architecture. The long-range goals of this project are to integrate the vision and language components and demonstrate the use of crossarea pattern recognition by the agent. References [1] G. Aist, J. Dowding, B. A. Hockey, M. Rayner, J. Hieronymus, D. Bohus, B. Boven, N. Blaylock, E. Campana, S. Early, G. Gorrell, and S. Phan. “Talking through procedures: An intelligent Space Station procedure assistant,” In Proceedings of the 10th Conference of the European Chapter of the Association for Computational Linguistics (EACL'03), Budapest, Hungary, 2003. [2] Allen, J.F., “An interval-based representation of temporal knowledge”, in Hayes, P. J., editor, IJCAI, pp. 221-226, Los Altos, CA., 1981. [3] Amarel, Saul, (1968). On Representations of Problems of Reasoning about Actions, in Michie (ed.) Machine Intelligence, chapter 10, pp. 131-171, Edinburgh University Press. [4] Ambite, J. L.; Knoblock, C. A.; and Minton, S., “Learning Plan Rewriting Rules,” Paper presented at the Fifth International Conference on Artificial Intelligence Planning and Scheduling, Breckenridge, Colorado, April 14-17, 2000. [5] Arkin, R., Behavior-Based Robotics, MIT Press, Cambridge, MA, 1998. [6] Michael F. Barnsley, “Fractal Image Compression”, Notices of the American Mathematical Society, Volume 43, Number 6, pp. 657-662, June, 1996. [8] Benjamin, D. Paul, “On the Emergence of Intelligent Global Behaviors from Simple Local Actions”, Journal of Systems Science, special issue: Emergent Properties of Complex Systems, Vol. 31, No. 7, 2000, 861-872. [10] Benjamin, D. Paul, “A Decomposition Approach to Solving Distributed Constraint Satisfaction Problems”, Proceedings of the IEEE Seventh Annual Dual-use Technologies & Applications Conference, IEEE Computer Society Press, 1997. [11] Benjamin, D. Paul, “Transforming System Formulations in Robotics for Efficient Perception and Planning”, Proceedings of the IEEE International Symposia on Intelligence and Systems, Washington, D.C., IEEE Computer Society Press, 1996. [14] Benjamin, D. Paul, “Behavior-preserving Transformations of System Formulations”, Proceedings of the AAAI Spring Symposium on Learning Dynamical Systems, Stanford University, March, 1996. [19] Benjamin, D. Paul, “Formulating Patterns in Problem Solving”, Annals of Mathematics and AI, 10, pp.1-23, 1994. Cognitive Robots [20] Benjamin, D. Paul, “Reformulating Path Planning Problems by Task-preserving Abstraction”, Journal of Robotics and Autonomous Systems, 9, pp.1-9, 1992. [23] A. Blake and A. Yuille, eds, Active Vision , MIT Press, Cambridge, MA, 1992. [24] Nate Blaylock, James Allen, and George Ferguson, “Synchronization in an asynchronous agent-based architecture for dialogue systems,” In Proceedings of the 3rd SIGdial Workshop on Discourse and Dialog, Philadelphia, 2002. [25] Brady, M., J. Hollerbach, T. Johnson, T. Lozano-Perez, and M.T. Mason (eds.), Robot Motion: Planning and Control, M.I.T. Press, Cambridge, MA, 1983. [26] Brooks, R. A. “A Robust Layered Control System for a Mobile Robot”, IEEE Journal of Robotics and Automation, Vol. 2, No. 1, pp. 14–23, March 1986. [27] Charness, Neil, Eyal M. Reingold, Marc Pomplun, and Dave M. Stampe, “The Perceptual Aspect of Skilled Performance in Chess: Evidence from Eye Movements”, Memory & Cognition, 29(8), pp.1146-1152, 2001. [28] L. Chase, R. Rosenfeld, A. Hauptmann, M. Ravishankar, E. Thayer, P. Placeway, R. Weide, and C. Lu. “Improvements in language, lexical, and phonetic modeling in Sphinx-II,” In Proc. Spoken Language Systems Technology Workshop. Morgan Kaufmann Publishers, 1995. [29] Eilenberg, Samuel, Automata, Languages, and Machines, Volumes A and B, Academic Press, 1976. [30] Estlin, T. A., and Mooney, R.J., “Multi-Strategy Learning of Search Control for Partial-Order Planning,” In Proceedings of the Thirteenth National Conference on Artificial Intelligence, 843-848. Menlo Park, Calif.: American Association for Artificial Intelligence, 1996. [31] Garcia-Martinez, R., and Borrajo, D., “An Integrated Approach of Learning, Planning, and Execution,” Journal of Intelligent and Robotic Systems 29(1): 47-78, 2000. [32] Jonathan Gratch and Randall W. Hill, Jr., “Continuous Planning and Collaboration for Command and Control in Joint Synthetic Battlespaces”, in Proceedings of the Eighth Conference on Computer Generated Forces and Behavioral Representation, 1999. [33] Green, N. and Lehman, J.F., “An Integrated Discourse Recipe-Based Model for Task-Oriented Dialogue.” Discourse Processes, 33(2), 2002. [34] Green, N., and Lehman, J.F., “Compiling knowledge for dialogue generation and interpretation.” Tech rep CMUCS-96-175, School of Computer Science, Carnegie Mellon University, 1996. [35] T. Henderson, E. Shilcrat, Logical sensor systems, Journal of Robotic Systems, 1(2), pp.169-193, 1984. [36] Huang, Y.; Kautz, H.; and Selman, B., “Learning Declarative Control Rules for Constraint-Based Planning,” Paper presented at the Seventeenth International Conference on Machine Learning, 29 June-2 July, Stanford, California, 2000. [37] Laird, John E., Rosenbloom, Paul S., and Newell, Allen, “Towards Chunking as a General Learning Mechanism,” AAAI-84, 1984. [38] Laird, J.E., Newell, A. and Rosenbloom, P.S., “Soar: An Architecture for General Intelligence”, Artificial Intelligence 33, pp.1-64, 1987. Version of July 30, 2003 9 [39] Lallement, Gerard, Semigroups and Combinatorial Applications, Wiley & Sons, 1979. [40] Larsson, S., Ljunglöf, P., Cooper, R., Engdahl, E., and Ericsson, S., “GoDiS—an Accommodating Dialogue System”, In Proceedings of ANLP/NAACL-2000 Workshop on Conversational Systems, Seattle, 2000. [41] K. F. Lee, “Automatic Speech Recognition: The Development of the SPHINX SYSTEM,” Kluwer Academic Publishers, Boston, 1989. [42]Lehman, J., VanDyke, J., and Rubinoff, R., “Natural language processing for IFORs: comprehension and generation in the air combat domain”, In Proceedings of the Fifth Conference on Computer Generated Forces and Behavioral Representation, 1995. [43] Lehman, J., VanDyke, J. Lonsdale, D., Green, N., and Smith, M., “A Building Block Approach to a Unified Language Capability,” Unpublished manuscript, 1997. [44] Lehman, J.F., Laird, J.E., and Rosenbloom, P.S., “A Gentle Introduction to Soar, an Architecture for Human Cognition,” Invitation to Cognitive Science, vol. 4, Sternberg & Scarborough (eds.), MIT Press. 1995. [45] Lehman, J.F., Newell, A.N., Polk, T., and Lewis, R.L., “The role of language in cognition,” In Conceptions of the Human Mind, Lawrence Erlbaum Associates, Inc., 1993. [46] Lewis, Richard L., “An Architecturally-based Theory of Human Sentence Comprehension,” Ph.D. thesis, Carnegie Mellon University, 1993. [47] D.Long and M.Fox and M.Hamdi, “Reformulation in Planning,” SARA 2002 (Springer Verlag LNCS series volume), 2002. [48] Lonsdale, Deryle, “Modeling Cognition in SI: Methodological Issues,” International journal of research and practice in interpreting, Vol. 2, no. 1/2, pages 91117; John Benjamins Publishing Company, Amsterdam, Netherlands, 1997. [49] Lonsdale, Deryle, “Leveraging Analysis Operators in Incremental Generation,” Analysis for Generation: Proceedings of a Workshop at the First International Natural Language Generation Conference, pp. 9-13; Association for Computational Linguistics; New Brunswick, NJ, 2000. [50] Lonsdale, Deryle, “An Operator-based Integration of Comprehension and Production,” LACUS Forum XXVII, pages 123–132, Linguistic Association of Canada and the United States, 2001. [51] Lonsdale, Deryle and C. Anton Rytting, “An Operatorbased Account of Semantic Processing,” The Acquisition and Representation of Word Meaning; pp. 84-92; European Summer School for Logic, Language and Information, Helsinki, 2001. [52] Lonsdale and C. Anton Rytting, “Integrating WordNet with NL-Soar,” WordNet and other lexical resources: Applications, extensions, and customizations; Proceedings of NAACL-2001; Association for Computational Linguistics, 2001. [53] Deryle Lonsdale, “Modeling cognition in SI: Methodological issues. International Journal of Research and Practice in Interpreting”, 2(1/2): 91–117, 1997. [54] Lueth, T.C., Laengle, T., Herzog, G., Stopp, E., and Rembold, U., “KANTRA: Human-Machine Interaction for Intelligent Robots Using Natural Language,” In Cognitive Robots Proceedings of the 3rd International Workshop on Robot and Human Communication. Nagoya, Japan, 1994. [55] Lyons, D.M. and Hendriks, A., “Planning as Incremental Adaptation of a Reactive System”, Journal of Robotics & Autonomous Systems 14, 1995, pp.255-288. [56] Lyons, D.M. and Hendriks, A., “Exploiting Patterns of Interaction to Select Reactions”, Special Issue on Computational Theories of Interaction, Artificial Intelligence 73, 1995, pp.117-148. [57] Lyons, D.M., “Representing and Analysing Action Plans as Networks of Concurrent Processes”, IEEE Transactions on Robotics and Automation, June 1993. [58] Lyons, D.M. and Arbib, M.A., “A Formal Model of Computation for Sensory-based Robotics”, IEEE Transactions on Robotics and Automation 5(3), Jun. 1989. [59] Lyons, D., and Arkin, R.C., “Towards Performance Guarantees for Emergent Behavior”, (Submitted) IEEE Int. Conf. on Robotics and Automation, New Orleans LA, April 2004. [60] D.M. Lyons, and A.J. Hendriks, Reactive Planning. Encyclopedia of Artificial Intelligence, 2nd Edition, Wiley & Sons, December, 1991. [61] Stacy Marsella, Jonathan Gratch and Jeff Rickel, "Expressive Behaviors for Virtual Worlds," Life-like Characters Tools, Affective Functions and Applications, Helmut Prendinger and Mitsuru Ishizuka (Editors), Springer Cognitive Technologies Series, 2003. [62] Miller, C. S. (1993), “Modeling Concept Acquisition in the Context of a Unified Theory of Cognition”, EECS, Ann Arbor, University of Michigan. [63] Murphy, T., and Lyons, D., “Combining Direct and Model-Based Perceptual Information Through Schema Theory”, CIRA97 , July 1997. [64] Nelson, G., Lehman, J.F., and John, B.E., “Experiences in interruptible language processing,” Proceedings of the AAAI Spring Symposium on Active Natural Language Processing, Stanford, CA,21-23 March,1994. [65] Nelson, G., Lehman, J.F., and John, B.E., “Integrating cognitive capabilities in a real-time task,” In Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society. Atlanta, GA, August 1994. [66] Newell, Allen, Unified Theories of Cognition, Harvard University Press, Cambridge, Massachusetts, 1990. [67] M. Nicolescu and M. Mataric, “Extending Behavior-based System Capabilities Using an Abstract Behavior Representation”, Working Notes of the AAAI Fall Symposium on Parallel Cognition, pages 27-34, North Falmouth, MA, November 3-5, 2000. [68] Petrich, Mario, Inverse Semigroups, John Wiley & Sons, Inc., New York, 1984. [69] Rees, Rebecca, “Investigating Dialogue Managers: Building and Comparing FSA Models to BDI Architectures, and the Advantages to Modeling Human Cognition in Dialogue,” Honors Thesis, BYU Physics Department, 2002. [70] P. S. Rosenbloom, J. E. Laird, and A. Newell, The SOAR Papers . MIT Press, 1993. [71] Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F., Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B., and Tambe, M., “Intelligent Automated Agents for Version of July 30, 2003 10 Tactical Air Simulation: A Progress Report”, Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation, pp.69-78, 1994. [72] Rubinoff, R. and Lehman, J. F., “Natural Language Processing in an IFOR Pilot,” Proceedings of the Fourth Conference on Computer Generated Forces and Behavioral Representation, Orlando, FL., 1994. [73] C. Anton Rytting and Deryle Lonsdale. IntegratingWordNet with NL-Soar. In WordNet and other lexical resources: Applications, extensions, and customizations, pages 162–164. North American Association for Computational Linguistics, 2001. [74] S. Schneider, Concurrent and Real-time Systems: The CSP Approach. Wiley, 1999. [75] M. Senechal, Quasicrystals and Geometry, Cambridge University Press, 1995. [76] Spiliotopoulos, D., Androutsopoulos, I., and Spyropoulos, C. D., “Human-Robot Interaction Based on Spoken Natural Language Dialogue,” In Proceedings of the European Workshop on Service and Humanoid Robots (ServiceRob ‘2001), Santorini, Greece, 25-27 June 2001. [77] Martha Steenstrup, Michael A. Arbib, Ernest G. Manes, Port Automata and the Algebra of Concurrent Processes. JCSS 27(1): 29-50, 1983. [78] Steinhardt, Paul J., New Perspectives on Forbidden Symmetries, Quasicrystals, and Penrose Tilings, Proceedings of the National Academy of Science USA, Vol. 93, pp. 14267-14270, December 1996. [79] Van Dyke, J. and Lehman, J.F., “An architectural account of errors in foreign language learning,” In Proceedings of the Nineteenth Annual Conference of the Cognitive Science Society, 1997. [80] Manuela M. Veloso, Jaime Carbonell, M. Alicia Perez, Daniel Borrajo, Eugene Fink, and Jim Blythe, “Integrating planning and learning: The Prodigy architecture,” Journal of Experimental and Theoretical Artificial Intelligence, 7(1):81--120, 1995. [81] D. Vrakas, G. Tsoumakas, N. Bassiliades and I. Vlahavas, "Learning Rules for Adaptive Planning ", In Proceedings of the 13th International Conference on Automated Planning and Scheduling, ICAPS 03, 2003. [82] Wray, R. E., and Chong, R. S., Explorations of quantitative category learning with Symbolic Concept Acquisition. 5th International Conference on Cognitive Modeling (ICCM), Bamberg, Germany, 2003. [83] Y.Xu, P.Duygulu, E.Saber, M.Tekalp, F.T. Yarman Vural, “Object Based Image Retrieval Based On MultiLevel Segmentation”, IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP2000), June 2000. Cognitive Robots