Introduction - Fordham University Computer and Information

advertisement
Integrating Perception, Language and Problem Solving in a
Cognitive Agent for a Mobile Robot*
D. Paul Benjamin
Computer Science Department
Pace University
Deryle Lonsdale
Department of Linguistics
Brigham Young University
Damian Lyons
Computer Science Department
Fordham University
Abstract
We are implementing a unified cognitive
architecture for a mobile robot. Our goal is to endow a
robot agent with the full range of cognitive abilities,
including perception, use of natural language, learning
and the ability to solve complex problems. The
perspective of this work is that an architecture based on
a unified theory of robot cognition has the best chance of
attaining human-level performance.
This agent architecture is based on an
integration of three theories: a theory of cognition
embodied in the Soar system, the RS formal model of
sensorimotor activity and an algebraic theory of
decomposition and reformulation. These three theories
share a hierarchical structure that is the basis of their
integration.
The three component theories of this
architecture have been implemented and tested
separately and their integration is currently underway.
This paper describes these components and the overall
architecture.
Introduction
The current generation of behavior-based
robots is programmed directly for each task. The robot
agents are written in a way that uses as few built-in
cognitive assumptions as possible, and as much sensory
information as possible, in a manner similar to Brooks,

*This project is partially supported by a grant from the Department of
Energy.
Version of July 30, 2003
1
who proposed in the mid 1980’s an approach that
emphasized fast, reactive actions and the absence of
explicit models. The lack of cognitive assumptions gives
them a certain robustness and generality in dealing with
unstructured environments. However it is proving a
challenge to extend the competence of such systems
beyond navigation and some simple tasks [67]. Complex
tasks that involve reasoning about spatial and temporal
relationships require robots to possess more advanced
mechanisms for planning, reasoning, learning and
representation.
One approach to this issue is to equip a robot
agent with a behavior-based “reactive” module coupled
to a “deliberative” module that engages in cognitive
reasoning – these are called hybrid reactive/deliberative
systems [5]. Many such systems consist of a symbolic
planner, or planner and scheduler, as the deliberative
component. The planner formulates long-term and more
abstract tasks, while the behavior-based system carries
out the steps in the task in a robust fashion [5]. For
example, Lyons et al. [55,56] describe a hybrid system
based on the concept of iterative and forced relaxation of
assumptions about the environment. The deliberative
module generates and maintains a behavior-based
system that is as robust as possible, given the tradeoff
between the time available to the planner and any
previously sensed assumption failures. The reactive
system is implemented using the port-automata based
RS model [57] and ITL [2] is used for the planning
component. Action schemas in ITL are linked to prepackaged RS automata networks. The planner reasons
about instantiating and connecting networks to achieve
its goals. As a plan is created, the planner incrementally
transmits the instructions to the reactive component,
Cognitive Robots
which instantiates automata networks and begins the
execution of the plan concurrently with the ongoing
elaboration of the plan. These networks include
monitoring automata whose job is to report back to the
planner when components of the plan finish execution,
or when sensed conditions change to such an extent that
the plan is no longer valid. The system could then
respond to the violation of assumptions about the
environment by replanning, again concurrent with the
ongoing automata execution, and upgrade components
of the plan in a safe fashion. This hybrid approach
endows a robot with a more sophisticated planning
ability than either deliberative planning or reactive
behavior provides alone; however, it does not address
either learning or representation issues.
Our objective is to take this approach an
important further step towards true cognitive behavior. A
planner can only consider plans based on the action and
sensing repertoire it possesses. In this sense, it is as
dependent for robust behavior on its designer as is a
behavior-based system. Furthermore, a second limitation
is that the planner is constrained by the environment
built in by its designer (or that it learned.) Each
formulation makes certain results easy to obtain and
makes others difficult. As a result of these two
limitations, robots can usually do a few things very well,
and everything else poorly.
This limitation is typical of planning systems
and has led to research on systems that can
autonomously expand or adapt their repertoire of
planning components. Two such approaches are to add a
learning capability so that the system can learn new plan
components, or to add an ability to change formulations
so that the system can transform a task into a much more
easily solved form. Much research has added a learning
module or a change of representation module to an
existing planner in an attempt to construct a planner of
greater power and generality. Typical examples of this
line of research are [3,30,31,36,47,80]. Our approach is
to add these capabilities to a planner by integrating it
with a system with more general cognitive abilities.
We are engaged in the development of a
deliberative module that is capable of carrying out
experimental attempts to solve a task with the objective
of discovering the underlying structure of the task. This
structure consists of local symmetries and invariants that
are present in the percepts and motions of the robot as
well as in the environment. The agent uses this structure
both to create new high-level sensors [35] and to
reformulate knowledge so that the task becomes easier to
solve.
Our approach is to build a complete cognitive
robotic architecture by merging RS [57,58], which
provides a model for building and reasoning about
sensory-motor schemas, with Soar [38], a cognitive
Version of July 30, 2003
2
architecture that is under development at a number of
universities. RS possesses a sophisticated formal
language for reasoning about networks of port automata
and has been successfully applied to robot planning [56].
Soar is a unified cognitive architecture [66] that has been
successfully applied to a wide range of tasks including
tactical air warfare [71].
This cognitive robotic agent is currently being
implemented and tested on Pioneer P2 robots
(http://robots.activmedia.com/) in the Pace University
Robotics Lab ( http://csis.pace.edu/robotlab ).
Operation
Our cognitive robot architecture operates in a
top-down, goal-directed fashion, both for motor planning
and sensory processing. A human gives a task to the
robot via a spoken natural language utterance which is
processed by the speech recognition system and
subsequent linguistic processing. If necessary, a
clarification dialogue unfolds between the robot and the
human. The deliberative planning component generates
an abstract plan, which it refines both before and during
execution. Plans are represented as a hierarchy of
schemas composed of sensory and motor components.
Sensory processing is also goal-directed, operating in a
goal-directed fashion to detect preconditions of plan
components, monitor their invariants, and check on their
postconditions. The vision is “active” in the sense of
[23].
To support this approach, the representation of
sensory data is also organized as a hierarchy in which
the levels are of different granularities, with the coarsest
granularity at the top of the hierarchy. For example, the
visual input is organized in this way with the top level
containing only the largest segments and lower levels
adding smaller segments, in a manner similar to [83].
The robot generates this hierarchy beginning with the
coarsest level and adds detail over time; this permits the
robot to make decisions and to communicate with
humans and other robots in an “anytime” fashion, using
whatever information has been generated instead of
waiting for the entire image to be completely processed.
Active perception guides this process by focusing
computation on certain portions of the image, to add
detail that is necessary to check sensory goals and
eliminate hypotheses.
Top-down processing is predictive in nature: it
generates hypotheses about the world and checks them.
The advantage of this approach is that it can be
extremely fast because only a few aspects of the world
need to be checked and much low-level data can be
ignored. To be effective, this approach requires that the
set of hypotheses be small, which further requires that
the environment be decomposable into small units. For
Cognitive Robots
effective problem solving, these small units should be
functionally related, which permits the problem solver to
recognize the situation quickly in terms of what should
be done. It is known that humans can solve hard
problems by organizing their perceptions this way [27].
From this perspective, one of the main goals of
the problem solver is to represent the world in terms of
such small functional units. Our theory of decomposition
is well suited to this purpose.
Decomposition and Reformulation
The complexity of perception and action arises
in part from the need for a robot to perceive and act
simultaneously on several space and time scales, ranging
from small scales of millimeters and milliseconds to
large scales of kilometers and hours. Each of these scales
seems to possess a complex structure, and each can
potentially constrain any other. Yet a robot must
perceive and act in real time. This topic has long been a
central focus of work in artificial intelligence and
robotics. Often, research into the analysis and synthesis
of intelligent behavior has been hindered by the belief
that complex behavior requires complex mechanisms
that explicitly encode the behaviors. Recent work in
dynamical systems theory suggests otherwise.
The study of dynamical systems is a new field
of mathematics with a very wide applicability
throughout science. Very simply formulated dynamical
systems can exhibit a wide range of complex behaviors
that are not evident from their formulations. Even the
simple pendulum turns out to possess chaotic behavior!
This relationship between simple systems and complex
behaviors poses a difficult hurdle, as sophisticated
mathematics may be required to understand the
behaviors of even very simply formulated nonlinear
systems. However, artificial intelligence and robotics
focus on the inverse problem: how to formulate a system
that can express a given range of complex behaviors.
Thus, for AI and robotics this relationship between
simple systems and complex behaviors is attractive, as it
holds the potential to greatly reduce the complexity of
robotic systems by finding methods for building simply
formulated systems that generate complex, intelligent
behavior in a complex environment.
Our decomposition algorithm finds a task’s
hierarchy of local neighborhoods, and reformulates the
task in terms of local coordinate systems for these
neighborhoods. The relevant properties for system
formulation are those properties that hold locally
everywhere within the neighborhoods. Simple programs
can measure these local properties and maneuver within
local neighborhoods. Under certain conditions, the local
neighborhood structure becomes a substitution tiling
[75]. In these cases, self-similarity of the neighborhoods
Version of July 30, 2003
3
permits the simple programs to be effective at several
scales, creating a hierarchical structure that is simple and
local at each scale, but capable of generating very
complex global behaviors. Fortunately, substitution
tilings appear to be ubiquitous in the physical world
[75].
Local coordinate systems define the perceptual
spaces for the robot. The axes in each coordinate system
correspond to a set of features for describing the state
space. Associated with the local symmetries of the local
coordinate system are invariant subspaces of states
(states on which particular combinations of features are
invariant.) Motion in the state space decomposes into
motion within invariant subspaces and motion normal to
those subspaces. This decomposition reduces the
complexity of motion planning, and also creates the goal
for the perceptual system: the robot must perceive the
visual symmetries and monitor the invariants of the
motions. The features determined in this way are the
semantic components that form the basis of
communication.
Thus, this approach integrates perception,
action and communication in a principled way by
connecting their local symmetries and invariants. Our
previous work into this structure has shown that it exists
in motion, perception, communication and planning. For
example, the motions of a robot on a small hexagonal
grid can be computed and analyzed, and then copies of
the small grid can be composed in such a way that the
properties scale up [19, 20]. The structure of local
neighborhoods replicates on larger scales. Features scale
up, with features of the basic neighborhood appearing in
larger neighborhoods at larger scales. By planning
locally within neighborhoods of different scales, the
robot can follow a very complex path without
performing any complex planning; the complexity of the
path reflects the complexity of the environment rather
than the complexity of the planner [8].
Perception exhibits this self-similar structure,
too. Barnsley [6] has shown that this self-similar
structure exists in images and has developed fractal
compression methods that are commercially viable
competitors in the areas of image processing and
communications. A number of research and commercial
efforts are underway in this area, including those at the
University of Waterloo, INRIA and Ecole Polytechnique
de l'Universite de Montreal.
Natural language is another example of the
interaction of top-down processing (overall discourse
meaning, major syntactic and semantic structures) and
bottom-up processing (word meaning, speech
recognition). The algebraic basis of natural language and
the use of local symmetries in its analysis are well
understood [29,39,68].
Constraint satisfaction problems can also
Cognitive Robots
exhibit this self-similar structure [10]. One such problem
is the tiling of the plane using Penrose tiles. The
solutions of the Penrose tiling placement problem are
typically found by combinatorial search, but Steinhardt
[78] has shown that nine copies of the local pattern
shown at left in (a) below can be replicated and
overlapped to give the pattern in (c) at right. This larger
pattern is self-similar to the original pattern using the
substitution given in (b). Steinhardt shows that all
Penrose tilings arise in this way, and thus Penrose tilings
are locally everywhere isomorphic to the pattern in (a),
at many scales.
The implications of this structure for planning
and problem solving are clear: instead of using
combinatorial search in the space of possible solutions, a
robot can search for a self-similar local neighborhood
structure in the constraints of the task and the
environment. When such a structure exists, the robot can
exploit it to solve large, complex planning problems in
an efficient hierarchical manner.
The Cognitive Architecture
Our robot cognitive architecture exhibits a
hierarchy of sensory-motor modules in all facets of its
operation, from abstract deliberative planning to realtime scheduling and reaction. As described above, all
these modules are formulated in terms of symmetries
and invariants. The implementation of this model is
accomplished by integrating two existing systems: RS
and Soar.
RS is a formal model of robot computation that
is based on the semantics of networks of port automata.
A port automaton (PA) is a finite-state automaton
equipped with a set of synchronous communication ports
[58]. Formally we can write a port automaton P as:
P = ( Q, L, X , ,  ) where
Q
is the set of states
L
is the set of ports
X = ( Xi | i L ) is the event alphabet for each
port
Let XL = { (i, Xi) | i L } i.e., a disjoint union
of L and X
 : Q XL 2Q is the transition function
Version of July 30, 2003
4
 = (i | i L) i : Q  Xi is the output map for
port i
  2Q is the set of start states
RS introduces a vocabulary to specify networks
of port automata. A network of processes is typically
built to capture a specific robot sensorimotor skill or
behavior: sensory processes linked to motor processes
by data communication channels and sequenced using
process composition operations.
RS process composition operations are similar
to the well-known CSP algebraic process model of [74].
However, unlike CSP, in RS the notation can be seen as
simply a shortcut for specifying automata; a process is a
port automaton, and a process composition operation is
two automata connected in a specific way. Composition
operations include sequential, conditional and disabling
compositions [57,58]. To analyze a network of
processes, it is necessary to calculate how that network
changes as time progresses and processes terminate
and/or are created. This is the process-level equivalent of
the PA transition function, combined with the axioms
that define port-to-port communication. This Process
Transition function can be used to analyze the behavior
of RS networks [59].
Soar is a cognitive architecture originally
developed at CMU and undergoing continuing
development at a number of locations, including the
University of Michigan and the Information Sciences
Institute. Knowledge in Soar is represented as operators,
which are organized into problem spaces. Each problem
space contains the operators relevant to some aspect of
the system's environment. In our system, some problem
spaces contain operators describing the actions of the
robot for particular tasks and subtasks; different
coordinate systems (representations) are in different
problem spaces. Other problem spaces contain operators
governing the vision system, including operators about
how to control the camera, operators for selecting
software to process the visual data, and operators for the
creation and modification of local coordinate systems in
the visual data. The reformulation method is itself
represented as operators in a problem space. Newly
created coordinate systems are placed in new problem
spaces.
The basic problem-solving mechanism in Soar
is universal subgoaling: every time there is choice of two
or more operators, Soar creates a subgoal of deciding
which to select, and brings the entire knowledge of the
system to bear on solving this subgoal by selecting a
problem space and beginning to search. This search can
encounter situations in which two or more operators can
fire, which in turn causes subgoals to be created, etc.
When an operator is successfully chosen, the
corresponding subgoal has been solved and the entire
Cognitive Robots
solution process is summarized in a single rule, called a
chunk, which contains the general conditions necessary
for that operator to be chosen. This rule is added to the
system's rule set, so that in similar future situations the
search can be avoided. In this way, Soar learns.
The Soar publications extensively document
how this learning method speeds up the system's
response time in a manner that accurately models the
speedup of human subjects on the same tasks.
Universal subgoaling reflects the self-similarity
of problem solving. This is the main reason for the
choice of Soar as the software for the deliberative
component of the cognitive architecture - the self-similar
structure of Soar's problem solving matches the selfsimilar structure of the environment. In addition, the
Soar research community is implementing a wide range
of aspects of cognition in Soar, including natural
language [53,73], concept learning [62,82] and emotion
[61].
The strengths of RS include its formal
mechanism for combining sensing and motion, its ability
to reason about time, its combination of deliberative
planning and reactive behavior, and its real-time parallel
scheduling and execution environment.
Its weaknesses are the lack of a mechanism for
autonomous formation of sensors or actuators, and the
lack of a model for implementation of cognitive abilities.
Soar provides an integrated cognitive model
with a full range of cognitive abilities, including
perception, deliberative planning, reaction, natural
language, learning, and emotion. On the other hand, Soar
lacks parallelism and a temporal mechanism, which
severely hampers its usefulness in robotics.
By integrating RS and Soar, we are creating an
architecture that will:
 process a wide range of modes of perceptual
input,
 provide a complete range of cognitive abilities,
including language and learning,
 reason about time and resources,
 implement parallel, reactive behaviors, and
 learn new high-level sensors and behaviors.
The reactive component is specified and
implemented in RS, and provides an execution
environment for automata networks. The basic sensory
system is always on and is hierarchical, shallow and
bottom-up. The deliberative module can instantiate
automata networks on demand, and can add networks to
the sensory system that are task-specific, deep, and topdown.
The deliberative component is implemented in
Soar. It receives information sent by the executing
reactive network, and can instantiate or stop networks.
Soar reasons about the automata networks that are used
to represent the environment and the task, and generates
abstract plans that are refined into automata networks,
creating new networks as necessary. Soar’s learning
mechanism can form new automata networks by
chunking, as well as learn new plan components and
learn search control knowledge to improve planning and
reaction time.
Deliberative System: Soar
Symbolic operators searching state space
Adds sensory and motor networks that are
task-specific, deep, and top-down. Modifies
and removes networks on demand.
Basic Cognitive Primitives
Reactive System: RS
Networks of port automata: parallel, with temporal properties
Sensory Schemas
Send perceptual data to Soar
Sensory system: bottom-up,
hierarchical, shallow, always on
Version of July 30, 2003
Motor Schemas
Motor system: schedules and
executes port automata networks
5
Cognitive Robots
Communication between humans and the robot
will be assured by a natural language system
implemented within the Soar cognitive modeling
framework [43,44]. The system supports spoken human
language input via an interface with Sphinx, a popular
speech recognition engine [28,41]. Textual inputs
representing best-guess transcriptions from the system
will be pipelined whole utterances into the main natural
language component.
The NL component processes each word
individually and performs the following operations in
order to understand the input text:
 lexical access (which retrieves morphological,
syntactic, and semantic information for each word
from its lexicon) [50,52]
 syntactic model construction (linking together
pieces of an X-bar parse tree) [46]
 semantic model construction (fusing together
pieces of a lexical-conceptual structure) [51,73]
 discourse model construction (extracting global
coherence from individual utterances) [34,35]
Each of these functions is performed either
deliberately (by subgoaling and via the implementation
of several types of lower-level operators) or by
recognition (if chunks and pre-existing operators have
already been acquired via the system's learning
capability). Three types of structure resulting from the
structure will be important for subsequent processing:
the X-bar model of syntax, the LCS model of semantics,
and the discourse model. The depth and breadth of the
interface between these structures and the robot's
incoming percepts and outgoing actions will be explored
during the project.
The top-level language operators mentioned
above can be sequenced in an agent's behavior with each
other and with any other non-language task operators,
providing a highly reactive, interruptible, interleavable
real-time language capability [64]. In agent/human
dialogues these properties are crucial [1,24].
As is typically implemented for human/robotic
interaction, our system uses dialogue-based discourse
interface between the robot and the NL component. The
system's discourse processing involves aspects of input
text comprehension (including referring to the prior
results of syntax and semantics where necessary) and
generation (i.e. the production of linguistic utterances).
Both applications of discourse processing will involve
planning and plan recognition, linguistic principles, realworld knowledge, and interaction management. The
robotics domain will require a limited command
vocabulary size of some 1500 words initially, and
utterances will be comparatively straightforward. This
will also improve the recognition rate of the speech
Version of July 30, 2003
6
engine and support more diverse interaction
environments.
Possible communication modalities include
[54]: (1) a command mode, or task specification, where
the robot is capable of understanding imperative
sentences and acting upon them (e.g. “Turn ninetydegrees to the left.” “Close the door.”) Other, more
difficult types of utterances include (2) execution
monitoring, where the robot takes instructions as part of
a team, (3) explanation/error recovery to help the robot
adapt and cooperate with changing plans, and (4)
updating the environment representation to allow a user
to aid the robot in maintaining a world model beyond its
own perceptions. To begin with, the robot will
understand imperative utterances, but other types of
comprehension capabilities, as well as language
generation, will be incrementally added.
NL-Soar implements a discourse recipe-based
model (DRM) for dialogue comprehension and
generation [33]. It learns the discourse recipes, which are
generalizations of an agent’s discourse plans, as a side
effect of dialogue planning. This way, plans can be used
for comprehension and generation. If no recipe can be
matched, the system resorts to dialogue plans. This
allows both a top-down and bottom-up approach to
dialogue modeling. It also supports elements of
BDI/DME functionality such as maintaining a common
ground with information about shared background
knowledge and a conversational record.
Problem definition
We select an important and flexible class of
mobile robot applications as our example domain: the
“shepherding” class. In this application, one or more
mobile robots have the objective of moving one or more
objects so that they are grouped according to a given
constraint. An example from this domain is for a single
mobile platform to push an object from its start position
over intermediate terrain of undefined characteristics
into an enclosure. Another example is for it to push a set
of objects of various shapes and size all into the
enclosure. A more complex example is to group objects
so that only objects of a particular color are in the
enclosure.
This class of tasks is attractive for two reasons.
The first is that it includes the full range of problems for
the robot to solve, from abstract task planning to realtime scheduling of motions, and including perception,
navigation and grasping of objects. In addition, the robot
must learn how to push one or more objects properly.
This range of demands is ideal for our purposes, because
it creates a situation in which complex hierarchies of
Cognitive Robots
features and constraints arise. Also, the tasks can be
made dynamic by including objects that can move by
themselves, e.g. balls or other robots.
The second reason is that we can embed lots of
other problems in it. For example, we can create an
isomorph of the Towers of Hanoi task by having three
narrow enclosures and adding the constraint that no
object can be in front of a shorter object (so that all
objects are always visible). The three enclosures act as
the three pegs in the Towers of Hanoi, and the constraint
is isomorphic to the constraint that no disk can be on a
smaller disk in the Towers of Hanoi. This creates a
situation in which the robot can be presented with a
Towers of Hanoi problem in a real setting with
perceptual and navigational difficulties, rather than just
as an abstract task.
Similarly, we can embed bin-packing problems
here, by making them enclosure-packing problems.
Also, we can embed sorting problems, and embed blockstacking problems. In this way, many existing puzzles
and tasks can be embedded in a realistic setting and
posed to the robot.
A Simple Example
This is an example of how a change in
representation can make a problem easier to solve.
1
2
3
Observer
The robot (gray) must move the three boxes
from the leftmost column to the rightmost. It
can push or lift one box at a time. A box can
never be in front of a smaller box, as seen from
the observer. The larger a box is, the taller also.
The front area must be kept clear so the robot
can move; there can be no boxes left in this
area.
This task is isomorphic to the three-disk
Towers of Hanoi puzzle. The enclosure can be made
larger and more boxes to create larger versions of this
task. There are many ways for the robot to represent the
Version of July 30, 2003
7
possible actions of the robot in this task. Let us examine
three different formulations of the two-disk version of
the task and then examine how they scale up to larger
versions of the task.
Formulation F1: Let the nine states of the task
be {(b, s), 0 < b, s < 4}, where b and s are the numbers
of the columns the big and small boxes are in,
respectively. Let the two actions be denoted X and Y,
where X moves the small box right one column
(wrapping around from column 3 to column 1), and Y
moves the large box one column to the left (wrapping
around from column 1 to column 3). X is always
executable, but Y can be executed only in three states.
Formulation F2: Let the states be the same as
F1, and let the six possible actions be:
X1 = move the front box from 1 to 2
Y1 = move the front box from 1 to 3
X2 = move the front box from 2 to 3
Y2 = move the front box from 2 to 1
X3 = move the front box from 3 to 1
Y3 = move the front box from 3 to 2
Each of these actions is executable in four
states.
Formulation F3: Let the states be the same as
F1, and let the two possible actions be denoted by X and
Z. X is the same as in F1, and Z is a macro-action that
moves both boxes column to the left, wrapping around
as before.
Each of these formulations can be implemented
in terms of every other. For example, Z in F3 is
implemented as a disjunction of sequences of actions
from F1: Z = {XXY, XYX, YXX} or from F2: Z =
{X1Y1X2, X2Y2X3, X3Y3X1}. The three formulations
given above differ in that the first indexes the moves
according to the box moved, whereas the second indexes
the moves by the column moved from (or to, for the
inverse moves.) The third formulation eliminates
subgoal interference, which is present in the first two
formulations. One advantage of the first and third
formulations is clear: they scale up to more boxes,
because sequences of actions in F1 and F3 behave in the
same way as new moves are added for the new boxes.
However, the actions in the second formulation must be
redefined as more boxes are added, so that learned
behaviors in this formulation cannot possibly be reused.
This theory does not scale up.
The result is that analysis and synthesis of the
two-box problem performed using the first or third
formulation can be reused when larger problems are
attempted, but analysis and synthesis from the second
formulation must be discarded when larger problems are
attempted. Furthermore, the properties of these two
representations scale up to all of these problems, e.g., the
family of representations based on F3 has no subgoal
Cognitive Robots
interference. The third representation is clearly the best
for efficient planning in this domain.
The local symmetries of these formulations are
formally examined in [11,14]. Space considerations
prevent reproducing this analysis here. Informally, the
local symmetries of each formulation determine how it
can be used to create larger versions of the task. For
example, let us consider formulation F1. In this case,
three distinct copies of this formulation are created and
“glued” together by sharing their local symmetries. This
intersection of these three two-box tasks yields a set of
relations for the three-box task. For example, consider
moving the middle box one column to the right while
putting the smaller box back where it started. When
considering the middle box as the larger box in the task
consisting of the smaller two boxes, this move is xyxxy
(in the state when all boxes are in column), yxxyx (when
the smallest box is to the right of the middle box), or
xxyxxyxx. On the other hand, when considering this box
as the smaller box in the task consisting of the larger two
boxes, this move is x. This means that we add the
relation xyxxy = yxxyx = xxyxxyxx to the formulation.
Adding all such relations transforms F1 into F3, as is
shown in detail in [11]. This reformulation makes the
task easier to solve by removing all subgoal interference
to produce independent controls.
This method of composing larger tasks from
smaller ones guarantees that this decomposition
generalizes to any such task with n boxes, so that by
induction the properties of the two-box task determine
the properties of all these tasks.
Summary: Status and Goals
Soar has been connected to Saphira (the robot’s
C++-based control software), and successful subgoaling
and autonomous learning of navigation has been
demonstrated using the reformulation method of
analyzing local symmetries and invariants. This work
utilizes only the sonars and not the vision. An initial
integration of RS with Soar is partly completed.
NL-Soar is a mature project dating back more
than a decade. The current work involves integration
with the speech-recognition software and speech
production software of the robot, and is mostly
complete. In addition, a more flexible discourse model
for the robot is also partly completed.
One of the major goals of this project is to
integrate visual understanding into this architecture. This
portion of the project is just beginning.
Currently, we are giving the architecture the
ability to predict the consequences of its actions by
simulating itself and its environment. We are connecting
the NeoEngine gaming software platform to Soar so that
Version of July 30, 2003
8
the robot can create virtual copies of the real
environment and simulate its actions.
Over the next year, our goals are to finish the
integration of RS and Soar and test its effectiveness on
simple shepherding tasks without vision or language,
and to continue work on NL-Soar’s discourse model and
begin to integrate it into the architecture. The long-range
goals of this project are to integrate the vision and
language components and demonstrate the use of crossarea pattern recognition by the agent.
References
[1] G. Aist, J. Dowding, B. A. Hockey, M. Rayner, J.
Hieronymus, D. Bohus, B. Boven, N. Blaylock, E.
Campana, S. Early, G. Gorrell, and S. Phan. “Talking
through procedures: An intelligent Space Station
procedure assistant,” In Proceedings of the 10th
Conference of the European Chapter of the Association
for Computational Linguistics (EACL'03), Budapest,
Hungary, 2003.
[2] Allen, J.F., “An interval-based representation of temporal
knowledge”, in Hayes, P. J., editor, IJCAI, pp. 221-226,
Los Altos, CA., 1981.
[3] Amarel, Saul, (1968). On Representations of Problems of
Reasoning about Actions, in Michie (ed.) Machine
Intelligence, chapter 10, pp. 131-171, Edinburgh
University Press.
[4] Ambite, J. L.; Knoblock, C. A.; and Minton, S., “Learning
Plan Rewriting Rules,” Paper presented at the Fifth
International Conference on Artificial Intelligence
Planning and Scheduling, Breckenridge, Colorado, April
14-17, 2000.
[5] Arkin, R., Behavior-Based Robotics, MIT Press,
Cambridge, MA, 1998.
[6] Michael F. Barnsley, “Fractal Image Compression”,
Notices of the American Mathematical Society, Volume
43, Number 6, pp. 657-662, June, 1996.
[8] Benjamin, D. Paul, “On the Emergence of Intelligent
Global Behaviors from Simple Local Actions”, Journal of
Systems Science, special issue: Emergent Properties of
Complex Systems, Vol. 31, No. 7, 2000, 861-872.
[10] Benjamin, D. Paul, “A Decomposition Approach to
Solving Distributed Constraint Satisfaction Problems”,
Proceedings of the IEEE Seventh Annual Dual-use
Technologies & Applications Conference, IEEE
Computer Society Press, 1997.
[11] Benjamin, D. Paul, “Transforming System Formulations
in Robotics for Efficient Perception and Planning”,
Proceedings of the IEEE International Symposia on
Intelligence and Systems, Washington, D.C., IEEE
Computer Society Press, 1996.
[14] Benjamin, D. Paul, “Behavior-preserving Transformations
of System Formulations”, Proceedings of the AAAI
Spring Symposium on Learning Dynamical Systems,
Stanford University, March, 1996.
[19] Benjamin, D. Paul, “Formulating Patterns in Problem
Solving”, Annals of Mathematics and AI, 10, pp.1-23,
1994.
Cognitive Robots
[20] Benjamin, D. Paul, “Reformulating Path Planning
Problems by Task-preserving Abstraction”, Journal of
Robotics and Autonomous Systems, 9, pp.1-9, 1992.
[23] A. Blake and A. Yuille, eds, Active Vision , MIT Press,
Cambridge, MA, 1992.
[24] Nate Blaylock, James Allen, and George Ferguson,
“Synchronization in an asynchronous agent-based
architecture for dialogue systems,” In Proceedings of the
3rd SIGdial Workshop on Discourse and Dialog,
Philadelphia, 2002.
[25] Brady, M., J. Hollerbach, T. Johnson, T. Lozano-Perez,
and M.T. Mason (eds.), Robot Motion: Planning and
Control, M.I.T. Press, Cambridge, MA, 1983.
[26] Brooks, R. A. “A Robust Layered Control System for a
Mobile Robot”, IEEE Journal of Robotics and
Automation, Vol. 2, No. 1, pp. 14–23, March 1986.
[27] Charness, Neil, Eyal M. Reingold, Marc Pomplun, and
Dave M. Stampe, “The Perceptual Aspect of Skilled
Performance in Chess: Evidence from Eye Movements”,
Memory & Cognition, 29(8), pp.1146-1152, 2001.
[28] L. Chase, R. Rosenfeld, A. Hauptmann, M. Ravishankar,
E. Thayer, P. Placeway, R. Weide, and C. Lu.
“Improvements in language, lexical, and phonetic
modeling in Sphinx-II,” In Proc. Spoken Language
Systems Technology Workshop. Morgan Kaufmann
Publishers, 1995.
[29] Eilenberg, Samuel, Automata, Languages, and Machines,
Volumes A and B, Academic Press, 1976.
[30] Estlin, T. A., and Mooney, R.J., “Multi-Strategy Learning
of Search Control for Partial-Order Planning,” In
Proceedings of the Thirteenth National Conference on
Artificial Intelligence, 843-848. Menlo Park, Calif.:
American Association for Artificial Intelligence, 1996.
[31] Garcia-Martinez, R., and Borrajo, D., “An Integrated
Approach of Learning, Planning, and Execution,” Journal
of Intelligent and Robotic Systems 29(1): 47-78, 2000.
[32] Jonathan Gratch and Randall W. Hill, Jr., “Continuous
Planning and Collaboration for Command and Control in
Joint Synthetic Battlespaces”, in Proceedings of the
Eighth Conference on Computer Generated Forces and
Behavioral Representation, 1999.
[33] Green, N. and Lehman, J.F., “An Integrated Discourse
Recipe-Based Model for Task-Oriented Dialogue.”
Discourse Processes, 33(2), 2002.
[34] Green, N., and Lehman, J.F., “Compiling knowledge for
dialogue generation and interpretation.” Tech rep CMUCS-96-175, School of Computer Science, Carnegie
Mellon University, 1996.
[35] T. Henderson, E. Shilcrat, Logical sensor systems, Journal
of Robotic Systems, 1(2), pp.169-193, 1984.
[36] Huang, Y.; Kautz, H.; and Selman, B., “Learning
Declarative Control Rules for Constraint-Based
Planning,” Paper presented at the Seventeenth
International Conference on Machine Learning, 29 June-2
July, Stanford, California, 2000.
[37] Laird, John E., Rosenbloom, Paul S., and Newell, Allen,
“Towards Chunking as a General Learning Mechanism,”
AAAI-84, 1984.
[38] Laird, J.E., Newell, A. and Rosenbloom, P.S., “Soar: An
Architecture for General Intelligence”, Artificial
Intelligence 33, pp.1-64, 1987.
Version of July 30, 2003
9
[39] Lallement, Gerard, Semigroups and Combinatorial
Applications, Wiley & Sons, 1979.
[40] Larsson, S., Ljunglöf, P., Cooper, R., Engdahl, E., and
Ericsson, S., “GoDiS—an Accommodating Dialogue
System”, In Proceedings of ANLP/NAACL-2000
Workshop on Conversational Systems, Seattle, 2000.
[41] K. F. Lee, “Automatic Speech Recognition: The
Development of the SPHINX SYSTEM,” Kluwer
Academic Publishers, Boston, 1989.
[42]Lehman, J., VanDyke, J., and Rubinoff, R., “Natural
language processing for IFORs: comprehension and
generation in the air combat domain”, In Proceedings of
the Fifth Conference on Computer Generated Forces and
Behavioral Representation, 1995.
[43] Lehman, J., VanDyke, J. Lonsdale, D., Green, N., and
Smith, M., “A Building Block Approach to a Unified
Language Capability,” Unpublished manuscript, 1997.
[44] Lehman, J.F., Laird, J.E., and Rosenbloom, P.S., “A
Gentle Introduction to Soar, an Architecture for Human
Cognition,” Invitation to Cognitive Science, vol. 4,
Sternberg & Scarborough (eds.), MIT Press. 1995.
[45] Lehman, J.F., Newell, A.N., Polk, T., and Lewis, R.L.,
“The role of language in cognition,” In Conceptions of the
Human Mind, Lawrence Erlbaum Associates, Inc., 1993.
[46] Lewis, Richard L., “An Architecturally-based Theory of
Human Sentence Comprehension,” Ph.D. thesis, Carnegie
Mellon University, 1993.
[47] D.Long and M.Fox and M.Hamdi, “Reformulation in
Planning,” SARA 2002 (Springer Verlag LNCS series
volume), 2002.
[48] Lonsdale, Deryle,
“Modeling Cognition in SI:
Methodological Issues,” International journal of research
and practice in interpreting, Vol. 2, no. 1/2, pages 91117; John Benjamins Publishing Company, Amsterdam,
Netherlands, 1997.
[49] Lonsdale, Deryle, “Leveraging Analysis Operators in
Incremental Generation,” Analysis for Generation:
Proceedings of a Workshop at the First International
Natural Language Generation Conference, pp. 9-13;
Association for Computational Linguistics; New
Brunswick, NJ, 2000.
[50] Lonsdale, Deryle, “An Operator-based Integration of
Comprehension and Production,” LACUS Forum XXVII,
pages 123–132, Linguistic Association of Canada and the
United States, 2001.
[51] Lonsdale, Deryle and C. Anton Rytting, “An Operatorbased Account of Semantic Processing,” The Acquisition
and Representation of Word Meaning; pp. 84-92;
European Summer School for Logic, Language and
Information, Helsinki, 2001.
[52] Lonsdale and C. Anton Rytting, “Integrating WordNet
with NL-Soar,” WordNet and other lexical resources:
Applications, extensions, and customizations; Proceedings
of NAACL-2001; Association for Computational
Linguistics, 2001.
[53] Deryle Lonsdale, “Modeling cognition in SI:
Methodological issues. International Journal of Research
and Practice in Interpreting”, 2(1/2): 91–117, 1997.
[54] Lueth, T.C., Laengle, T., Herzog, G., Stopp, E., and
Rembold, U., “KANTRA: Human-Machine Interaction
for Intelligent Robots Using Natural Language,” In
Cognitive Robots
Proceedings of the 3rd International Workshop on Robot
and Human Communication. Nagoya, Japan, 1994.
[55] Lyons, D.M. and Hendriks, A., “Planning as Incremental
Adaptation of a Reactive System”, Journal of Robotics &
Autonomous Systems 14, 1995, pp.255-288.
[56] Lyons, D.M. and Hendriks, A., “Exploiting Patterns of
Interaction to Select Reactions”, Special Issue on
Computational Theories of Interaction,
Artificial
Intelligence 73, 1995, pp.117-148.
[57] Lyons, D.M., “Representing and Analysing Action Plans
as Networks of Concurrent Processes”, IEEE Transactions
on Robotics and Automation, June 1993.
[58] Lyons, D.M. and Arbib, M.A., “A Formal Model of
Computation for Sensory-based Robotics”, IEEE
Transactions on Robotics and Automation 5(3), Jun.
1989.
[59] Lyons, D., and Arkin, R.C., “Towards Performance
Guarantees for Emergent Behavior”, (Submitted) IEEE
Int. Conf. on Robotics and Automation, New Orleans LA,
April 2004.
[60] D.M. Lyons, and A.J. Hendriks, Reactive Planning.
Encyclopedia of Artificial Intelligence, 2nd Edition, Wiley
& Sons, December, 1991.
[61] Stacy Marsella, Jonathan Gratch and Jeff Rickel,
"Expressive Behaviors for Virtual Worlds," Life-like
Characters Tools, Affective Functions and Applications,
Helmut Prendinger and Mitsuru Ishizuka (Editors),
Springer Cognitive Technologies Series, 2003.
[62] Miller, C. S. (1993), “Modeling Concept Acquisition in
the Context of a Unified Theory of Cognition”, EECS,
Ann Arbor, University of Michigan.
[63] Murphy, T., and Lyons, D., “Combining Direct and
Model-Based Perceptual Information Through Schema
Theory”, CIRA97 , July 1997.
[64] Nelson, G., Lehman, J.F., and John, B.E., “Experiences in
interruptible language processing,” Proceedings of the
AAAI Spring Symposium on Active Natural Language
Processing, Stanford, CA,21-23 March,1994.
[65] Nelson, G., Lehman, J.F., and John, B.E., “Integrating
cognitive capabilities in a real-time task,” In Proceedings
of the Sixteenth Annual Conference of the Cognitive
Science Society. Atlanta, GA, August 1994.
[66] Newell, Allen, Unified Theories of Cognition, Harvard
University Press, Cambridge, Massachusetts, 1990.
[67] M. Nicolescu and M. Mataric, “Extending Behavior-based
System Capabilities
Using an Abstract Behavior
Representation”, Working Notes of the AAAI Fall
Symposium on Parallel Cognition, pages 27-34, North
Falmouth, MA, November 3-5, 2000.
[68] Petrich, Mario, Inverse Semigroups, John Wiley & Sons,
Inc., New York, 1984.
[69] Rees, Rebecca, “Investigating Dialogue Managers:
Building and Comparing FSA Models to BDI
Architectures, and the Advantages to Modeling Human
Cognition in Dialogue,” Honors Thesis, BYU Physics
Department, 2002.
[70] P. S. Rosenbloom, J. E. Laird, and A. Newell, The SOAR
Papers . MIT Press, 1993.
[71] Rosenbloom, P.S., Johnson, W.L., Jones, R.M., Koss, F.,
Laird, J.E., Lehman, J.F., Rubinoff, R., Schwamb, K.B.,
and Tambe, M., “Intelligent Automated Agents for
Version of July 30, 2003
10
Tactical Air Simulation: A Progress Report”, Proceedings
of the Fourth Conference on Computer Generated Forces
and Behavioral Representation, pp.69-78, 1994.
[72] Rubinoff, R. and Lehman, J. F., “Natural Language
Processing in an IFOR Pilot,” Proceedings of the Fourth
Conference on Computer Generated Forces and
Behavioral Representation, Orlando, FL., 1994.
[73]
C.
Anton
Rytting
and
Deryle
Lonsdale.
IntegratingWordNet with NL-Soar. In WordNet and other
lexical resources: Applications, extensions, and
customizations, pages 162–164. North American
Association for Computational Linguistics, 2001.
[74] S. Schneider, Concurrent and Real-time Systems: The
CSP Approach. Wiley, 1999.
[75] M. Senechal, Quasicrystals and Geometry, Cambridge
University Press, 1995.
[76] Spiliotopoulos, D., Androutsopoulos, I., and Spyropoulos,
C. D., “Human-Robot Interaction Based on Spoken
Natural Language Dialogue,” In Proceedings of the
European Workshop on Service and Humanoid Robots
(ServiceRob ‘2001), Santorini, Greece, 25-27 June 2001.
[77] Martha Steenstrup, Michael A. Arbib, Ernest G. Manes,
Port Automata and the Algebra of Concurrent Processes.
JCSS 27(1): 29-50, 1983.
[78] Steinhardt, Paul J., New Perspectives on Forbidden
Symmetries, Quasicrystals, and Penrose Tilings,
Proceedings of the National Academy of Science USA,
Vol. 93, pp. 14267-14270, December 1996.
[79] Van Dyke, J. and Lehman, J.F., “An architectural account
of errors in foreign language learning,” In Proceedings of
the Nineteenth Annual Conference of the Cognitive
Science Society, 1997.
[80] Manuela M. Veloso, Jaime Carbonell, M. Alicia Perez,
Daniel Borrajo, Eugene Fink, and Jim Blythe,
“Integrating planning and learning: The Prodigy
architecture,” Journal of Experimental and Theoretical
Artificial Intelligence, 7(1):81--120, 1995.
[81] D. Vrakas, G. Tsoumakas, N. Bassiliades and I. Vlahavas,
"Learning Rules for Adaptive Planning ", In Proceedings
of the 13th International Conference on Automated
Planning and Scheduling, ICAPS 03, 2003.
[82] Wray, R. E., and Chong, R. S., Explorations of
quantitative category learning with Symbolic Concept
Acquisition. 5th International Conference on Cognitive
Modeling (ICCM), Bamberg, Germany, 2003.
[83] Y.Xu, P.Duygulu, E.Saber, M.Tekalp, F.T. Yarman Vural,
“Object Based Image Retrieval Based On MultiLevel
Segmentation”, IEEE International Conference on
Acoustics, Speech, and Signal Processing (ICASSP2000),
June 2000.
Cognitive Robots
Download