Machine Consciousness in CiceRobot, a Museum Guide Robot

Machine Consciousness in CiceRobot, a Museum Guide Robot
Irene Macaluso and Antonio Chella
Dipartimento di Ingegneria Informatica, Università di Palermo
Viale delle Scienze, 90128, Palermo, Italy
Abstract
The paper discusses a model of robot perceptual awareness based on a comparison process between the effective and the expected robot input sensory data generated
by a 3D robot/environment simulator. The paper contributes to the machine consciousness research field by
testing the added value of robot perceptual awareness on
an effective robot architecture implemented on an operating autonomous robot RWI B21 offering guided tours
at the Archaeological Museum of Agrigento, Italy.
Introduction
An autonomous robot operating in real and unstructured environments interacts with a dynamic world populated with
objects, people, and in general, other agents: people and
agents may change their position and identity during time,
while objects may be moved or dropped. In order to work
properly, the robot should be able to pay attention to relevant entities in the environment, to choose its own goals and
motivations, and to decide how to reach them. We claim that
the robot, in order to properly move and act in complex and
dynamic environment should have some form of perceptual
awareness of the surrounding environment.
Taking into account several results from neuroscience,
psychology and philosophy, summarized in the next Sect.,
we hypothesize that at the basis of the robot perceptual
awareness there is a continuous comparison process between
the expectation of the perceived scene obtained by a projection of the 3D reconstruction of the scene, and the effective
scene coming from the sensory input. The paper contributes
to the machine consciousness research field (Chella & Manzotti 2007) by implementing and testing the proposed robot
perceptual awareness model on an effective robot architecture implemented on an operating autonomous robot RWI
B21 offering guided tours at the Archaeological Museum of
Agrigento (Fig. 1).
Figure 1: CiceRobot at the Archaeological Museum of Agrigento.
sponses generated by the body in reaction to external stimuli. They refers to the subject, they are about what is happening to me. Perceptions are mental representations related
to something outside the subject. They are about what is
happening out there. Sensations and perceptions are two
separate channels; a possible interaction between the two
channels is that the perception channel may be recoded in
terms of sensations and compared with the effective stimuli
from the outside, in order to catch and avoid perceptual errors. This process is similar to the echoing back to source
strategy for error detection and correction.
(Gärdenfors 2004b) discusses the role of simulators related to sensations and perceptions. He claims that sensations are immediate sensory impressions, while perceptions
are built by simulators of the external world. A simulator
receives as input the sensations coming from the external
world, it fills the gaps and it may also add new information
in order to generate perceptions. The perception of an object
is therefore more rich and expressive than the corresponding
sensation. In Gärdenfors terms, perceptions are sensations
that are reinforced with simulations.
The role of simulators in motor control has been extensively analyzed from the neuroscience point of view, see
(Wolpert, Doya, & Kawato 2003) for a review. In this line,
Theoretical remarks
Analyzing the perceptual awareness from an evolutionary
point of view, (Humphrey 1992) makes a distinction between sensations and perceptions. Sensations are active rec 2007, Association for the Advancement of Artificial
Copyright Intelligence (www.aaai.org). All rights reserved.
90
(Grush 2004) proposes several cognitive architectures based
on simulators (emulators in Grush terms). The basic architecture is made up by a feedback loop connecting the controller, the plant to be controlled and a simulator of the plant.
The loop is pseudo-closed in the sense that the feedback signal is not directly generated by the plant, but by the simulator of the plant, which parallels the plant and it receives as
input the efferent copy of the control signal sent to the plant.
A more advanced architecture proposed by Grush and inspired to the work of (Gerdes & Happee 1994) takes into account the basic schema of the Kalman filter (Haykin 2001).
In this case, the residual correction generated by the comparison between the effective plant output and the simulator
output are sent again to the simulator via the Kalman gain.
In turns, the simulator sends its inner variables as feedback
to the controller. According to (Gärdenfors 2004a), the simulator inner variables are more expressive that rough plant
outputs and they may contain also information not directly
perceived by the system, as the occurring forces in the perceived scene, or the object-centred parameters, or the variables employed in causal reasoning.
(Grush 1995) also discusses the adoption of neural networks to learn the operations of the simulators, while (Oztop, Wolpert, & Kawato 2005) propose more sophisticated
learning techniques of simulators based on inferences of the
theory of mind of others.
The hypothesis that the content of perceptual awareness
is the output of a comparator system is in line with the Behavioural Inhibition System (BIS) discussed by (Gray 1995)
starting from neuropsychological analysis. From a neurological point of view, (Llinas 2001) hypothesizes that the
CNS is a reality-emulating system and the role of sensory
input is to characterize the parameters of the emulation. He
also discusses (Llinas & Pare 1991) the role of this loop during dreaming activity.
An early implementation of a robot architecture based on
simulators is due to (Mel 1986). He proposed a simulated
robot moving in an environment populated with simple 3D
objects. The robot is controlled by a neural network that
learns the aspects of the objects and their relationships with
the corresponding motor commands. It becomes able to simulate and to generate expectations about the expected object views according to the motor commands; i.e., the robot
learns to generate expectations of the external environment.
A successive system is MURPHY (Mel 1990) in which a
neural network controls a robot arm. The system is able to
perform off-line planning of the movements by means of a
learned internal simulator of the environment.
Other early implementation of robots operating with internal simulators of the external environment are: MetaToto
(Stein 1991), and the internalized plan architecture (Payton
1990). In both systems, a robot builds an inner model on the
environment reactively explored by simulated sensorimotor
actions in order to generate action plans.
An effective robot able to build an internal model of the
environment has been proposed by (Holland & Goodman
2003). The system is based on a neural network that controls a Khepera minirobot and it is able to build a model of
environment and to simulate perceptual activities in a sim-
Figure 2: The robot architecture.
plified environment. Following the same principles, (Holland, Knight, & Newcombe 2007) describe the successive
robot CRONOS, a complex anthropomimetic robot whose
operations are controlled by SIMNOS, a 3D simulator of the
robot itself and its environment.
Robot architecture
The robot architecture proposed in this paper is inspired by
the work of Grush previously presented and it is based on
an internal 3D simulator of the robot and the environment
world (Fig. 2). The Robot block is the robot itself and it
is equipped with motors and a video camera. It is modelled
as a block that receives in input the motor commands M
and it sends in output the robot sensor data S, i.e., the scene
acquired by the robot video camera.
The Controller block controls the actuators of the robot
and it sends the motor commands M to the robot. The robot
moves according to M and its output is the 2D pixel matrix
S corresponding to the scene image acquired by the robot
video camera.
At the same time, an efferent copy of the motor commands
is sent to the 3D Robot/Environment simulator. The simulator is a 3D reconstruction of the robot environment with
the robot itself. It is an object-centred representation of the
world in the sense of (Marr 1982).
The simulator receives in input the controller motor command M and it simulates the corresponding motion of the
robot in the 3D simulated environment by generating a cloud
of possible robot positions {xm } according to a suitable
probability distribution that takes into account the motor
command, the noise, the faults of the controllers, the slippery of the wheels, and so on. For each possible position
xm , a 2D image Sm is generated as a projection of the simulated scene acquired by the robot in the hypothesized position. Therefore, the output of the simulator is the set {Sm }
of expected 2D images. Both S and Sm are viewer-centred
representations in Marr’s terms.
The acquired and the expected image scenes are then
compared by the comparator block c and the resulting error
ε is sent back to the simulator to align the simulated robot
with the real robot (see Sect. ). At the same time, the simulator send back all the relevant 3D information P about the
robot position and its environment to the controller, in order
to adjust the motor plans.
We claim that the perceptual awareness of the robot is the
process based on the acquisition of the sensory image S, the
generation of hypotheses {Sm } and their comparison via the
91
comparator block in order to build the vector P that contains
all the relevant 3D information of what the robot perceives
at a given instant. In this sense, P is the full interpretation
of robot sensory data by means of the 3D simulator.
Planning by expectations
The proposed framework for robot perceptual awareness
may be employed to allow the robot to imagine its own
sequences of actions (Gärdenfors 2007). In this perspective, planning may be performed by taking advantage from
the representations in the 3D robot/environment simulator.
Note that we are not claiming that all kinds of planning must
be performed within a simulator, but the forms of planning
that are more directly related to perceptual information can
take great advantage from perception in the described framework.
As previously stated, P is the perception of a situation of
the world out there at time t. The simulator, by means of its
simulation engine based on expectations (see below), is able
to generate expectations of P at time t + 1, i.e., it is able
to simulate the robot action related with motor command M
generated by the controller and the relationship of the action
with the external world.
It should be noticed that in the described framework, the
preconditions of an action can be simply verified by geometric inspections of P at time t, while in the STRIPS-like
planners (Fikes & Nilsson 1971) the preconditions are verified by means of logical inferences on symbolic assertions.
Also the effects of an action are not described by adding or
deleting symbolic assertions, as in STRIPS, but they can be
easily described by the situation resulting from the expectations of the execution of the action itself in the simulator,
i.e., by considering the expected perception P at time t + 1.
We take into account two main sources of expectations.
On the one side, expectations are generated on the basis of
the structural information stored in a symbolic KB which is
part of the simulator. We call linguistic such expectations.
As soon as a situation is perceived which is the precondition of a certain action, then the symbolic description elicit
the expectation of the effect situation, i.e., it generates the
expected perception P at time t + 1.
On the other side, expectations could also be generated
by a purely Hebbian association between situations. Suppose that the robot has learnt that when it sees somebody
pointing on the right, it must turn in that direction. The system learns to associate these situations and to perform the
related action. We call associative this kind of expectations.
In order to explain the planning by expectation mechanism, let us suppose that the robot has perceived the current
situation P0 , e.g., it is in a certain position of a room. Let
us suppose that the robot knows that its goal g is to be in
a certain position of another room with a certain orientation. A set of expected perceptions P1 , P2 , . . . of situations
is generated by means of the interaction of both the linguistic
and the associative modalities described above. Each Pi in
this set can be recognized to be the effect of some action related with a motor command Mj in a set of possible motor
commands M1 , M2 , . . . where each action (and the corre-
Figure 3: The map of the Sala Giove of the Museum.
sponding motor command) in the set is compatible with the
perception P0 of the current situation.
The robot chooses a motor command Mj according to
some criteria; e.g., it may be the action whose expected effect has the minimum Euclidean distance from the goal, or it
may be the action that maximizes the utility value of the expected effect. Once that the action to be performed has been
chosen, the robot can imagine to execute it by simulating
its effects in the 3D simulator then it may update the situation and restart the mechanism of generation of expectations
until the plan is complete and ready to be executed.
Linguistic expectations are the main source of deliberative robot plans: the imagination of the effect of an action is
driven by the description of the action in the simulator KB.
This mechanism is similar to the selection of actions in deliberative forward planners. Associative expectations are at
the basis of a more reactive form of planning: in this latter case, perceived situations can immediately recall some
expected effect of an action.
Both modalities contribute to the full plan that is imagined by the robot when it simulates the plan by means of the
simulator. When the robot terminates the generation of the
plan and of its actions, it can generate judgments about its
actions and, if necessary, imagine alternative possibilities.
Robot at work
The presented framework has been implemented in CiceRobot, an autonomous robot RWI B21 equipped with sonar,
laser rangefinder and a video camera mounted on a pan tilt.
The robot has been employed as a museum tour guide operating at the Archaeological Museum of Agrigento, Italy
offering guided tours in the Sala Giove of the museum (Fig.
3). A first session of experimentations, based on a previous
version of the architecture, has been carried out from January to June 2005 and the results are described in (Chella,
Frixione, & Gaglio 2005) and in (Macaluso et al. 2005).
The second session, based on the architecture described in
this paper, started in March and ended in July 2006.
The task of museum guide is considered a significant case
study (Burgard et al. 1999) because it concerns perception,
self perception, planning and human-robot interactions. The
task is therefore relevant as a test bed for robot perceptual
92
Figure 4:
The object centered
robot/environment simulator.
view
from
Figure 6: The 2D image output of the robot video camera
(left) and the corresponding image generated by the simulator (right).
3D
Figure 5: The viewer centered image from the robot point of
view.
Figure 7: The operation of the particle filter. The initial distribution of expected robot positions (left), and the cluster of
winning expected positions.
awareness. It can be divided in many subtasks operating in
parallel, and at the same time at the best of the robot capabilities.
Fig. 4 shows the object-centred view from the 3D
robot/environment simulator. As previously described, the
task of the block is to generate the expectations of the interactions between the robot and the environment at the basis
of robot perceptual awareness. It should be noted that the
robot also simulates itself and its relationships with its environment. Fig. 5 shows a 2D image generated from the 3D
simulator from the robot point of view.
In order to keep the simulator aligned with the external
environment, the simulator engine is equipped with a particle filter algorithm (Thrun, Burgard, & Fox 2005). As discussed in the previous Sect., the simulator hypothesizes a
cloud {xm } of expected possible positions of the robot. For
each expected position, the corresponding expected image
scene Sm is generated, as in Fig. 6 (left). The comparator
thus generates the error measure ε between each of the expected images and the effective image scene S as in Fig. 6
(right). The error ε weights the expected position under consideration; in subsequent steps, only the winning expected
positions that received the higher weights are taken, while
the other ones are dropped.
Fig. 7 (left) shows the initial distribution of expected
robot position and Fig. 7 (right) shows the small cluster of
winning positions. Now the simulator receives a new motor command M related with the chosen action and, starting
from the winning hypotheses, it generates a new set of hypothesized robot positions.
To compute the importance weight ε of each hypothesis,
we developed a measurement model that incorporates the
3D representation provided by the simulator. The images
Sm , generated by the simulator in correspondence to each
hypothesized robot position, and the image S, acquired by
the robot camera, are both processed to obtain the VE images, in which vertical edges are outlined. For each VE
image a vector s is computed:
si =
VEij
(1)
j
Each vector s can be considered as a set of samples drawn
from an unknown distribution. To estimate such a probability density function, we adopted the Parzen-window formula:
pk (x) =
1 1 x − si
δ(
)
|s| i h
h
(2)
where the window function δ is the normal distribution.
Then we adopted the Kullback-Leibler distance:
ps (x)
dx.
(3)
d(p, ps ) = − p(x)ln
p(x)
as measure of the dissimilarity between the distribution p
corresponding to the image S acquired by the robot camera
and each of the distributions ps corresponding to the images
Sm generated by the simulator.
93
(a)
(b)
(c)
(d)
(e)
(f)
(a)
(b)
Figure 9: (a) The simulated image corresponding to the hypothesis with the highest weight. (b) A simulated image corresponding to an hypothesis with a lesser weight. (c)-(d)
Vertical Edge maps of the images shown in (a) and (b). (e)(f) Distributions of the random variables representing vertical edges.
(c)
Figure 8: (a) Image acquired by robot camera. (b) Vertical
Edge image. (c) Distribution of the random variable representing vertical edges.
The importance weight ε of the hypothesized robot position xm is the inverse of the Kullback-Leibler distance between the image Sm generated in simulation and the real
image S acquired by the robot.
Fig. 8 (a) shows the image S obtained by the robot video
camera, the corresponding image VE (b), and the distribution of the random variable representing the vertical edges
computed according to Eq. 2 (c). Fig. 9 shows two example images generated by the simulator. Fig. 9 (a),(c) and
(e) correspond to the hypothesis with the highest weight, i.e.
the least Kullback-Leibler distance with respect to the camera generated distribution. Fig. 9(b),(d) and (f) correspond
to an hypothesis with a lesser weight.
Fig. 10 shows the operation of the robot during the tour
guide. The Fig. shows the map build up by state-of-art laserbased robotic algorithms. By comparing this Fig. with the
real map of the Sala Giove in Fig. 3, it should be noticed
that the museum armchairs are not visible to laser. However,
thanks to the perceptual process, the robot is able to integrate
laser data and video camera sensory data in a stable inner
model of the environment (Franklin 2005) and to correctly
move and act in the museum environment (Fig. 10).
Figure 10: The operation of the robot equipped with the architecture.
94
Discussions and conclusions
Grush, R. 2004. The emulator theory of representation:
motor control, imagery and perception. Behavioral and
Brain Sciences 27:377 – 442.
Haykin, S., ed. 2001. Kalman Filtering and Neural Networks. New York: J. Wiley and Sons.
Holland, O., and Goodman, R. 2003. Robots with internal
models - a route to machine consciousness? Journal of
Consciousness Studies 10(4 -5):77 – 109.
Holland, O.; Knight, R.; and Newcombe, R. 2007. A
robot-based approach to machine consciousness. In Chella,
A., and Manzotti, R., eds., Artificial Consciousness. Exeter,
UK: Imprint Academic. 156 – 173.
Humphrey, N. 1992. A history of mind. New York, NY:
Springer-Verlag.
Llinas, R., and Pare, D. 1991. Of dreaming and wakefulness. Neuroscience 44(3):521 – 535.
Llinas, R. 2001. I of the Vortex: From Neurons to Self.
Cambridge, MA: MIT Press.
Macaluso, I.; Ardizzone, E.; Chella, A.; Cossentino, M.;
Gentile, A.; Gradino, R.; Infantino, I.; Liotta, M.; Rizzo,
R.; and Scardino, G. 2005. Experiences with CiceRobot,
a museum guide cognitive robot. In Bandini, S., and Manzoni, S., eds., AI*IA 2005, volume 3673 of Lecture Notes
in Artificial Intelligence, 474 – 482. Berlin Heidelberg:
Springer-Verlag.
Marr, D. 1982. Vision. New York: W.H. Freeman and Co.
Mel, B. 1986. A connectionist learning model for 3-D
mental rotation, zoom, and pan. In Proceedings of the Eigth
Annual Conference of the Cognitive Science Society, 562 –
571.
Mel, B. 1990. Connectionist robot motion planning:
A neurally-inspired approach to visually-guided reaching.
Cambridge, MA: Academic Press.
Oztop, E.; Wolpert, D.; and Kawato, M. 2005. Mental state
inference using visual control parameters. Cognitive Brain
Research 22:129 – 151.
Payton, D. 1990. Internalized plans: A representation
for action resources. Robotics and Autonomous Systems
6(1):89 – 103.
Stein, L. 1991. Imagination and situated cognition. Technical Report AIM-1277, MIT AI Lab.
Thrun, S.; Burgard, W.; and Fox, D. 2005. Probabilistic
Robotics. Cambridge, MA: MIT Press.
Wolpert, D.; Doya, K.; and Kawato, M. 2003. A unifying computational framework for motor control and social
interaction. Phil. Trans. Roy. Soc. B 358:593–602.
The described robot perceptual process is an active process,
since it is based on a reconstruction of the inner percepts in
egocentric coordinates, but it is also driven by the external
flow of information. It is the place in which a global consistency is checked between the internal model and the visual
data coming the sensors. Any discrepancy asks for a readjustment of its internal model.
The robot perceptual awareness is therefore based on a
stage in which two flows of information, the internal and
the external, compete for a consistent match. There a
strong analogy with the phenomenology in human perception: when one perceives the objects of a scene he actually
experiences only the surfaces that are in front of him, but at
the same time he builds an interpretation of the objects in
their whole shape.
We maintain that the proposed system is a good starting
point to investigate robot phenomenology. As described in
the paper it should be remarked that a robot equipped with
perceptual awareness performs complex tasks as museum
tours, because of its inner stable perception of itself and of
its environment.
References
Burgard, W.; Cremers, A.; Fox, D.; Hähnel, D.; Lakemeyer, G.; Schulz, D.; Steiner, W.; and Thrun, S. 1999.
Experiences with and interactive museum tour-guide robot.
Artif. Intell. 114:3–55.
Chella, A., and Manzotti, R., eds. 2007. Artificial Consciousness. Exeter, UK: Imprint Academic.
Chella, A.; Frixione, M.; and Gaglio, S. 2005. Planning
by imagination in CiceRobot, a robot for museum tours. In
Proceedings of the AISB 2005 Symposium on Next Generation approaches to Machine Consciousness: Imagination,
Development, Intersubjectivity, and Embodiment, 40 – 49.
Fikes, R., and Nilsson, N. 1971. STRIPS: a new approach
to the application of theorem proving to problem solving.
Artif. Intell. 2(3-4):189–208.
Franklin, S. 2005. Evolutionary pressures and a stable
world for animals and robots: A commentary on merkel.
Consciousness and Cognition 14:115 – 118.
Gärdenfors, P. 2004a. Emulators as sources of hidden cognitive variables. Behavioral and Brain Sciences 27(3):403.
Gärdenfors, P. 2004b. How Homo Became Sapiens: On
the Evolution of Thinking. Oxford, UK: Oxford University
Press.
Gärdenfors, P. 2007. Mind-reading as control theory. European Review 15(2):223 – 240.
Gerdes, V., and Happee, R. 1994. The use of an internal representation in fast goal-directed movements: a modelling approach. Biological Cybernetics 70(6):513 – 524.
Gray, J. 1995. The contents of consciousness: A neuropsychological conjecture. Behavioral and Brain Sciences
18(4):659 – 722.
Grush, R. 1995. Emulation and Cognition. Ph.D. Dissertation, University of California.
95