Machine Consciousness in CiceRobot, a Museum Guide Robot Irene Macaluso and Antonio Chella Dipartimento di Ingegneria Informatica, Università di Palermo Viale delle Scienze, 90128, Palermo, Italy Abstract The paper discusses a model of robot perceptual awareness based on a comparison process between the effective and the expected robot input sensory data generated by a 3D robot/environment simulator. The paper contributes to the machine consciousness research field by testing the added value of robot perceptual awareness on an effective robot architecture implemented on an operating autonomous robot RWI B21 offering guided tours at the Archaeological Museum of Agrigento, Italy. Introduction An autonomous robot operating in real and unstructured environments interacts with a dynamic world populated with objects, people, and in general, other agents: people and agents may change their position and identity during time, while objects may be moved or dropped. In order to work properly, the robot should be able to pay attention to relevant entities in the environment, to choose its own goals and motivations, and to decide how to reach them. We claim that the robot, in order to properly move and act in complex and dynamic environment should have some form of perceptual awareness of the surrounding environment. Taking into account several results from neuroscience, psychology and philosophy, summarized in the next Sect., we hypothesize that at the basis of the robot perceptual awareness there is a continuous comparison process between the expectation of the perceived scene obtained by a projection of the 3D reconstruction of the scene, and the effective scene coming from the sensory input. The paper contributes to the machine consciousness research field (Chella & Manzotti 2007) by implementing and testing the proposed robot perceptual awareness model on an effective robot architecture implemented on an operating autonomous robot RWI B21 offering guided tours at the Archaeological Museum of Agrigento (Fig. 1). Figure 1: CiceRobot at the Archaeological Museum of Agrigento. sponses generated by the body in reaction to external stimuli. They refers to the subject, they are about what is happening to me. Perceptions are mental representations related to something outside the subject. They are about what is happening out there. Sensations and perceptions are two separate channels; a possible interaction between the two channels is that the perception channel may be recoded in terms of sensations and compared with the effective stimuli from the outside, in order to catch and avoid perceptual errors. This process is similar to the echoing back to source strategy for error detection and correction. (Gärdenfors 2004b) discusses the role of simulators related to sensations and perceptions. He claims that sensations are immediate sensory impressions, while perceptions are built by simulators of the external world. A simulator receives as input the sensations coming from the external world, it fills the gaps and it may also add new information in order to generate perceptions. The perception of an object is therefore more rich and expressive than the corresponding sensation. In Gärdenfors terms, perceptions are sensations that are reinforced with simulations. The role of simulators in motor control has been extensively analyzed from the neuroscience point of view, see (Wolpert, Doya, & Kawato 2003) for a review. In this line, Theoretical remarks Analyzing the perceptual awareness from an evolutionary point of view, (Humphrey 1992) makes a distinction between sensations and perceptions. Sensations are active rec 2007, Association for the Advancement of Artificial Copyright Intelligence (www.aaai.org). All rights reserved. 90 (Grush 2004) proposes several cognitive architectures based on simulators (emulators in Grush terms). The basic architecture is made up by a feedback loop connecting the controller, the plant to be controlled and a simulator of the plant. The loop is pseudo-closed in the sense that the feedback signal is not directly generated by the plant, but by the simulator of the plant, which parallels the plant and it receives as input the efferent copy of the control signal sent to the plant. A more advanced architecture proposed by Grush and inspired to the work of (Gerdes & Happee 1994) takes into account the basic schema of the Kalman filter (Haykin 2001). In this case, the residual correction generated by the comparison between the effective plant output and the simulator output are sent again to the simulator via the Kalman gain. In turns, the simulator sends its inner variables as feedback to the controller. According to (Gärdenfors 2004a), the simulator inner variables are more expressive that rough plant outputs and they may contain also information not directly perceived by the system, as the occurring forces in the perceived scene, or the object-centred parameters, or the variables employed in causal reasoning. (Grush 1995) also discusses the adoption of neural networks to learn the operations of the simulators, while (Oztop, Wolpert, & Kawato 2005) propose more sophisticated learning techniques of simulators based on inferences of the theory of mind of others. The hypothesis that the content of perceptual awareness is the output of a comparator system is in line with the Behavioural Inhibition System (BIS) discussed by (Gray 1995) starting from neuropsychological analysis. From a neurological point of view, (Llinas 2001) hypothesizes that the CNS is a reality-emulating system and the role of sensory input is to characterize the parameters of the emulation. He also discusses (Llinas & Pare 1991) the role of this loop during dreaming activity. An early implementation of a robot architecture based on simulators is due to (Mel 1986). He proposed a simulated robot moving in an environment populated with simple 3D objects. The robot is controlled by a neural network that learns the aspects of the objects and their relationships with the corresponding motor commands. It becomes able to simulate and to generate expectations about the expected object views according to the motor commands; i.e., the robot learns to generate expectations of the external environment. A successive system is MURPHY (Mel 1990) in which a neural network controls a robot arm. The system is able to perform off-line planning of the movements by means of a learned internal simulator of the environment. Other early implementation of robots operating with internal simulators of the external environment are: MetaToto (Stein 1991), and the internalized plan architecture (Payton 1990). In both systems, a robot builds an inner model on the environment reactively explored by simulated sensorimotor actions in order to generate action plans. An effective robot able to build an internal model of the environment has been proposed by (Holland & Goodman 2003). The system is based on a neural network that controls a Khepera minirobot and it is able to build a model of environment and to simulate perceptual activities in a sim- Figure 2: The robot architecture. plified environment. Following the same principles, (Holland, Knight, & Newcombe 2007) describe the successive robot CRONOS, a complex anthropomimetic robot whose operations are controlled by SIMNOS, a 3D simulator of the robot itself and its environment. Robot architecture The robot architecture proposed in this paper is inspired by the work of Grush previously presented and it is based on an internal 3D simulator of the robot and the environment world (Fig. 2). The Robot block is the robot itself and it is equipped with motors and a video camera. It is modelled as a block that receives in input the motor commands M and it sends in output the robot sensor data S, i.e., the scene acquired by the robot video camera. The Controller block controls the actuators of the robot and it sends the motor commands M to the robot. The robot moves according to M and its output is the 2D pixel matrix S corresponding to the scene image acquired by the robot video camera. At the same time, an efferent copy of the motor commands is sent to the 3D Robot/Environment simulator. The simulator is a 3D reconstruction of the robot environment with the robot itself. It is an object-centred representation of the world in the sense of (Marr 1982). The simulator receives in input the controller motor command M and it simulates the corresponding motion of the robot in the 3D simulated environment by generating a cloud of possible robot positions {xm } according to a suitable probability distribution that takes into account the motor command, the noise, the faults of the controllers, the slippery of the wheels, and so on. For each possible position xm , a 2D image Sm is generated as a projection of the simulated scene acquired by the robot in the hypothesized position. Therefore, the output of the simulator is the set {Sm } of expected 2D images. Both S and Sm are viewer-centred representations in Marr’s terms. The acquired and the expected image scenes are then compared by the comparator block c and the resulting error ε is sent back to the simulator to align the simulated robot with the real robot (see Sect. ). At the same time, the simulator send back all the relevant 3D information P about the robot position and its environment to the controller, in order to adjust the motor plans. We claim that the perceptual awareness of the robot is the process based on the acquisition of the sensory image S, the generation of hypotheses {Sm } and their comparison via the 91 comparator block in order to build the vector P that contains all the relevant 3D information of what the robot perceives at a given instant. In this sense, P is the full interpretation of robot sensory data by means of the 3D simulator. Planning by expectations The proposed framework for robot perceptual awareness may be employed to allow the robot to imagine its own sequences of actions (Gärdenfors 2007). In this perspective, planning may be performed by taking advantage from the representations in the 3D robot/environment simulator. Note that we are not claiming that all kinds of planning must be performed within a simulator, but the forms of planning that are more directly related to perceptual information can take great advantage from perception in the described framework. As previously stated, P is the perception of a situation of the world out there at time t. The simulator, by means of its simulation engine based on expectations (see below), is able to generate expectations of P at time t + 1, i.e., it is able to simulate the robot action related with motor command M generated by the controller and the relationship of the action with the external world. It should be noticed that in the described framework, the preconditions of an action can be simply verified by geometric inspections of P at time t, while in the STRIPS-like planners (Fikes & Nilsson 1971) the preconditions are verified by means of logical inferences on symbolic assertions. Also the effects of an action are not described by adding or deleting symbolic assertions, as in STRIPS, but they can be easily described by the situation resulting from the expectations of the execution of the action itself in the simulator, i.e., by considering the expected perception P at time t + 1. We take into account two main sources of expectations. On the one side, expectations are generated on the basis of the structural information stored in a symbolic KB which is part of the simulator. We call linguistic such expectations. As soon as a situation is perceived which is the precondition of a certain action, then the symbolic description elicit the expectation of the effect situation, i.e., it generates the expected perception P at time t + 1. On the other side, expectations could also be generated by a purely Hebbian association between situations. Suppose that the robot has learnt that when it sees somebody pointing on the right, it must turn in that direction. The system learns to associate these situations and to perform the related action. We call associative this kind of expectations. In order to explain the planning by expectation mechanism, let us suppose that the robot has perceived the current situation P0 , e.g., it is in a certain position of a room. Let us suppose that the robot knows that its goal g is to be in a certain position of another room with a certain orientation. A set of expected perceptions P1 , P2 , . . . of situations is generated by means of the interaction of both the linguistic and the associative modalities described above. Each Pi in this set can be recognized to be the effect of some action related with a motor command Mj in a set of possible motor commands M1 , M2 , . . . where each action (and the corre- Figure 3: The map of the Sala Giove of the Museum. sponding motor command) in the set is compatible with the perception P0 of the current situation. The robot chooses a motor command Mj according to some criteria; e.g., it may be the action whose expected effect has the minimum Euclidean distance from the goal, or it may be the action that maximizes the utility value of the expected effect. Once that the action to be performed has been chosen, the robot can imagine to execute it by simulating its effects in the 3D simulator then it may update the situation and restart the mechanism of generation of expectations until the plan is complete and ready to be executed. Linguistic expectations are the main source of deliberative robot plans: the imagination of the effect of an action is driven by the description of the action in the simulator KB. This mechanism is similar to the selection of actions in deliberative forward planners. Associative expectations are at the basis of a more reactive form of planning: in this latter case, perceived situations can immediately recall some expected effect of an action. Both modalities contribute to the full plan that is imagined by the robot when it simulates the plan by means of the simulator. When the robot terminates the generation of the plan and of its actions, it can generate judgments about its actions and, if necessary, imagine alternative possibilities. Robot at work The presented framework has been implemented in CiceRobot, an autonomous robot RWI B21 equipped with sonar, laser rangefinder and a video camera mounted on a pan tilt. The robot has been employed as a museum tour guide operating at the Archaeological Museum of Agrigento, Italy offering guided tours in the Sala Giove of the museum (Fig. 3). A first session of experimentations, based on a previous version of the architecture, has been carried out from January to June 2005 and the results are described in (Chella, Frixione, & Gaglio 2005) and in (Macaluso et al. 2005). The second session, based on the architecture described in this paper, started in March and ended in July 2006. The task of museum guide is considered a significant case study (Burgard et al. 1999) because it concerns perception, self perception, planning and human-robot interactions. The task is therefore relevant as a test bed for robot perceptual 92 Figure 4: The object centered robot/environment simulator. view from Figure 6: The 2D image output of the robot video camera (left) and the corresponding image generated by the simulator (right). 3D Figure 5: The viewer centered image from the robot point of view. Figure 7: The operation of the particle filter. The initial distribution of expected robot positions (left), and the cluster of winning expected positions. awareness. It can be divided in many subtasks operating in parallel, and at the same time at the best of the robot capabilities. Fig. 4 shows the object-centred view from the 3D robot/environment simulator. As previously described, the task of the block is to generate the expectations of the interactions between the robot and the environment at the basis of robot perceptual awareness. It should be noted that the robot also simulates itself and its relationships with its environment. Fig. 5 shows a 2D image generated from the 3D simulator from the robot point of view. In order to keep the simulator aligned with the external environment, the simulator engine is equipped with a particle filter algorithm (Thrun, Burgard, & Fox 2005). As discussed in the previous Sect., the simulator hypothesizes a cloud {xm } of expected possible positions of the robot. For each expected position, the corresponding expected image scene Sm is generated, as in Fig. 6 (left). The comparator thus generates the error measure ε between each of the expected images and the effective image scene S as in Fig. 6 (right). The error ε weights the expected position under consideration; in subsequent steps, only the winning expected positions that received the higher weights are taken, while the other ones are dropped. Fig. 7 (left) shows the initial distribution of expected robot position and Fig. 7 (right) shows the small cluster of winning positions. Now the simulator receives a new motor command M related with the chosen action and, starting from the winning hypotheses, it generates a new set of hypothesized robot positions. To compute the importance weight ε of each hypothesis, we developed a measurement model that incorporates the 3D representation provided by the simulator. The images Sm , generated by the simulator in correspondence to each hypothesized robot position, and the image S, acquired by the robot camera, are both processed to obtain the VE images, in which vertical edges are outlined. For each VE image a vector s is computed: si = VEij (1) j Each vector s can be considered as a set of samples drawn from an unknown distribution. To estimate such a probability density function, we adopted the Parzen-window formula: pk (x) = 1 1 x − si δ( ) |s| i h h (2) where the window function δ is the normal distribution. Then we adopted the Kullback-Leibler distance: ps (x) dx. (3) d(p, ps ) = − p(x)ln p(x) as measure of the dissimilarity between the distribution p corresponding to the image S acquired by the robot camera and each of the distributions ps corresponding to the images Sm generated by the simulator. 93 (a) (b) (c) (d) (e) (f) (a) (b) Figure 9: (a) The simulated image corresponding to the hypothesis with the highest weight. (b) A simulated image corresponding to an hypothesis with a lesser weight. (c)-(d) Vertical Edge maps of the images shown in (a) and (b). (e)(f) Distributions of the random variables representing vertical edges. (c) Figure 8: (a) Image acquired by robot camera. (b) Vertical Edge image. (c) Distribution of the random variable representing vertical edges. The importance weight ε of the hypothesized robot position xm is the inverse of the Kullback-Leibler distance between the image Sm generated in simulation and the real image S acquired by the robot. Fig. 8 (a) shows the image S obtained by the robot video camera, the corresponding image VE (b), and the distribution of the random variable representing the vertical edges computed according to Eq. 2 (c). Fig. 9 shows two example images generated by the simulator. Fig. 9 (a),(c) and (e) correspond to the hypothesis with the highest weight, i.e. the least Kullback-Leibler distance with respect to the camera generated distribution. Fig. 9(b),(d) and (f) correspond to an hypothesis with a lesser weight. Fig. 10 shows the operation of the robot during the tour guide. The Fig. shows the map build up by state-of-art laserbased robotic algorithms. By comparing this Fig. with the real map of the Sala Giove in Fig. 3, it should be noticed that the museum armchairs are not visible to laser. However, thanks to the perceptual process, the robot is able to integrate laser data and video camera sensory data in a stable inner model of the environment (Franklin 2005) and to correctly move and act in the museum environment (Fig. 10). Figure 10: The operation of the robot equipped with the architecture. 94 Discussions and conclusions Grush, R. 2004. The emulator theory of representation: motor control, imagery and perception. Behavioral and Brain Sciences 27:377 – 442. Haykin, S., ed. 2001. Kalman Filtering and Neural Networks. New York: J. Wiley and Sons. Holland, O., and Goodman, R. 2003. Robots with internal models - a route to machine consciousness? Journal of Consciousness Studies 10(4 -5):77 – 109. Holland, O.; Knight, R.; and Newcombe, R. 2007. A robot-based approach to machine consciousness. In Chella, A., and Manzotti, R., eds., Artificial Consciousness. Exeter, UK: Imprint Academic. 156 – 173. Humphrey, N. 1992. A history of mind. New York, NY: Springer-Verlag. Llinas, R., and Pare, D. 1991. Of dreaming and wakefulness. Neuroscience 44(3):521 – 535. Llinas, R. 2001. I of the Vortex: From Neurons to Self. Cambridge, MA: MIT Press. Macaluso, I.; Ardizzone, E.; Chella, A.; Cossentino, M.; Gentile, A.; Gradino, R.; Infantino, I.; Liotta, M.; Rizzo, R.; and Scardino, G. 2005. Experiences with CiceRobot, a museum guide cognitive robot. In Bandini, S., and Manzoni, S., eds., AI*IA 2005, volume 3673 of Lecture Notes in Artificial Intelligence, 474 – 482. Berlin Heidelberg: Springer-Verlag. Marr, D. 1982. Vision. New York: W.H. Freeman and Co. Mel, B. 1986. A connectionist learning model for 3-D mental rotation, zoom, and pan. In Proceedings of the Eigth Annual Conference of the Cognitive Science Society, 562 – 571. Mel, B. 1990. Connectionist robot motion planning: A neurally-inspired approach to visually-guided reaching. Cambridge, MA: Academic Press. Oztop, E.; Wolpert, D.; and Kawato, M. 2005. Mental state inference using visual control parameters. Cognitive Brain Research 22:129 – 151. Payton, D. 1990. Internalized plans: A representation for action resources. Robotics and Autonomous Systems 6(1):89 – 103. Stein, L. 1991. Imagination and situated cognition. Technical Report AIM-1277, MIT AI Lab. Thrun, S.; Burgard, W.; and Fox, D. 2005. Probabilistic Robotics. Cambridge, MA: MIT Press. Wolpert, D.; Doya, K.; and Kawato, M. 2003. A unifying computational framework for motor control and social interaction. Phil. Trans. Roy. Soc. B 358:593–602. The described robot perceptual process is an active process, since it is based on a reconstruction of the inner percepts in egocentric coordinates, but it is also driven by the external flow of information. It is the place in which a global consistency is checked between the internal model and the visual data coming the sensors. Any discrepancy asks for a readjustment of its internal model. The robot perceptual awareness is therefore based on a stage in which two flows of information, the internal and the external, compete for a consistent match. There a strong analogy with the phenomenology in human perception: when one perceives the objects of a scene he actually experiences only the surfaces that are in front of him, but at the same time he builds an interpretation of the objects in their whole shape. We maintain that the proposed system is a good starting point to investigate robot phenomenology. As described in the paper it should be remarked that a robot equipped with perceptual awareness performs complex tasks as museum tours, because of its inner stable perception of itself and of its environment. References Burgard, W.; Cremers, A.; Fox, D.; Hähnel, D.; Lakemeyer, G.; Schulz, D.; Steiner, W.; and Thrun, S. 1999. Experiences with and interactive museum tour-guide robot. Artif. Intell. 114:3–55. Chella, A., and Manzotti, R., eds. 2007. Artificial Consciousness. Exeter, UK: Imprint Academic. Chella, A.; Frixione, M.; and Gaglio, S. 2005. Planning by imagination in CiceRobot, a robot for museum tours. In Proceedings of the AISB 2005 Symposium on Next Generation approaches to Machine Consciousness: Imagination, Development, Intersubjectivity, and Embodiment, 40 – 49. Fikes, R., and Nilsson, N. 1971. STRIPS: a new approach to the application of theorem proving to problem solving. Artif. Intell. 2(3-4):189–208. Franklin, S. 2005. Evolutionary pressures and a stable world for animals and robots: A commentary on merkel. Consciousness and Cognition 14:115 – 118. Gärdenfors, P. 2004a. Emulators as sources of hidden cognitive variables. Behavioral and Brain Sciences 27(3):403. Gärdenfors, P. 2004b. How Homo Became Sapiens: On the Evolution of Thinking. Oxford, UK: Oxford University Press. Gärdenfors, P. 2007. Mind-reading as control theory. European Review 15(2):223 – 240. Gerdes, V., and Happee, R. 1994. The use of an internal representation in fast goal-directed movements: a modelling approach. Biological Cybernetics 70(6):513 – 524. Gray, J. 1995. The contents of consciousness: A neuropsychological conjecture. Behavioral and Brain Sciences 18(4):659 – 722. Grush, R. 1995. Emulation and Cognition. Ph.D. Dissertation, University of California. 95