Proceedings of the Twenty-First International FLAIRS Conference (2008) Humanoid Agents as Hosts, Advisors, Companions and Jesters Candace L. Sidner Advanced Information Technology BAE Systems 6 New England Executive Drive, Burlington, MA 01803 candy.sidner@alum.mit.edu Abstract Humanoid robots are a valuable for study not only for how they can be used, but as a research platform for understanding a variety of issues in human interaction, where we gain inspiration from human to human interaction and study human-robot interaction to which there are comparisons and contrasts. Humanoid agents can serve in a variety of activities ranging from hosts, to advisors and coaches, to teachers and to humous companions. This paper explores recent research in humanoid robots and embodied conversational agents to undertand how both of these types of humanoid agents might play a role in our lives. Humanoid robots run the gamut of robots built using robot bases with laptops displaying animated faces, as in the Naval Research Laboratories robot (Kennedy et al 07) in Figure 1, or robotic faces and robotic arms as shown in the JAST robot (Rickert et al 07) in Figure 2 to specially made animal-like ones such as Leonardo (Thomaz & Breazeal 07) in Figure 3 and special-purpose efforts such as Kidd's robot head for coaching weight loss (Kidd & Braezeal 07) in Figure 4. While these robots are very diverse in form, they bear many similarities in terms of their ability to produce communication styles and use of limbs that mimic those of human communicators. Introduction In the 1970s, AI leaders believed we would soon see all manner of intelligent agents, agents that would easily beat grand masters at chess, speak natural language fluently and continually learn new things. These problems turned out to be much harder than they thought. Subfields formed within AI to pursue specialized research in robotics, machine vision, natural language and so on. Speech recognition was joined with natural language to work towards spoken language. Happily, in past 25 years, these subfields have made significant progress, to the point that there are now some commercially available robots, speech recognition engines, and vision systems that can actually find and sometimes correctly identify faces, gaze, arms, and simple objects. There are also embodied agents that live in the screen world. So-called embodied conversational agents (ECAs) have animated faces and bodies, and can be combined with speech and vision systems to serve as a screen based animated intelligent partner. What are the capabilities of these various agents for interaction? In this paper I explore the new types of humanoid robots being explored in research groups around the world. While only a sample of the work being done, this survey will indicate the breadth of possibilities in this work. I then turn to ECAs and survey them briefly as well as applications of their use and how these might be applied to robots. Figure 1: George, a Naval Research Labs robot Humanoid Robots: Capabilities and Limitations Copyright © 2008, Association for the Advancement o f Artificial Intelligence (www.aaai.org). All rights reserved. 11 One-of-a-kind humanoids also now share the research world now with commercially produced humanoids, such as Melvin, shown in figure 5,a humanoid robot developed by Robomotio (www.robomotio.com) and Mitsubishi Electric Research Labs with 15 degrees of freedom, speech input and output, and mounted a mobile base. Figure 5: Melvin developed by Robomotio and MERL Beyond humanoids, it is important to mention work on “android” robots, which attempt to imitate humans very realistically. David Hanson (Hanson Robotics) and Hiroshi Ishiguro (Nishio et al 07) are exploring either face- only androids or full-body humanoids, (see Figure 6 for Ishiguro’s “geminoid” android). Full-body androids cannot walk in any fashion because robotics has not progressed to walking in the manner of humans. However, when seated, Ishiguro's geminoid can participate in a meeting as a surrogate for its creator while being controlled by him. Figure 2: The EU JAST project robot Figure 3: Leonardo from the MIT Media Lab Figure 6: Ishiguro's geminoid robot with Ishiguro Robots that display human or human-like physical characteristics offer the possibility of being partner with human interaction styles. However, unless they are capable of speaking and understanding and using some sensory means of being aware of their partner, humanoid robots remain of limited use for interactions with people. All the robots shown here make use of various sensory inputs. Speech recognition tools make it possible to utter sentences and phrases to them. However, unrestricted Figure 4: Kidd's weight loss coach robot 12 natural conversation pushes most speech recognition engines beyond their limits because conversational speech is highly unpredictable, varied and disfluent. Good models of dialogue, of which there are too many to review here, have been shown to improve speech recognition engines (Lemon et al, 2002). Yet care must still be taken to limit the human partner's expectations for what can be said to reduce error rates to a useful level. Vision research has progressed significantly in recent years, which makes possible the use of vision algorithms for face tracking (Viola and Jones, 2001), face recognition, gaze recognition (Morency et al, 2006), tracking of limbs (e.g. Demirdjian, 2004) and recognition of objects in a limited way (e.g. Torralba et al 2004; Liebe et al 2007 for pointers into a vast literature). However, all of this technology is still in relative infancy. Furthermore, a moving robot will have a great deal more trouble tracking and recognizing a face than one that is stationary at the time it wishes to use its vision technology. Figure 7: Bickmore's Laura ECA Pelachaud's Greta includes a human face as well as a body with significant facial motion, including lip movement, and has been used to study expression of emotion and communication style during talking (Pelachaud 05). Greta is also in use as a research tool by other groups (including E. Andre's group at University of Augsburg) to support their own research efforts in emotion and dialogue. Embodied Conversational Agents Integration of vision technology, along with mobile movement is a considerable undertaking in humanoid robots. How can all this technology be used? There are two general categories of answer, one based on work with humanoid robots, and the other on ECAs, which can have highly natural facial and body movements. Let us envison what robots might do first by looking at research that demonstrates what ECAs are now doing in research labs to interact in intelligent and useful ways with their human partners. I will then turn to research, including my own, on using humanoid robots both for research and for useful applications. Embodied conversational agents simplify the physical hardware problem inherent in humanoid robots by replacing it with animation. Animation, while a challenge, can be used in varying levels of sophistication to produce a character that is human-looking, both in form and movement. In the figures below, Bickmore's Laura ECA (Bickmore and Picard, 2005) and Pelachaud's Greta (Pelachaud, 2005), shown in Figures 7 and 8, illustrate some of the range of animation forms. Work at Institute for Creative Technologies, not just with a single agent, but an immersive full-wall screen world, includes many ECAs (Traum & Rickel 02). Figure 8: Pelachaud's Greta ECA ECAs, facial or full-body, also support research in social behavior, including that of children. Cassell's research on Sam the Castlemate, shown in Figure 9, provides children with the opportunity to improve their preliteracy skills by combining the ECA with physical figurines to "share" with Sam [Ryokai et al, 2002; Cassell 04]. This work illustrates not only the research platform side of ECAs but their practical application for serving human activities, in this case, playing and at the same time improving literacy skills. The projects associated with these three examples illustrate a basic point of this paper: ECAs are being used in significant ways to train people, assist them in their lives and as research platforms for learning about communication, emotion, and social behavior. 13 performed significantly more walking compared to a nonintervention control group drawn from the same practice (Bickmore et al 05). Humanoid Robots in Research and Applications In this section, we will focus on the humanoid robot as a research platform and one for supporting humans in many ways. In our own recent work, Lee and I (Sidner, Lee et al 2005; Sidner, Lee et al 2006) developed dialogue and collaboration capabilities for two humanoids, one a penguin with a face wearing glasses, body and wings (see Figure 11), and the other using Melvin (see Figure 5). Figure 9: Sam the Castlemate and child The ICT immersive technology, shown in Figure 10, including full-body ECAs is designed to help train soldiers with tasks needed in modern military in-country duty. Training is a significant and useful role for ECAs. While research remains to be done to determine how and why this technology is more effective than non-ECA efforts, its potential will be greatly enhanced by tools that will allow researchers and engineers to focus on what must be learned rather than building the ECA and environment. Figure 11: Mel a quasi-humanoid robot This research concerned attentional behaviors, especially the nature of engagement between the robot and its human interlocutor. Engagement is the process by which participants start, maintain and end their perceived connection to one another. Our robots could hold extended dialogues with their human counterparts while tracking the human faces, and when dialogue appropriate, viewing other objects relevant to the conversation. They could also interpret nods during dialogue. Figure 10: Peace keeping scenario from ICT Bickmore's work on the Laura ECA is instructive for two principal reasons: it is both a research platform for studying social dialogue, and for creating devices that offer assistance to users seeking to change their habits in diet and exercise. As a research platform, Bickmore was able to show that people interacting with Laura over a several week span for exercise advice responded socially to Laura when she not only talked with them about the task of exercise but also about social matters (Bickmore and Picard, 2005). Bickmore's more recent research has been focused on the use of ECAs, which he calls relational agents, in dialogues with users to support their behavior change activities. Bickmore's research in a pilot study showed that participants recruited from the Geriatric Ambulatory Practice at Boston Medical Center who used a relational agent exercise coach daily for two months In user studies, we found that as long as the robot conversed, whether it tracked its human interlocutor or not, human participants looked back at the robot when they took their turn (as people often do in human dialogs). However, people preferentially looked back to the robot more when it tracked them. They also considered the robot more natural when moving and tracking, and they nodded more at the robot when it indicated through its own nods that was interpreting their nods. Our robots could converse about themselves, engage in collaborative demonstrations of devices, as well as locate a person in a environment and begin an interaction with them. 14 Our robot research program has been directed towards a robot as a host in an environment, a host that can direct a human or groups of humans in the environment, tell them about it and interact with them about devices in that environment. Hosting is form of companionship aimed at information sharing as opposed to building social relationships. Humanoid robots, drawing on work in ECAs, can provide social interaction. However, because the benefit of physical presence and locomotion comes at the cost of hardware and its maintenance challenges, humanoid robots must be useful in their very physicality. Research so far both on robots and ECAs suggests this will be so. Robots and ECAs are now being explored for similar tasks, for example, in health behavior change where Kidd's recent research on robotic weight loss coach (Kidd and Breazeal, 07) reprises some of the effects shown in Bickmore's work. Other research (see Wainer et al 06 for one such study) suggests that there are differential effects in social awareness, believability and trust in a co-present robot as opposed to a virtual (ECA) one. However, a better option at this point in time is to be inspired by the applications of ECAs in looking for uses of humanoids. Bickmore, T., Caruso, L., Clough-Gorr, K., and Heeren, T. 2005. It's just like you talk to a friend: Relational Agents for Older Adults. Interacting with Computers 17 (6): 711735. Progress on many fronts, both on social matters, but also on better robotic and visual technology, will offer humanoid agents, embodied in the computer display and in the physical world, who will be worthwhile additions to our lives. References Bickmore, T. and Picard, R. 2005. Establishing and Maintaining Long-Term Human-Computer Relationships. ACM Transactions on Computer Human Interaction (ToCHI), 59(1): 21-30. Cassell, J. 2004. Towards a Model of Technology and Literacy Development: Story Listening Systems. Journal of Applied Developmental Psychology, 25 (1): 75-105. Obviously, as robots become able to use their arms and hands safely, many capabilities will manifest themselves. In the meantime, as advisors and teachers, humanoid robots can peer into our environment as we perform tasks including repairs, or learn new physical skills or seek to perfect them. As social companions, they may be more engaging than simply "wiring" an environment to notice when we forget something, or in the case of elders, fall and become injured. Evidence already suggests that we humans feel attuned to and engaged by the physical presence of a humanoid robot. Demirdjian, D. 2004. Combining Geometric- and ViewBased Approaches for Articulated Pose Estimation. In Proceedings of ECCV’04, Prague, Czech Republic, May. Feil-Seifer, D.J. and Mataric´, M.J. 2005. Defining Socially Assistive Robotics. Poster paper in International Conference on Rehabilitation Robotics, pages 465-468, Chicago, IL. Kennedy, W. G., Bugajska, M., Marge, M., Adams, W., Fransen, B. R.,Perzanowski, D., Schultz, A. C., & Trafton, J. G. 2007. Spatial Representation and Reasoning for Human-Robot Collaboration. Proceedings of the TwentySecond Conference on Artificial Intelligence. 1554-1559, Vancouver, Canada: AAAI Press. Humanoid robots have had one direct analogue of the Cassell work in recent years—providing a interactive partner for autistic children (Feil-Seifer & Mataric´ 05). Mataric´ and colleagues have identified socially assistive uses of robots as significant, that is, robots that assist in social interaction for a purpose other than the interaction itself. Robots that help autistic children are essentially teachers, helping them to understand social behaviors they find impossible to learn on their own. Likewise, robots that serve as tutors or assistants in doing physical tasks are socially assistive robots on their view. Kidd C.D. and Breazeal, C. 2007. A Robotic Weight Loss Coach. Twenty-Second Conference on Artificial Intelligence. Vancouver, British Columbia, Canada. AAAI. Leibe, B., Cornelis, N., Cornelis, K. and van Gool, L. 2007. Dynamic 3D Scene Analysis from a Moving Vehicle, Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Social interaction means having something to be social about. Advising on activities and tasks, teaching skills, and hosting in environments are useful steps in that direction. However, as a field, we have much to learn about matters not discussed thus far in achieving in social interaction. These matters include emotional connection, extended dialogue and yes, even humor as part of our interactions. We all are aware of the role of these behaviors in our human-to-human social interaction. It remains to be seen if these behaviors when possible with robots will be beneficial in our interactions with them. Lemon, O., Gruenstein, E. and Peters, S. 2002. Collaborative activities and multi-tasking in dialogue systems. Traitement Automatique des Langues (TAL), Special issue on Dialogue, 43(2):131–154. Nishio,S., Ishiguro, H. and Hagita, N. Can a teleoperated android represent personal presence? - A case study with children, Psychologia, Vol. 50, No. 4, pp. 330-343, 2007. 15 Morency, L.-P., Christoudias, C. M., and Darrell, T. 2006. Recognizing Gaze Aversion Gestures in Embodied Conversational Discourse, International Conference on Multimodal Interactions, Banff, Canada. Pelachaud, C. 2005. Multimodal expressive embodied conversational agent, ACM Multimedia, Brave New Topics session, Singapore, November. Rickel, J., Marsella, S. , Gratch, G., Hill, R., Traum, D. and Swartout, W. 2002. Towards a New Generation of Virtual Humans for Interactive Experiences, in IEEE Intelligent Systems, 17(4):32—38. Rickert, M., Foster, M.E., Giuliani, M., By, T., Panin, G.and Knoll,A. Integrating language, vision and action for human robot dialog systems. In Proceedings of HCI International 2007, Beijing, pages 987-995, July 2007. Ryokai, K., Vaucelle, C., and Cassell, J. 2002. Literacy Learning by Storytelling with a Virtual Peer. In Proceedings of Computer Support for Collaborative Learning, 2002. Sidner, C., Lee, C., Kidd, C., Lesh, N. and Rich, C. 2005. Explorations in Engagement for Humans and Robots, Artificial Intelligence, 166(1-2): 140-164, August. Sidner, C., Lee, C., Morency, C., Forlines, C. 2006. The Effect of Head-Nod Recognition in Human-Robot Conversation, Proceedings of the 2006 ACM Conference on Human Robot Interaction, pp. 290-296. A. L. Thomaz, and C. Breazeal 2007. “Robot Learning via Socially Guided Exploration.” In Proceedings of the 6th International Conference on Developmental Learning (ICDL). Imperial College, London. Torralba, A., Murphy, K. P. and Freeman, W. T. 2004. Sharing features: efficient boosting procedures for multiclass object detection. Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR). pp 762- 769. Wainer,J., Feil-Seifer, D.J., Shell, D.A. and Mataric´, M.J. 2006. The role of physical embodiment in human-robot interaction. In IEEE Proceedings of the International Workshop on Robot and Human Interactive Communication (RO-MAN), pages 117-122, Hatfield, United Kingdom. Viola, P., Jones, M. 2001. Rapid object detection using a boosted cascade of simple features, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Hawaii, pp. 905-910. 16