Mechanisms of Social Learning for Robots that Interact with People

Socially Intelligent Robots Cynthia Breazeal MIT Media Lab Robotic Life Group Robots turn 85 years old Posted May 31st 2006 11:38PM by Ryan Block Filed under: Robots Dear Robots, We're very sorry. It appears we missed your 85th birthday two days ago -- the anniversary of which is marked by the date Czech writer Karel Capek debuted his play R.U.R. (Rossum's Universal Robots) to its first audience in Prague. Yes, we know the concept of the automaton dates back much further, but we think it's well agreed upon that Capek's play marks the robot's entry into mass consciousness (as well as marking the first use of the word "robot"). No matter, we're just saying happy birthday, robots -- not because we fear you'll one day you'll subsume us in some dystopian nightmare of artificial intelligence gone terribly wrong, but because from Asimov to AIBO, from Roomba to Ri-Man, from QRIO to ASIMO, we just love ya. So happy birthday, happy birthday, happy birthday, robots, and when the day of reckoning comes, please remember: Engadget and its readers are your friends. All our love, Engadget Robots have… explored ocean depths, mapped subterranean mines, rescued natural disaster victims, assisted surgeons with operations, driven autonomously across the desert, And even been to Mars… What’s Next? The next big frontier…society at large Everyday Life with People and Robots …and its implication for design People and Robots Robots are not perceived as pure tools or appliances, but often as social actors -- over a wide range of morphologies and behaviors Robots Evoke Human Social Responses “The Kismet Effect” New Scientist, 2005 Newsmaker: My friend, the robot CNET news.com, May 24, 2006 The PackBots have almost become members of military units, Angle said, recalling an incident when a U.S. soldier begged iRobot to repair his unit's robot, which they had dubbed Scooby Doo. "Please fix Scooby Doo because he saved my life," was the soldier's plea, Angle told the Future in Review conference last week in Coronado, Calif. For many reasons, people bond with robots in a way they don't bond with their lawn mowers, televisions or regular vacuum cleaners. At some point, this could help solve the looming health care problem caused by an enormous generation of aging people. Not only could robots make sure they take their medicine and watch for early warning signs of distress, but they could also provide a companion for lonely people and extend their independence. Social Robots Socio-emotive Factors “Social as relationship” Interactive Toys Future applications require robots to address “Social as entertainment” Professional Service Robots “Social as interface” the socioemotive and psychological aspects of people, in long-term relations BANDAI “elder toys” NEC “babysitters” OMRON “pets” HRI, An Emerging Discipline An important goal of Human-Robot Interaction (HRI) is synergy of the human-robot system. Robots bring their own abilities that complement human strengths. It is not about equivalence (replacement), but compatibility with a typical human partner Four Cornerstones of Social Robotics in HRI User Studies, Psychology & Social Development Lasting Relationship Perspective Taking Social Intelligence Cognitive Compatibility Teamwork Transparent Communication Social Learning Interdependence Today’s Focus Robots, like humans, should leverage the social and environmental constraints in the real world to foster learning new skills and knowledge from anyone. Personalization agents, Adaptive user interfaces {Lashkari, Metral, Maes, Collaborative Interface Agents, AAAI 1994} {E. Horovitz et al., The Lumiere project, UAI 1998} Active Learning, Learning with Queries {Cohn, Ghahramani, Jordan, Active learning with statistical models, 1995} {Cohn et al., Semi-supervised clustering with user feedback, 2003} Learning by Demonstration, Programming by Example {Voyles, Khosla, Programming robotic agents by demonstration, 1998} {Lieberman, Your Wish is my Command, 2001} {A. Billard, Special Issue of RAS on Robot Programming by Demonstration, 2006} Learning by Imitation {S. Schaal review in TICS 1999} Animal{K.training Dautenhahntechniques & C. Nehaniv, Imitation in Animals and Artifacts, 2002} {Stern, Frank, Resner, Virtual Petz, Agents 1998} {Blumberg et al. Integrated learning for interactive characters, SIGGRAPH 2002} {Kaplan et al., Robot clicker training, RAS 2002} Reinforcement Learning with humans {Isbell et al. Cobot: a social reinforcement learning agent, UAI 1998} {Evans, Varieties of Learning, AI Game Programming Wisdom, 2002} {Clouse, Utgoff, Teaching a Reinforcement Learner, ICML 1992} … and many more How Do Ordinary People Teach a RL Agent? Most people don’t have experience with Machine Learning techniques, they have a lifetime of experience with social learning interactions that they bring to the table. We emphasize the need to consider and design to support the ways that people naturally approach teaching. And then design algorithms and systems that take better advantage of this Experiments in Sophie’s Kitchen A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. Sophie learns via QLearning 30 steps ~10,000 states 2-7 actions/state Allows us to run many subjects on- QuickTime™ and a Animation decompressor are needed to see this picture. Experiments in Sophie’s Kitchen A “computer game” - players teach a virtual robot to bake a cake, by sending various messages with a mouse interface. An object specific reward is about a particular part of the world QuickTime™ and a Animation decompressor are needed to see this picture. Initial Experiment  18 people trained Sophie Thomaz & Breazeal RO-MAN 2006  They are given a description of the cake task, and told they can’t do actions but can help Sophie by sending FEEDBACK messages with the mouse  System logs time of state changes, agent actions, and any human feedback. We analyze games logs to understand people’s teaching behavior Findings: Guidance People tried to use the object specific rewards as FUTURE directed guidance. QuickTime™ and a Animation decompressor are needed to see this picture. Many object rewards not about the last object used Never About Most Recent Object 0 % 20 % Always About Most Recent Object 40 % 60 % 80 % 100 Each player’s %Object Rewards about last object Almost everyone gave rewards to the bowl or tray sitting empty on the shelf...a guidance reward. 16 14 12 10 Number of People 8 6 4 2 0 Zero rewards At least 1 reward to Empty Bowl to Empty Bowl Findings: People Adapt Teaching to their Mental Model of Sophie People gave more rewards after realizing their feedback made a difference Interpreted Sophie’s behavior as being a “staged” learner Adapted their teaching strategy accordingly human rewards : agent actions Individual (Avg) Individual (Avg) Individual (Avg) Guidance Initial } Experiment Transparency Asymmetry Using Guidance in Sophie’s Kitchen } slight delay to animate act and receive human reward Interactive Q-Learning Algorithm, baseline system Using Guidance in Sophie’s Kitchen QuickTime™ and QuickTime™ and a a Photo - JPEG decompressor Animation decompressor are needed to see this picture. are needed to see this picture. Guidance Experiment Thomaz & Breazeal, AAAI 2006 Hypothesis: Non-expert teachers can use guidance to improve agent’s performance 27 subjects trained Sophie in two groups: Using feedback only Using both feedback and guidance Again, system logs game play and logs are analyzed to understand teaching behavior Effects of Guidance + >> only 1-tailed T-tests show logs in guidance condition are significantly better than non-guidance Number of Trials Number of Actions Number of Failures Number Fails before 1st Goal Number Unique States Visited feedback only 28.5 816.4 18.89 guidance + feedback 14.6 368 11.8 18.7 11 41% 124.44 62.7 50% effect size 49% 55% 38% Guidance Initial } Experiment Transparency Asymmetry Transparency Teachers structure the environment and the task to help a learner succeed. Learners contribute by revealing internal state; helping the teacher maintain a mental model to make guidance more appropriate.  How can machine learners be Transparent? Sophie’s Gaze Behavior Interactive Q-Learning Algorithm modified to incorporate Guidance Sophie’s Gaze Behavior QuickTime™ and a Animation decompressor are needed to see this picture. Transparency Experiment Thomaz et al., ICDL 2006 52 subjects trained Sophie in an online version:  Feedback and guidance, no gaze  Feedback and guidance, Sophie gazing Hypothesis: Learners can help shape their learning environment by communicating aspects of the internal process -- gaze will improve the human’s guidance instruction Sophie’s Gaze Behavior Results: Sophie’s gaze significantly improves the guidance received - more when uncertainty high and less when uncertainty is low. 90 80 70 Uncertainty high: 3 or more choices 60 50 gaze no-gaze 40 Uncertainty low: 3 or less 30 20 10 0 uncertainty low uncertainty high Lessons People bring their own teaching and learning experience to the task Social factors of guidance and transparency Collaborative process between teacher and learner improves performance Agent can use transparency cues to improve its own learning environment by helping teacher form a better mental model Adding gaze significantly improves the human’s Guidance Summary

Mechanisms of Social Learning for Robots that Interact with People

Related documents

Products

Support

Mechanisms of Social Learning for Robots that Interact with People

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib