Mechanisms of Social Learning for Robots that Interact with People

advertisement
Socially Intelligent Robots
Cynthia Breazeal
MIT Media Lab
Robotic Life Group
Robots turn 85 years old
Posted May 31st 2006 11:38PM by Ryan Block
Filed under: Robots
Dear Robots,
We're very sorry. It appears we missed your 85th
birthday two days ago -- the anniversary of which is
marked by the date Czech writer Karel Capek debuted
his play R.U.R. (Rossum's Universal Robots) to its first
audience in Prague. Yes, we know the concept of the
automaton dates back much further, but we think it's well
agreed upon that Capek's play marks the robot's entry
into mass consciousness (as well as marking the first
use of the word "robot"). No matter, we're just saying
happy birthday, robots -- not because we fear you'll one
day you'll subsume us in some dystopian nightmare of
artificial intelligence gone terribly wrong, but because
from Asimov to AIBO, from Roomba to Ri-Man, from
QRIO to ASIMO, we just love ya. So happy birthday,
happy birthday, happy birthday, robots, and when the
day of reckoning comes, please remember: Engadget
and its readers are your friends.
All our love,
Engadget
Robots have…
explored ocean depths,
mapped subterranean mines,
rescued natural disaster victims,
assisted surgeons with operations,
driven autonomously across the desert,
And even been to Mars…
What’s Next?
The next big frontier…society at large
Everyday Life with
People and Robots
…and its implication for design
People and Robots
Robots are not
perceived as pure
tools or appliances,
but often as social
actors -- over a wide
range of morphologies
and behaviors
Robots Evoke Human Social
Responses
“The Kismet Effect”
New Scientist, 2005
Newsmaker: My friend,
the robot
CNET news.com, May 24, 2006
The PackBots have almost become members of military units,
Angle said, recalling an incident when a U.S. soldier begged
iRobot to repair his unit's robot, which they had dubbed Scooby
Doo. "Please fix Scooby Doo because he saved my life," was the
soldier's plea, Angle told the Future in Review conference last
week in Coronado, Calif. For many reasons, people bond with
robots in a way they don't bond with their lawn mowers,
televisions or regular vacuum cleaners. At some point, this could
help solve the looming health care problem caused by an
enormous generation of aging people. Not only could robots
make sure they take their medicine and watch for early warning
signs of distress, but they could also provide a companion for
lonely people and extend their independence.
Social Robots
Socio-emotive Factors
“Social as relationship”
Interactive
Toys
Future
applications
require
robots to
address
“Social as entertainment”
Professional
Service
Robots
“Social as interface”
the socioemotive and
psychological
aspects of
people, in
long-term
relations
BANDAI “elder toys”
NEC “babysitters”
OMRON “pets”
HRI, An Emerging Discipline
An important goal of Human-Robot Interaction (HRI) is synergy
of the human-robot system. Robots bring their own abilities that
complement human strengths. It is not about equivalence
(replacement), but compatibility with a typical human partner
Four Cornerstones of Social
Robotics in HRI
User Studies,
Psychology &
Social Development
Lasting Relationship
Perspective Taking
Social Intelligence
Cognitive
Compatibility
Teamwork
Transparent
Communication
Social Learning
Interdependence
Today’s Focus
Robots, like humans, should leverage the social
and environmental constraints in the real world to
foster learning new skills and knowledge from
anyone.
Personalization agents, Adaptive user interfaces
{Lashkari, Metral, Maes, Collaborative Interface Agents, AAAI 1994}
{E. Horovitz et al., The Lumiere project, UAI 1998}
Active Learning, Learning with Queries
{Cohn, Ghahramani, Jordan, Active learning with statistical models, 1995}
{Cohn et al., Semi-supervised clustering with user feedback, 2003}
Learning by Demonstration, Programming by
Example
{Voyles, Khosla, Programming robotic agents by demonstration, 1998}
{Lieberman, Your Wish is my Command, 2001}
{A. Billard, Special Issue of RAS on Robot Programming by Demonstration,
2006}
Learning by Imitation
{S. Schaal review in TICS 1999}
Animal{K.training
Dautenhahntechniques
& C. Nehaniv, Imitation in Animals and Artifacts, 2002}
{Stern, Frank, Resner, Virtual Petz, Agents 1998}
{Blumberg et al. Integrated learning for interactive characters, SIGGRAPH
2002}
{Kaplan et al., Robot clicker training, RAS 2002}
Reinforcement Learning with humans
{Isbell et al. Cobot: a social reinforcement learning agent, UAI 1998}
{Evans, Varieties of Learning, AI Game Programming Wisdom, 2002}
{Clouse, Utgoff, Teaching a Reinforcement Learner, ICML 1992}
… and many more
How Do Ordinary People
Teach a RL Agent?
Most people don’t have experience with Machine
Learning techniques, they have a lifetime of
experience with social learning interactions that they
bring to the table.
We emphasize the need to consider and design to
support the ways that people naturally approach
teaching.
And then design algorithms and systems that take
better advantage of this
Experiments in
Sophie’s Kitchen
A “computer game” - players teach a virtual robot
to bake a cake, by sending various messages with a
mouse interface.
Sophie learns via QLearning
30 steps
~10,000 states
2-7 actions/state
Allows us to run
many subjects on-
QuickTime™ and a
Animation decompressor
are needed to see this picture.
Experiments in
Sophie’s Kitchen
A “computer game” - players teach a virtual robot
to bake a cake, by sending various messages with a
mouse interface.
An object specific
reward is about a
particular part of
the world
QuickTime™ and a
Animation decompressor
are needed to see this picture.
Initial Experiment
 18 people trained Sophie
Thomaz & Breazeal
RO-MAN 2006
 They are given a description of the cake task,
and told they can’t do actions but can help
Sophie by sending FEEDBACK messages with
the mouse
 System logs time of state changes, agent
actions, and any human feedback. We
analyze games logs to understand people’s
teaching behavior
Findings: Guidance
People tried to use the object
specific rewards as FUTURE directed
guidance.
QuickTime™ and a
Animation decompressor
are needed to see this picture.
Many object rewards not about the last object
used
Never About
Most Recent Object
0
%
20
%
Always About
Most Recent Object
40
%
60
%
80
%
100
Each player’s %Object Rewards about last object
Almost everyone gave rewards to
the bowl or tray sitting empty on the
shelf...a guidance reward.
16
14
12
10
Number
of People
8
6
4
2
0
Zero rewards At least 1 reward
to Empty Bowl to Empty Bowl
Findings: People Adapt
Teaching to their Mental
Model of Sophie
People gave more rewards after realizing their
feedback made a difference
Interpreted Sophie’s behavior as being a
“staged” learner
Adapted their teaching strategy accordingly
human rewards : agent actions
Individual
(Avg)
Individual
(Avg)
Individual
(Avg)
Guidance
Initial
}
Experiment
Transparency
Asymmetry
Using Guidance in
Sophie’s Kitchen
}
slight delay
to animate act
and receive
human reward
Interactive Q-Learning
Algorithm, baseline system
Using Guidance in
Sophie’s Kitchen
QuickTime™
and
QuickTime™ and
a a
Photo - JPEG
decompressor
Animation
decompressor
are needed to see this picture.
are needed
to see this picture.
Guidance
Experiment
Thomaz & Breazeal, AAAI
2006
Hypothesis: Non-expert teachers can use guidance to
improve agent’s performance
27 subjects trained Sophie in two groups:
Using feedback only
Using both feedback and guidance
Again, system logs game play and logs are analyzed to
understand teaching behavior
Effects of Guidance
+
>>
only
1-tailed T-tests show logs in guidance condition are
significantly better than non-guidance
Number of Trials
Number of Actions
Number of Failures
Number Fails before 1st
Goal
Number Unique States
Visited
feedback
only
28.5
816.4
18.89
guidance +
feedback
14.6
368
11.8
18.7
11
41%
124.44
62.7
50%
effect size
49%
55%
38%
Guidance
Initial
}
Experiment
Transparency
Asymmetry
Transparency
Teachers structure the environment and the
task to help a learner succeed.
Learners contribute by revealing internal state;
helping the teacher maintain a mental model to
make guidance more appropriate.
 How can machine learners be Transparent?
Sophie’s Gaze Behavior
Interactive Q-Learning
Algorithm modified to
incorporate Guidance
Sophie’s Gaze Behavior
QuickTime™ and a
Animation decompressor
are needed to see this picture.
Transparency
Experiment
Thomaz et al., ICDL
2006
52 subjects trained Sophie in an online version:
 Feedback and guidance, no gaze
 Feedback and guidance, Sophie gazing
Hypothesis:
Learners can help shape their learning environment by
communicating aspects of the internal process -- gaze will
improve the human’s guidance instruction
Sophie’s Gaze Behavior
Results: Sophie’s gaze significantly
improves the guidance received - more
when uncertainty high and less when
uncertainty is low.
90
80
70
Uncertainty high:
3 or more choices
60
50
gaze
no-gaze
40
Uncertainty low:
3 or less
30
20
10
0
uncertainty low
uncertainty high
Lessons
People bring their own teaching and learning
experience to the task
Social factors of guidance and transparency
Collaborative process between teacher and
learner improves performance
Agent can use transparency cues to improve
its own learning environment by helping
teacher form a better mental model
Adding gaze significantly improves the human’s
Guidance
Summary
Download