Imitation Learning for Robots - Yale School of Engineering

advertisement
Course SHS
Program in Cognitive Psychology
Spring 2007
Human-Robot Interaction
Social learning and skill acquisition
via teaching and imitation
Aude G. Billard
Learning Algorithms and Systems Laboratory - LASA
EPFL, Swiss Federal Institute of Technology
Lausanne, Switzerland
aude.billard@epfl.ch
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Calinon, S. and Billard, A. (2007)
Gestures
by Imitation
in a2007
Humanoid
Robot. in
A.G. Incremental
Billard - SHS Learning
Program inof
Cognitive
Psychology
- Spring
http://lasa.epfl.ch
Proceedings of the ACM/IEEE International Conference on Human-Robot Interaction (HRI).
Gesture Recognition
How are actions perceived?
How is information parsed?
Imitation
Level of granularity: What is copied?
Should it copy the intention,
goal or dynamics of movement?
Motor Learning
How is information transferred
across multiple modalities?
Visuo-motor, Auditor-motor
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Gesture Recognition
Biological
Inspiration
Learning by Imitation
Robotic
Implementation
Motor Learning
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
BIOLOGICAL INSPIRATION
Prior to building any capability in robots, we
might want to understand how the equivalent
capability works in humans and other
animals
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation Capabilities in Animals
Which species may exhibit imitation is still a main
Gesturearea
Recognition
of discussion and debate
Biological
Inspiration
One differentiate “true” imitation from copying
(flocking, schooling, following), stimulus
enhancement, contagion or emulation
Learning by Imitation
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation Capabilities in Animals
• Copying and Mimicry: Rats, Monkeys
Gesture Recognition
• Observe companion actor rats performing different
spatial tasks differing according to the experimental
requirements. After the observational training,
surgical ablation to block any further learning
Biological
Inspiration
Learning by Imitation
• Legio et al, Brain Res. Protocols, 2003
• Heyes, Trends in Cog. Sciences, 2001
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation Capabilities in Animals
• The observer rats displayed exploration abilities
Gesture Recognition
that closely matched the previously observed
behaviors.
Biological
Inspiration
Learning by Imitation
• Legio et al, Brain Res. Protocols, 2003
• Heyes, Trends in Cog. Sciences, 2001
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation Capabilities in Monkeys
Gesture Recognition
Subjects who saw the Lever demonstrations tended to
use a levering movement to pop open the lid whereas
subjects who viewed Poke, as well as the controls, did
not display this behavior at all.
Biological
Inspiration
Learning by Imitation
• Whiten et al, Journal of Comparative Psychology, 1996
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation Capabilities in Animals
• “True” imitation: Ability to learn new actions not part
Gesture Recognition
of the usual repertoire
• The appanage of humans only, and possibly great apes
Biological
Inspiration
Learning by Imitation
• Whiten & Ham, Advances in the Study of Behaviour, 1992
• Savage & Rumbaugh, Child Devel, 1993
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation Capabilities in Animals
• Complex Imitation capabilities in Dolphins &
Gesture Recognition
Parrots. Large repertoire of imitation capabilities,
demonstrating flexibility and generalization in
different contexts.
Biological
Inspiration
Learning by Imitation
• Moore, Behaviour, 1999.
• Herman, Imitation in Animals & Artifacts, MIT Press, 2002
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Developmental Stages of Imitation
• Innate Facial Imitation (newborns  3 months)
Gesture Recognition
Tongue and lips protrusion, mouth-opening, head
movements, cheek and brow motion, eye blinking
• Delayed imitation up to 24 hours
 Imitation is mediated by a stored representation
Biological
Inspiration
Learning by Imitation
Meltzoff & Moore, Early Development and Parenting, 1997
Meltzoff & Moore, Developmental Psychology, 1989
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Developmental Stages of Imitation
• Deferred and delayed imitation - 18 month (Piaget), 9Gesture Recognition
12 months (Meltzoff)
• Deferred imitation of novel behavior
Biological
Inspiration
67% of the infants who saw the display reproduced the act
after the week's delay, as compared to 0% of the control
infants who had not seen the novel display.
Learning by Imitation
• Piaget, Play, Dreams and Imitation in Infancy, 1962 ;
• Meltzoff, Body and the self, 1995
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Goals and Intentions
• Infants aged 14 months.
Gesture Recognition
• Children imitate new action to achieve the same goal
only if they consider it to be the most rational
alternative.
Biological
Inspiration
Learning by Imitation
•Gergely, Bekkering, Giraly, Nature 415, 755, 2002
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Goals and Intentions
• 18-months infants
Gesture Recognition
• Differentiate between human and machine
demonstration
 Attribute intentions only to the human
Biological
Inspiration
• Learn from unsuccessful examples
Learning by Imitation
• Meltzoff, Dev. Psychol. 31, 1995.
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Goals and Intentions
• Imitation is hierarchical and goal-directed
Gesture Recognition
• Single-hand motions: accurate ipsilateral imitation,
48% subsitution for crosslateral imitation
• Two-hand motions: only 10% substitution for
crosslateral imitation.
Biological
Inspiration
Learning by Imitation
• Two-phase motion eliminates mistakes
• Adding constraints of hand gestures increases mistakes
• Bekkering, Wolschlager & Gattis, Quart. J. of Exp. Psych, 2000
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation in adults
• Reaches highest level of complexity
Gesture Recognition
• Is present in all activities:
Social influence in establishing group norms; collective
frame of reference, transmission of phoebias
Biological
Inspiration
Learning by Imitation
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Imitation Capabilities in Adults
Movement observation influences movement execution
Gesture Recognition
Priming process occurs involuntarily and is not under
the actor’s control.
Biological
Inspiration
Learning by Imitation
• Brass, Bekkering, Prinz, Acta Psychologia, 2001
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Neural Correlates
• Mirror Neuron System – F5 Area of Monkey M1
Gesture Recognition
Biological
Inspiration
Learning by Imitation
• Gallese et al, Brain, 1996. ; Rizzolatti et al, Cog. Brain Res., 1996
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Neural Correlates
• Mirror Neuron System – locus of visuo-motor
Gesture Recognition
transformation (STS, PM, Broca)
Biological
Inspiration
Learning by Imitation
• Iacoboni et al, Science 1999
• Arbib, Billard, Iacoboni, Oztop, Neural Networks, 2000.
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning in Animals
Take-Home Message
•
Range of imitative behaviors in animals
 Increasing in complexity across species
•
Stages of development in children imitation
 innate facial imitation
 inferring goals
 hierarchy of goals driving imitation
(hand motion takes precedence over arm gesture and location in
space)
•
Imitation in adulthood is influenced by mvmt observation,
handedness, orientation of the demonstrator
•
The underlying neural mechanisms are not yet completely
deciphered
 A better understanding of those would help shed light on
the different levels of imitation in animal behavior
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning in Animals
Take-Home Message
Advantages: When is Imitation useful?
•
It is a powerful means of transferring skills
•
It speeds up the learning process by showing
possible solutions or conversely by showing bad
solutions
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning in Animals
Take-Home Message
Disadvantages:
When is Imitation not useful?
•
Not appropriate: When a good solution for the
teacher is not a possible solution for the learner
•
Disadvantageous: When it induces you in error bad teacher (e.g. phoebia of spiders)
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Gesture Recognition
Biological
Inspiration
Learning by Imitation
Robotic
Implementation
Motor Learning
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
The Transfer Problem
Imitator
Demonstrator
 , , 
1
2
4
 , , 
5
6
?
3
 , , 
1
3
 4

x   x1 , x2 , x3 
 , ,  
5
6

x   x1, x2 , x3 
7
7
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
What to imitate?
 
x  x
 
d  d
 
v  v
 
  
Same Object, same target location
Same direction of motion
Same speed, same force
Same posture
 , ,  
 , , 
1
2
1
3
4
 , , 
5
6
7

v

x   x1 , x2 , x3 

d
2
3

x   x1, x2 , x3 

d
 4
 , ,  
5
6
7

v
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
How to Imitate?
The correspondence problem
Demonstration
Imitation
?
No solutions (smaller range of motion)
 Find the closest solution according to a metric
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Following – an imitation mechanism
• While following the teacher, the learner robot learns to
Gesture Recognition
associate a word with a meaning in terms of sensory inputs
Learning by Imitation
Robotic
Implementation
• Billard et al, ESANN’1997,
• Billard & Dautenhahn, Robotics &
Autonomous Systems 1998,
• Billard & Hayes, 99,00
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Following – an imitation mechanism
•Teaching path in a Maze
Gesture Recognition
Demiris & Hayes, 1994, 1996;
• Teaching how to climb a hill
Dautenhahn, Robotics & Autonomous Systems, 1995
Learning by Imitation
• Teaching a path in the environment
Robotic
Implementation
Billard & Hayes, Adaptive Behavior, 1999
Moga, Gaussier, Applied Artificial Intelligence, 2000
Kaiser et al, Robotics & Autonomous Systems, 2002
Nicolescu & Mataric, AGENTS’ 2003
• Teaching a vocabulary
Billard 1997, 1998, 1999
Vogt & Steels, ECAL, 1999
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
One-Shot Learning Methods
• Segmentation of demonstration into primitives
Gesture Recognition
• Classification of gestures into predefined states (e.g.
grasp, collision)
• Built-in controller for producing sequences of states
Learning by Imitation
Robotic
Implementation
• Kuniyoshi et al. IEEE Trans. on Robotics and Automation,1994.
• Dillmann et al, Robotics & Autonomous Systems, 2001.
• Ritter et al, Rev Neuroscience, 2003
• Aleotti et al, Robotics & Autonomous Systems, 2004.
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Robot Programming by Demonstration
One-Shot Learning Methods
Sensors: Data Gloves, Fixed cameras, Speech processing
Actuators: Mobile robot, 7 DOF arm, 2 fingers Gripper
R. Dillmann, Robotics & Autonomous Systems 47:2-3, 109-116, 2004
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
One-Shot Learning Methods
Explicit teaching/learning:
- Reasoning about tasks Gesture Recognition
- Verbal instructions
Gesture Recognition:
For each sensor a context-dependent
Learning
by Imitation
model based on background
knowledge
is provided: ‘opening the refrigerator
door’, ‘extracting the bottle’ and ‘closing
the door’
Robotic
Implementation
Task Reproduction:
Store action sequences in a tree-like
structure of macro-operators
R. Dillmann, Robotics & Autonomous Systems
47:2-3, 109-116 2004
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Robot Programming by Demonstration:
Grasping
Gesture Recognition
Because of the large range of
possible shapes, generalizing preprogrammed grasps to new and
general objects is a rather hard task:
Learning by Imitation
• Orientation of the hand
• Positioning of the fingers
(correspondence problem!)
• Tactile forces, stable object contact
Robotic
Implementation
Steil et al, Robotics & Autonomous Systems 47:2-3, 129-141, 2004
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Robot Programming by Demonstration:
Grasping
Gesture Recognition
(i) a ‘naïve’ imitation strategy, in which
the observed joint angle trajectories
(after their transformation into the
three-finger geometry) were directly
applied to control the fingers Learning
of the
TUM hand during the grasp, until
complete closure around the object
by Imitation
Robotic
Implementation
(ii) a strategy in which the visually
observed hand posture is matched to
the initial conditions of a power grip, a
precision grip, a three-finger and twofinger grip, respectively, in order to
identify the grip type.
Steil et al, Robotics & Autonomous Systems 47:2-3, 129-141, 2004
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Robot Programming by Demonstration
Other related works are, e.g.:
•
•
•
Kuniyoshi et al, ICRA, 1994
Aleotti et al, Robotics & Autonomous Systems, 47:2-3, 153-167, 2004
Zhang & Roessler, Robotics & Autonomous Systems 47:2-3, 117-127, 2004
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Learning of Dynamical Systems
• Learning the optimal controller
Gesture Recognition
• Model of physical system (pendulum)
• Reinforcement and locally weighted learning
Learning by Imitation
Robotic
Implementation
• Atkeson & Schaal, ICML, 1997.
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Learning of Dynamical Systems
• Locally weighted learning
Gesture Recognition
• Learning primitives of the system
Learning by Imitation
Robotic
Implementation
•Ijspeert, Nakanishi, Schaal, ICRA’01, NIPS’02
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Learning of Dynamical Systems
• Locally weighted learning
Gesture Recognition
• Learning primitives of the system
Learning by Imitation
Robotic
Implementation
•Ijspeert, Nakanishi, Schaal, ICRA’01, NIPS’02
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning
Learning of Dynamical Systems
The learned trajectory is not sufficient to control the
Gesture Recognition
actual robot’s walking pattern.
Phase resetting using foot contact information is
necessary.
 on-line adjustment of the phase of the CPG by
sensory feedback from the environment is essential to
Learning by Imitation
achieve successful locomotion
Robotic
Implementation
Nakanishi et al, Robotics & Autonomous Systems, 47:2-3, 79-91, 2004.
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Imitation Learning in Robots
Granularity
Cognition
How to imitate?
Level 3: Learning primitives of motion
Level 2: Exact reproduction of trajectories
Level 1: One-shot learning
Level 0: Following – an implicit imitation mechanism
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
IMITATION LEARNING VERSUS MOTOR LEARNING
Imitation learning – Programming by
Demonstration:
• A way to speed up learning, to reduce the
search space
• A way to share with the robot’s the same
vocabulary of motor skills
Self Motor Learning - Reinforcement
Learning
• To adapt to novel situations
• To adapt the demonstrated motions to the robot’s
body  Correspondence problem
(Nehaniv & Dautenhahn 1999)
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Learning What to imitate
The robot should learn that the important feature in
this task is that the queen should be moved 2 steps
forward vertically
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Learning How to Imitate
Once the robot has learned the rule of motion for the queen,
it can apply this rule for moving the queen from
locations not seen during the demonstrations
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
IMITATION LEARNING VERSUS MOTOR LEARNING
Imitation learning – Programming by
Demonstration:
• A way to speed up learning, to reduce the
search space
• A way to share with the robot’s the same
vocabulary of motor skills
Self Motor Learning - Reinforcement
Learning
• To adapt to novel situations
• To adapt the demonstrated motions to the robot’s
body  Correspondence problem
(Nehaniv & Dautenhahn 1999)
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Transmitting human skills and knowledge to robots
Learning a Packaging Task
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
From Recognizing to Reproducing Gestures
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
From Recognizing to Reproducing Gestures
GMM/HMM Encoding: Recovers generalized
signal by regression
Mixture of k Gaussians
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Learning What to imitate?
Tracking of object – stereovision
Tracking of joint trajectories
GMM over 26 demonstrations
3D velocities of the end effector
Calion, S., Guenter, F. andA.G.
Billard,
A. - (2007)
On Learning,
and Generalizing
a Task in a
Billard
SHS Program
in CognitiveRepresenting
Psychology - Spring
2007
http://lasa.epfl.ch
Humanoid Robot. IEEE Transactions on Systems, Man and Cybernetics, 37:2. Part B. Special issue on robot
Learning What to imitate?
Calion, S., Guenter, F. and Billard, A. (2007) On Learning, Representing and Generalizing a Task in a
Humanoid Robot. IEEE Transactions on Systems, Man and Cybernetics, 37:2. Part B. Special issue on robot
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
learning by observation, demonstration and imitation.
Learning What to imitate?
Correlations in the latent space of
the two hands
Hands-Bucket Correlations
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
IMITATION LEARNING VERSUS MOTOR LEARNING
Imitation learning – Programming by
Demonstration:
• A way to speed up learning, to reduce the
search space
• A way to share with the robot’s the same
vocabulary of motor skills
Self Motor Learning - Reinforcement
Learning
• To adapt to novel situations
• To adapt the demonstrated motions to the robot’s
body  Correspondence problem
(Nehaniv & Dautenhahn 1999)
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
IMITATION LEARNING VERSUS MOTOR LEARNING
Imitation learning – Programming by
Demonstration:
• A way to speed up learning, to reduce the
search space
• A way to share with the robot’s the same
vocabulary of motor skills
Self Motor Learning - Reinforcement
Learning
• To adapt to novel situations
• To adapt the demonstrated motions to the robot’s
body  Correspondence problem
(Nehaniv & Dautenhahn 1999)
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Computing the Metric of Imitation Performance
Hands’ Paths
Joints Trajectories
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Adapting the Demonstration to Fit the Robot’s Body



Minimizing H
Under the kinematics constraint:
By Lagrange, we compute
the optimal solution:
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Adapting the Demonstration to Fit the Robot’s Body



Minimizing H
Under the kinematics constraint:
By Lagrange, we compute
the optimal solution:
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Adapting the Demonstration to Fit the Robot’s Body
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
1st Limitation of the System
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
1st Limitation of the System
Misses the bucket
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
IMITATION LEARNING VERSUS MOTOR LEARNING
Imitation learning – Programming by
Demonstration:
• A way to speed up learning, to reduce the
search space
• A way to share with the robot’s the same
vocabulary of motor skills
Self Motor Learning - Reinforcement
Learning
• To adapt to novel situations
• To adapt the demonstrated motions to the robot’s
body  Correspondence problem
(Nehaniv & Dautenhahn 1999)
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Dynamic Adaptation of Gesture Reproduction
Dynamical system modulation to be robust to perturbations
(novel context, obstacles, etc)
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Dynamical System Modulation
Different initial conditions
Adaptation to sudden target
displacement
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Different initial conditions
Adaptation to sudden target
displacement
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Dynamic Adaptation of Gesture Reproduction
Adaptation to different
contexts
Online adaptation to changes
in the context
Hersch, M., Guenter, F., Calinon,
S. and
Learning
Dynamical
Systemhttp://lasa.epfl.ch
Modulation for
A.G. Billard
- SHSBillard,
ProgramA.in (2006)
Cognitive
Psychology
- Spring 2007
Constrained Reaching Tasks. In Proceedings of the IEEE-RAS International Conference on Humanoid Robots.
2nd Limitation of the System
If the novel situation differs
Importantly from the demonstrated
one, then adapting the
demonstrated trajectory is no longer
sufficient to satisfy the task.
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
2nd Limitation of the System
If the novel situation differs
Importantly from the demonstrated
one, then adapting the
demonstrated trajectory is no longer
sufficient to satisfy the task.
 Need to relearn the task -- Reinforcement Learning
 Need to define a new metric – the reward
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
RL - To Adapt to Novel Situations
Reinforcement Learning episodic Natural Actor Critic (NAC) is
applied to learn a new trajectory, so as to overcome the
obstacle.
Gaussian Stochastic Policy
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
RL - To Adapt to Novel Situations
The robot is rewarded for reaching the obstacle, as well as
staying close to the original demonstrated trajectory.
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
RL - To Adapt to Novel Situations
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
RL - To Adapt to Novel Situations
In each case, at least 1 run fails to
find a solution
 Need to run the algorithm several
times
 Time consumingA.G. Billard - SHS Program in Cognitive Psychology - Spring
2007
http://lasa.epfl.ch
INCREMENTAL LEARNING
The robot records the position of the objects, the position of the teacher's
hands, the joint angles of the teacher's upper-body motion (motion sensors)
A.G.
Billard -of
SHS
Program
in Cognitive
Psychology
- Spring(kinesthetics).
2007
http://lasa.epfl.ch
or/and the joint
angles
the
robot's
upper-body
motion
INCREMENTAL LEARNING
Movie
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
INCREMENTAL LEARNING
Movie
6 demonstrations of moving the white Knight to catch the black King
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
INCREMENTAL LEARNING
Movie
Trajectories of the hand with respect to the first and second object.
(Left) Superimposed, the trajectories of each of the 6 demonstrations
Middle: Gaussian Mixture Model, Right, Gaussian Mixture Regression
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
INCREMENTAL LEARNING
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
INCREMENTAL LEARNING
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
INCREMENTAL LEARNING
Movie
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
SUMMARY
Learning new tasks relies on various means of teaching the
robots.
 Imitation learning is useful in so far that it gives hints as to
the optimal solution
 The robot must however rely on generic skills of its own to
adapt the demonstration to its own body and to the context
 Learning of complex skills is overall relatively slow and
must proceed incrementally
A.G. Billard - SHS Program in Cognitive Psychology - Spring 2007
http://lasa.epfl.ch
Download