ppt slides

advertisement
Learning from Observation
Using Primitives
Darrin Bentivegna
Outline
 Motivation
 Test environments
 Learning from observation
 Learning from practice
 Contributions
 Future directions
Bentivegna Thesis, July 2004
Motivation
 Reduce the learning time needed by robots.
 Quickly learn skills from observing others.
 Improve performance through practice.
 Adapt to environment changes.
 Create robots that can interact with and learn
from humans in a human-like way.
Bentivegna Thesis, July 2004
Real World Marble Maze
Bentivegna Thesis, July 2004
Real World Air Hockey
Bentivegna Thesis, July 2004
Research Strategy
 Domain knowledge: library of primitives.
 Manually defining primitives is a natural way to
specify domain knowledge.
 Focus of research is on how to use a fixed
library of primitives
Marble Maze Primitives
Roll To Corner
Guide
Bentivegna Thesis, July 2004
Roll Off Wall
Leave Corner
Roll From Wall
Primitives in Air Hockey
Right Bank Shot
- Defend Goal
Bentivegna Thesis, July 2004
Straight Shot
Left Bank Shot
-Static Shot
-Idle
Take home message
 Learning using primitives greatly speeds up
learning and allows more complex problems to
be performed by robots.
 Memory based learning makes learning from
observation easy.
 I created a way to do memory based
reinforcement learning.


Problem is no fixed set of parameters to adjust.
Learn by adjusting distance function.
 Present algorithms that learn from both
observation and practice.
Bentivegna Thesis, July 2004
Observe Critical Events in Marble
Maze
Raw Data
Bentivegna Thesis, July 2004
Observe Critical Events in Marble
Maze
Raw Data
Wall Contact Inferred
Bentivegna Thesis, July 2004
Observe Critical Events in Air Hockey
Paddle X
Human paddle
movement
Puck Y
Puck
movement
Puck X
Paddle Y
Shots made
by human
+y
+x
Bentivegna Thesis, July 2004
Learning From Observation
 Memory-based learner: Learn by storing
experiences.
 Primitive selection: K-nearest neighbor.
 Sub-goal generation: Kernel regression
(distance weighted averaging) based on
remembered primitives of the appropriate type.
 Action generation: Learned or fixed policy.
Bentivegna Thesis, July 2004
Three Level Structure
Primitive
Selection
Sub-goal
Generation
Action
Generation
Bentivegna Thesis, July 2004
Learning from Observation Framework
Learning from
Observation
Primitive
Selection
Sub-goal
Generation
Action
Generation
Bentivegna Thesis, July 2004
Observe Primitives Performed by a
Human
20
15
◊-Guide
○-Roll To Corner
□- Roll Off Wall
*-Roll From Wall
Bentivegna Thesis, JulyX-Leave
2004
Corner
10
5
0
0
5
10
15
20
25
Primitive Database
 Create a data point for each observed primitive.
 The primitive type performed: TYPE
 State of the environment at the start of the primitive
performance.


( M x , M y , M x , M y , Bx , By )
State of the environment at the end of the primitive
performance.

( EM x , EM y , EM x , EM y , EBx , EBy )
( M x , M y , M x , M y , Bx , By )
(TYPE , EM x , EM y , EM x , EM y , EBx , EBy )
Bentivegna Thesis, July 2004
Marble Maze Example
Primitive
Selection
Sub-goal
Generation
Action
Generation
Bentivegna Thesis, July 2004
Primitive Type Selection
 Lookup using environment state.

q  (M x , M y , M x , M y , Bx , By )
Primitive
Selection
 Weighted nearest neighbor.
d ( x, q) 
2
w
(
x

q
)
j j j j
 ,M
 ,B ,B )
(M x , M y , M
x
y
x
y
 , EM
 , EB , EB )
(TYPE , EM x , EM y , EM
x
y
x
y
 Many ways to select a primitive type.
 Use closest point.
 Use n nearest points to vote.
 Highest frequency.
 Weighted by distance from the query point.
Bentivegna Thesis, July 2004
Sub-goal
Generation
Action
Generation
Sub-goal Generation
 Locally weighted average over
nearby primitives (data points) of
the same type.
 ,M
 ,B ,B )
(M x , M y , M
x
y
x
y

 , EB , EB )
(TYPE , EM x , EM y , EM x , EM
y
x
y
 Use a kernel function to control
Primitive
Selection
Sub-goal
Generation
Action
Generation
the influence of nearby data
points.

K d   e
d 2
n
y ( q) 
 y K (d ( x , q))
i
i
1
n
 K (d ( x , q))
i
1
Bentivegna Thesis, July 2004
( EM x , EM y , EM x , EM y , EBx , EBy )
Action Generation
 Provides the action (motor
command) to perform at each
time step.
 LWR, neural networks, physical
model, etc.
Bentivegna Thesis, July 2004
Primitive
Selection
Sub-goal
Generation
Action
Generation
Creating an Action Generation
Module (Roll to Corner)
 Record at each time step from the beginning to
the end of the primitive:
 , M , B , B )
 Environment state: ( M x , M y , M
x
y
x
y
Actions taken: ( Bx , B y )
 , EM , EB , EB )
 End state: ( EM x , EM y , EM
x
y
x
y

Observed
environment states
Start of the
Roll to Corner
Primitive
End of the
Roll to Corner Primitive
Incoming
Velocity Vector
 ,M
 , B , B ), ( EM , EM , EM
 , EM
 , EB , EB ))
(( M x , M y , M
x
y
x
y
x
y
x
y
x
y
( B , B )
Bentivegna Thesis,xJuly 2004
y
Transform to a Local Coordinate
Frame
 Global information
 ,M
 , B , B ), ( EM , EM , EM
 , EM
 , EB , EB ))
(( M x , M y , M
x
y
x
y
x
y
x
y
x
y
( Bx , B y )
 Primitive specific local information.
 , B , B ), ( EM
 , EB , EB ))
(( X , M
x
x
y
x
x
y
( Bx , B y )
Reference
Point
+Y
Dist to the end
+X
Bentivegna Thesis, July 2004
Learning the Maze Task from Only
Observation
Bentivegna Thesis, July 2004
Related Research: Primitive
Recognition
 Survey of research in human motion analysis
and recognizing human activities from image
sequences.

Aggarwal and Cai.
 Recognize over time.


HMM, Brand, Oliver, and Pentland.
Template matching, Davis and Bobick.
 Discover Primitives.

Fod, Mataric, and Jenkins.
Bentivegna Thesis, July 2004
Related Research: Primitive
Selection
 Predefined sequence.


Virtual characters, Hodgins, et al., Faloutsos, et
al., and Mataric et al.
Mobile robots, Balch, et al. and Arkin, et al.
 Learn from observation.

Assembly, Kuniyoshi, Inaba, Inoue, and Kang.
 Use a planning system.


Assembly, Thomas and Wahl.
RL, Ryan and Reid.
Bentivegna Thesis, July 2004
Related Research: Primitive
Execution
 Predefine execution policy.
 Virtual characters, Mataric et al., and Hodgins et
al.
 Mobile robots, Brooks et al. and Arkin.
 Learn while operating in the environment.
 Mobile robots, Mahadevan and Connell
 RL, Kaelbling, Dietterich, and Sutton at al.
 Learn from observation
 Mobile robots, Larson and Voyles, Hugues and
Drogoul, Grudic and Lawrence.
 High DOF robots, Aboaf et al., Atkeson, and
Schaal.
Bentivegna Thesis,
July 2004
Review
Learning from
Observation
Primitive
Selection
Sub-goal
Generation
Action
Generation
Bentivegna Thesis, July 2004
Using Only Observed Data
 Tries to mimic the teacher.



Can not always perform primitives as well as
the teacher.
Sometimes select the wrong primitive type for
the observed state.
Does not know what to do in states it has not
observed.
 No way to know it should try something
different.
 Solution: Learning from practice.
Bentivegna Thesis, July 2004
Improving Primitive Selection and Subgoal Generation from Practice
Learning from
Observation
Primitive
Selection
Sub-goal
Generation
Learning from
Practice
Bentivegna Thesis, July 2004
Action
Generation
Improving Primitive Selection and
Sub-goal Generation Through Practice
 Need task specification information to create a
reward function.
 Learn by adjusting distance to query: Scale
distance function by value of using a data
point.
d ( x, q) 

 w (x
j
j
j
 qj )
2
d ( x, q)  d ( x, q)  f ( x, q)
f(data point location, query location) related to Q
value: 1/Q or exp(-Q)
 Associate scale values with each data point.
 The scale values must be stored, and learned.
Bentivegna Thesis, July 2004
Store Values in Function Approximator
 Look-up table.

Fixed size.
 Locally Weighted Projection Regression
(LWPR), Schaal, et al.
 Create a model for each data point.

Indexed by the difference between the query
point and data point’s state (delta-state).
( M x , M y , M x , M y , Bx , By )
(TYPE , EM x , EM y , EM x , EM y , EBx , EBy )
Bentivegna Thesis, July 2004
Learn Values Using a Reinforcement
Learning Strategy

Q( st , at )  Q( st , at )    rt 1   max Q( st 1 , a)  Q( st , at )
a
 State: delta-state.
 Action: Using this data point.
 Reward Assignment


Positive: Making progress through the maze.
Negative:
 Falling into a hole.
 Going backwards through the maze.
 Taking time performing the primitive.
Bentivegna Thesis, July 2004

Learning the Value of Choosing a Data
Point (Simulation)
Testing area
Incoming
Velocity Vector
Computed Scale Values
+Y
(12.9,18.8)
+X
Observed Roll Off
Wall Primitive
= Two marble positions with the incoming velocity
as shown when the LWPR model associated with the
Roll Off Wall primitive shown is queried.
BAD
GOOD
Bentivegna Thesis, July 2004
Maze Learning from Practice
Real World
Simulation
Cumulative failures/meter
200
180
160
Obs.
Only
140
120
100
Table
80
60
40
LWPR
20
0
Bentivegna Thesis, July 2004
0
50
100
150
200
250
300
Learning New Strategies
Human
Learning From Observation
Bentivegna Thesis, July 2004
After Learning From Practice
Learning Action Generation from
Practice
Learning from
Observation
Primitive
Selection
Sub-goal
Generation
Learning from
Practice
Bentivegna Thesis, July 2004
Action
Generation
Improving Action Generation Through
Practice
 Environment changes over time.
 Need to compensate for structural modeling
error.
 Can not learn everything from only observing
others.
Bentivegna Thesis, July 2004
Knowledge for Making a Hit
Target
Line
Target
Location
Absolute
Post-hit
Velocity
Hit Line
Hit
Path of the Location
incoming puck
 After hit location has been determined.
Puck
 Puck movement
Impact
Robot
Motion
 Puck-paddle collision
 Paddle placement
Target
Outgoing
Incoming
Robot
 Location
Paddle movement
timingPaddle Velocity
Puck Velocity
Trajectory
Bentivegna Thesis, July 2004
Results of Learning Straight Shots
(Simulation)
 Observed
44 straight shots made by the
Puck
human. Motion
Impact
Robot
 Running average of 5 shots.
Target Error(meters)

Target
Too
much
Location
Bentivegna Thesis, July 2004
Outgoing
Incoming
Robot
noise
in hardware
Puck Velocity
Paddle sensing.
Velocity
Trajectory
Practice
100 Shots
Number of shot taken
Robot Model (Real World)
Puck
Motion
Target
Location
Bentivegna Thesis, July 2004
Impact
Outgoing
Puck Velocity
Robot
Incoming
Paddle Velocity
Robot
Trajectory
Obtaining Proper Robot Movement
 Six set robot configurations.
 Interpolate between the four
surrounding configurations.
Compute
joint angles
P(x,y)
+Y
+X
 Paddle command:


Desired end location and time of the trajectory, (x, y, t).
Follows fifth-order polynomial equation, zero start and end
velocity and acceleration.
Bentivegna Thesis, July 2004
Robot Model
Desired state
of the puck at
hit time.
Compute the
movement
command
Robot
trajectory
Generate
robot
trajectory
(x, y, t)
(x, y, t)
Pre-set
time delay
0.3
0.25
0.2
0.15
0.1
+y
0.05
Starting location
+x
Bentivegna Thesis, July 2004
0
0.2
0.22
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
Robot Movement Errors
0.3
Observed path
of the paddle.
0.25
Desired hit location –
Location of highest
paddle velocity -
Desired
trajectory
0.2
0.15
0.1
+y
0.05
Starting location
+x
0
0.2
0.22
0.24
0.26
0.28
0.3
0.32
0.34
0.36
0.38
0.4
 Movement accuracy determined by many factors.
 Speed of the movement.
 Friction between the paddle and the board.
 Hydraulic pressure applied to the robot.
 Operating within the designed performance parameters.
Bentivegna Thesis, July 2004
Robot Model
Puck
Motion
Target
Location
Impact
Outgoing
Puck Velocity
Robot
Incoming
Paddle Velocity
Robot
Trajectory
 Learn to properly place the paddle.
 Learn the timing of the paddle.
 Observe its own actions:
Actual hit point (highest velocity point).
 Time from when the command is given to the
time the paddle observed at the hit position.
Bentivegna Thesis, July 2004

Improving the Robot Model
Desired state
of the puck at
hit time.
Robot the
Compute
Movement
movement
LWPR
command
(x, y, t)
(x, y, t)
Time
delay
Pre-set
Timing info. from
LWPR
time delay
Bentivegna Thesis, July 2004
Generate
robot
trajectory
(x, y, t)
Robot
trajectory
Using the Improved Robot Model
Desired hit location –
Location of highest
paddle velocity -
Desired
trajectory
Observed path
of the paddle.
+y
+x
Bentivegna Thesis, July 2004
Starting location
Using the Improved Robot Model
Desired hit location –
Location of highest
paddle velocity -
Desired
trajectory
Observed path
of the paddle.
+y
+x
Bentivegna Thesis, July 2004
Starting location
Real-World Air Hockey
Bentivegna Thesis, July 2004
Major Contributions
 A framework has been created as a tool in which to perform
research in learning from observation using primitives.


Flexible structure allows for the use of various learning
algorithms.
Can also learn from practice.
 Presented learning methods that can learn quickly from
observed information and also have the ability to increase
performance through practice.
 Created a unique algorithm that gives a robot the ability to
learn the effectiveness of data points in a data base and
then use that information to change its behavior as it
operates in the environment.
 Presented a method of breaking the learning problem into
small learning modules.
Individual modules have more opportunities to learn and
generalize.
Bentivegna Thesis,
July 2004

Some Future Directions
 Automatically defining primitive types.
 Explore how to represent learned information so it can
be used in other tasks/environments.

Can robots learn about the world from playing these
games?
 Explore other ways to select primitives and sub-goals.
 Use the observed information to create a planner.
 Investigate methods of exploration at primitive
selection and sub-goal generation.
 Simultaneously learn primitive selection and action
generation.
Bentivegna Thesis, July 2004
Download