torrey.ecml06.ppt

advertisement
Skill Acquisition
via Transfer Learning
and Advice Taking
Lisa Torrey, Jude Shavlik, Trevor Walker
University of Wisconsin-Madison, USA
Richard Maclin
University of Minnesota-Duluth, USA
Transfer Learning
Agent learns Task A
Agent encounters related Task B
Task A is
the source.
Task B is
the target.
Agent discovers how tasks are related
So far the
user provides
this info to
the agent
Agent uses knowledge from Task A to learn Task B faster
Transfer Learning
performance
The goal for the target task:
with transfer
without transfer
training
Reinforcement Learning Overview
Described by a
set of features
Observe world state
Use the rewards
to estimate the Qvalues of actions
in states
Take an action
Receive a reward
Policy: choose the
action with the
highest Q-value in
the current state
Transfer in Reinforcement Learning

What knowledge will we transfer from the source?




How will we extract that knowledge from the source?



Q-functions (Taylor & Stone 2005)
Policies (Torrey et al. 2005)
Skills (this work)
From Q-functions (Torrey et al. 2005)
From observed behavior (this work)
How will we apply that knowledge in the target?


Model reuse (Taylor & Stone 2005)
Advice taking (Torrey et al. 2005, this work)
Advice Taking

Advice: instructions for the learner
IF: condition
In these states
Qaction1 >
Qaction2
THEN: prefer action

Apply advice as soft constraints (KBKR, 2005)
For each action, find the Q-function that minimizes:
Complexity of
Q-function
+
Error on
Training Data
+
Disagreement
with Advice
Experimental Domain: RoboCup
KeepAway (KA/MKA)
MoveDownfield (MD)
Keep the ball
Cross the line
Stone & Sutton 2001
Torrey et al. 2006
BreakAway (BA)
Score a goal
Maclin et al. 2005
Different objectives, but a transferable skill: passing to teammates
A Challenge for Skill Transfer

Shared skills are not exactly the same


Skills have general and specific aspects
Aspects of the pass skill in RoboCup



General: teammate must be open
Game-specific: where teammate should be located
Player-specific: whether teammate is nearest or furthest
I’m open and
far from you.
Pass to me!
I’m open and
near the
goal. Pass to
me!
Addressing the Challenge

We focus on learning general skill aspects


We learn skills that apply to multiple players


These should transfer better
This generalizes over player-specific aspects
We allow humans to provide information

They can point out game-specific aspects
Human-Provided Information


User provides a mapping to show task similarities
May also provide user advice about task differences
Pass
Ø
Ø
Pass towards goal
Move towards goal
Shoot at goal
Our Transfer Algorithm
Observe source task
games to learn skills
Translate learned
skills into transfer
advice
Create advice for the
target task
If there is user
advice, add it in
Learn target task
with KBKR
Learning Skills By Observation
State 1:
distBetween(me,teammate2) = 15
distBetween(me,teammate1) = 10
distBetween(me,opponent1) = 5
...
action = pass(teammate2)
outcome = caught(teammate2)


Source-task games are sequences: (state, action)
Learning skills is like learning to classify states by
their correct actions

We use Inductive Logic Programming to learn classifiers
Advantages of ILP

Can produce first-order rules for skills


Capture only the essential aspects of the skill
We expect these aspects to transfer better
pass(teammate1)
pass(Teammate)
vs.
.
.
.
pass(teammateN)

Can incorporate background knowledge
Preparing Datasets for ILP
action = pass(Teammate) ?
yes
Q(pass) is high?
no
no
no
yes
Q(pass) is highest?
no
yes
no
yes
outcome = caught(Teammate) ?
Q(other) is high?
Q(pass) is lower?
no
yes
yes
Positive example for
pass(Teammate)
Reject
example
Negative example for
pass(Teammate)
Example of a Skill Learned
pass(Teammate) :distBetween(me, Teammate) > 14,
passAngle(Teammate) > 30,
passAngle(Teammate) < 150,
distBetween(me, Opponent) < 7.
Technical Challenges

KBKR requires propositional advice


Variables in rule bodies create disjunctions


We instantiate each rule head
We use tile features to translate them
Variables can appear multiple times

We create new features to translate them
Two Experimental Scenarios
Pass
Ø
Ø
4-on-3 MKA
3-on-2 BA
Pass
MoveAhead
Ø
3-on-2 MD
Pass towards goal
Move towards goal
Shoot at goal
3-on-2 BA
Pass
MoveAhead
Shoot at goal
Skill Transfer Results
0.6
Probability of Goal
From MKA
0.5
Without transfer
0.4
From MD
0.3
0.2
0.1
0
0
1000
2000
3000
Games Played
4000
5000
Breakdown of MKA Results
0.6
Probability of Goal
0.5
0.4
0.3
all advice
transfer advice only
user advice only
no advice
0.2
0.1
0
0
1000
2000
3000
Games Played
4000
5000
What if User Advice is Bad?
0.6
Probability of Goal
0.5
0.4
0.3
Transfer with good advice
Transfer with bad advice
Bad advice only
No advice
0.2
0.1
0
0
1000
2000
3000
Games Played
4000
5000
Related Work

Q-function transfer in RoboCup


Transfer via policy reuse



Taylor & Stone (AAMAS 2005, AAAI 2005)
Fernandez & Veloso (AAMAS 2006, ICML workshop 2006)
Madden & Howley (AI Review 2004)
Transfer via relational RL

Driessens et al. (ICML workshop 2006)
Summary of Contributions

Transfer of shared skills in high-level logic


Demonstration of the value of user guidance


Despite differences in shared skills
Easy to give and beneficial
Effective transfer in the RoboCup domain

Challenging and dissimilar tasks
Future Work



Learn more general skills by combining
multiple source tasks
Compare several transfer methods on
RoboCup scenarios of varying difficulty
Reach similar levels of transfer with less
user input
Acknowledgements


DARPA Grant HR0011-04-1-0007
US Naval Research Laboratory
Grant N00173-06-1-G002
Thank You
User Advice
IF: distBetween(me,goal) < 10
angle(goal, me, goalie) > 40
AND
THEN: prefer shoot
This is the
part that
came
from
transfer
IF: distBetween(me,goal) > 10
THEN: prefer move_ahead
IF: [transferred conditions]
distBetween(Teammate,goal) < distBetween(me,goal)
THEN: prefer pass(Teammate)
AND
Feature Tiling
Original feature
Tiling #1
Tiling #2
…
Tiling #8
Tiling #9
Tiling #10
max value
min value
…
(16 tiles)
(8 tiles)
Tiling #11
(8 tiles)
Propositionalizing Rules

Step 1: rule head
pass(Teammate) :distBetween(me, Teammate) > 14,
…
pass(teammate1) :distBetween(me, teammate1) > 14,
…
…
pass(teammateN) :distBetween(me, teammateN) > 14,
…
Propositionalizing Rules

Step 2: single-variable disjunctions
distBetween(me, Opponent) < 7
distBetween(me,opponent1) < 7 OR … OR distBetween(me,opponentN) < 7
distBetween(me,opponent1)[0,7] + … + distBetween(me,opponentN )[0,7] ≥ 1
Propositionalizing Rules

Step 3: linked-variable disjunctions
distBetween(me, Player) > 14,
distBetween(Player, goal) < 10
newFeature(player1) + … + newFeature(playerN) ≥ 1
newFeature(Player) :Dist1 is distBetween(me, Player),
Add to target task
Dist2 is distBetween(Player, goal),
feature space:
Dist1 > 14, Dist2 < 10.
Download