torrey.ecml06.ppt

Skill Acquisition via Transfer Learning and Advice Taking Lisa Torrey, Jude Shavlik, Trevor Walker University of Wisconsin-Madison, USA Richard Maclin University of Minnesota-Duluth, USA Transfer Learning Agent learns Task A Agent encounters related Task B Task A is the source. Task B is the target. Agent discovers how tasks are related So far the user provides this info to the agent Agent uses knowledge from Task A to learn Task B faster Transfer Learning performance The goal for the target task: with transfer without transfer training Reinforcement Learning Overview Described by a set of features Observe world state Use the rewards to estimate the Qvalues of actions in states Take an action Receive a reward Policy: choose the action with the highest Q-value in the current state Transfer in Reinforcement Learning  What knowledge will we transfer from the source?     How will we extract that knowledge from the source?    Q-functions (Taylor & Stone 2005) Policies (Torrey et al. 2005) Skills (this work) From Q-functions (Torrey et al. 2005) From observed behavior (this work) How will we apply that knowledge in the target?   Model reuse (Taylor & Stone 2005) Advice taking (Torrey et al. 2005, this work) Advice Taking  Advice: instructions for the learner IF: condition In these states Qaction1 > Qaction2 THEN: prefer action  Apply advice as soft constraints (KBKR, 2005) For each action, find the Q-function that minimizes: Complexity of Q-function + Error on Training Data + Disagreement with Advice Experimental Domain: RoboCup KeepAway (KA/MKA) MoveDownfield (MD) Keep the ball Cross the line Stone & Sutton 2001 Torrey et al. 2006 BreakAway (BA) Score a goal Maclin et al. 2005 Different objectives, but a transferable skill: passing to teammates A Challenge for Skill Transfer  Shared skills are not exactly the same   Skills have general and specific aspects Aspects of the pass skill in RoboCup    General: teammate must be open Game-specific: where teammate should be located Player-specific: whether teammate is nearest or furthest I’m open and far from you. Pass to me! I’m open and near the goal. Pass to me! Addressing the Challenge  We focus on learning general skill aspects   We learn skills that apply to multiple players   These should transfer better This generalizes over player-specific aspects We allow humans to provide information  They can point out game-specific aspects Human-Provided Information   User provides a mapping to show task similarities May also provide user advice about task differences Pass Ø Ø Pass towards goal Move towards goal Shoot at goal Our Transfer Algorithm Observe source task games to learn skills Translate learned skills into transfer advice Create advice for the target task If there is user advice, add it in Learn target task with KBKR Learning Skills By Observation State 1: distBetween(me,teammate2) = 15 distBetween(me,teammate1) = 10 distBetween(me,opponent1) = 5 ... action = pass(teammate2) outcome = caught(teammate2)   Source-task games are sequences: (state, action) Learning skills is like learning to classify states by their correct actions  We use Inductive Logic Programming to learn classifiers Advantages of ILP  Can produce first-order rules for skills   Capture only the essential aspects of the skill We expect these aspects to transfer better pass(teammate1) pass(Teammate) vs. . . . pass(teammateN)  Can incorporate background knowledge Preparing Datasets for ILP action = pass(Teammate) ? yes Q(pass) is high? no no no yes Q(pass) is highest? no yes no yes outcome = caught(Teammate) ? Q(other) is high? Q(pass) is lower? no yes yes Positive example for pass(Teammate) Reject example Negative example for pass(Teammate) Example of a Skill Learned pass(Teammate) :distBetween(me, Teammate) > 14, passAngle(Teammate) > 30, passAngle(Teammate) < 150, distBetween(me, Opponent) < 7. Technical Challenges  KBKR requires propositional advice   Variables in rule bodies create disjunctions   We instantiate each rule head We use tile features to translate them Variables can appear multiple times  We create new features to translate them Two Experimental Scenarios Pass Ø Ø 4-on-3 MKA 3-on-2 BA Pass MoveAhead Ø 3-on-2 MD Pass towards goal Move towards goal Shoot at goal 3-on-2 BA Pass MoveAhead Shoot at goal Skill Transfer Results 0.6 Probability of Goal From MKA 0.5 Without transfer 0.4 From MD 0.3 0.2 0.1 0 0 1000 2000 3000 Games Played 4000 5000 Breakdown of MKA Results 0.6 Probability of Goal 0.5 0.4 0.3 all advice transfer advice only user advice only no advice 0.2 0.1 0 0 1000 2000 3000 Games Played 4000 5000 What if User Advice is Bad? 0.6 Probability of Goal 0.5 0.4 0.3 Transfer with good advice Transfer with bad advice Bad advice only No advice 0.2 0.1 0 0 1000 2000 3000 Games Played 4000 5000 Related Work  Q-function transfer in RoboCup   Transfer via policy reuse    Taylor & Stone (AAMAS 2005, AAAI 2005) Fernandez & Veloso (AAMAS 2006, ICML workshop 2006) Madden & Howley (AI Review 2004) Transfer via relational RL  Driessens et al. (ICML workshop 2006) Summary of Contributions  Transfer of shared skills in high-level logic   Demonstration of the value of user guidance   Despite differences in shared skills Easy to give and beneficial Effective transfer in the RoboCup domain  Challenging and dissimilar tasks Future Work    Learn more general skills by combining multiple source tasks Compare several transfer methods on RoboCup scenarios of varying difficulty Reach similar levels of transfer with less user input Acknowledgements   DARPA Grant HR0011-04-1-0007 US Naval Research Laboratory Grant N00173-06-1-G002 Thank You User Advice IF: distBetween(me,goal) < 10 angle(goal, me, goalie) > 40 AND THEN: prefer shoot This is the part that came from transfer IF: distBetween(me,goal) > 10 THEN: prefer move_ahead IF: [transferred conditions] distBetween(Teammate,goal) < distBetween(me,goal) THEN: prefer pass(Teammate) AND Feature Tiling Original feature Tiling #1 Tiling #2 … Tiling #8 Tiling #9 Tiling #10 max value min value … (16 tiles) (8 tiles) Tiling #11 (8 tiles) Propositionalizing Rules  Step 1: rule head pass(Teammate) :distBetween(me, Teammate) > 14, … pass(teammate1) :distBetween(me, teammate1) > 14, … … pass(teammateN) :distBetween(me, teammateN) > 14, … Propositionalizing Rules  Step 2: single-variable disjunctions distBetween(me, Opponent) < 7 distBetween(me,opponent1) < 7 OR … OR distBetween(me,opponentN) < 7 distBetween(me,opponent1)[0,7] + … + distBetween(me,opponentN )[0,7] ≥ 1 Propositionalizing Rules  Step 3: linked-variable disjunctions distBetween(me, Player) > 14, distBetween(Player, goal) < 10 newFeature(player1) + … + newFeature(playerN) ≥ 1 newFeature(Player) :Dist1 is distBetween(me, Player), Add to target task Dist2 is distBetween(Player, goal), feature space: Dist1 > 14, Dist2 < 10.

torrey.ecml06.ppt

Related documents

Products

Support

torrey.ecml06.ppt

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib