Learning through Interactive Behavior Specifications Tolga Konik CSLI, Stanford University Douglas Pearson Three Penny Software John Laird University of Michigan 1 Goal Automatically generate cognitive agents Reduce the cost of agent development Reduce the expertise required to develop agents. 2 Domains Autonomous Cognitive agents Dynamic Virtual Worlds Real time decisions based on knowledge and sensed data Soar agent architecture 3 Learning by Observation Approach: Observe expert behavior Learn to replicate it Why? We may want human-like agents In complex domains, imitating humans maybe easier than learning from scratch 4 Bottleneck in pure Learning by Observation PROBLEM: You cannot observe the internal reasoning of the expert SOLUTION: Ask the expert for additional information Goal annotations Use additional knowledge sources Task & domain knowledge 5 Learning by Observation Expert Goal annotations Interface Actions Environment Percepts Learner Agent Additional Task Knowledge 6 Learning by Observation Agent Interface Environment ILP 2004 Machine Learning Journal (forthcoming) 7 Learning by Observation Critic Mode Interface Agent Environment critic Learner Expert 8 One Body, Two Minds ? ? Interface Agent Expert Environment How and when to switch control How the expert and the agent program communicate 9 Diagrammatic Behavior Specification Expert Environment Redux Agent Learner 10 Redux Visual rule editing Diagrammatic Behavior Specification 11 r2 Goal Hierarchy d3 d4 r3 i3 d2 d1 r1 i4 d5 d6 r4 Get-item(Item) Get-item-in-room(Item) Get-item-different-room(Item) Goto-next-room Go-to-door(D) Go-to(Door) Go-through(Door) Task-Performance knowledge is represented with a hierarchy of durative goals. 12 r2 Goal Hierarchy d3 d4 r3 i3 d2 d1 r1 i4 d5 d6 r4 Get-item(i3) Item=i3 Get-item-in-room(Item) Get-item-in-room(i3) Get-item-different-room(Item) Goto-next-room Go-to-door(D) Go-to(Door) Go-through(Door) 13 r2 Goal Hierarchy d3 d4 i3 r3 d2 d1 r1 i4 d5 d6 r4 Get-item(i3) Item=i3 Get-item-in-room(Item) Door=d1 Get-item-different-room(Item) Get-item-different-room(i3) Go-to(Door) Go-to(d1) Go-through(Door) 14 r2 Goal Hierarchy d3 r3 i3 d4 d2 d1 r1 i4 d5 d6 r4 Get-item(i3) Get-item-in-room(Item) Goto-next-room Get-item-different-room(i3) Door=d1 Go-to-door(D) Go-to(Door) Go-through(d1) 15 Behavior Specification Expert Agent Expert draws initial abstract situation Create senario by selecting actions 17 Goal Specification Expert Agent Goals are explicitly selected The agent contributes based on the current situation, current goal and its knowledge 18 Goal Hierarchy Learning by Observation perspective Learning Perspective Unobservable mental reasoning of the expert Bias hypothesis space “learn agent” problem reduced to “learn goal selection and termination” MI Perspective information exchange between the expert and the agent 20 Relevant Knowledge Specification Prepare food Expert Agent Expert can mark important objects in a decision 21 Rich Behavior Trace Expert specified undesired actions and goals Expert rejected actions and goals of the approximately learned agent program Watch TV 22 Rich Behavior Trace Hypothetical Actions and Goals Situation history : a tree structure of possible behaviors 23 Relational Learning by Observation Input: Output: Relational Situations Goal and action selections and rejections Additional annotations (i.e. important objects) Background knowledge Rule based agent program Learn goal/action selection/termination generalizing over multiple examples Inductive Logic Programming to combine rich knowledge structures 24 Relational Learning by Observation 25 Relational Learning by Observation Find the common structures in the decision examples 26 Relational Learning by Observation ? Learn relations between what the agent wants, perceives and knows. “Select a door in the current room, which leads to a room that contains the item the agent wants to get” 27 Summary Diagrammatic behavior specification approach: To extract rich behavior knowledge Interactive behavior specification Communication medium between the agents (explicit goals and assumed situation) Relational learning by observation approach to combine multiple complex knowledge sources 32 Future Work Improve mixed initiative interaction of the interface Explore domain independent diagrammatic interface features Allow the expert to enter context sensitive knowledge 33 Mixed initiative perspective Interactive behavior specification Diagrammatic representation of behavior communication medium between the agents Explicit goals and desired behavior Facilitates interaction between the agents 34