Option and Constraint Generation using Work Domain Analysis Presenter: Guliz Tokadli Dr. Karen Feigh Introduction Interactive Machine Learning Reinforcement Learning Work Domain Analysis Micro-world of Pac-Man Introduction Goal: To develop a method to exploit experienced humans’ knowledge to improve algorithms for machine learning using work domain analysis techniques from cognitive engineering Current practice: Using programmer derived or autoderived primitives, options and constraints, or inferring these from human coaching or demonstration How is our approach new: Provides a systematic way to mine a human’s knowledge about a domain and to translate it to a hierarchical goal structure How will we know we’ve succeeded: We will be able to generate better machine learning algorithms than current practice in a shorter period of time with less expert intervention 3 Interactive Machine Learning Reinforcement Learning is a method to generate Our Objective: Robots and other machines that can policies for agent tasked with making decisions. learn from people unfamiliar with machine learning Learning from action policy: state action algorithms. It maximizes the reward From: To: 4 Reinforcement Learning – Option & Constraint GOAL is the ultimate purpose GOAL to gain. CONSTRAINTS are defined as the set of all state action pairs that the agent should not do. CONSTRAINTS - Don’t do X. - Don’t move onto unscared ghost. OPTIONS are used for generalization of primitive actions to include temporally OPTIONS extended courses of action. An option O: <I, π, β>. PRIMITIVES are the set of fundamental actions the PRIMITIVES agent can effect such as Up, Right, Left, Down (5) A. J. Irani, J. A. Rosalia, K. Subramanian, C. L. I. Jr., and A. L. Thomaz, “Using both positive and negative instructions from humans to decompose reinforcement learning problems,” Georgia Institute of Technology, Tech. Rep., 2014. (6) R.S.Sutton and A.G.Barto, Reinforcement Learning: An Introduction. The MIT Press, 2012. 5 Work Domain Analysis Method: Abstraction Hierarchy Levels of AH Meaning of Levels for Pac-Man 5-level functional decomposition usedExamples for Functional Thecomplex goals of Pac-Man. Staying Alive modeling sociotechnicalWhat systems. Why Purposes (FP) System is described at different levels of Abstract What criteria is required to Avoid Ghosts abstraction with “Why is How” What re lationship. How Why Functions (AF) judge whether Pac-Man achieving the purposes. Generalized Functions (GF) What functions are required to accomplish Pac-Man’s abstract functions. How What Why Physical Functions (PF) The limitation and capabilities of the system, What How Why Distance to Nearest Ghost How What Ghosts, Pac-Man, Walls Physical Objects The objects 6and characters on (PO) the maze. (1)N. Naikar, R. Hopcroft, and A. Moylan, “Work domain analysis : Theoretical concepts and methodology,” pp. 5–35, 2005. (2)N. Naikar, Work Domain Analysis: Concepts, Guidelines, and Cases. CRC Press, 2013. (3)J. Rasmussen, Information Processing and Human-Machine Interaction: An Approach to Cognitive Engineering. Elsevier Science, 1986. (4) N.A. Stanton, P.M. Salmon, G.H. Walker, C. Baber and D.P. Jenkins, Human Factors Methods, Ch. 4 Cognitive Task Analysis Methods, 2005 6 Maneuver Pac-Man Micro-World of Pac-Man Classic arcade game, Is invented in 1980 Why we chose Pac-Man: Helps to understand the research problem Helps to build AH Map linking the goals Provide insight into RL policies Used for many ML and RL studies Allow the comparison of the results Find what is needed quickly and conveniently. 7 Results of Pac-Man Study Familiarization & Interview Phase Modeling Phase Option & Constraint Set Generation Phase Pac-Man Study - Method Familiarization Phase: 16 participants played PacMan for 10 minutes Interview Phase: researchers interviewed the participants using a structured interview script designed to generate abstraction hierarchies Modeling Phase: researchers created abstraction hierarchies for each player and composite abstraction hierarchies for high and low performing players Option & Constraint Set Generation Phase: researchers translated composite abstraction hierarchies into sets of options and constraints 9 Participant Performance High Performers Low Performers Familiarization and Interview Phase Modelling Phase Option & Constraint Set Generation Phase 10 Modeling Phase - Procedure 1. Creation of individual AH per player 2. Harmonization of statements in AHs 3. Combination of AHs of high and low performers Performance-based AH Familiarization and Interview Phase Modelling Phase Option & Constraint Set Generation Phase 11 WDA: Performance-Based AH Aggregated AH: Low and High Performance AHs are represented as single AH. Familiarization and Interview Phase Modelling Phase Option & Constraint Set Generation Phase 12 WDA: Performance-Based AH High Performers Low Performers Player perspective is as ‘Competitor’ Player perspective is as ‘Exploratory – Maximum score and minimum time spirit’ – Learning Tricks Having both defensive and offensive actions Having dual level of protection for Pac-Man as proactive and reactive strategy Having only defensive actions ‘Clearing current quadrant’ as action is using as tactic ‘Clearing current quadrant’ is using as strategy Quick observation to act in minimum Observation on quadrant is being time interval used for ‘Learning Tricks’ Familiarization and Interview Phase Modelling Phase Option & Constraint Set Generation Phase 13 Generation of Option and Constraint Sets High PerformerDefined OC Set Low PerformerDefined OC Set Familiarization and Interview Phase Modelling Phase Option & Constraint Set Generation Phase 14 High Performer-Defined OC Set Creation Familiarization and Interview Phase Modelling Phase Option & Constraint Set Generation Phase 15 Low Performer-Defined OC Set Creation Familiarization and Interview Phase Modelling Phase Option & Constraint Set Generation Phase 16 Conclusion Conclusion Implemented Our Goal is to develop optionaand method constraint to improve sets separately algorithmsonfor RLmachine algorithms learning and evaluation based on human of the performers experience–and ableknowledge to show the using differences Work Domain between Analysis. performers Now Will wework are able to auto-generate to generate option option and and constraint constraint sets using for different state-oflevels the-art of performance. methods Low Performers High Performers 18 Thank you! Questions and Comments? 19