Beyond Chunking: Learning in Soar March 22, 2003 John E. Laird Shelley Nason, Andrew Nuxoll and a cast of many others University of Michigan Research Methodology in Cognitive Architecture 1. Pick basic principles to guide development 2. Pick desired behavioral capabilities 3. Make design decisions consistent above 4. Build/modify architecture 5. Implement tasks 6. Evaluate performance Soar Basic Principle: Knowledge vs. Problem Search • Knowledge Search • Finds knowledge relevant to current situation • Architectural – not subject to change with new knowledge • Not combinatorial or generative • Problem Search • Controlled by knowledge, arises from lack of knowledge • Subject to improvement with additional knowledge • Generative – combinatorial Desired Behavioral Capabilities • • • • • • • • • • • Interact with a complex world - limited uncertain sensing Respond quickly to changes in the world Use extensive knowledge Use methods appropriate for tasks Goal-driven Meta-level reasoning and planning Generate human-like behavior Coordinate behavior and communicate with others Learn from experience Integrate above capabilities across tasks Behavior generated with low computational expense Example Tasks The horse raced past the barn fell R1-Soar NL-Soar Amber EPIC-Soar TacAir-Soar & RWA-Soar Soar Hauntbot Soar Quakebot Soar MOUTbot Soar 101 Input Propose Operator If cell in direction <d> is not a wall, --> propose operator move <d> Compare Operators If operator <o1> will move to a empty cell and operator <o2> bonus food --> will move to a normal food, operator <o1> < --> operator <o1> > <o2> East North South North North>>East East South South<> East North = South Select Operator movedirection North Working Memory Apply Operator Output If an operator is selected to move <d> --> create output move-direction <d> Production Memory Soar 102: Subgoals Input Propose Operator Compare Operators Tie Impasse North > East South > East East North = South North Select Operator Apply Operator Output Chunking creates rules that create preferences based on what was tested South Evaluate-operator = 10 (North) Evaluate-operator = 10 (South) = 10 North = 10 Evaluate-operator = 5 (East) Chunking creates rule that applies evaluate-operator Learning Results 1400 1200 Score 1000 random 800 look-ahead no chunk look-ahead during chunking 600 look-ahead after chunking 400 200 0 1 101 201 301 401 501 601 701 801 901 1001 Decisions Soar 102: Dynamic Task Decomposition Fly-Wing Execute Mission If instructed to intercept an enemy then propose intercept Intercept If intercepting an enemy and the enemy is within range ROE are met then propose employ-weapons Fly-route Ground Attack Execute Tactic Achieve Proximity Employ Weapons Search Scram Get Missile LAR Select Missile Launch Missile Get Steering Circle Sort Group Lock Radar Lock IR Fire-Missile Wait-for Missile-Clear >250 goals, >600 operators, >8000 rules If employing-weapons and missile has been selected and the enemy is in the steering circle and LAR has been achieved, then propose launch-missile If launching a missile and it is an IR missile and there is currently no IR lock then propose lock-IR Chunking • Simple architectural learning mechanism • Automatically build rules that summarize/cache processing • Converts deliberate reasoning/planning to reaction • Problem search => knowledge search • Problem solving in subgoals determines what is learned • Supports deliberate/reflective learning • Leads to many different types of learning strategies • If reasoning is inductive, so is learning Why Beyond Chunking? • Chunking requires deliberate processing (operators) to • record experiences • capture statistical regularities • learn new concepts (data chunking) • Processing for these is done only because we want the learning, not because it is performing a task • Learning competes with task at hand • Hard to implement, hard to use • Are there other architectural learning mechanisms? Episodic Learning [Andrew Nuxoll] • What is it? • Not facts or procedures but memories of specific events • Recording and recalling of experiences with the world • Characteristics of Episodic Memory • • • • Autobiographical Not confused with original experience Runs forward in time Temporally annotated • Why add to Soar architecture? • • • • Not appropriate as reflective learning Provides personal history and identity Memories that can aid future decision making & learning Can generalize and analyze when time and more knowledge are available Episodic Learning • When is a memory recorded? • Fixed period of time • “Significant” event • Significant change in highest activated working memory elements • What are the cues for retrieval? • • • • Everything Only input Most “activated” input / everything Domain specific features • Is retrieval automatic or deliberate? • What is retrieved? • Changes to input • Changes to working memory • Changes to activated • How is the memory stored? • As production rule • What’s missing • Sense of the time when episode occurred • Current implementation is not task independent Episodic Recall Implementation Input Propose Operator Compare Operators Select Operator Apply Operator Output Tie Impasse East North South Evaluate-operator (North) If a memory matches, it computes correct next state If no memory matches, returns default evaluation [3]. North = 10 3 Two Approaches 1. On-line • • • Build memories as actions taken Attempt to recall memories during look-ahead Chunk use of memories during look-ahead 2. Off-line • • • Randomly explore while memories are recorded Off-line attempt to recall and learn from recorded memories Chunk use of memories during look-ahead On-line Episodic Learning 1400 1200 greedy 1000 random epmem chunk 5 epmem chunk 4 800 epmem chunk 3 epmem chunk 2 600 epmem chunk 1 epmem 2 epmem 1 400 leonard 200 0 1 66 131 196 261 326 391 456 521 586 651 716 781 846 911 976 Decisions On-Line Episodic Learning 900 800 700 600 Score greedy epmem chunked iter 5 500 epmem-chunked iter 4 400 epmem random 300 200 100 0 1 11 21 31 41 51 61 71 81 91 101 111 121 131 141 151 161 Actions Off-Line Episodic Learning 1400 1200 greedy 1000 epmem reflect iter 5 epmem reflect iter 4 800 Score epmem reflect iter 3 epmem reflect iter 2 random 600 epmem reflect iter 1 epmem chunked iter 5 400 leonard 200 0 1 98 195 292 389 486 583 Decisions 680 777 874 971 Reinforcement Learning [Shelley Nason] • Why add it to Soar? • Might capture statistical regularities automatically/architecturally • Chunking can do this only via deliberate learning • Why Soar? • Potential to integrate RL with complex problem solver • Quantifiers, hierarchy, … • How can RL fit into Soar? • Learn rules that create numeric probabilistic preferences for operators • Used only when symbolic preferences are inconclusive • Decision based on all preferences that are recalled for an operator • Why is this going to be cool? • Dynamically compute Q-values based on all rules that match state • Get transfer at different levels of generality Example Numeric Preferences North = 8 North =12 East North South North =15 =8 North = 48/6 = 8 North =1 North =2 North =10 Reinforcement Learning State A State B = 10 North East = 6 North = 11 South = 3 Create rule that creates numeric preference for North in state A using values in State B and max(proposed operators) according to standard RL Conditions of rule? > Current: all of state > Future: what was tested to produce evaluation of State B but existed in State A Score 1200 Reinforcement Learning Results Greedy 1000 Learned Learning 3 Learning 2 800 Learning 1 Random 600 400 200 100 200 300 Actions 400 500 Architectural Learning • • • • Automatic & ubiquitous Task independent & fixed Bounded processing Single experience-based Deliberate/Reflective Learning • Deliberately engaged • “On top” of architecture • Uses knowledge to control • Uses architectural learning • Can change with learning • Unbounded processing • Can generalize across multiple examples through recall • Examples: • • • • Chunking Episodic learning Reinforcement learning Semantic/concept learning? • Examples: • • • • • Task acquisition Learning by instruction Learning by analogy Recovery from incorrect knowledge …