Functional Constraints on Architectural Mechanisms Christian Lebiere (cl@cmu.edu) Carnegie Mellon University Bradley Best (bjbest@adcogsys.com) Adaptive Cognitive Systems Introduction • Goal: Strong Cogsci – single integrated model of human abilities that is robust, adaptive and general • Not just an architecture that supports it (Newell test evaluation) but a system that actually does it • Not strong AI (means matter), weak Cogsci (general) • Plausible strategies: • Build a single model from scratch – traditional AI strategy • Incremental assembly – successful CS system strategy • But little/no reuse of models limits complexity! 7/29/09 2009 ACT-R Workshop 2 Model Fitting Constraint • Fitting computational models to human data is the “coin of the realm” of cognitive modeling • Is it a sufficient constraint to achieve convergence toward the goal of model integration and robustness • Good news: cognitive architectures are increasingly converging toward a common modular organization • Bad news: still very little model reuse – almost every task results in a new model developed tabula rasa • Question: have we gotten right the tradeoff between precision (fitting data) and generality (reuse/integration)? 7/29/09 2009 ACT-R Workshop 3 You Can’t Play 20 Models… • 35 years ago Newell raised a similar issue with convergence in experimental psychology • He diagnosed much of the issue with the lack of emphasis on the control structure to solve a problem • He offered 3 prognoses for “putting it together”: – Complete processing models (and PS suggestion) – check! – Analyze a complex task (chess suggestion) – progress but… – One program for many tasks (integration, e.g WAIS) – fail? • What have been the obstacles to putting it together? 7/29/09 2009 ACT-R Workshop 4 Obstacles to Integration • Models tend to be highly task-specific – they usually cannot be used directly even for closely related tasks • They tend to represent the final point of the process from initial task discovery to final asymptotic behavior • Modeler’s meta-cognitive knowledge of the task gets hardwired into the model • Experience with High-Level Language (HLSR) compilation • Task discovery processes, including metacognitive processes, should be part of the model/architecture • Tackles broader category of tasks through adaptation 7/29/09 2009 ACT-R Workshop 5 Forcing Functions for Integration • Model comparison challenges (e.g. DSF) that feature: – Breadth of applicability (e.g. multiple conditions) – Unknown conditions and/or data (tuning vs testing sets) – Integration of multiple functionalities (control, prediction) • Unpredictable domains, e.g. adversarial behavior: – Breadth and variability of behavior – Constant push for adaptivity and unpredictability – Strong incentive to maximize functionality • Architectural implications to model integration? – Focus on both control and representation structure 7/29/09 2009 ACT-R Workshop 6 A Tour of Four Modules • All modules have shortcomings in robustness and generality • Ability to craft models for lab tasks does not guarantee plausible behavior in open-ended situations 7/29/09 Intentional Module Goal Buffer Working Memory Module Imaginal Buffer Declarative Module Retrieval Buffer Procedural Module Visual Buffer Manual Buffer Vision Module Motor Module Environment 2009 ACT-R Workshop 7 Module 1: Declarative • Base-level learning can lead to looping if unchecked – Most active chunk is retrieved, then its activation boosted… • Very hard to control if compiling higher-level model – Many logical conditions require repeated retrieval loops • Old solution: tag chunk on retrieval (e.g. list learning) +retrieval> isa item index =index - retrieved =goal =retrieval> retrieved =goal • New solution: declarative finsts to perform tagging (sgp :declarative-num-finsts 5 +retrieval> :declarative-finst-span 10) isa item index =index :recently-retrieved nil 7/29/09 2009 ACT-R Workshop 8 Base-Level Inhibition (BLI) Odds by Quintile - Brittanica 0 1 1 1 10 Odds 100 -0.5 100 -1 0.1 Q1 Q2 Q3 Q4 Q5 0.01 10 -1.5 -2 BLL PL(0.75;10) PL(1.0;10) PL(1.0;5.0) PL(3;1.0;10) PL(2;1.0;10) 0.001 -2.5 0.0001 Lag -3 n d s Also in other domains: arithmetic, t d n B log t log 1 i j web navigation, physical ts j 1 environments Provides inhibition of return resulting in soft, adaptive round-robin in free-recall procedures w/o requiring any additional constraints 7/29/09 2009 ACT-R Workshop 9 Emergent Robustness Frequencies of Free Recall as a Function of Item Rank 10000 1000 n=100 n=1000 n=10000 100 10 1 1 10 • Running the retrieval mechanism unsupervised leads to the gradual emergence of an internal power law distribution • It differs from both the pathological behavior of the default BLL, and from the hard and fixed round-robin of the tag/finst version 7/29/09 2009 ACT-R Workshop 10 Module 2: Procedural • Procedural module – Production rule set need careful crafting to cover all cases – Degenerate behavior in real environments (stuck, loop, etc) – Esp. difficult in continuous domains (ad hoc thresholds, etc) • Generalization of production applicability – Often need to use declarative module to leverage semantic generalization through partial matching mechanism • Unification between symbolic (matching) and subsymbolic (selection) processes is desirable for robustness, adaptivity and generalization 7/29/09 2009 ACT-R Workshop 11 Production Partial Matching (PPM) • Same principle as partial matching in declarative memory – Unification is good and logical given representation (neural models) n • Matching Utility MU p Up BMP Sim( pi ,bi ) i 1 • Dynamic generalization: production condition defines ideal “prototype” situation, not range of application conditions • Adaptivity: generalization expands with success as utility rises, contracts with failure as production over-generalizes • Safe version: explicit ~ test modifier similar to -, <, >, etc • Learning new productions can collapse across range and learn differential sensitivity to individual buffer slot values 7/29/09 2009 ACT-R Workshop 12 Building Sticks Task Standard Production Model (Lovett, 1996) • 4 productions – – – – Force-over Force-under Decide-over Decide-under • Hardwired range • Utility Learning 7/29/09 Instance-based Model (Lebiere, 1997) • Chunks: under, over, target & choice slots • Partial matching on closeness of over and under to target • Base-level learning w/ degree of match 2009 ACT-R Workshop New Partial-Matching Production Model • 2 productions – Over: match over stick against target – Under: match under stick against target • Utility learning mixed with degree of match 13 Procedural or Instance-based? • One of Newell’s decried “oppositions” reappeared in the computational modeling context • Neuroscience (e.g., fMRI) might provide arbitrating data between modules but likely not within module • Correct solution is likely a combination of initial declarative retrieval to procedural selection • Need a smooth transition from declarative to procedural mechanism without modeler-induced discontinuity in terms of arbitrary control structure 7/29/09 2009 ACT-R Workshop 14 Module 3: Working Memory • Current WM: Named, fixed buffers, types, slots – Pros •Precise reference makes complex information processing not only possible but relatively easy •Familiar analogy to traditional programming – Cons •Substantial modeling effort required – Modeling often time-consuming and error-prone •Hard limit on flexibility of representation – Fine in laboratory tasks, more problematic in open-ended, dynamic, unpredictable environments Representation Implications • Explicit slot (and also type, buffer) management – Add more slots to represent all information needed •Pro: slots have clear semantics •Con: profligate, dilution of spreading activation – Reuse slots for different purposes over time •Pro: keep structures relatively compact •Con: uncertain semantics (what is in this slot right now?) – Use different (goal) types over time •Pro: cleaner semantics, hierarchical control •Con: increase management of context transfer – More buffers or reuse buffers as storage •Less of that for now but same general drawbacks as slot, type •Integration issues (episodic memory) Working Memory Module • Replace chunk structures in buffers with sets of values associated with fast decaying short-term activation – Faster decay rate than LTM and no reinforcement • Generalize pattern matching to ordered set of values – Double match of semantic and position content • Assumptions about context permanence – Short-term maintenance w/ quick decay (sequence learning) – Explicit rehearsal possible but impact on strength and ordering N-Back Task • Nback working memory task: is current stimulus same as the one n back? • Default ACT-R model holds and shifts items in buffer: perfect recall! • Working memory model adds item to WM, then decays and partial match • Performance decreases with noise and n up to plateau – good fit to data (p back4 =goal> isa nback stimulus =stimulus match nil =imaginal> isa four-back back1 =back1 back2 =back2 back3 =back3 back4 =back4 ==> !output! (Stimulus =stimulus matching 4-back =back4) =goal> match =back4) (p back4 =goal> isa nback stimulus =stimulus match nil +intentional> =back1 =back2 =back3 =back4 ==> !output! (Stimulus =stimulus retrieving 4-back =back4) =goal> match =back4) Module 4: Episodic Memory • Need integration of information in LTM across modalities • Main role of episodic memory is support goal management • Store snapshots of working memory – Concept of chunk slot is replaced with activation – Similar to connectionist temporal synchrony binding • Straightforward matching of WM context to episodic chunks – Double, symmetrical match of semantic and activation content • Issues: – Creation signal: similar to current chunk switch in buffer – Reinforcement upon rehearsal? – Relation to traditional LTM? Similar to role of HC in training PC? List Memory • Pervasive task requires multi-level indexing representation – “micro-chunks” vs traditional representation • • • +retrieval> isa item Captures positional confusion and failures parent =group Is it strategy choice or architectural feature? position fourth How best to provide this function pervasively :recently-retrieved nil 7/29/09 2009 ACT-R Workshop 20 Related Work • Instruction following (Anderson and Taatgen) • General model for simple step-following tasks • Minimal control principle (Taatgen) • Limit modeler-imposed control structure • Threading and multitasking (Salvucci and Taatgen) • Combine independent models and reduce interference • Metacognition (Anderson) • Enable model to discover original solution to new problem • Call for new thinking on “an increasingly watered down set of principles for the representation of knowledge” (Anderson) 7/29/09 2009 ACT-R Workshop 21 Conclusion • Available data is often not enough to discriminate between competing models of single tasks – Newell might have been too optimistic about the ability to uniquely infer the control method given data and system • More data can help but often leads to more specialized and complex models and away from integration • Focus on functionality, esp. Newell’s 2nd (complex tasks) and 3rd (multiple tasks) criteria for further discrimination • Focusing on tasks that require open-ended behavior can enhance the robustness and generality of cognitive architectures without compromising their fidelity 7/29/09 2009 ACT-R Workshop 22