Explanation and Simulation in Cognitive Science • • • • • • Simulation and computational modeling Symbolic models Connectionist models Comparing symbolism and connectionism Hybrid architectures Cognitive architectures Simulation and Computational Modeling • With detailed and explicit cognitive theories, we can implement the theory as a computational model • And then execute the model to: – Simulate cognitive capacity – Derive predictions from the theory • The predictions can then be compared to empirical data Questions • What kinds of theories are amenable to simulation? • What techniques work for simulation? • Is simulating the mind different from simulating the weather? The Mind & the Weather • The mind may just be a complex dynamic system, but it isn’t amenable to generic simulation techniques: – The relation between theory and implementation is indirect: theories tend to be rather abstract – The relation between simulation results and empirical data is indirect: simulations tend to be incomplete • The need to simulate helps make theories more concrete • But “improvement” of the simulation must be theory-drive, not just an attempt to capture the data Symbolic Models • High-level functions (e.g., problem solving, reasoning, language) appear to involve explicit symbol manipulation • Example: Chess and shopping seem to involve representation of aspects of the world and systematic manipulation of those representations Central Assumptions • Mental representations exist • Representations are structured • Representations are semantically interpretable What’s in a representation? • • • • Representation must consist of symbols Symbols must have parts Parts must have independent meanings Those meanings must contribute to the meanings of the symbols which contain them – e.g., “34” contains “3” and “4”, parts which have independent meanings – the meaning of “34” is a function of the meaning of “3” in the tens position and “4” in the units position In favor of structured mental representations • Productivity – It is through structuring that thought is productive (finite number of elements, infinite number of possible combinations) • Systematicity – If you think “John loves Mary”, you can think “Mary loves John” • Compositionality – The meaning of “John loves Mary is a function of its parts, and their modes of combination • Rationality – If you know A and B is true, then you can infer A is true Fodor & Pylyshyn (1988) What do you do with them? • Suppose we accept that there are symbolic representations • How can they be manipulated? …by a computing machine • Any such approach has three components – A representational system – A processing strategy – A set of predefined machine operations Automata Theory • Identifies a family of increasingly powerful computing machines – Finite state automata – Push down automata – Turning machines Automata, in brief (Figure 2.2 in Green et al., Chapter 2) • This FSA takes as input a sequence of on and off messages, and accepts any sequence ending with an “on” • A PDA adds a stack: an infinite-capacity, limited access memory, so that what a machine does depends on input, current state, plus the memory • A Turing machine changes this memory to allow any location to be accessed at any time. An the State transition function specifies read/write instructions, as well as which state to move to next. • Any effective procedure can be implemented on an appropriately programmed Turing machine • And Universal Turing machines can emulate any Turing machine, via a description on the tape of the machine and its inputs • Hence, philosophical disputes: – Is the brain Turing powerful? – Does machine design matter or not? More practical architectures • Von Neumann machines: – Strictly less powerful than Turing machines (finite memory) – Distinguished area of memory for stored programs – Makes them conceptually easier to use than TMs – Special memory location points to next-instruction on each processing cycle: fetch instruction, move pointer to next instruction, execute current instruction Production Systems • Introduced by Newell & Simon (1972) • Cyclic processor with two main memory structures – Long term memory with rules (~productions) – Working memory with symbolic representation of current system state • Example: IF goal (sweeten(X) AND available (sugar) THEN action (add(sugar, X)) and retract (goal(sweeten(X))) • Recognize phase (pattern matching) – Find all rules in LTM that match elements in WM • Act phase (conflict resolution) – Choose one matching rule, execute, update WM and (possibly) perform action • Complex sequences of behavior can thus result • Power of pattern matcher can be varied, allowing different use of WM • Power of conflict resolution will influence behavior given multiple matches – Most specific? • This works well for problem-solving. Would it work for pole-balancing? Connectionist Models • The basic assumption – There are many processors connected together, and operating simultaneously – Processors: units, nodes, artificial neurons A connectionist network is… • A set of nodes, connected in some fashion • Nodes have varying activation levels • Nodes interact via the flow of activation along the connections • Connections are usually directed (one-way flow), and weighted (strength and nature of interaction; positive weight = excitatory; negative = inhibitory) • A node’s activation will be computed from the weighted sum of its inputs Local vs. Distributed Representation • Parallel Distributed Processing is a (the?) major branch of connectionism • In principle, a connectionist node could have an interpretable meaning – E.g., active when ‘red’ input, or ‘grandmother’, or whatever • However, an individual PDP node will not have such an interpretable meaning – Activation over whole set of nodes corresponds to ‘red’ – Individual node participates in many such representations PDP • PDP systems lack systematicity and compositionality • Three main types of networks: – Associative – Feed-forward – Recurrent Associative • To recognize and reconstruct patterns – Present activation pattern to subset of units – Let network ‘settle’ in stable activation pattern (reconstruction of previously learned state) Feedforward • Not for reconstruction, but for mapping from one domain to another – Nodes are organized into layers – Activation spreads through layers in sequence – A given layer can be thought of as an “activation vector” • Simplest case: – Input layer (stimulus) – Output layer (response) • Two layer networks are very restricted in power. Intermediate (hidden) layers gain most of the additional computational power needed. Recurrent • Feedforward nets compute mappings given current input only. Recurrent networks allow mapping to take into account previous input. • Jordan (1986) and Elman (1990) introduced networks with: – Feedback links from output or hidden layers to context units, and – Feedforward links from the context units to the hidden units • Jordan network output depends on current input and previous output • Elman network output depends on current input and whole of previous input history Key Points about PDP • It’s not just that a net can recognize a pattern or perform a mapping • It’s the fact that it can learn to do so, on the basis of limited data • And the way that networks respond to damage is crucial Learning • Present network with series of training patterns – Adjust the weights on connections so that the patterns are encoded in the weights • Most training algorithms perform small adjustments to the weights per trial, but require many presentations of the training set to reach a reasonable degree of performance • There are many different learning algorithms Learning (contd.) • Associative nets support Hebbian learning rule: – Adjust weight of connection by amount proportional to the correlation in activity of corresponding nodes – So if both active, increase weight; if both inactive, increase weight; if they differ, decrease weight • Important because this is biologically plausible…and very effective Learning (contd.) • Feedforward and recurrent nets often exploit the backpropagation of error rule – Actual output compared to expected output – Difference computed and propagated back to input, layer by layer, requiring weight adjustments • Note: unlike Hebb, this is supervised learning Psychological Relevance • Given a network of fixed size, if there are two few units to encode the training set, then interference occurs • This is suboptimal, but is better than nothing, since at least approximate answers are provided • And this is the flipside of generalization, which provides output for unseen input – E.g., weep wept; bid bid Damage • Either remove a proportion of connections • Or introduce random noise into activation propagation • And behavior can simulate that of people with various forms of neurological damage • “Graceful degradation”: impairment, but residual function Example of Damage • Hinton & Shallice (1991), Plaut & Shallice (1993) on deep dyslexia: – Visual error (‘cat’ read as ‘cot’) – Semantic error (‘cat’ read as ‘dog’) • Networks constructed for orthography-tophonology mapping, lesioned in various ways, producing behavior similar to human subjects Symbolic Networks • Though distributed representations have proved very important, some researchers prefer localist approaches • Semantic networks: – Frequently used in AI-based approaches, and in cognitive approaches which focus on conceptual knowledge – One node per concept; typed links between concepts – Inference: link-following Production systems with spreading activation • Anderson’s work (ACT, ACT*, ACT-R) – Symbolic networks with continuous activation values – ACT-R never removes working memory elements; activation instead decays over time – Productions chosen on basis of (co-) activation Interactive Activation Networks • Essentially, localist connectionist networks • Featuring self-excitatory and lateral inhibitory links, which ensure that there’s always a winner in a competition (e.g., McClelland & Rumelhart’s model of letter perception) • Appropriate combinations of levels, with feedback loops in them, allow modeling of complex datadriven and expectation-driven bahavior Comparing Symbolism & Connectionism • As is so often the case in science, the two approaches were initially presented as exclusive alternatives Connectionist: • Interference • Generalization • Graceful degradation • Symbolists complain: – Connectionists don’t capture structured information – Network computation is opaque – Networks are “merely” implementation-level Symbolic • Productive • Systematic • Compositional • Connectionists complain: – Symbolists don’t relate assumed structures to brain – They relate them to von Neumann machines Connectionists can claim: • Complex rule-oriented behavior *emerges* from interaction of subsymbolic behavior • So symbolic models describe, but do not explain Symbolists can claim: • Though PDP models can learn implicit rules, the learning mechanisms are usually not neurally plausible after all • Performance is highly dependent on exact choice of architecture Hybrid Architectures • But really, the truth is that different tasks demand different technologies • Hybrid approaches explicitly assume: – Neither connectionist nor symbolic approach is flawed – Their techniques are compatible Two main hybrid options: • Physically hybrid models: – Contain subsystems of both types – Issues: interfacing, modularity (e.g., use Interactive Activation Network to integrate results) • Non-physically hybrid models – Subsystems of only one type, but described two ways – Issue: levels of description (e.g., connectionist production systems) Cognitive Architectures • Most modeling is aimed at specific processes or tasks • But it has been argued that: – Most real tasks involve many cognitive processes – Most cognitive processes are used in many tasks • Hence, we need unified theories of cognition Examples • ACT-R (Anderson) • Soar (Newell) Both based on production system technology – Task-specific knowledge coded into the productions – Single processing mechanism, single learning mechanism • Like computer architectures, cognitive architectures tend to make some tasks easy, at the price of making other hard • Unlike computer architectures, cognitive architectures must include learning mechanisms • But note that the unified approaches sacrifice genuine task-appropriateness and perhaps also biological plausibility A Cognitive Architecture is: • A fixed arrangement of particular functional components • A processing strategy