ppt - Cognitive, Linguistic & Psychological Sciences

advertisement
Explanation and Simulation in
Cognitive Science
•
•
•
•
•
•
Simulation and computational modeling
Symbolic models
Connectionist models
Comparing symbolism and connectionism
Hybrid architectures
Cognitive architectures
Simulation and Computational
Modeling
• With detailed and explicit cognitive
theories, we can implement the theory as a
computational model
• And then execute the model to:
– Simulate cognitive capacity
– Derive predictions from the theory
• The predictions can then be compared to
empirical data
Questions
• What kinds of theories are amenable to
simulation?
• What techniques work for simulation?
• Is simulating the mind different from
simulating the weather?
The Mind & the Weather
• The mind may just be a complex dynamic system,
but it isn’t amenable to generic simulation
techniques:
– The relation between theory and implementation is
indirect: theories tend to be rather abstract
– The relation between simulation results and empirical
data is indirect: simulations tend to be incomplete
• The need to simulate helps make theories more
concrete
• But “improvement” of the simulation must be
theory-drive, not just an attempt to capture the
data
Symbolic Models
• High-level functions (e.g., problem solving,
reasoning, language) appear to involve
explicit symbol manipulation
• Example: Chess and shopping seem to
involve representation of aspects of the
world and systematic manipulation of those
representations
Central Assumptions
• Mental representations exist
• Representations are structured
• Representations are semantically
interpretable
What’s in a representation?
•
•
•
•
Representation must consist of symbols
Symbols must have parts
Parts must have independent meanings
Those meanings must contribute to the meanings
of the symbols which contain them
– e.g., “34” contains “3” and “4”, parts which have
independent meanings
– the meaning of “34” is a function of the meaning of “3”
in the tens position and “4” in the units position
In favor of structured mental
representations
• Productivity
– It is through structuring that thought is productive
(finite number of elements, infinite number of possible
combinations)
• Systematicity
– If you think “John loves Mary”, you can think “Mary
loves John”
• Compositionality
– The meaning of “John loves Mary is a function of its
parts, and their modes of combination
• Rationality
– If you know A and B is true, then you can infer A is true
Fodor & Pylyshyn (1988)
What do you do with them?
• Suppose we accept that there are symbolic
representations
• How can they be manipulated?
…by a computing machine
• Any such approach has three components
– A representational system
– A processing strategy
– A set of predefined machine operations
Automata Theory
• Identifies a family of increasingly powerful
computing machines
– Finite state automata
– Push down automata
– Turning machines
Automata, in brief
(Figure 2.2 in Green et al., Chapter 2)
• This FSA takes as input a sequence of on
and off messages, and accepts any sequence
ending with an “on”
• A PDA adds a stack: an infinite-capacity,
limited access memory, so that what a
machine does depends on input, current
state, plus the memory
• A Turing machine changes this memory to allow
any location to be accessed at any time. An the
State transition function specifies read/write
instructions, as well as which state to move to
next.
• Any effective procedure can be implemented on
an appropriately programmed Turing machine
• And Universal Turing machines can emulate any
Turing machine, via a description on the tape of
the machine and its inputs
• Hence, philosophical disputes:
– Is the brain Turing powerful?
– Does machine design matter or not?
More practical architectures
• Von Neumann machines:
– Strictly less powerful than Turing machines (finite
memory)
– Distinguished area of memory for stored programs
– Makes them conceptually easier to use than TMs
– Special memory location points to next-instruction on
each processing cycle: fetch instruction, move pointer
to next instruction, execute current instruction
Production Systems
• Introduced by Newell & Simon (1972)
• Cyclic processor with two main memory
structures
– Long term memory with rules (~productions)
– Working memory with symbolic representation of
current system state
• Example: IF goal (sweeten(X) AND available
(sugar) THEN action (add(sugar, X)) and retract
(goal(sweeten(X)))
• Recognize phase (pattern matching)
– Find all rules in LTM that match elements in WM
• Act phase (conflict resolution)
– Choose one matching rule, execute, update WM and
(possibly) perform action
• Complex sequences of behavior can thus result
• Power of pattern matcher can be varied, allowing
different use of WM
• Power of conflict resolution will influence
behavior given multiple matches
– Most specific?
• This works well for problem-solving. Would it
work for pole-balancing?
Connectionist Models
• The basic assumption
– There are many processors connected together,
and operating simultaneously
– Processors: units, nodes, artificial neurons
A connectionist network is…
• A set of nodes, connected in some fashion
• Nodes have varying activation levels
• Nodes interact via the flow of activation along the
connections
• Connections are usually directed (one-way flow),
and weighted (strength and nature of interaction;
positive weight = excitatory; negative =
inhibitory)
• A node’s activation will be computed from the
weighted sum of its inputs
Local vs. Distributed
Representation
• Parallel Distributed Processing is a (the?) major
branch of connectionism
• In principle, a connectionist node could have an
interpretable meaning
– E.g., active when ‘red’ input, or ‘grandmother’, or
whatever
• However, an individual PDP node will not have
such an interpretable meaning
– Activation over whole set of nodes corresponds to ‘red’
– Individual node participates in many such
representations
PDP
• PDP systems lack systematicity and
compositionality
• Three main types of networks:
– Associative
– Feed-forward
– Recurrent
Associative
• To recognize and reconstruct patterns
– Present activation pattern to subset of units
– Let network ‘settle’ in stable activation pattern
(reconstruction of previously learned state)
Feedforward
• Not for reconstruction, but for mapping from one
domain to another
– Nodes are organized into layers
– Activation spreads through layers in sequence
– A given layer can be thought of as an “activation
vector”
• Simplest case:
– Input layer (stimulus)
– Output layer (response)
• Two layer networks are very restricted in power.
Intermediate (hidden) layers gain most of the
additional computational power needed.
Recurrent
• Feedforward nets compute mappings given current
input only. Recurrent networks allow mapping to
take into account previous input.
• Jordan (1986) and Elman (1990) introduced
networks with:
– Feedback links from output or hidden layers to context
units, and
– Feedforward links from the context units to the hidden
units
• Jordan network output depends on current input
and previous output
• Elman network output depends on current input
and whole of previous input history
Key Points about PDP
• It’s not just that a net can recognize a
pattern or perform a mapping
• It’s the fact that it can learn to do so, on the
basis of limited data
• And the way that networks respond to
damage is crucial
Learning
• Present network with series of training patterns
– Adjust the weights on connections so that the patterns
are encoded in the weights
• Most training algorithms perform small
adjustments to the weights per trial, but require
many presentations of the training set to reach a
reasonable degree of performance
• There are many different learning algorithms
Learning (contd.)
• Associative nets support Hebbian learning rule:
– Adjust weight of connection by amount proportional to
the correlation in activity of corresponding nodes
– So if both active, increase weight; if both inactive,
increase weight; if they differ, decrease weight
• Important because this is biologically
plausible…and very effective
Learning (contd.)
• Feedforward and recurrent nets often
exploit the backpropagation of error rule
– Actual output compared to expected output
– Difference computed and propagated back to
input, layer by layer, requiring weight
adjustments
• Note: unlike Hebb, this is supervised
learning
Psychological Relevance
• Given a network of fixed size, if there are two few
units to encode the training set, then interference
occurs
• This is suboptimal, but is better than nothing,
since at least approximate answers are provided
• And this is the flipside of generalization, which
provides output for unseen input
– E.g., weep  wept; bid  bid
Damage
• Either remove a proportion of connections
• Or introduce random noise into activation
propagation
• And behavior can simulate that of people
with various forms of neurological damage
• “Graceful degradation”: impairment, but
residual function
Example of Damage
• Hinton & Shallice (1991), Plaut & Shallice
(1993) on deep dyslexia:
– Visual error (‘cat’ read as ‘cot’)
– Semantic error (‘cat’ read as ‘dog’)
• Networks constructed for orthography-tophonology mapping, lesioned in various
ways, producing behavior similar to human
subjects
Symbolic Networks
• Though distributed representations have proved
very important, some researchers prefer localist
approaches
• Semantic networks:
– Frequently used in AI-based approaches, and in
cognitive approaches which focus on conceptual
knowledge
– One node per concept; typed links between concepts
– Inference: link-following
Production systems with
spreading activation
• Anderson’s work (ACT, ACT*, ACT-R)
– Symbolic networks with continuous activation
values
– ACT-R never removes working memory
elements; activation instead decays over time
– Productions chosen on basis of (co-) activation
Interactive Activation Networks
• Essentially, localist connectionist networks
• Featuring self-excitatory and lateral inhibitory
links, which ensure that there’s always a winner in
a competition (e.g., McClelland & Rumelhart’s
model of letter perception)
• Appropriate combinations of levels, with feedback
loops in them, allow modeling of complex datadriven and expectation-driven bahavior
Comparing Symbolism &
Connectionism
• As is so often the case in science, the two
approaches were initially presented as
exclusive alternatives
Connectionist:
• Interference
• Generalization
• Graceful degradation
• Symbolists complain:
– Connectionists don’t capture structured information
– Network computation is opaque
– Networks are “merely” implementation-level
Symbolic
• Productive
• Systematic
• Compositional
• Connectionists complain:
– Symbolists don’t relate assumed structures to
brain
– They relate them to von Neumann machines
Connectionists can claim:
• Complex rule-oriented behavior *emerges*
from interaction of subsymbolic behavior
• So symbolic models describe, but do not
explain
Symbolists can claim:
• Though PDP models can learn implicit
rules, the learning mechanisms are usually
not neurally plausible after all
• Performance is highly dependent on exact
choice of architecture
Hybrid Architectures
• But really, the truth is that different tasks
demand different technologies
• Hybrid approaches explicitly assume:
– Neither connectionist nor symbolic approach is
flawed
– Their techniques are compatible
Two main hybrid options:
• Physically hybrid models:
– Contain subsystems of both types
– Issues: interfacing, modularity (e.g., use Interactive
Activation Network to integrate results)
• Non-physically hybrid models
– Subsystems of only one type, but described two ways
– Issue: levels of description (e.g., connectionist
production systems)
Cognitive Architectures
• Most modeling is aimed at specific processes or
tasks
• But it has been argued that:
– Most real tasks involve many cognitive processes
– Most cognitive processes are used in many tasks
• Hence, we need unified theories of cognition
Examples
• ACT-R (Anderson)
• Soar (Newell)
Both based on production system technology
– Task-specific knowledge coded into the
productions
– Single processing mechanism, single learning
mechanism
• Like computer architectures, cognitive
architectures tend to make some tasks easy, at the
price of making other hard
• Unlike computer architectures, cognitive
architectures must include learning mechanisms
• But note that the unified approaches sacrifice
genuine task-appropriateness and perhaps also
biological plausibility
A Cognitive Architecture is:
• A fixed arrangement of particular functional
components
• A processing strategy
Download