CogSci C131/Psych C123 Computational Models of Cognition Tom Griffiths

advertisement
CogSci C131/Psych C123
Computational Models of Cognition
Tom Griffiths
CogSci C131/Psych C123
Computational Models of Cognition
Tom Griffiths
Computation
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Cognition
QuickTi me™ a nd a
TIFF (Uncompre ssed ) decomp resso r
are need ed to se e th is p icture.
Cognitive science
• The study of intelligent systems
• Cognition as information processing
input
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
output
QuickTi me™ a nd a
TIFF (Uncompre ssed ) decomp resso r
are need ed to se e th is p icture.
input
output
Computational modeling
Look for principles that characterize
both computation and cognition
computation
input
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
computation
output
QuickTi me™ a nd a
TIFF (Uncompre ssed ) decomp resso r
are need ed to se e th is p icture.
input
output
Two goals
• Cognition:
– explain human cognition (and behavior) in
terms of the underlying computation
• Computation:
– gain insight into how to solve some
challenging computational problems
Computational problems
• Easy:
– arithmetic, algebra, chess
• Difficult:
– learning and using language
– sophisticated senses: vision, hearing
– similarity and categorization
– representing the structure of the world
– scientific investigation
human cognition sets the standard
Three approaches
Rules and symbols
Networks, features, and spaces
Probability and statistics
Three approaches
Rules and symbols
Networks, features, and spaces
Probability and statistics
Logic
QuickTime™ and a
TIFF (Unc ompressed) decompres sor
are needed to see this picture.
Aristotle
(384-322 BC)
All As are Bs
All Bs are Cs
All As are Cs
Mechanical reasoning
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
(1232-1315)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
The mathematics of reason
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Thomas Hobbes
(1588-1679)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Rene Descartes
(1596-1650)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Gottfried Leibniz
(1646-1716)
Modern logic
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
George Boole
(1816-1854)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Friedrich Frege
(1848-1925)
PQ
P
Q
Computation
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Alan Turing
(1912-1954)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Logic
Inference Rules
Facts
PQ
Q
P
The World
Categorization
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Unc ompressed) decompres sor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Categorization
cat  small  furry  domestic  carnivore
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Quic kTime™ and a
TIFF (Unc ompres sed) dec ompres sor
are needed to see this pic ture.
Logic
Inference Rules
Facts
PQ
Q
P
The World
Early AI systems…
Operations
Workspace
Operations
Facts
Goals
Actions
Observations
The World
Rules and symbols
• Perhaps we can consider thought a set of
rules, applied to symbols…
– generating infinite possibilities with finite means
– characterizing cognition as a “formal system”
• This idea was applied to:
– deductive reasoning (logic)
– language (generative grammar)
– problem solving and action (production systems)
Language as a formal system
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Noam Chomsky
Language
“a set (finite or infinite) of sentences, each finite in
length and constructed out of a finite set of elements”
L
This is a good sentence
Sentence bad this is
1
0
all sequences
linguistic analysis aims to separate the grammatical
sequences which are sentences of L from the
ungrammatical sequences which are not
A context free grammar
S
NP
VP
T
N
V
 NP VP
TN
 V NP
 the
 man, ball, …
 hit, took, …
S
NP
T
the
VP
N
V
man hit
NP
T
N
the
ball
Rules and symbols
• Perhaps we can consider thought a set of
rules, applied to symbols…
– generating infinite possibilities with finite means
– characterizing cognition as a “formal system”
• This idea was applied to:
– deductive reasoning (logic)
– language (generative grammar)
– problem solving and action (production systems)
• Big question: what are the rules of cognition?
Computational problems
• Easy:
– arithmetic, algebra, chess
• Difficult:
– learning and using language
– sophisticated senses: vision, hearing
– similarity and categorization
– representing the structure of the world
– scientific investigation
human cognition sets the standard
Inductive problems
• Drawing conclusions that are not fully
justified by the available data
– e.g. detective work
“In solving a problem of this sort, the
grand thing is to be able to reason
backward. That is a very useful
accomplishment, and a very easy one,
but people do not practice it much.”
• Much more challenging than deduction!
Challenges for symbolic approaches
• Learning systems of rules and symbols is hard!
– some people who think of human cognition in these
terms end up arguing against learning…
The poverty of the stimulus
• The rules and principles that constitute the
mature system of knowledge of language are
actually very complicated
• There isn’t enough evidence to identify these
principles in the data available to children
Therefore
• Acquisition of these rules and principles must
be a consequence of the genetically determined
structure of the language faculty
The poverty of the stimulus
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Learning language requires strong constraints on
the set of possible languages
These constraints are “Universal Grammar”
Challenges for symbolic approaches
• Learning systems of rules and symbols is hard!
– some people who think of human cognition in these
terms end up arguing against learning…
• Many human concepts have fuzzy boundaries
– notions of similarity and typicality are hard to
reconcile with binary rules
• Solving inductive problems requires dealing
with uncertainty and partial knowledge
Three approaches
Rules and symbols
Networks, features, and spaces
Probability and statistics
Similarity
What determines similarity?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Representations
What kind of representations are used
by the human mind?
QuickTime™ and a
TIF F (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Representations
How can we capture the meaning of words?
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Semantic networks
Semantic spaces
Categorization
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Unc ompressed) decompres sor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Computing with spaces
+1 = cat, -1 = dog
y
error:
E  y  g(Wx )
2
cat
dog
y

x1
x2
perceptual features
x2
x1
QuickTi me™ and a
TIFF ( Uncompressed) decompressor
are needed to see thi s pi ctur e.
y  g(Wx)
Networks, features, and spaces
• Artificial neural networks can represent any
continuous function…
Problems with simple networks
x2
y
x1
x1
x2
Some kinds of data are not
linearly separable
AND
OR
XOR
x2
x2
x2
x1
x1
x1
A solution: multiple layers
output
layer
y
y
z2
hidden
layer
z1
z2
z1
x2
input
layer
x1
x2
x1
Networks, features, and spaces
• Artificial neural networks can represent any
continuous function…
• Simple algorithms for learning from data
– fuzzy boundaries
– effects of typicality
General-purpose learning mechanisms
E
0
E
 0 w ij
w ij
E
0
w ij




E (error)
E
w ij  
w ij
( is learning rate)
wij
E
w ij  
w ij
The Delta Rule
E  y  g(Wx )
2
+1 = cat, -1 = dog
y

for any function g with derivative g
x1

x2
perceptual features
E
 2y  g(Wx)g'(Wx) x j
wij
wij   y  g(Wx)g'(Wx) x j
output
error
influence
of input
Networks, features, and spaces
• Artificial neural networks can represent any
continuous function…
• Simple algorithms for learning from data
– fuzzy boundaries
– effects of typicality
• A way to explain how people could learn
things that look like rules and symbols…
Simple recurrent networks
(i1)
x
output x
x
layer
1
2
copy
hidden
layer
input
layer
z1
x1
z2
x2
context units
input
x
(i)
(Elman, 1990)
Hidden unit
activations after
6 iterations of
27,500 words
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
(Elman, 1990)
Networks, features, and spaces
• Artificial neural networks can represent any
continuous function…
• Simple algorithms for learning from data
– fuzzy boundaries
– effects of typicality
• A way to explain how people could learn
things that look like rules and symbols…
• Big question: how much of cognition can
be explained by the input data?
Challenges for neural networks
• Being able to learn anything can make it
harder to learn specific things
– this is the “bias-variance tradeoff”
Bias-variance tradeoff
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Bias-variance tradeoff
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Bias-variance tradeoff
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Bias-variance tradeoff
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
What about generalization?
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
What happened?
• The set of 8th degree polynomials contains
almost all functions through 10 points
• Our data are some true function, plus noise
• Fitting the noise gives us the wrong function
• This is called overfitting
– while it has low bias, this class of functions
results in an algorithm that has high variance
(i.e. is strongly affected by the observed data)
The moral
• General purpose learning mechanisms do
not work well with small amounts of data
(the most flexible algorithm isn’t always the best)
• To make good predictions from small
amounts of data, you need algorithms with
bias that matches the problem being solved
• This suggests a different approach to
studying induction…
– (what people learn as n  0, rather than n  )
Challenges for neural networks
• Being able to learn anything can make it
harder to learn specific things
– this is the “bias-variance tradeoff”
• Neural networks allow us to encode
constraints on learning in terms of
neurons, weights, and architecture, but
is this always the right language?
Three approaches
Rules and symbols
Networks, features, and spaces
Probability and statistics
Probability
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Gerolamo Cardano
(1501-1576)
Probability
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Thomas Bayes
(1701-1763)
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Pierre-Simon Laplace
(1749-1827)
Bayes’ rule
How rational agents should update their
beliefs in the light of data
Posterior
probability
Likelihood
Prior
probability
P(d | h)P(h)
P(h | d) 
 P(d | h)P(h)
h  H
h: hypothesis
d: data
Sum over space
of hypotheses
Cognition as statistical inference
• Bayes’ theorem tells us how to combine
prior knowledge with data
– a different language for describing the
constraints on human inductive inference
Prior over functions
k = 8,  = 5,  = 1
k = 8,  = 5,  = 0.3
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
k = 8, = 5,  = 0.1
k = 8,  = 5,  = 0.01
Maximum a posteriori (MAP) estimation
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Cognition as statistical inference
• Bayes’ theorem tells us how to combine
prior knowledge with data
– a different language for describing the
constraints on human inductive inference
• Probabilistic approaches also help to
describe learning
Probabilistic context free grammars
S
NP
NP
VP
T
T
N
N
V
V
 NP VP
TN
N
 V NP
 the
a
 man
 ball
 hit
 took
S
1.0
0.7
0.3
1.0
0.8
0.2
0.5
0.5
0.6
0.4
1.0
NP
VP
0.7
T
0.8
the
N
1.0
V
0.5
NP
0.7
0.6
man hit
T
0.8
the
P(tree) = 1.00.71.00.80.50.60.70.80.5
N
0.5
ball
Probability and learnability
• Any probabilistic context free grammar can
be learned from a sample from that grammar
as the sample size becomes infinite
• Priors trade off with the amount of data that
needs to be seen to believe a hypothesis
Cognition as statistical inference
• Bayes’ theorem tells us how to combine
prior knowledge with data
– a language for describing the constraints on
human inductive inference
• Probabilistic approaches also help to
describe learning
• Big question: what do the constraints on
human inductive inference look like?
Challenges for probabilistic approaches
• Computing probabilities is hard… how
could brains possibly do that?
• How well do the “rational” solutions from
probability theory describe how people
think in everyday life?
Three approaches
Rules and symbols
Networks, features, and spaces
Probability and statistics
Download