Is the brain a Bayesian? Nick Chater Cognitive, Perceptual and Brain Sciences

advertisement
Is the brain a Bayesian?
Nick Chater
Cognitive, Perceptual and Brain Sciences
University College London
Overview




1.
2.
3.
4.
Bayesian models of cognition
But which Bayesian calculations?
The human struggle with probability
And decisions
1. Bayesian models of
cognition
Bayesian models sweeping cognitive science/AI

Perception



Language



Bayesian computer vision
Perceptual organization
Bayesian language processing
...and acquisition
Reasoning


The structure of inference
Probabilistic calculations
Sophisticated structured models
(Griffiths et al, 2009)
Powerful asymptotic results: e.g.,
Overgeneralization Theorem (Chater & Vitányi, 2007)

Suppose learner has probability j of erroneously
guessing an ungrammatical jth word


j 1

j
K ( )

log e 2
Grammaticality judgements from positive evidence
alone
Models of empirical data: Capturing “logical” reasoning
patterns (Oaksford, Chater & Grainger, 2003)
Low P(p), Low P(q)
Low P(p), High P(q)
100
Proportion Endorsed (%)
Proportion Endorsed (%)
100
80
60
40
20
Data
Model
0
MP
DA
AC
80
60
40
20
Data
Model
0
MT
MP
Inference
AC
MT
Inference
High P(p), Low P(q)
High P(p), High P(q)
100
Proportion Endorsed (%)
100
Proportion Endorsed (%)
DA
80
60
40
20
Data
Model
0
MP
DA
AC
Inference
80
60
40
20
Data
Model
0
MT
MP
DA
AC
Inference
MT
2. But which Bayesian
calculations?
The most probable image?
Which image is more probable? vs
What is in each image?
And in language...

The girl saw the boy with the telescope

vs

The banana was eaten
But alternative parses immediately
compared...
(The girl) saw (the boy) (with the telescope)
(The girl) saw (the boy with the telescope)
A rapid comparative calculation...
And the brain is good at
extrapolation/interpolation/prediction
So we have strong expectations of...
And we don’t expect anything else...
But not...
What brains do: I

Hi sampled from, or maximizing:


Pr(Hi|D)
i.e., in perception or language comprehension, the
brain starts from the data
What brains do: II

Di sampled from, or to maximize:


Pr(Di|H)
i.e., in imagery and language production, the brain
starts from the hypotheses and generates data
What brains do: III

sample from, or to maximize:


Pr(Di|D1...Di-1)
i.e., in both perception and production, the brain
predicts what is coming next
What brains don’t

Probability of data


Evaluate likelihoods


Pr(D)
Pr(D|H)
Access the probabilities; just the
samples
3. The human struggle with
probability
People’s probabilistic reasoning appears
hopelessly incoherent

i. The conjunction fallacy

ii. The gambler’s fallacy
i. The conjunction fallacy
(Kahneman & Tversky, 1983)

K = Linda is very bright, studied philosophy
at LSE, and had strong left-wing sympathies

C = Linda is now a bank cashier

C = Linda is a now bank cashier & an active feminist
The paradoxical judgement...

“Pr(C)” < “Pr(C&F)”

But people can’t judge probability of data...

Instead, the language processing system is
checking for a best hypothesis about the
discourse...
A language processing perspective
Gosh, she’s
changed! Is this
the same Linda?


“Linda is very bright, studied
philosophy at LSE, and had strong
left-wing sympathies. Linda is
now a bank cashier.”
“Linda is very bright, studied
philosophy at LSE, and had strong
left-wing sympathies. Linda is
now a bank cashier & an active
feminist.”
The need to update our model
may be driving judgements
That’s Linda all
right!
Conjunction fallacy, coin version
(Kahneman and Tversky)
How probable is this:
“Looks unlikely...”
Conjunction fallacy, coin version
“Ah – thats not so unlikely...”
If people are updating choice of H, given D
(cf Griffiths and Tenenbaum)

Then, e.g., consistent with a 2-headed coin
Vs

but this one isn’t
And also, insensitivity to sample size...
Why so insensitive to sample size?
A fair coin?
2/3
4/6
8/12
But integration across objects is v. limited
Even energy per letter
Even energy per word

Pelli, D. G., Farell, B., & Moore, D. C. (2003) The remarkable
inefficiency of word recognition. Nature, 423, 752-756.
ii. The gambler’s fallacy
?
What happens next? What
continuation looks predictable? The
opposite looks random...
Gambler’s fallacy
Suspiciously
regular
Reassuringly
irregular
Gambler’s fallacy
(generalized a bit)
?
Gambler’s fallacy
Suspiciously
regular
Reassuringly
irregular
Subjective randomness as unpredictability
(cf. Martin Lof randomness; Kolmogorov complexity;
Falk & Konald, Psychological Review, 1997)
Gambler’s fallacy
Suspiciously
regular
Reassuringly
irregular
Random continuations are ‘counter-inductive’
4. And decisions
Two starting points

Starting from normative theory


Prospect theory, Kahneman and Tversky, 1979
...or cognitive machinery?

Decision by Sampling, Stewart, Chater & Brown, 2006
Prospect theory: Decision theory with distortions
Value Function
Value
Weighting Function
£
Decision by Sampling

No underlying scales for



utility
probability
time

 no stable trade-offs between scales

analogy with psychophysics (e.g., Laming)
What the cognitive system does have
• Only binary judgments
– "less than", "equal to", and "greater than“
• Values are compared with a small sample of “anchors”
– from memory
– from context
• All dimensions (gains, losses, delay, probability, quality, etc) are
equal, despite different roles in “rational” model
Only Rank Matters
What is the utility of £300? Here its 3rd of 5 items
Grocery
Wallet Bill
Mortgage
$80 $300
$20
$400
Money
Wide Screen
TV
$700
 Key issue: How do people sample comparison
anchors?
From memory
●
●
From task context
Assume that samples from memory
mirror distribution in the “world”
(Anderson)
•
e.g., choice can be affected by
“irrelevant” options
Estimate using external “proxies”
(e.g., via googleTM)
•
Explore experimentally
Diminishing “utility” of money
• The rational choice
approach:
• Implies risk aversion:
• £50 for certain preferred to
50% chance of £100?
• DbS considers
distribution of
amounts of money
– Only rank matters
– So changes in money
value will be valued
by change in rank
position in samples of
amounts
Estimate Distribution of Gains from Credits to
Bank Accounts in a UK High Street Bank*
*Bank data analysed by Rich Lewis
Rank vs. Money Gain
(complete sample)
Analogy of diminishing marginal utility of
money in the rational choice theory
Losses loom larger than gains
(credits and debits, bank data)
Gains
Losses
More small losses than small gains:
 £10 is a “bigger” subjective loss than gain
Combining gains and losses creates an analog “value
function” as in prospect theory (Kahneman & Tversky, 1979)
Value Function
Value
Relative rank
£
•-ively accelerating for gains
•+ively accelerating for losses
•losses worse than gains
•Discontinuity at zero
Items is a standard probability experiment
(Gonzalez & Wu, 1999, Expt B)
Probability phrases, google hits
Summary




1. Bayesian models of cognition
2. But the brain can only carry out a
limited set of calculations
3. So people struggle with probability
4. And decisions
Download