Is the brain a Bayesian? Nick Chater Cognitive, Perceptual and Brain Sciences University College London Overview 1. 2. 3. 4. Bayesian models of cognition But which Bayesian calculations? The human struggle with probability And decisions 1. Bayesian models of cognition Bayesian models sweeping cognitive science/AI Perception Language Bayesian computer vision Perceptual organization Bayesian language processing ...and acquisition Reasoning The structure of inference Probabilistic calculations Sophisticated structured models (Griffiths et al, 2009) Powerful asymptotic results: e.g., Overgeneralization Theorem (Chater & Vitányi, 2007) Suppose learner has probability j of erroneously guessing an ungrammatical jth word j 1 j K ( ) log e 2 Grammaticality judgements from positive evidence alone Models of empirical data: Capturing “logical” reasoning patterns (Oaksford, Chater & Grainger, 2003) Low P(p), Low P(q) Low P(p), High P(q) 100 Proportion Endorsed (%) Proportion Endorsed (%) 100 80 60 40 20 Data Model 0 MP DA AC 80 60 40 20 Data Model 0 MT MP Inference AC MT Inference High P(p), Low P(q) High P(p), High P(q) 100 Proportion Endorsed (%) 100 Proportion Endorsed (%) DA 80 60 40 20 Data Model 0 MP DA AC Inference 80 60 40 20 Data Model 0 MT MP DA AC Inference MT 2. But which Bayesian calculations? The most probable image? Which image is more probable? vs What is in each image? And in language... The girl saw the boy with the telescope vs The banana was eaten But alternative parses immediately compared... (The girl) saw (the boy) (with the telescope) (The girl) saw (the boy with the telescope) A rapid comparative calculation... And the brain is good at extrapolation/interpolation/prediction So we have strong expectations of... And we don’t expect anything else... But not... What brains do: I Hi sampled from, or maximizing: Pr(Hi|D) i.e., in perception or language comprehension, the brain starts from the data What brains do: II Di sampled from, or to maximize: Pr(Di|H) i.e., in imagery and language production, the brain starts from the hypotheses and generates data What brains do: III sample from, or to maximize: Pr(Di|D1...Di-1) i.e., in both perception and production, the brain predicts what is coming next What brains don’t Probability of data Evaluate likelihoods Pr(D) Pr(D|H) Access the probabilities; just the samples 3. The human struggle with probability People’s probabilistic reasoning appears hopelessly incoherent i. The conjunction fallacy ii. The gambler’s fallacy i. The conjunction fallacy (Kahneman & Tversky, 1983) K = Linda is very bright, studied philosophy at LSE, and had strong left-wing sympathies C = Linda is now a bank cashier C = Linda is a now bank cashier & an active feminist The paradoxical judgement... “Pr(C)” < “Pr(C&F)” But people can’t judge probability of data... Instead, the language processing system is checking for a best hypothesis about the discourse... A language processing perspective Gosh, she’s changed! Is this the same Linda? “Linda is very bright, studied philosophy at LSE, and had strong left-wing sympathies. Linda is now a bank cashier.” “Linda is very bright, studied philosophy at LSE, and had strong left-wing sympathies. Linda is now a bank cashier & an active feminist.” The need to update our model may be driving judgements That’s Linda all right! Conjunction fallacy, coin version (Kahneman and Tversky) How probable is this: “Looks unlikely...” Conjunction fallacy, coin version “Ah – thats not so unlikely...” If people are updating choice of H, given D (cf Griffiths and Tenenbaum) Then, e.g., consistent with a 2-headed coin Vs but this one isn’t And also, insensitivity to sample size... Why so insensitive to sample size? A fair coin? 2/3 4/6 8/12 But integration across objects is v. limited Even energy per letter Even energy per word Pelli, D. G., Farell, B., & Moore, D. C. (2003) The remarkable inefficiency of word recognition. Nature, 423, 752-756. ii. The gambler’s fallacy ? What happens next? What continuation looks predictable? The opposite looks random... Gambler’s fallacy Suspiciously regular Reassuringly irregular Gambler’s fallacy (generalized a bit) ? Gambler’s fallacy Suspiciously regular Reassuringly irregular Subjective randomness as unpredictability (cf. Martin Lof randomness; Kolmogorov complexity; Falk & Konald, Psychological Review, 1997) Gambler’s fallacy Suspiciously regular Reassuringly irregular Random continuations are ‘counter-inductive’ 4. And decisions Two starting points Starting from normative theory Prospect theory, Kahneman and Tversky, 1979 ...or cognitive machinery? Decision by Sampling, Stewart, Chater & Brown, 2006 Prospect theory: Decision theory with distortions Value Function Value Weighting Function £ Decision by Sampling No underlying scales for utility probability time no stable trade-offs between scales analogy with psychophysics (e.g., Laming) What the cognitive system does have • Only binary judgments – "less than", "equal to", and "greater than“ • Values are compared with a small sample of “anchors” – from memory – from context • All dimensions (gains, losses, delay, probability, quality, etc) are equal, despite different roles in “rational” model Only Rank Matters What is the utility of £300? Here its 3rd of 5 items Grocery Wallet Bill Mortgage $80 $300 $20 $400 Money Wide Screen TV $700 Key issue: How do people sample comparison anchors? From memory ● ● From task context Assume that samples from memory mirror distribution in the “world” (Anderson) • e.g., choice can be affected by “irrelevant” options Estimate using external “proxies” (e.g., via googleTM) • Explore experimentally Diminishing “utility” of money • The rational choice approach: • Implies risk aversion: • £50 for certain preferred to 50% chance of £100? • DbS considers distribution of amounts of money – Only rank matters – So changes in money value will be valued by change in rank position in samples of amounts Estimate Distribution of Gains from Credits to Bank Accounts in a UK High Street Bank* *Bank data analysed by Rich Lewis Rank vs. Money Gain (complete sample) Analogy of diminishing marginal utility of money in the rational choice theory Losses loom larger than gains (credits and debits, bank data) Gains Losses More small losses than small gains: £10 is a “bigger” subjective loss than gain Combining gains and losses creates an analog “value function” as in prospect theory (Kahneman & Tversky, 1979) Value Function Value Relative rank £ •-ively accelerating for gains •+ively accelerating for losses •losses worse than gains •Discontinuity at zero Items is a standard probability experiment (Gonzalez & Wu, 1999, Expt B) Probability phrases, google hits Summary 1. Bayesian models of cognition 2. But the brain can only carry out a limited set of calculations 3. So people struggle with probability 4. And decisions