Causes and coincidences Tom Griffiths Cognitive and Linguistic Sciences Brown University “It could be that, collectively, the people in New York caused those lottery numbers to come up 91-1… If enough people all are thinking the same thing, at the same time, they can cause events to happen… It's called psychokinesis.” QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. (Halley, 1752) 75 years 76 years The paradox of coincidences How can coincidences simultaneously lead us to irrational conclusions and significant discoveries? Outline 1. A Bayesian approach to causal induction 2. Coincidences i. what makes a coincidence? ii. rationality and irrationality iii. the paradox of coincidences 3. Explaining inductive leaps Outline 1. A Bayesian approach to causal induction 2. Coincidences i. what makes a coincidence? ii. rationality and irrationality iii. the paradox of coincidences 3. Explaining inductive leaps Causal induction • Inferring causal structure from data • A task we perform every day … – does caffeine increase productivity? • … and throughout science – three comets or one? QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Reverend Thomas Bayes Bayes’ theorem Posterior probability Likelihood Prior probability p(d | h) p(h) p(h | d ) p(d | h) p(h) hH h: hypothesis d: data Sum over space of hypotheses Bayesian causal induction Hypotheses: Priors: Data: Likelihoods: causal structures Causal graphical models (Pearl, 2000; Spirtes et al., 1993) • Variables X Y Z Causal graphical models (Pearl, 2000; Spirtes et al., 1993) • Variables X Y • Structure Z Causal graphical models (Pearl, 2000; Spirtes et al., 1993) • Variables p(y) p(x) X Y • Structure Z p(z|x,y) • Conditional probabilities Defines probability distribution over variables (for both observation, and intervention) Bayesian causal induction Hypotheses: causal structures Priors: a priori plausibility of structures Data: observations of variables Likelihoods: probability distribution over variables Causal induction from contingencies C present C absent (c+) (c-) E present (e+) a c E absent (e-) b d “Does C cause E?” (rate on a scale from 0 to 100) Buehner & Cheng (1997) Chemical Gene C present C absent (c+) (c-) E present (e+) 6 4 E absent (e-) 2 4 “Does the chemical cause gene expression?” (rate on a scale from 0 to 100) Buehner & Cheng (1997) Causal rating Examined human judgments for all values of P(e+|c+) and P(e+|c-) in increments of 0.25 People How can we explain these judgments? Bayesian causal induction Hypotheses: C B E Priors: Data: Likelihoods: chance cause p C B E 1-p frequency of cause-effect co-occurrence each cause has an independent opportunity to produce the effect Bayesian causal induction Hypotheses: chance cause C B E C B E p(d | cause ) p(cause ) p(cause | d) p(d | cause ) p(cause ) p(d | chance ) p(chance ) Bayesian causal induction Hypotheses: chance cause C B E C B E p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) evidence for a causal relationship Buehner and Cheng (1997) People Bayes (r = 0.97) Buehner and Cheng (1997) People Bayes (r = 0.97) DP (r = 0.89) Power (r = 0.88) Other predictions • Causal induction from contingency data – sample size effects – judgments for incomplete contingency tables (Griffiths & Tenenbaum, in press) • More complex cases – detectors (Tenenbaum & Griffiths, 2003) – explosions (Griffiths, Baraff, & Tenenbaum, 2004) – simple mechanical devices The stick-ball machine A B (Kushnir, Schulz, Gopnik, & Danks, 2003) Outline 1. A Bayesian approach to causal induction 2. Coincidences i. what makes a coincidence? ii. rationality and irrationality iii. the paradox of coincidences 3. Explaining inductive leaps What makes a coincidence? A common definition: Coincidences are unlikely events “an event which seems so unlikely that it is worth telling a story about” “we sense that it is too unlikely to have been the result of luck or mere chance” Coincidences are not just unlikely... HHHHHHHHHH vs. HHTHTHTTHT Bayesian causal induction high Prior odds low high ? cause low Likelihood ratio (evidence) chance ? p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) Bayesian causal induction Prior odds high coincidence low low chance high cause Likelihood ratio (evidence) ? p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) What makes a coincidence? A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) What makes a coincidence? A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists likelihood ratio is high p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) What makes a coincidence? A coincidence is an event that provides evidence for causal structure, but not enough evidence to make us believe that structure exists posterior odds are middling likelihood ratio is high prior odds are low p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) HHHHHHHHHH HHTHTHTTHT posterior odds are middling likelihood ratio is high prior odds are low p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) Bayesian causal induction Hypotheses: Priors: Data: Likelihoods: cause chance C C E E p (small) 1-p frequency of effect in presence of cause 0 < p(E) < 1 p(E) = 0.5 HHHHHHHHHH coincidence posterior odds are middling likelihood ratio is high prior odds are low HHTHTHTTHT chance posterior odds are low likelihood ratio is low prior odds are low p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) HHHH mere coincidence posterior odds are low prior odds are low likelihood ratio is middling HHHHHHHHHH suspicious coincidence posterior odds are middling prior odds are low likelihood ratio is high HHHHHHHHHHHHHHHHHH cause posterior odds are high likelihood ratio is very high prior odds are low Mere and suspicious coincidences p(cause | d) p(chance | d) mere coincidence suspicious coincidence evidence for a causal relation • Transition produced by – increase in likelihood ratio (e.g., coinflipping) – increase in prior odds (e.g., genetics vs. ESP) Testing the definition • Provide participants with data from experiments • Manipulate: – cover story: genetic engineering vs. ESP (prior) – data: number of males/heads (likelihood) – task: “coincidence or evidence?” vs. “how likely?” • Predictions: – coincidences affected by prior and likelihood – relationship between coincidence and posterior Proportion “coincidence” 59 63 70 Number of heads/males 87 99 47 51 55 59 63 87 99 Posterior probability 47 51 55 70 r = -0.98 Rationality and irrationality Prior odds high coincidence low low chance high cause Likelihood ratio (evidence) ? p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) The bombing of London (Gilovich, 1991) Change in... Number Ratio Location Spread (uniform) People Bayesian causal induction T X Priors: Data: Likelihoods: chance cause Hypotheses: X X T T T T T X X X X X p 1-p bomb locations uniform + regularity uniform Change in... People Bayes Number Ratio Location Spread (uniform) r = 0.98 Coincidences in date May 14, July 8, August 21, December 25 vs. August 3, August 3, August 3, August 3 People Bayesian causal induction Priors: Data: Likelihoods: chance cause Hypotheses: B B B P P P P p P P P 1-p birthdays of those present uniform + regularity August uniform P People Bayes Rationality and irrationality • People’s sense of the strength of coincidences gives a close match to the likelihood ratio – bombing and birthdays p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) Rationality and irrationality • People’s sense of the strength of coincidences gives a close match to the likelihood ratio – bombing and birthdays • Suggests that we accept false conclusions when our prior odds are insufficiently low p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) Rationality and irrationality Prior odds high coincidence low low chance high cause Likelihood ratio (evidence) ? The paradox of coincidences Prior odds can be low for two reasons Reason Consequence Incorrect current theory Significant discovery Correct current theory False conclusion Attending to coincidences makes more sense the less you know Coincidences • Provide evidence for causal structure, but not enough to make us believe that structure exists • Intimately related to causal induction – an opportunity to discover a theory is wrong • Guided by a well calibrated sense of when an event provides evidence of causal structure Outline 1. A Bayesian approach to causal induction 2. Coincidences i. what makes a coincidence? ii. rationality and irrationality iii. the paradox of coincidences 3. Explaining inductive leaps Explaining inductive leaps • How do people – – – – – infer causal relationships identify the work of chance predict the future assess similarity and make generalizations learn functions, languages, and concepts . . . from such limited data? • What knowledge guides human inferences? Which sequence seems more random? HHHHHHHHHH vs. HHTHTHTTHT Subjective randomness • Typically evaluated in terms of p(d | chance) • Assessing randomness is part of causal induction p(chance | d) p(d | chance ) p(chance ) p(cause | d) p(d | cause ) p(cause ) evidence for a random generating process Randomness and coincidences p(cause | d) p(d | cause ) p(cause ) p(chance | d) p(d | chance ) p(chance ) strength of coincidence p(chance | d) p(d | chance ) p(chance ) p(cause | d) p(d | cause ) p(cause ) evidence for a random generating process Randomness and coincidences Birthdays Bombing 10 10 r = -0.96 r = -0.94 8 How random? How random? 8 6 4 2 6 4 2 0 0 2 4 6 How big a coincidence? 8 10 0 0 2 4 6 How big a coincidence? 8 10 Pick a random number… People 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 Bayes Bayes’ theorem p(d | h) p(h) p(h | d ) p(d | h) p(h) hH Bayes’ theorem inference = f(data,knowledge) Bayes’ theorem inference = f(data,knowledge) Predicting the future Human predictions match optimal predictions from empirical prior Iterated learning (Briscoe, 1998; Kirby, 2001) data hypothesis data hypothesis d0 h1 d1 h2 p(h|d) p(d|h) p(h|d) p(d|h) (Griffiths & Kalish, submitted) Iteration 1 2 3 4 5 QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. 6 7 8 9 Conclusion • Many cognitive judgments are the result of challenging problems of induction • Bayesian statistics provides a formal framework for exploring how people solve these problems • Makes it possible to ask… – how do we make surprising discoveries? – how do we learn so much from so little? – what knowledge guides our judgments? Collaborators • Causal induction – Josh Tenenbaum (MIT) – Liz Baraff (MIT) • Iterated learning – Mike Kalish (University of Louisiana) Causes and coincidences “coincidence” appears in 13/60 cases QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. p(“cause”) = 0.01 p(“cause”|“coincidence”) = 0.26 A reformulation: unlikely kinds • Coincidences are events of an unlikely kind – e.g. a sequence with that number of heads • Deals with the obvious problem... p(10 heads) < p(5 heads, 5 tails) Problems with unlikely kinds • Defining kinds August 3, August 3, August 3, August 3 January 12, March 22, March 22, July 19, October 1, December 8 Problems with unlikely kinds • Defining kinds • Counterexamples HHHH > HHTT P(4 heads) < P(2 heads, 2 tails) HHHH > HHHHTHTTHHHTHTHHTHTTHHH P(4 heads) > P(15 heads, 8 tails)