Causes and coincidences

advertisement
Causes and coincidences
Tom Griffiths
Cognitive and Linguistic Sciences
Brown University
“It could be that, collectively, the people in New
York caused those lottery numbers to come up 91-1… If enough people all are thinking the same
thing, at the same time, they can cause events to
happen… It's called psychokinesis.”
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
(Halley, 1752)
75 years
76 years
The paradox of coincidences
How can coincidences simultaneously lead us to
irrational conclusions and significant discoveries?
Outline
1. A Bayesian approach to causal induction
2. Coincidences
i. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
Outline
1. A Bayesian approach to causal induction
2. Coincidences
i. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
Causal induction
• Inferring causal structure from data
• A task we perform every day …
– does caffeine increase productivity?
• … and throughout science
– three comets or one?
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes
Bayes’ theorem
Posterior
probability
Likelihood
Prior
probability
p(d | h) p(h)
p(h | d ) 
 p(d | h) p(h)
hH
h: hypothesis
d: data
Sum over space
of hypotheses
Bayesian causal induction
Hypotheses:
Priors:
Data:
Likelihoods:
causal structures
Causal graphical models
(Pearl, 2000; Spirtes et al., 1993)
• Variables
X
Y
Z
Causal graphical models
(Pearl, 2000; Spirtes et al., 1993)
• Variables
X
Y
• Structure
Z
Causal graphical models
(Pearl, 2000; Spirtes et al., 1993)
• Variables
p(y)
p(x)
X
Y
• Structure
Z
p(z|x,y)
• Conditional probabilities
Defines probability distribution over variables
(for both observation, and intervention)
Bayesian causal induction
Hypotheses:
causal structures
Priors:
a priori plausibility of structures
Data:
observations of variables
Likelihoods:
probability distribution over variables
Causal induction from contingencies
C present C absent
(c+)
(c-)
E present (e+)
a
c
E absent (e-)
b
d
“Does C cause E?”
(rate on a scale from 0 to 100)
Buehner & Cheng (1997)
Chemical
Gene
C present C absent
(c+)
(c-)
E present (e+)
6
4
E absent (e-)
2
4
“Does the chemical cause gene expression?”
(rate on a scale from 0 to 100)
Buehner & Cheng (1997)
Causal rating
Examined human judgments for all values of
P(e+|c+) and P(e+|c-) in increments of 0.25
People
How can we explain these judgments?
Bayesian causal induction
Hypotheses:
C
B
E
Priors:
Data:
Likelihoods:
chance
cause
p
C
B
E
1-p
frequency of cause-effect co-occurrence
each cause has an independent
opportunity to produce the effect
Bayesian causal induction
Hypotheses:
chance
cause
C
B
E
C
B
E
p(d | cause ) p(cause )
p(cause | d) 
p(d | cause ) p(cause )  p(d | chance ) p(chance )
Bayesian causal induction
Hypotheses:
chance
cause
C
B
E
C
B
E
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
evidence for a
causal relationship
Buehner and Cheng (1997)
People
Bayes (r = 0.97)
Buehner and Cheng (1997)
People
Bayes (r = 0.97)
DP (r = 0.89)
Power (r = 0.88)
Other predictions
• Causal induction from contingency data
– sample size effects
– judgments for incomplete contingency tables
(Griffiths & Tenenbaum, in press)
• More complex cases
– detectors
(Tenenbaum & Griffiths, 2003)
– explosions (Griffiths, Baraff, & Tenenbaum, 2004)
– simple mechanical devices
The stick-ball machine
A
B
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Outline
1. A Bayesian approach to causal induction
2. Coincidences
i. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
What makes a coincidence?
A common definition:
Coincidences are unlikely events
“an event which seems so unlikely
that it is worth telling a story about”
“we sense that it is too unlikely to have
been the result of luck or mere chance”
Coincidences are not just unlikely...
HHHHHHHHHH
vs.
HHTHTHTTHT
Bayesian causal induction
high
Prior odds
low
high
?
cause
low
Likelihood ratio
(evidence)
chance
?
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
Bayesian causal induction
Prior odds
high
coincidence
low
low
chance
high
cause
Likelihood ratio
(evidence)
?
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
What makes a coincidence?
A coincidence is an event that provides evidence
for causal structure, but not enough evidence to
make us believe that structure exists
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
What makes a coincidence?
A coincidence is an event that provides evidence
for causal structure, but not enough evidence to
make us believe that structure exists
likelihood ratio
is high
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
What makes a coincidence?
A coincidence is an event that provides evidence
for causal structure, but not enough evidence to
make us believe that structure exists
posterior odds
are middling
likelihood ratio
is high
prior odds
are low
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
HHHHHHHHHH
HHTHTHTTHT
posterior odds
are middling
likelihood ratio
is high
prior odds
are low
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
Bayesian causal induction
Hypotheses:
Priors:
Data:
Likelihoods:
cause
chance
C
C
E
E
p
(small)
1-p
frequency of effect in presence of cause
0 < p(E) < 1
p(E) = 0.5
HHHHHHHHHH
coincidence
posterior odds
are middling
likelihood ratio
is high
prior odds
are low
HHTHTHTTHT
chance
posterior odds
are low
likelihood ratio
is low
prior odds
are low
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
HHHH
mere coincidence
posterior odds
are low
prior odds
are low
likelihood ratio
is middling
HHHHHHHHHH
suspicious coincidence
posterior odds
are middling
prior odds
are low
likelihood ratio
is high
HHHHHHHHHHHHHHHHHH
cause
posterior odds
are high
likelihood ratio
is very high
prior odds
are low
Mere and suspicious coincidences
p(cause | d)
p(chance | d)
mere
coincidence
suspicious
coincidence
evidence for a
causal relation
• Transition produced by
– increase in likelihood ratio (e.g., coinflipping)
– increase in prior odds (e.g., genetics vs. ESP)
Testing the definition
• Provide participants with data from experiments
• Manipulate:
– cover story: genetic engineering vs. ESP
(prior)
– data: number of males/heads
(likelihood)
– task: “coincidence or evidence?” vs. “how likely?”
• Predictions:
– coincidences affected by prior and likelihood
– relationship between coincidence and posterior
Proportion “coincidence”
59 63
70
Number of heads/males
87
99
47 51 55
59 63
87
99
Posterior probability
47 51 55
70
r = -0.98
Rationality and irrationality
Prior odds
high
coincidence
low
low
chance
high
cause
Likelihood ratio
(evidence)
?
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
The bombing of London
(Gilovich, 1991)
Change in...
Number
Ratio
Location
Spread
(uniform)
People
Bayesian causal induction
T
X
Priors:
Data:
Likelihoods:
chance
cause
Hypotheses:
X
X
T
T
T
T
T
X
X
X
X
X
p
1-p
bomb locations
uniform
+
regularity
uniform
Change in...
People
Bayes
Number
Ratio
Location
Spread
(uniform)
r = 0.98
Coincidences in date
May 14, July 8, August 21, December 25
vs.
August 3, August 3, August 3, August 3
People
Bayesian causal induction
Priors:
Data:
Likelihoods:
chance
cause
Hypotheses:
B
B
B
P
P
P
P
p
P
P
P
1-p
birthdays of those present
uniform + regularity
August
uniform
P
People
Bayes
Rationality and irrationality
• People’s sense of the strength of coincidences
gives a close match to the likelihood ratio
– bombing and birthdays
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
Rationality and irrationality
• People’s sense of the strength of coincidences
gives a close match to the likelihood ratio
– bombing and birthdays
• Suggests that we accept false conclusions
when our prior odds are insufficiently low
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
Rationality and irrationality
Prior odds
high
coincidence
low
low
chance
high
cause
Likelihood ratio
(evidence)
?
The paradox of coincidences
Prior odds can be low for two reasons
Reason
Consequence
Incorrect current theory
Significant discovery
Correct current theory
False conclusion
Attending to coincidences makes
more sense the less you know
Coincidences
• Provide evidence for causal structure, but not
enough to make us believe that structure exists
• Intimately related to causal induction
– an opportunity to discover a theory is wrong
• Guided by a well calibrated sense of when an
event provides evidence of causal structure
Outline
1. A Bayesian approach to causal induction
2. Coincidences
i. what makes a coincidence?
ii. rationality and irrationality
iii. the paradox of coincidences
3. Explaining inductive leaps
Explaining inductive leaps
• How do people
–
–
–
–
–
infer causal relationships
identify the work of chance
predict the future
assess similarity and make generalizations
learn functions, languages, and concepts
. . . from such limited data?
• What knowledge guides human inferences?
Which sequence seems more random?
HHHHHHHHHH
vs.
HHTHTHTTHT
Subjective randomness
• Typically evaluated in terms of p(d | chance)
• Assessing randomness is part of causal induction
p(chance | d) p(d | chance ) p(chance )

p(cause | d)
p(d | cause ) p(cause )
evidence for a random
generating process
Randomness and coincidences
p(cause | d)
p(d | cause ) p(cause )

p(chance | d) p(d | chance ) p(chance )
strength of coincidence
p(chance | d) p(d | chance ) p(chance )

p(cause | d)
p(d | cause ) p(cause )
evidence for a random
generating process
Randomness and coincidences
Birthdays
Bombing
10
10
r = -0.96
r = -0.94
8
How random?
How random?
8
6
4
2
6
4
2
0
0
2
4
6
How big a coincidence?
8
10
0
0
2
4
6
How big a coincidence?
8
10
Pick a random number…
People
0
1
2
3
4
5
6
7
8
9
0
1
2
3
4
5
6
7
8
9
Bayes
Bayes’ theorem
p(d | h) p(h)
p(h | d ) 
 p(d | h) p(h)
hH
Bayes’ theorem
inference = f(data,knowledge)
Bayes’ theorem
inference = f(data,knowledge)
Predicting the future
Human predictions match optimal predictions from empirical prior
Iterated learning
(Briscoe, 1998; Kirby, 2001)
data
hypothesis
data
hypothesis
d0
h1
d1
h2
p(h|d)
p(d|h)
p(h|d)
p(d|h)
(Griffiths & Kalish, submitted)
Iteration
1
2
3
4
5
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
6
7
8
9
Conclusion
• Many cognitive judgments are the result of
challenging problems of induction
• Bayesian statistics provides a formal framework
for exploring how people solve these problems
• Makes it possible to ask…
– how do we make surprising discoveries?
– how do we learn so much from so little?
– what knowledge guides our judgments?
Collaborators
• Causal induction
– Josh Tenenbaum (MIT)
– Liz Baraff (MIT)
• Iterated learning
– Mike Kalish (University of Louisiana)
Causes and coincidences
“coincidence” appears in 13/60 cases
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
p(“cause”)
= 0.01
p(“cause”|“coincidence”) = 0.26
A reformulation: unlikely kinds
• Coincidences are events of an unlikely kind
– e.g. a sequence with that number of heads
• Deals with the obvious problem...
p(10 heads) < p(5 heads, 5 tails)
Problems with unlikely kinds
• Defining kinds
August 3, August 3, August 3, August 3
January 12, March 22, March 22, July 19, October 1, December 8
Problems with unlikely kinds
• Defining kinds
• Counterexamples
HHHH
>
HHTT
P(4 heads) < P(2 heads, 2 tails)
HHHH
>
HHHHTHTTHHHTHTHHTHTTHHH
P(4 heads) > P(15 heads, 8 tails)
Download