Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum

advertisement
Theory-based causal induction
Tom Griffiths
Brown University
Josh Tenenbaum
MIT
Three kinds of causal induction
Three kinds of causal induction
contingency data
C present C absent
(c+)
(c-)
E present (e+)
a
c
E absent (e-)
b
d
“To what extent does C cause E?”
(rate on a scale from 0 to 100)
Three kinds of causal induction
contingency data
physical systems
The stick-ball machine
A
B
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Three kinds of causal induction
contingency data
physical systems
perceived causality
Michotte (1963)
Michotte (1963)
Three kinds of causal induction
contingency data
physical systems
perceived causality
bottom-up
covariation
information
top-down
mechanism
knowledge
object
physics
module
Three kinds of causal induction
less data
more constrained
more data
less constrained
contingency data
physical systems
prior knowledge
+
statistical inference
perceived causality
prior knowledge
+
statistical inference
Theory-based causal induction
Theory
generates
Hypothesis space
Y
X
Z
X
Y
X
Y
Z
Z
X
Y
Z
generates
Data
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Bayesian
inference
An analogy to language
Theory
Grammar
generates
generates
Hypothesis space
Y
X
Z
X
Y
X
Y
Z
Z
Parse trees
X
Y
Z
generates
Data
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
generates
Sentence
The quick brown fox …
Outline
contingency data
physical systems
perceived causality
Outline
contingency data
physical systems
perceived causality
C present C absent
(c+)
(c-)
E present (e+)
a
c
E absent (e-)
b
d
“To what extent does C cause E?”
(rate on a scale from 0 to 100)
Buehner & Cheng (1997)
Chemical
Gene
C present C absent
(c+)
(c-)
E present (e+)
6
4
E absent (e-)
2
4
“To what extent does the chemical cause gene expression?”
(rate on a scale from 0 to 100)
Buehner & Cheng (1997)
Humans
• Showed participants all combinations of
P(e+|c+) and P(e+|c-) in increments of 0.25
Buehner & Cheng (1997)
Humans
• Showed participants all combinations of
P(e+|c+) and P(e+|c-) in increments of 0.25
• Curious phenomenon: “frequency illusion”:
– why do people’s judgments change when the cause
does not change the probability of the effect?
Causal graphical models
• Framework for representing, reasoning, and
learning about causality (also called Bayes nets)
(Pearl, 2000; Spirtes, Glymour, & Schienes, 1993)
• Becoming widespread in psychology
(Glymour, 2001; Gopnik et al., 2004; Lagnado & Sloman, 2002;
Tenenbaum & Griffiths, 2001; Steyvers et al., 2003; Waldmann
& Martignon, 1998)
Causal graphical models
• Variables
X
Y
Z
Causal graphical models
• Variables
X
Y
• Structure
Z
Causal graphical models
• Variables
P(Y)
P(X)
X
Y
• Structure
Z
P(Z|X,Y)
• Conditional probabilities
Defines probability distribution over variables
(for both observation, and intervention)
Causal graphical models
• Provide a basic framework for representing causal
systems
• But… where is the prior knowledge?
chemicals
genes
Clofibrate
Wyeth 14,643
p450 2B1
Gemfibrozil
Phenobarbital
Carnitine Palmitoyl Transferase 1
Hamadeh et al. (2002) Toxicological sciences.
chemicals
genes
X
Clofibrate
Wyeth 14,643
p450 2B1
Gemfibrozil
Phenobarbital
Carnitine Palmitoyl Transferase 1
Hamadeh et al. (2002) Toxicological sciences.
Chemical X
chemicals
genes
peroxisome proliferators
Clofibrate
Wyeth 14,643
+
p450 2B1
+
Gemfibrozil
Phenobarbital
+
Carnitine Palmitoyl Transferase 1
Hamadeh et al. (2002) Toxicological sciences.
Beyond causal graphical models
• Prior knowledge produces expectations about:
– types of entities
– plausible relations
– functional form
• This cannot be captured by graphical models
A theory consists of three interrelated components: a set of
phenomena that are in its domain, the causal laws and other
explanatory mechanisms in terms of which the phenomena are
accounted for, and the concepts in terms of which the phenomena
and explanatory apparatus are expressed. (Carey, 1985)
Theory-based causal induction
A causal theory is a hypothesis space generator
Component of theory:
• Ontology
• Plausible relations
• Functional form
Generates:
• Variables
• Structure
• Conditional probabilities
Hypotheses are evaluated by Bayesian inference
P(h|data)  P(data|h) P(h)
Theory
• Ontology
– Types: Chemical, Gene, Mouse
– Predicates:
Injected(Chemical,Mouse)
Expressed(Gene,Mouse)
C
B
E
E = 1 if effect occurs (mouse expresses gene), else 0
C = 1 if cause occurs (mouse is injected), else 0
Theory
• Plausible relations
– For any Chemical c and Gene g, with prior probability p:
For all Mice m, Injected(c,m)  Expressed(g,m)
C
B
C
B
E
E
P(Graph 1) = p
P(Graph 0) =1 – p
No hypotheses with E
C, B
C, C
B, ….
Theory
• Ontology
– Types: Chemical, Gene, Mouse
– Predicates:
Injected(Chemical,Mouse)
Expressed(Gene,Mouse)
• Plausible relations
– For any Chemical c and Gene g, with prior probability p :
For all Mice m, Injected(c,m)  Expressed(g,m)
• Functional form of causal relations
Functional form
• Structures: 1 =
C
B
0=
E
E
• Parameterization:
C B
0
1
0
1
0
0
1
1
Generic
1: P(E = 1 | C, B)
p00
p10
p01
p11
C
B
0: P(E = 1| C, B)
p0
p0
p1
p1
Functional form
• Structures: 1 =
B
C
w0
w1
0=
C
B
w0
E
E
w0, w1: strength parameters for B, C
• Parameterization:
C B
0
1
0
1
0
0
1
1
“Noisy-OR”
1: P(E = 1 | C, B)
0
w1
w0
w1+ w0 – w1 w0
0: P(E = 1| C, B)
0
0
w0
w0
Theory
• Ontology
– Types: Chemical, Gene, Mouse
– Predicates:
Injected(Chemical,Mouse)
Expressed(Gene,Mouse)
• Constraints on causal relations
– For any Chemical c and Gene g, with prior probability p:
For all Mice m, Injected(c,m)  Expressed(g,m)
• Functional form of causal relations
– Causes of Expressed(g,m) are independent probabilistic
mechanisms, with causal strengths wi. An independent
background cause is always present with strength w0.
Evaluating a causal relationship
C
B
C
B
E
E
P(Graph 1) = p
P(Graph 0) =1 – p
P(D|Graph 1) P(Graph 1)
P(Graph 1|D) =
i P(D|Graph i) P(Graph i)
Humans
Bayesian
DP
Causal power
(Cheng, 1997)
c2
Generativity is essential
P(e+|c+)
P(e+|c-)
100
50
0
8/8
8/8
6/8
6/8
4/8
4/8
2/8
2/8
0/8
0/8
Bayesian
• Predictions result from “ceiling effect”
– ceiling effects only matter if you believe a cause
increases the probability of an effect
– follows from use of Noisy-OR (after Cheng, 1997)
Generativity is essential
Generic
Noisy-OR
• causes increase • probability
differs across
probability of
conditions
their effects
Noisy-AND-NOT
• causes decrease
probability of
their effects
Generativity is essential
Humans
Noisy-OR
Generic
Noisy AND-NOT
Manipulating functional form
Generic
Noisy-OR
• causes increase • probability
differs across
probability of
conditions
their effects
• appropriate for
generative
causes
• appropriate
for assessing
differences
Noisy-AND-NOT
• causes decrease
probability of
their effects
• appropriate for
preventive
causes
Manipulating functional form
Generative
Noisy-OR
Difference
Preventive
Generic
Noisy AND-NOT
Causal induction from contingency data
• The simplest case of causal learning: a single
cause-effect relationship and plentiful data
• Nonetheless, exhibits complex effects of prior
knowledge (in the assumed functional form)
• These effects reflect appropriate causal theories
Outline
contingency data
physical systems
perceived causality
The stick-ball machine
A
B
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Inferring hidden causal structure
• Can people accurately infer hidden causal
structure from small amounts of data?
• Kushnir et al. (2003): four kinds of structure
separate causes
common cause
A causes B
B causes A
Inferring hidden causal structure
separate causes
common cause
A causes B
B causes A
Common unobserved cause
4x
2x
2x
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Inferring hidden causal structure
separate causes
common cause
A causes B
B causes A
Common unobserved cause
4x
2x
2x
Independent unobserved causes
1x
2x
2x
2x
2x
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Inferring hidden causal structure
separate causes
common cause
A causes B
B causes A
Common unobserved cause
4x
2x
2x
Independent unobserved causes
1x
2x
2x
2x
2x
One observed cause
2x
4x
(Kushnir, Schulz, Gopnik, & Danks, 2003)
Probability
Common unobserved cause
Probability
separate
common
A causes B
Probability
common
A causes B
B causes A
One observed cause
separate
common
A causes B
common cause
A causes B
B causes A
B causes A
Independent unobserved causes
separate
separate causes
B causes A
Theory
• Ontology
– Types: Ball, HiddenCause, Trial
– Predicates: Moves(Ball, Trial), Active(HiddenCause, Trial)
• Plausible relations
– For any Ball a and Ball b (a  b), with prior probability p:
For all Trials t, Moves(a,t)  Moves(b,t)
– For some HiddenCause h and Ball b, with prior probability q:
For all Trials t, Active(h,t)  Moves(b,t)
• Functional form of causal relations
– Causes result in Moves(b,t) with probability w.
Otherwise, Moves(b,t) occurs with probability 0.
– Active(h,t) occurs with probability .
Hypotheses
(w)2 w(1-w) (1-w)w w
(1-w)
(1-w)
w2
w(1-w) (1-w)w
w
(1-w)
(1-w)
w2
w(1-w)
0
w
(1-w)
(1-)
w2
0
w(1-w)

(1-)
(1-w)
Probability
Common unobserved cause
Probability
separate
common
A causes B
Probability
common
A causes B
B causes A
One observed cause
separate
common
A causes B
common cause
A causes B
B causes A
B causes A
Independent unobserved causes
separate
separate causes
B causes A
bjects activate
e detector
Object A does not
activate the detector
by itself
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
Other physical systems
ard Blocking Condition
bjects activate
e detector
From blicket detectors…
Object Aactivates the
detector by itself
Oooh, it’s a
blicket!
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
…to lemur colonies
Outline
contingency data
physical systems
perceived causality
Michotte (1963)
Affected by…
– timing of events
– velocity of balls
– proximity
Nitro X
Affected by…
– timing of events
– velocity of balls
– proximity
(joint work with Liz Baraff)
Test trials
• Show explosions involving multiple cans
– allows inferences about causal structure
• For each trial, choose one of:
– chain reaction
– spontaneous explosions
– other
Theory
• Ontology
– Types: Can, HiddenCause
– Predicates:
ExplosionTime(Can), ActivationTime(HiddenCause)
• Constraints on causal relations
– For any Can y and Can x, with prior probability 1:
ExplosionTime(y)  ExplosionTime(x)
– For some HiddenCause c and Can x, with prior probability 1:
ActivationTime(c)  ExplosionTime(x)
• Functional form of causal relations
– Explosion at ActivationTime(c), and after appropriate delay
from ExplosionTime(y) with probability set by w.
Otherwise explosions occur with probability 0.
– Low probability of hidden causes activating.
Using the theory
Using the theory
• What kind of explosive is this?
spontaneity
volatility
rate
QuickTime™ and a
TIFF (LZW) decompressor
are needed to see this picture.
Using the theory
• What kind of explosive is this?
• What caused what?
Using the theory
• What kind of explosive is this?
• What caused what?
• What is the causal structure?
Testing a prediction of the theory
• Evidence for a hidden cause should increase with
the number of simultaneous explosions
• Four groups of 16 participants saw displays using
m = 2, 3, 4, or 6 cans
• For each trial, choose one of:
– chain reaction
– spontaneous explosions
– other
coded for reference to hidden cause
Probability of hidden cause
c2(3) = 11.36, p < .01
Number of canisters
Gradual transition from few to most identifying hidden cause
Further predictions
• Explains chain reaction inferences
• Attribution of causality should be sensitive to
interaction between time and distance
• Simultaneous explosions that occur sooner
provide stronger evidence for common cause
Three kinds of causal induction
less data
more constrained
more data
less constrained
contingency data
physical systems
prior knowledge
+
statistical inference
perceived causality
Combining knowledge and statistics
• How do people...
– identify causal relationships from small samples?
– learn hidden causal structure with ease?
– reason about complex dynamic causal systems?
• Constraints from knowledge + powerful statistics
• Key ideas:
– prior knowledge expressed in causal theory
– theory generates hypothesis space for inference
Further questions
• Are there unifying principles across theories?
Functional form
• Stick-balls:
– Causes result in Moves(b,t) with probability w.
Otherwise, Moves(b,t) occurs with probability 0.
• Nitro X:
– Explosion at ActivationTime(c), and after appropriate delay
from ExplosionTime(y), with probability set by
wOtherwise explosions occur with probability 0.
1. Each force acting on a system has an opportunity to change its state
2. Without external influence a system will not change its state
Further questions
• Are there unifying principles across theories?
• How are theories learned?
Learning causal theories
Theory
Ontology
Plausible relations
Functional form
generates
Hypothesis space
Y
X
Z
X
Y
X
Y
Z
Z
X
Y
Z
generates
Data
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Bayesian
inference
Learning causal theories
Theory
Ontology
Plausible relations
Functional form
generates
Hypothesis space
Y
X
Z
X
Y
X
Y
Z
Z
X
Y
Z
generates
Data
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Learning causal theories
Theory
Bayesian
inference
Ontology
Plausible relations
Functional form
generates
Hypothesis space
Y
X
Z
X
Y
X
Y
Z
Z
X
Y
Z
generates
Data
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Case
1
2
3
4
X
1
0
1
0
...
Y Z
0 1
1 1
1 1
0 0
Further questions
• Are there unifying principles across theories?
• How are theories learned?
• What is an appropriate prior over theories?
Causal induction with rates
• Different functional form results in models
that apply to different kinds of data
• Rate: number of times effect occurs in time
interval, in presence and absence of cause
Does the electric field cause
the mineral to emit particles?
Theory
• Ontology
– Types: Mineral, Field, Time
– Predicates: Emitted(Mineral,Time), Active(Field,Time)
• Plausible relations
– For any Mineral m and Field f, with prior probability p:
Active(f,t)  Emitted(m,t)
For all Times t,
• Functional form of causal relations
– Causes of Emitted(m,t) are independent probabilistic mechanisms, with
causal strengths wi. An independent background cause is always present with
strength w0.
– Implies number of emissions is a Poisson process, with rate at time t given
by w0 + Active(f,t) w1.
Causal induction with rates
Rate(e | c  )
Rate(e | c  )
Humans
DR
Power (N = 150)
Bayesian
Learning causal theories
• T1: bacteria die at random
• T2: bacteria die at random, or in waves
P(wave|T2) > P(wave|T1)
• Having inferred the existence of a new force,
need to find a mechanism...
Lemur colonies
A researcher in Madagascar is studying the effects of environmental resources
on the location of lemur colonies. She has studied twelve different parts of
Madagascar, and is trying to establish which areas show evidence of being
affected by the distribution of resources in order to decide where she should
focus her research.
Change in...
Number
Ratio
Location
Spread
(uniform)
Human data
Theory
• Ontology
– Types: Colony, Resource
– Predicates: Location(Colony), Location(Resource)
• Plausible relations
– For any Colony c and Resource r, with probability p:
Location(r)  Location(c)
• Functional form of causal relations
– Without a hidden cause, Location(c) is uniform
– With a hidden cause r, Location(c) is Gaussian with mean
Location(r) and covariance matrix 
– Location(r) is uniform
Is there a resource?
C
No:
x
x
x
x
x
Yes:
x
x
x
x
x
sum over all structures
uniform
uniform
+
regularity
sum over all regularities
Change in...
Number
Ratio
Location
Spread
(uniform)
Human data
Bayesian
Schulz & Gopnik (in press)
A
B
C
E
1
0
0
0
0
1
0
0
0
0
1
1
1
1
1
1
Schulz & Gopnik (in press)
A
B
C
E
1
0
0
0
0
1
0
0
Biology
Ahchoo!
0
0
1
1
Ahchoo!
1
1
1
1
Schulz & Gopnik (in press)
A
B
C
E
1
0
0
0
0
1
0
0
Biology
Psychology
Ahchoo!
Eek!
0
0
1
1
Ahchoo!
1
1
1
1
Eek!
Common functional form
• A theory of sneezing
– a flower is a cause with probability 
– no sneezing without a cause
– causes each produce sneezing with probability 
• A theory of fear
– an animal is a cause with probability 
– no fear without a cause
– a cause produces fear with probability 
Common functional form
A
B
C
E
1
0
0
0
0
1
0
0
0
0
1
1
1
1
1
1
• Children: choose just C, never just A or just B
Common functional form
A
B
C
E
(1-)3
A B C
1
0
0
0
E
0
1
0
0
0
0
1
1
1
1
1
1
(1-)2
2(1-)
3
• Children: choose just C, never just A or just B
Common functional form
(1-)2
A
B
C
E
1
0
0
0
0
1
0
0
0
0
1
1
1
1
1
1
2(1-)
3
• Children: choose just C, never just A or just B
• Bayes: just C is preferred, never just A or just B
Inter-domain causation
• Physical: noise-making machine
– A & B are magnetic buttons, C is talking
• Psychological: confederate giggling
– A & B are silly faces, C is a switch
• Procedure:
– baseline: which could be causes?
– trials: same contingencies as Experiment 3
– test: which are causes?
(Schulz & Gopnik, in press, Experiment 4)
Inter-domain causation
• A theory with inter-domain causes
–
–
–
–
intra-domain entities are causes with probability 1
inter-domain entities are causes with probability 0
no effect occurs without a cause
causes produce effects with probability 
• Lower prior probability for inter-domain causes
(i.e. 0 much lower than 1)
A problem with priors?
• If lack of mechanism results in lower prior
probability, shouldn’t inferences change?
• Intra-domain causes (Experiment 3):
– biological:
– psychological:
78% took C
67% took C
• Inter-domain causes (Experiment 4):
– physics:
– psychological:
75% took C
81% took C
A
B
C
E
1
0
0
0
0
1
0
0
0
0
1
1
1
1
1
1
(1- 0)(1-1)2
A B C
(1- 0)(1-1)1
0(1-1)2
E
(1-0)12
01(1-1)
0 1 2
0(1-1)2
A
B
C
E
1
0
0
0
0
1
0
0
0
0
1
1
1
1
1
1
01(1-1)
0 1 2
0(1-1)2
A
B
C
E
1
0
0
0
0
1
0
0
0
0
1
1
1
1
1
1
01(1-1)
0 1 2
A direct test of inter-domain priors
• Ambiguous causes:
– A and C together produce E
– B and C together produce E
– A and B and C together produce E
• For C intra-domain, choose C (Sobel et al., in press)
• For C inter-domain, should choose A and B
The plausibility matrix
Identifies plausible
causal graphs
Grounded predicates
Grounded predicates
Plausibility of relation
M=
Injected(c1)
Injected(c2)
Injected(c3)
Expressed(g1)
Expressed(g2)
Expressed(g3)
Injected(c1)
Injected(c2)
Injected(c3)
Expressed(g1)
Expressed(g2)
Expressed(g3)
Entities: c1, c2, c3, g1, g2, g3
Predicates: Injected, Expressed
1 1 1
1 1 1
1 1 1
The Chomsky hierarchy
•
•
•
•
Languages
Machines
Type 0 (computable)
Type 1 (context sensitive)
Type 2 (context free)
Type 3 (regular)
Turing machine
Bounded TM
Push-down automaton
Finite state automaton
Languages in each class a strict subset of higher classes
(Chomsky, 1956)
Grammaticality and plausibility
sentences
• Grammar:
– indicates admissibility of (infinitely
many) sentences generated from
terminals
• Theory:
predicates
– indicates plausibility of (infinitely
many) relations generated from
grounded predicates
predicates
Download