Theory-based Causal Induction - Computational Cognitive Science

advertisement
Bayesian models as a tool for
revealing inductive biases
Tom Griffiths
University of California, Berkeley
Inductive problems
Learning languages from utterances
blicket toma
dax wug
blicket wug
SXY
X  {blicket,dax}
Y  {toma, wug}
Learning categories from instances of their members
Learning functions from (x,y) pairs
Revealing inductive biases
• Many problems in cognitive science can be
formulated as problems of induction
– learning languages, concepts, and causal relations
• Such problems are not solvable without bias
(e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995)
• What biases guide human inductive inferences?
How can computational models be used to
investigate human inductive biases?
Models and inductive biases
• Transparent
Bayesian models
QuickTime™ and a
TIFF (Uncompressed) decompressor
are needed to see this picture.
Reverend Thomas Bayes

Bayes’ theorem
Posterior
probability
Likelihood
Prior
probability
P(d | h)P(h)
P(h | d) 
 P(d | h)P(h)
h  H
h: hypothesis
d: data
Sum over space
of hypotheses
Three advantages of Bayesian models
• Transparent identification of inductive biases
through hypothesis space, prior, and likelihood
• Opportunity to explore a range of biases expressed
in terms that are natural to the problem at hand
• Rational statistical inference provides an upper
bound on human inferences from data
Two examples
Causal induction from small samples
(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation
(Sharon Goldwater, Mark Johnson)
Two examples
Causal induction from small samples
(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation
(Sharon Goldwater, Mark Johnson)
Both objects activate
the detector
Blicket detector
Object A does not
activate the detector
by itself
Chi
each
Then
they
maket
(Dave Sobel, Alison Gopnik,
and
colleagues)
Backward Blocking Condition
Procedure used in Sobel et al. (2002), Experiment 2
e Condition
See this? It’s a
blicket machine.
Blickets
Blocking
Condition make it go.
s activate
tector
Object A does not
activate the detector
by itself
s activate
tector
Object Aactivates the
detector by itself
Both objects activate
the detector
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
Let’s put this one
on the machine.
Children are asked if
each is a blicket
Thenare asked to
they
Object Aactivates the
detector by itself
Oooh, it’s a
blicket!
Chi
each
Then
they
maket
Both objects activate
the detector
Object A does not
activate the detector
by itself
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
“One cause”
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Figure 13: Procedure used in Sobel et al. (2002), Experiment
02), Experiment 2
Figure 13: 2Procedure used in
One-Cause Condition
Backward Blocking Condition
One-Cause Condition
One-Cause Condition
(Gopnik, Sobel, Schulz, & Glymour, 2001)
Both objects activate
the detector
Both objects
activate
Children
are
ifif
Children
are asked
asked
the
detector
each
is
a
blicket
each is a blicket
Then
they
Thenare
they
are asked
asked to
to
make
tthe
make
he machine
machine go
go
A
B
Object A does not
activate the
detector
Object
Aactivates
Both
objects
activate the
by itselfthe
detector
by itself
detector
A Trial
Children are asked if
each is a blicket
Children
are not
asked if
Object
A does
Thenare asked
they
to isthe
each
a blicket
activate
detector
makethe machine
Then
go
they
are
asked to
by itself
makethe machine go
B Trial
AB Trial
Children
asked if
Both
objectsare
activate
eachthe
is detector
a blicket
Thenare asked to
they
makethe machine go
Backward Blocking Condition
–
–
–
–
Backward
Blocking
Two objects:
A and
BCondition
Backward Blocking Conditio
Trial 1: A B on detector – detector active
Trial 2: B on detector – detector inactive
4-year-olds judge whether each object is a blicket
Both objects activate
the detector
Children
Children are
are asked
asked ifif
each
is
a
blicket
each is a blicket
Then
they
Thenare
they
are asked
asked to
to
make
tthe
make
he machine
machine go
go
Object Aactivates the
detectorBoth
by itself
objects activate
the detector
Children are asked if
each is a blicket
Object Aactivates the
Thenare asked
they
to
detector
by itself
makethe machine go
Children
asked if
Both
objectsare
activate
eachthe
is detector
a blicket
Thenare asked to
they
makethe machine go
• A: a blicket (100% say yes)
• B: almost certainly not a blicket (16% say yes)
Hypotheses: causal models
A
B
E
A
B
E
A
B
E
A
B
E
Defines probability distribution over variables
(for both observation, and intervention)
(Pearl, 2000; Spirtes, Glymour, & Scheines, 1993)
Prior and likelihood: causal theory
• Prior probability an object is a blicket is q
– defines a distribution over causal models
• Detectors have a deterministic “activation law”
– always activate if a blicket is on the detector
– never activate otherwise
(Tenenbaum & Griffiths, 2003; Griffiths, 2005)
Prior and likelihood: causal theory
P(h00) = (1 – q)2
A
P(E=1 | A=0, B=0):
P(E=0 | A=0, B=0):
P(E=1 | A=1, B=0):
P(E=0 | A=1, B=0):
P(E=1 | A=0, B=1):
P(E=0 | A=0, B=1):
P(E=1 | A=1, B=1):
P(E=0 | A=1, B=1):
B
P(h01) = (1 – q) q
A
B
P(h10) = q(1 – q)
A
B
P(h11) = q2
A
B
E
E
E
E
0
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
Modeling “one cause”
P(h00) = (1 – q)2
A
P(E=1 | A=0, B=0):
P(E=0 | A=0, B=0):
P(E=1 | A=1, B=0):
P(E=0 | A=1, B=0):
P(E=1 | A=0, B=1):
P(E=0 | A=0, B=1):
P(E=1 | A=1, B=1):
P(E=0 | A=1, B=1):
B
P(h01) = (1 – q) q
A
B
P(h10) = q(1 – q)
A
B
P(h11) = q2
A
B
E
E
E
E
0
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
Modeling “one cause”
P(h01) = (1 – q) q
A
P(E=1 | A=0, B=0):
P(E=0 | A=0, B=0):
P(E=1 | A=1, B=0):
P(E=0 | A=1, B=0):
P(E=1 | A=0, B=1):
P(E=0 | A=0, B=1):
P(E=1 | A=1, B=1):
P(E=0 | A=1, B=1):
0
1
0
1
0
1
0
1
B
P(h10) = q(1 – q)
A
B
P(h11) = q2
A
B
E
E
E
0
1
0
1
1
0
1
0
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
Modeling “one cause”
P(h10) = q(1 – q)
A is definitely a blicket
B is definitely not a blicket
P(E=1 | A=0, B=0):
P(E=0 | A=0, B=0):
P(E=1 | A=1, B=0):
P(E=0 | A=1, B=0):
P(E=1 | A=0, B=1):
P(E=0 | A=0, B=1):
P(E=1 | A=1, B=1):
P(E=0 | A=1, B=1):
0
1
0
1
0
1
0
1
0
1
0
1
1
0
1
0
A
B
E
0
1
1
0
0
1
1
0
0
1
1
0
1
0
1
0
Both objects activate
the detector
Object A does not
activate the detector
by itself
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
“One cause”
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Figure 13: Procedure used in Sobel et al. (2002), Experiment
02), Experiment 2
Figure 13: 2Procedure used in
One-Cause Condition
Backward Blocking Condition
One-Cause Condition
One-Cause Condition
(Gopnik, Sobel, Schulz, & Glymour, 2001)
Both objects activate
the detector
Both objects
activate
Children
are
ifif
Children
are asked
asked
the
detector
each
is
a
blicket
each is a blicket
Then
they
Thenare
they
are asked
asked to
to
make
tthe
make
he machine
machine go
go
A
B
Object A does not
activate the
detector
Object
Aactivates
Both
objects
activate the
by itselfthe
detector
by itself
detector
A Trial
Children are asked if
each is a blicket
Children
are not
asked if
Object
A does
Thenare asked
they
to isthe
each
a blicket
activate
detector
makethe machine
Then
go
they
are
asked to
by itself
makethe machine go
B Trial
AB Trial
Children
asked if
Both
objectsare
activate
eachthe
is detector
a blicket
Thenare asked to
they
makethe machine go
Backward Blocking Condition
–
–
–
–
Backward
Blocking
Two objects:
A and
BCondition
Backward Blocking Conditio
Trial 1: A B on detector – detector active
Trial 2: B on detector – detector inactive
4-year-olds judge whether each object is a blicket
Both objects activate
the detector
Children
Children are
are asked
asked ifif
each
is
a
blicket
each is a blicket
Then
they
Thenare
they
are asked
asked to
to
make
tthe
make
he machine
machine go
go
Object Aactivates the
detectorBoth
by itself
objects activate
the detector
Children are asked if
each is a blicket
Object Aactivates the
Thenare asked
they
to
detector
by itself
makethe machine go
Children
asked if
Both
objectsare
activate
eachthe
is detector
a blicket
Thenare asked to
they
makethe machine go
• A: a blicket (100% say yes)
• B: almost certainly not a blicket (16% say yes)
Building on this analysis
• Transparent
Other physical systems
From stick-ball machines…
(Kushnir, Schulz, Gopnik, & Danks, 2003)
(Griffiths, Baraff, & Tenenbaum, 2004)
…to lemur colonies
(Griffiths & Tenenbaum, 2007)
Two examples
Causal induction from small samples
(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation
(Sharon Goldwater, Mark Johnson)
Bayesian segmentation
• In the domain of segmentation, we have:
– Data: unsegmented corpus (transcriptions).
– Hypotheses: sequences of word tokens.
= 1 if concatenating words forms corpus,
= 0 otherwise.
Encodes assumptions about
the structure of language
• Optimal solution is the segmentation with
highest prior probability
Brent (1999)
• Describes a Bayesian unigram model for
segmentation.
– Prior favors solutions with fewer words, shorter words.
• Problems with Brent’s system:
– Learning algorithm is approximate (non-optimal).
– Difficult to extend to incorporate bigram info.
A new unigram model (Dirichlet process)
Assume word wi is generated as follows:
1. Is wi a novel lexical item?
P ( yes ) 

n 
n
P ( no) 
n 
Fewer word types =
Higher probability
A new unigram model (Dirichlet process)
Assume word wi is generated as follows:
2. If novel, generate phonemic form x1…xm :
m
P( wi  x1... xm )   P( xi )
i 1
Shorter words =
Higher probability
If not, choose lexical identity of wi from previously
occurring words:
count (l )
P( wi  l ) 
n
Power law =
Higher probability
Unigram model: simulations
• Same corpus as Brent (Bernstein-Ratner, 1987):
– 9790 utterances of phonemically transcribed
child-directed speech (19-23 months).
– Average utterance length: 3.4 words.
– Average word length:
2.9 phonemes.
yuwanttusiD6bUk
• Example input:
lUkD*z6b7wIThIzh&t
&nd6dOgi
yuwanttulUk&tDIs
...
Example results
What happened?
• Model assumes (falsely) that words have
the same probability regardless of context.
P(D&t) = .024
P(D&t|WAts) = .46
P(D&t|tu) = .0019
• Positing amalgams allows the model to
capture word-to-word dependencies.
What about other unigram models?
• Brent’s learning algorithm is insufficient to
identify the optimal segmentation.
– Our solution has higher probability under his
model than his own solution does.
– On randomly permuted corpus, our system
achieves 96% accuracy; Brent gets 81%.
• Formal analysis shows undersegmentation is
the optimal solution for any (reasonable)
unigram model.
Bigram model (hierachical Dirichlet process)
Assume word wi is generated as follows:
1. Is (wi-1,wi) a novel bigram?
nw

P( yes ) 
P(no) 
nw 
nw 
i 1
i 1
i 1
2. If novel, generate wi using unigram model
(almost).
If not, choose lexical identity of wi from
count (l ' , l )
words
previously
occurring after wi-1.
P( w
i  l | wi 1  l ' ) 
count (l ' )
Example results
Conclusions
• Both adults and children are sensitive to the
nature of mechanisms in using covariation
• Both adults and children can use covariation to
make inferences about the nature of mechanisms
• Bayesian inference provides a formal framework
for understanding how statistics and knowledge
interact in making these inferences
– how theories constrain hypotheses, and are learned
A probabilistic mechanism?
• Children in Gopnik et al. (2001) who said that
B was a blicket had seen evidence that the
detector was probabilistic
– one block activated detector 5/6 times
• Replace the deterministic “activation law”…
– activate with p = 1- if a blicket is on the detector
– never activate otherwise
Deterministic vs. probabilistic
Deterministic
Probabilistic
One cause
mechanism knowledge affects
intepretation of contingency data
Manipulating mechanisms
Both objects activate
the detector
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Object
A does not
Both objects
activate
activate
the detector
the
detector
by itself
Children
Object
A doesare
notasked if
each
a blicket
activate
theisdetector
Then
they
asked to
by are
itself
makethe machine go
One-Cause Condition
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Childr
each is
Thenare
they
makethe
I.One-Cause
Familiarization
phase: Establish
nature of mechanism
Backward Blocking Condition
ConditionBackward Blocking Condition
same block
Both objects activate
the detector
Object A does not
activate the detector
by itself
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
sed
et al.
(2002), Experiment
2
el
et in
al.Sobel
(2002),
Experiment
2
Both
objects
activate
Object in
Aactivates
theet
Children
Both
objects
activate
Object
A does
not
Children
are
asked
Bothused
objects
activate
Object Experiment
Aactivatesare
theasked if2
Figure
13:
Procedure
Sobel
al.if (2002),
One-Cause Condition
each
is
a
blicket
eachbyisCondition
a blicket
the activate
detectorthe detector
itself
the detector
the detector
detector by itself
Backward Blocking
Thenare asked to
Thenare asked to
they
they
by itself
One-Cause Condition
make
makethe machine go
the machine go
II. Test phase: one cause
Children are a
each is a blicke
Thenare asked t
they
makethe machin
Childr
each is
Thenare
they
makethe
Backward Blocking Condition
Both objects activate
the detector
A does not
t A does notObject
activate the detector
e the detector
by itself
by itself
Object A does not
Children are asked if
each
is a blicket
activate
the
detector
Both
objects
activate
Object Aactivates the
are asked if
Children are askedChildren
if
Then
they
are
asked
to
by
itself
Both
objects
activate
Object
A
does
not
the
detector
detector by itself
each
is
a
blicket
each is a blicket
make
t
he
machine
go
the
detector
activate
the
detector
Then
Thenare asked to they are asked to
they
by itself
makethe machine go
makethe machine go
A
B
B Trial
AB Trial
Children are a
each
Children are asked
if is a blicke
Thenare asked t
each is a blicket they
Thenare asked to makethe machin
they
makethe machine go
At end of the test phase, adults judge the probability
Backward Blocking Condition
ndition that each object is a blicket
Backward Blocking Condition
Both objects activate
the detector
Object Aactivates the
detector by itself
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
Manipulating mechanisms
(n = 12 undergraduates per condition)
Deterministic
Probabilistic
One cause
Bayes
People
Manipulating mechanisms
(n = 12 undergraduates per condition)
Deterministic
Bayes
People
Probabilistic
One cause
One control
Three control
Acquiring mechanism knowledge
Both objects activate
the detector
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Object
A does not
Both objects
activate
activate
the detector
the
detector
by itself
Children
Object
A doesare
notasked if
each
a blicket
activate
theisdetector
Then
they
asked to
by are
itself
makethe machine go
One-Cause Condition
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
Childr
each is
Thenare
they
makethe
I.One-Cause
Familiarization
phase: Establish
nature of mechanism
Backward Blocking Condition
ConditionBackward Blocking Condition
same block
Both objects activate
the detector
Object A does not
activate the detector
by itself
Figure 13: Procedure used in Sobel et al. (2002), Experiment 2
sed
et al.
(2002), Experiment
2
el
et in
al.Sobel
(2002),
Experiment
2
Both
objects
activate
Object in
Aactivates
theet
Children
Both
objects
activate
Object
A does
not
Children
are
asked
Bothused
objects
activate
Object Experiment
Aactivatesare
theasked if2
Figure
13:
Procedure
Sobel
al.if (2002),
One-Cause Condition
each
is
a
blicket
eachbyisCondition
a blicket
the activate
detectorthe detector
itself
the detector
the detector
detector by itself
Backward Blocking
Thenare asked to
Thenare asked to
they
they
by itself
One-Cause Condition
make
makethe machine go
the machine go
II. Test phase: one cause
Children are a
each is a blicke
Thenare asked t
they
makethe machin
Childr
each is
Thenare
they
makethe
Backward Blocking Condition
Both objects activate
the detector
A does not
t A does notObject
activate the detector
e the detector
by itself
by itself
Object A does not
Children are asked if
each
is a blicket
activate
the
detector
Both
objects
activate
Object Aactivates the
are asked if
Children are askedChildren
if
Then
they
are
asked
to
by
itself
Both
objects
activate
Object
A
does
not
the
detector
detector by itself
each
is
a
blicket
each is a blicket
make
t
he
machine
go
the
detector
activate
the
detector
Then
Thenare asked to they are asked to
they
by itself
makethe machine go
makethe machine go
A
B
B Trial
AB Trial
Children are a
each
Children are asked
if is a blicke
Thenare asked t
each is a blicket they
Thenare asked to makethe machin
they
makethe machine go
At end of the test phase, adults judge the probability
Backward Blocking Condition
ndition that each object is a blicket
Backward Blocking Condition
Both objects activate
the detector
Object Aactivates the
detector by itself
Children are asked if
each is a blicket
Thenare asked to
they
makethe machine go
Results with children
• Tested 24 four-year-olds (mean age 54 months)
• Instead of rating, yes or no response
• Significant difference in one cause B responses
– deterministic: 8% say yes
– probabilistic: 79% say yes
• No significant difference in one control trials
– deterministic: 4% say yes
– probabilistic: 21% say yes
(Griffiths & Sobel, submitted)
Comparison to previous results
• Proposed boundaries are more accurate than
Brent’s, but
fewer proposals
are made.
Boundary
Boundary
Precision
Recall
Brent
.80
.85
GGJ
.92
.62
Precision: #correct / #found
=
[= hits / (hits + false alarms)]
Recall:
=
[= hits / (hits + misses)]
F-score:
Tokentokens
F-score are less accurate.
• Result: word
Brent
.68
GGJ
.54
#found / #true
an average of
precision and recall.
Quantitative evaluation
• Compared to unigram model, more boundaries are
proposed, with no loss in accuracy:
Boundary
Precision
Boundary
Recall
GGJ (unigram)
.92
.62
GGJ (bigram)
.92
.84
Token F-score
Type F-score
• Accuracy is higher than previous models:
Brent (unigram)
GGJ (bigram)
.68
.77
.52
.63
Two examples
Causal induction from small samples
(Josh Tenenbaum, David Sobel, Alison Gopnik)
Statistical learning and word segmentation
(Sharon Goldwater, Mark Johnson)
Download