Bayesian models as a tool for revealing inductive biases Tom Griffiths University of California, Berkeley Inductive problems Learning languages from utterances blicket toma dax wug blicket wug SXY X {blicket,dax} Y {toma, wug} Learning categories from instances of their members Learning functions from (x,y) pairs Revealing inductive biases • Many problems in cognitive science can be formulated as problems of induction – learning languages, concepts, and causal relations • Such problems are not solvable without bias (e.g., Goodman, 1955; Kearns & Vazirani, 1994; Vapnik, 1995) • What biases guide human inductive inferences? How can computational models be used to investigate human inductive biases? Models and inductive biases • Transparent Bayesian models QuickTime™ and a TIFF (Uncompressed) decompressor are needed to see this picture. Reverend Thomas Bayes Bayes’ theorem Posterior probability Likelihood Prior probability P(d | h)P(h) P(h | d) P(d | h)P(h) h H h: hypothesis d: data Sum over space of hypotheses Three advantages of Bayesian models • Transparent identification of inductive biases through hypothesis space, prior, and likelihood • Opportunity to explore a range of biases expressed in terms that are natural to the problem at hand • Rational statistical inference provides an upper bound on human inferences from data Two examples Causal induction from small samples (Josh Tenenbaum, David Sobel, Alison Gopnik) Statistical learning and word segmentation (Sharon Goldwater, Mark Johnson) Two examples Causal induction from small samples (Josh Tenenbaum, David Sobel, Alison Gopnik) Statistical learning and word segmentation (Sharon Goldwater, Mark Johnson) Both objects activate the detector Blicket detector Object A does not activate the detector by itself Chi each Then they maket (Dave Sobel, Alison Gopnik, and colleagues) Backward Blocking Condition Procedure used in Sobel et al. (2002), Experiment 2 e Condition See this? It’s a blicket machine. Blickets Blocking Condition make it go. s activate tector Object A does not activate the detector by itself s activate tector Object Aactivates the detector by itself Both objects activate the detector Children are asked if each is a blicket Thenare asked to they makethe machine go Let’s put this one on the machine. Children are asked if each is a blicket Thenare asked to they Object Aactivates the detector by itself Oooh, it’s a blicket! Chi each Then they maket Both objects activate the detector Object A does not activate the detector by itself Children are asked if each is a blicket Thenare asked to they makethe machine go “One cause” Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Figure 13: Procedure used in Sobel et al. (2002), Experiment 02), Experiment 2 Figure 13: 2Procedure used in One-Cause Condition Backward Blocking Condition One-Cause Condition One-Cause Condition (Gopnik, Sobel, Schulz, & Glymour, 2001) Both objects activate the detector Both objects activate Children are ifif Children are asked asked the detector each is a blicket each is a blicket Then they Thenare they are asked asked to to make tthe make he machine machine go go A B Object A does not activate the detector Object Aactivates Both objects activate the by itselfthe detector by itself detector A Trial Children are asked if each is a blicket Children are not asked if Object A does Thenare asked they to isthe each a blicket activate detector makethe machine Then go they are asked to by itself makethe machine go B Trial AB Trial Children asked if Both objectsare activate eachthe is detector a blicket Thenare asked to they makethe machine go Backward Blocking Condition – – – – Backward Blocking Two objects: A and BCondition Backward Blocking Conditio Trial 1: A B on detector – detector active Trial 2: B on detector – detector inactive 4-year-olds judge whether each object is a blicket Both objects activate the detector Children Children are are asked asked ifif each is a blicket each is a blicket Then they Thenare they are asked asked to to make tthe make he machine machine go go Object Aactivates the detectorBoth by itself objects activate the detector Children are asked if each is a blicket Object Aactivates the Thenare asked they to detector by itself makethe machine go Children asked if Both objectsare activate eachthe is detector a blicket Thenare asked to they makethe machine go • A: a blicket (100% say yes) • B: almost certainly not a blicket (16% say yes) Hypotheses: causal models A B E A B E A B E A B E Defines probability distribution over variables (for both observation, and intervention) (Pearl, 2000; Spirtes, Glymour, & Scheines, 1993) Prior and likelihood: causal theory • Prior probability an object is a blicket is q – defines a distribution over causal models • Detectors have a deterministic “activation law” – always activate if a blicket is on the detector – never activate otherwise (Tenenbaum & Griffiths, 2003; Griffiths, 2005) Prior and likelihood: causal theory P(h00) = (1 – q)2 A P(E=1 | A=0, B=0): P(E=0 | A=0, B=0): P(E=1 | A=1, B=0): P(E=0 | A=1, B=0): P(E=1 | A=0, B=1): P(E=0 | A=0, B=1): P(E=1 | A=1, B=1): P(E=0 | A=1, B=1): B P(h01) = (1 – q) q A B P(h10) = q(1 – q) A B P(h11) = q2 A B E E E E 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 Modeling “one cause” P(h00) = (1 – q)2 A P(E=1 | A=0, B=0): P(E=0 | A=0, B=0): P(E=1 | A=1, B=0): P(E=0 | A=1, B=0): P(E=1 | A=0, B=1): P(E=0 | A=0, B=1): P(E=1 | A=1, B=1): P(E=0 | A=1, B=1): B P(h01) = (1 – q) q A B P(h10) = q(1 – q) A B P(h11) = q2 A B E E E E 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 Modeling “one cause” P(h01) = (1 – q) q A P(E=1 | A=0, B=0): P(E=0 | A=0, B=0): P(E=1 | A=1, B=0): P(E=0 | A=1, B=0): P(E=1 | A=0, B=1): P(E=0 | A=0, B=1): P(E=1 | A=1, B=1): P(E=0 | A=1, B=1): 0 1 0 1 0 1 0 1 B P(h10) = q(1 – q) A B P(h11) = q2 A B E E E 0 1 0 1 1 0 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 Modeling “one cause” P(h10) = q(1 – q) A is definitely a blicket B is definitely not a blicket P(E=1 | A=0, B=0): P(E=0 | A=0, B=0): P(E=1 | A=1, B=0): P(E=0 | A=1, B=0): P(E=1 | A=0, B=1): P(E=0 | A=0, B=1): P(E=1 | A=1, B=1): P(E=0 | A=1, B=1): 0 1 0 1 0 1 0 1 0 1 0 1 1 0 1 0 A B E 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 Both objects activate the detector Object A does not activate the detector by itself Children are asked if each is a blicket Thenare asked to they makethe machine go “One cause” Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Figure 13: Procedure used in Sobel et al. (2002), Experiment 02), Experiment 2 Figure 13: 2Procedure used in One-Cause Condition Backward Blocking Condition One-Cause Condition One-Cause Condition (Gopnik, Sobel, Schulz, & Glymour, 2001) Both objects activate the detector Both objects activate Children are ifif Children are asked asked the detector each is a blicket each is a blicket Then they Thenare they are asked asked to to make tthe make he machine machine go go A B Object A does not activate the detector Object Aactivates Both objects activate the by itselfthe detector by itself detector A Trial Children are asked if each is a blicket Children are not asked if Object A does Thenare asked they to isthe each a blicket activate detector makethe machine Then go they are asked to by itself makethe machine go B Trial AB Trial Children asked if Both objectsare activate eachthe is detector a blicket Thenare asked to they makethe machine go Backward Blocking Condition – – – – Backward Blocking Two objects: A and BCondition Backward Blocking Conditio Trial 1: A B on detector – detector active Trial 2: B on detector – detector inactive 4-year-olds judge whether each object is a blicket Both objects activate the detector Children Children are are asked asked ifif each is a blicket each is a blicket Then they Thenare they are asked asked to to make tthe make he machine machine go go Object Aactivates the detectorBoth by itself objects activate the detector Children are asked if each is a blicket Object Aactivates the Thenare asked they to detector by itself makethe machine go Children asked if Both objectsare activate eachthe is detector a blicket Thenare asked to they makethe machine go • A: a blicket (100% say yes) • B: almost certainly not a blicket (16% say yes) Building on this analysis • Transparent Other physical systems From stick-ball machines… (Kushnir, Schulz, Gopnik, & Danks, 2003) (Griffiths, Baraff, & Tenenbaum, 2004) …to lemur colonies (Griffiths & Tenenbaum, 2007) Two examples Causal induction from small samples (Josh Tenenbaum, David Sobel, Alison Gopnik) Statistical learning and word segmentation (Sharon Goldwater, Mark Johnson) Bayesian segmentation • In the domain of segmentation, we have: – Data: unsegmented corpus (transcriptions). – Hypotheses: sequences of word tokens. = 1 if concatenating words forms corpus, = 0 otherwise. Encodes assumptions about the structure of language • Optimal solution is the segmentation with highest prior probability Brent (1999) • Describes a Bayesian unigram model for segmentation. – Prior favors solutions with fewer words, shorter words. • Problems with Brent’s system: – Learning algorithm is approximate (non-optimal). – Difficult to extend to incorporate bigram info. A new unigram model (Dirichlet process) Assume word wi is generated as follows: 1. Is wi a novel lexical item? P ( yes ) n n P ( no) n Fewer word types = Higher probability A new unigram model (Dirichlet process) Assume word wi is generated as follows: 2. If novel, generate phonemic form x1…xm : m P( wi x1... xm ) P( xi ) i 1 Shorter words = Higher probability If not, choose lexical identity of wi from previously occurring words: count (l ) P( wi l ) n Power law = Higher probability Unigram model: simulations • Same corpus as Brent (Bernstein-Ratner, 1987): – 9790 utterances of phonemically transcribed child-directed speech (19-23 months). – Average utterance length: 3.4 words. – Average word length: 2.9 phonemes. yuwanttusiD6bUk • Example input: lUkD*z6b7wIThIzh&t &nd6dOgi yuwanttulUk&tDIs ... Example results What happened? • Model assumes (falsely) that words have the same probability regardless of context. P(D&t) = .024 P(D&t|WAts) = .46 P(D&t|tu) = .0019 • Positing amalgams allows the model to capture word-to-word dependencies. What about other unigram models? • Brent’s learning algorithm is insufficient to identify the optimal segmentation. – Our solution has higher probability under his model than his own solution does. – On randomly permuted corpus, our system achieves 96% accuracy; Brent gets 81%. • Formal analysis shows undersegmentation is the optimal solution for any (reasonable) unigram model. Bigram model (hierachical Dirichlet process) Assume word wi is generated as follows: 1. Is (wi-1,wi) a novel bigram? nw P( yes ) P(no) nw nw i 1 i 1 i 1 2. If novel, generate wi using unigram model (almost). If not, choose lexical identity of wi from count (l ' , l ) words previously occurring after wi-1. P( w i l | wi 1 l ' ) count (l ' ) Example results Conclusions • Both adults and children are sensitive to the nature of mechanisms in using covariation • Both adults and children can use covariation to make inferences about the nature of mechanisms • Bayesian inference provides a formal framework for understanding how statistics and knowledge interact in making these inferences – how theories constrain hypotheses, and are learned A probabilistic mechanism? • Children in Gopnik et al. (2001) who said that B was a blicket had seen evidence that the detector was probabilistic – one block activated detector 5/6 times • Replace the deterministic “activation law”… – activate with p = 1- if a blicket is on the detector – never activate otherwise Deterministic vs. probabilistic Deterministic Probabilistic One cause mechanism knowledge affects intepretation of contingency data Manipulating mechanisms Both objects activate the detector Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Object A does not Both objects activate activate the detector the detector by itself Children Object A doesare notasked if each a blicket activate theisdetector Then they asked to by are itself makethe machine go One-Cause Condition Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Childr each is Thenare they makethe I.One-Cause Familiarization phase: Establish nature of mechanism Backward Blocking Condition ConditionBackward Blocking Condition same block Both objects activate the detector Object A does not activate the detector by itself Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 sed et al. (2002), Experiment 2 el et in al.Sobel (2002), Experiment 2 Both objects activate Object in Aactivates theet Children Both objects activate Object A does not Children are asked Bothused objects activate Object Experiment Aactivatesare theasked if2 Figure 13: Procedure Sobel al.if (2002), One-Cause Condition each is a blicket eachbyisCondition a blicket the activate detectorthe detector itself the detector the detector detector by itself Backward Blocking Thenare asked to Thenare asked to they they by itself One-Cause Condition make makethe machine go the machine go II. Test phase: one cause Children are a each is a blicke Thenare asked t they makethe machin Childr each is Thenare they makethe Backward Blocking Condition Both objects activate the detector A does not t A does notObject activate the detector e the detector by itself by itself Object A does not Children are asked if each is a blicket activate the detector Both objects activate Object Aactivates the are asked if Children are askedChildren if Then they are asked to by itself Both objects activate Object A does not the detector detector by itself each is a blicket each is a blicket make t he machine go the detector activate the detector Then Thenare asked to they are asked to they by itself makethe machine go makethe machine go A B B Trial AB Trial Children are a each Children are asked if is a blicke Thenare asked t each is a blicket they Thenare asked to makethe machin they makethe machine go At end of the test phase, adults judge the probability Backward Blocking Condition ndition that each object is a blicket Backward Blocking Condition Both objects activate the detector Object Aactivates the detector by itself Children are asked if each is a blicket Thenare asked to they makethe machine go Manipulating mechanisms (n = 12 undergraduates per condition) Deterministic Probabilistic One cause Bayes People Manipulating mechanisms (n = 12 undergraduates per condition) Deterministic Bayes People Probabilistic One cause One control Three control Acquiring mechanism knowledge Both objects activate the detector Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Object A does not Both objects activate activate the detector the detector by itself Children Object A doesare notasked if each a blicket activate theisdetector Then they asked to by are itself makethe machine go One-Cause Condition Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 Childr each is Thenare they makethe I.One-Cause Familiarization phase: Establish nature of mechanism Backward Blocking Condition ConditionBackward Blocking Condition same block Both objects activate the detector Object A does not activate the detector by itself Figure 13: Procedure used in Sobel et al. (2002), Experiment 2 sed et al. (2002), Experiment 2 el et in al.Sobel (2002), Experiment 2 Both objects activate Object in Aactivates theet Children Both objects activate Object A does not Children are asked Bothused objects activate Object Experiment Aactivatesare theasked if2 Figure 13: Procedure Sobel al.if (2002), One-Cause Condition each is a blicket eachbyisCondition a blicket the activate detectorthe detector itself the detector the detector detector by itself Backward Blocking Thenare asked to Thenare asked to they they by itself One-Cause Condition make makethe machine go the machine go II. Test phase: one cause Children are a each is a blicke Thenare asked t they makethe machin Childr each is Thenare they makethe Backward Blocking Condition Both objects activate the detector A does not t A does notObject activate the detector e the detector by itself by itself Object A does not Children are asked if each is a blicket activate the detector Both objects activate Object Aactivates the are asked if Children are askedChildren if Then they are asked to by itself Both objects activate Object A does not the detector detector by itself each is a blicket each is a blicket make t he machine go the detector activate the detector Then Thenare asked to they are asked to they by itself makethe machine go makethe machine go A B B Trial AB Trial Children are a each Children are asked if is a blicke Thenare asked t each is a blicket they Thenare asked to makethe machin they makethe machine go At end of the test phase, adults judge the probability Backward Blocking Condition ndition that each object is a blicket Backward Blocking Condition Both objects activate the detector Object Aactivates the detector by itself Children are asked if each is a blicket Thenare asked to they makethe machine go Results with children • Tested 24 four-year-olds (mean age 54 months) • Instead of rating, yes or no response • Significant difference in one cause B responses – deterministic: 8% say yes – probabilistic: 79% say yes • No significant difference in one control trials – deterministic: 4% say yes – probabilistic: 21% say yes (Griffiths & Sobel, submitted) Comparison to previous results • Proposed boundaries are more accurate than Brent’s, but fewer proposals are made. Boundary Boundary Precision Recall Brent .80 .85 GGJ .92 .62 Precision: #correct / #found = [= hits / (hits + false alarms)] Recall: = [= hits / (hits + misses)] F-score: Tokentokens F-score are less accurate. • Result: word Brent .68 GGJ .54 #found / #true an average of precision and recall. Quantitative evaluation • Compared to unigram model, more boundaries are proposed, with no loss in accuracy: Boundary Precision Boundary Recall GGJ (unigram) .92 .62 GGJ (bigram) .92 .84 Token F-score Type F-score • Accuracy is higher than previous models: Brent (unigram) GGJ (bigram) .68 .77 .52 .63 Two examples Causal induction from small samples (Josh Tenenbaum, David Sobel, Alison Gopnik) Statistical learning and word segmentation (Sharon Goldwater, Mark Johnson)