Theory-based causal induction Tom Griffiths Brown University Josh Tenenbaum MIT Three kinds of causal induction Three kinds of causal induction contingency data C present C absent (c+) (c-) E present (e+) a c E absent (e-) b d “To what extent does C cause E?” (rate on a scale from 0 to 100) Three kinds of causal induction contingency data physical systems The stick-ball machine A B (Kushnir, Schulz, Gopnik, & Danks, 2003) Three kinds of causal induction contingency data physical systems perceived causality Michotte (1963) Michotte (1963) Three kinds of causal induction contingency data physical systems perceived causality bottom-up covariation information top-down mechanism knowledge object physics module Three kinds of causal induction less data more constrained more data less constrained contingency data physical systems prior knowledge + statistical inference perceived causality prior knowledge + statistical inference Theory-based causal induction Theory generates Hypothesis space Y X Z X Y X Y Z Z X Y Z generates Data Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Bayesian inference An analogy to language Theory Grammar generates generates Hypothesis space Y X Z X Y X Y Z Z Parse trees X Y Z generates Data Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 generates Sentence The quick brown fox … Outline contingency data physical systems perceived causality Outline contingency data physical systems perceived causality C present C absent (c+) (c-) E present (e+) a c E absent (e-) b d “To what extent does C cause E?” (rate on a scale from 0 to 100) Buehner & Cheng (1997) Chemical Gene C present C absent (c+) (c-) E present (e+) 6 4 E absent (e-) 2 4 “To what extent does the chemical cause gene expression?” (rate on a scale from 0 to 100) Buehner & Cheng (1997) Humans • Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25 Buehner & Cheng (1997) Humans • Showed participants all combinations of P(e+|c+) and P(e+|c-) in increments of 0.25 • Curious phenomenon: “frequency illusion”: – why do people’s judgments change when the cause does not change the probability of the effect? Causal graphical models • Framework for representing, reasoning, and learning about causality (also called Bayes nets) (Pearl, 2000; Spirtes, Glymour, & Schienes, 1993) • Becoming widespread in psychology (Glymour, 2001; Gopnik et al., 2004; Lagnado & Sloman, 2002; Tenenbaum & Griffiths, 2001; Steyvers et al., 2003; Waldmann & Martignon, 1998) Causal graphical models • Variables X Y Z Causal graphical models • Variables X Y • Structure Z Causal graphical models • Variables P(Y) P(X) X Y • Structure Z P(Z|X,Y) • Conditional probabilities Defines probability distribution over variables (for both observation, and intervention) Causal graphical models • Provide a basic framework for representing causal systems • But… where is the prior knowledge? chemicals genes Clofibrate Wyeth 14,643 p450 2B1 Gemfibrozil Phenobarbital Carnitine Palmitoyl Transferase 1 Hamadeh et al. (2002) Toxicological sciences. chemicals genes X Clofibrate Wyeth 14,643 p450 2B1 Gemfibrozil Phenobarbital Carnitine Palmitoyl Transferase 1 Hamadeh et al. (2002) Toxicological sciences. Chemical X chemicals genes peroxisome proliferators Clofibrate Wyeth 14,643 + p450 2B1 + Gemfibrozil Phenobarbital + Carnitine Palmitoyl Transferase 1 Hamadeh et al. (2002) Toxicological sciences. Beyond causal graphical models • Prior knowledge produces expectations about: – types of entities – plausible relations – functional form • This cannot be captured by graphical models A theory consists of three interrelated components: a set of phenomena that are in its domain, the causal laws and other explanatory mechanisms in terms of which the phenomena are accounted for, and the concepts in terms of which the phenomena and explanatory apparatus are expressed. (Carey, 1985) Theory-based causal induction A causal theory is a hypothesis space generator Component of theory: • Ontology • Plausible relations • Functional form Generates: • Variables • Structure • Conditional probabilities Hypotheses are evaluated by Bayesian inference P(h|data) P(data|h) P(h) Theory • Ontology – Types: Chemical, Gene, Mouse – Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) C B E E = 1 if effect occurs (mouse expresses gene), else 0 C = 1 if cause occurs (mouse is injected), else 0 Theory • Plausible relations – For any Chemical c and Gene g, with prior probability p: For all Mice m, Injected(c,m) Expressed(g,m) C B C B E E P(Graph 1) = p P(Graph 0) =1 – p No hypotheses with E C, B C, C B, …. Theory • Ontology – Types: Chemical, Gene, Mouse – Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) • Plausible relations – For any Chemical c and Gene g, with prior probability p : For all Mice m, Injected(c,m) Expressed(g,m) • Functional form of causal relations Functional form • Structures: 1 = C B 0= E E • Parameterization: C B 0 1 0 1 0 0 1 1 Generic 1: P(E = 1 | C, B) p00 p10 p01 p11 C B 0: P(E = 1| C, B) p0 p0 p1 p1 Functional form • Structures: 1 = B C w0 w1 0= C B w0 E E w0, w1: strength parameters for B, C • Parameterization: C B 0 1 0 1 0 0 1 1 “Noisy-OR” 1: P(E = 1 | C, B) 0 w1 w0 w1+ w0 – w1 w0 0: P(E = 1| C, B) 0 0 w0 w0 Theory • Ontology – Types: Chemical, Gene, Mouse – Predicates: Injected(Chemical,Mouse) Expressed(Gene,Mouse) • Constraints on causal relations – For any Chemical c and Gene g, with prior probability p: For all Mice m, Injected(c,m) Expressed(g,m) • Functional form of causal relations – Causes of Expressed(g,m) are independent probabilistic mechanisms, with causal strengths wi. An independent background cause is always present with strength w0. Evaluating a causal relationship C B C B E E P(Graph 1) = p P(Graph 0) =1 – p P(D|Graph 1) P(Graph 1) P(Graph 1|D) = i P(D|Graph i) P(Graph i) Humans Bayesian DP Causal power (Cheng, 1997) c2 Generativity is essential P(e+|c+) P(e+|c-) 100 50 0 8/8 8/8 6/8 6/8 4/8 4/8 2/8 2/8 0/8 0/8 Bayesian • Predictions result from “ceiling effect” – ceiling effects only matter if you believe a cause increases the probability of an effect – follows from use of Noisy-OR (after Cheng, 1997) Generativity is essential Generic Noisy-OR • causes increase • probability differs across probability of conditions their effects Noisy-AND-NOT • causes decrease probability of their effects Generativity is essential Humans Noisy-OR Generic Noisy AND-NOT Manipulating functional form Generic Noisy-OR • causes increase • probability differs across probability of conditions their effects • appropriate for generative causes • appropriate for assessing differences Noisy-AND-NOT • causes decrease probability of their effects • appropriate for preventive causes Manipulating functional form Generative Noisy-OR Difference Preventive Generic Noisy AND-NOT Causal induction from contingency data • The simplest case of causal learning: a single cause-effect relationship and plentiful data • Nonetheless, exhibits complex effects of prior knowledge (in the assumed functional form) • These effects reflect appropriate causal theories Outline contingency data physical systems perceived causality The stick-ball machine A B (Kushnir, Schulz, Gopnik, & Danks, 2003) Inferring hidden causal structure • Can people accurately infer hidden causal structure from small amounts of data? • Kushnir et al. (2003): four kinds of structure separate causes common cause A causes B B causes A Inferring hidden causal structure separate causes common cause A causes B B causes A Common unobserved cause 4x 2x 2x (Kushnir, Schulz, Gopnik, & Danks, 2003) Inferring hidden causal structure separate causes common cause A causes B B causes A Common unobserved cause 4x 2x 2x Independent unobserved causes 1x 2x 2x 2x 2x (Kushnir, Schulz, Gopnik, & Danks, 2003) Inferring hidden causal structure separate causes common cause A causes B B causes A Common unobserved cause 4x 2x 2x Independent unobserved causes 1x 2x 2x 2x 2x One observed cause 2x 4x (Kushnir, Schulz, Gopnik, & Danks, 2003) Probability Common unobserved cause Probability separate common A causes B Probability common A causes B B causes A One observed cause separate common A causes B common cause A causes B B causes A B causes A Independent unobserved causes separate separate causes B causes A Theory • Ontology – Types: Ball, HiddenCause, Trial – Predicates: Moves(Ball, Trial), Active(HiddenCause, Trial) • Plausible relations – For any Ball a and Ball b (a b), with prior probability p: For all Trials t, Moves(a,t) Moves(b,t) – For some HiddenCause h and Ball b, with prior probability q: For all Trials t, Active(h,t) Moves(b,t) • Functional form of causal relations – Causes result in Moves(b,t) with probability w. Otherwise, Moves(b,t) occurs with probability 0. – Active(h,t) occurs with probability . Hypotheses (w)2 w(1-w) (1-w)w w (1-w) (1-w) w2 w(1-w) (1-w)w w (1-w) (1-w) w2 w(1-w) 0 w (1-w) (1-) w2 0 w(1-w) (1-) (1-w) Probability Common unobserved cause Probability separate common A causes B Probability common A causes B B causes A One observed cause separate common A causes B common cause A causes B B causes A B causes A Independent unobserved causes separate separate causes B causes A bjects activate e detector Object A does not activate the detector by itself Children are asked if each is a blicket Thenare asked to they makethe machine go Other physical systems ard Blocking Condition bjects activate e detector From blicket detectors… Object Aactivates the detector by itself Oooh, it’s a blicket! Children are asked if each is a blicket Thenare asked to they makethe machine go …to lemur colonies Outline contingency data physical systems perceived causality Michotte (1963) Affected by… – timing of events – velocity of balls – proximity Nitro X Affected by… – timing of events – velocity of balls – proximity (joint work with Liz Baraff) Test trials • Show explosions involving multiple cans – allows inferences about causal structure • For each trial, choose one of: – chain reaction – spontaneous explosions – other Theory • Ontology – Types: Can, HiddenCause – Predicates: ExplosionTime(Can), ActivationTime(HiddenCause) • Constraints on causal relations – For any Can y and Can x, with prior probability 1: ExplosionTime(y) ExplosionTime(x) – For some HiddenCause c and Can x, with prior probability 1: ActivationTime(c) ExplosionTime(x) • Functional form of causal relations – Explosion at ActivationTime(c), and after appropriate delay from ExplosionTime(y) with probability set by w. Otherwise explosions occur with probability 0. – Low probability of hidden causes activating. Using the theory Using the theory • What kind of explosive is this? spontaneity volatility rate QuickTime™ and a TIFF (LZW) decompressor are needed to see this picture. Using the theory • What kind of explosive is this? • What caused what? Using the theory • What kind of explosive is this? • What caused what? • What is the causal structure? Testing a prediction of the theory • Evidence for a hidden cause should increase with the number of simultaneous explosions • Four groups of 16 participants saw displays using m = 2, 3, 4, or 6 cans • For each trial, choose one of: – chain reaction – spontaneous explosions – other coded for reference to hidden cause Probability of hidden cause c2(3) = 11.36, p < .01 Number of canisters Gradual transition from few to most identifying hidden cause Further predictions • Explains chain reaction inferences • Attribution of causality should be sensitive to interaction between time and distance • Simultaneous explosions that occur sooner provide stronger evidence for common cause Three kinds of causal induction less data more constrained more data less constrained contingency data physical systems prior knowledge + statistical inference perceived causality Combining knowledge and statistics • How do people... – identify causal relationships from small samples? – learn hidden causal structure with ease? – reason about complex dynamic causal systems? • Constraints from knowledge + powerful statistics • Key ideas: – prior knowledge expressed in causal theory – theory generates hypothesis space for inference Further questions • Are there unifying principles across theories? Functional form • Stick-balls: – Causes result in Moves(b,t) with probability w. Otherwise, Moves(b,t) occurs with probability 0. • Nitro X: – Explosion at ActivationTime(c), and after appropriate delay from ExplosionTime(y), with probability set by wOtherwise explosions occur with probability 0. 1. Each force acting on a system has an opportunity to change its state 2. Without external influence a system will not change its state Further questions • Are there unifying principles across theories? • How are theories learned? Learning causal theories Theory Ontology Plausible relations Functional form generates Hypothesis space Y X Z X Y X Y Z Z X Y Z generates Data Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Bayesian inference Learning causal theories Theory Ontology Plausible relations Functional form generates Hypothesis space Y X Z X Y X Y Z Z X Y Z generates Data Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Learning causal theories Theory Bayesian inference Ontology Plausible relations Functional form generates Hypothesis space Y X Z X Y X Y Z Z X Y Z generates Data Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Case 1 2 3 4 X 1 0 1 0 ... Y Z 0 1 1 1 1 1 0 0 Further questions • Are there unifying principles across theories? • How are theories learned? • What is an appropriate prior over theories? Causal induction with rates • Different functional form results in models that apply to different kinds of data • Rate: number of times effect occurs in time interval, in presence and absence of cause Does the electric field cause the mineral to emit particles? Theory • Ontology – Types: Mineral, Field, Time – Predicates: Emitted(Mineral,Time), Active(Field,Time) • Plausible relations – For any Mineral m and Field f, with prior probability p: Active(f,t) Emitted(m,t) For all Times t, • Functional form of causal relations – Causes of Emitted(m,t) are independent probabilistic mechanisms, with causal strengths wi. An independent background cause is always present with strength w0. – Implies number of emissions is a Poisson process, with rate at time t given by w0 + Active(f,t) w1. Causal induction with rates Rate(e | c ) Rate(e | c ) Humans DR Power (N = 150) Bayesian Learning causal theories • T1: bacteria die at random • T2: bacteria die at random, or in waves P(wave|T2) > P(wave|T1) • Having inferred the existence of a new force, need to find a mechanism... Lemur colonies A researcher in Madagascar is studying the effects of environmental resources on the location of lemur colonies. She has studied twelve different parts of Madagascar, and is trying to establish which areas show evidence of being affected by the distribution of resources in order to decide where she should focus her research. Change in... Number Ratio Location Spread (uniform) Human data Theory • Ontology – Types: Colony, Resource – Predicates: Location(Colony), Location(Resource) • Plausible relations – For any Colony c and Resource r, with probability p: Location(r) Location(c) • Functional form of causal relations – Without a hidden cause, Location(c) is uniform – With a hidden cause r, Location(c) is Gaussian with mean Location(r) and covariance matrix – Location(r) is uniform Is there a resource? C No: x x x x x Yes: x x x x x sum over all structures uniform uniform + regularity sum over all regularities Change in... Number Ratio Location Spread (uniform) Human data Bayesian Schulz & Gopnik (in press) A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 Schulz & Gopnik (in press) A B C E 1 0 0 0 0 1 0 0 Biology Ahchoo! 0 0 1 1 Ahchoo! 1 1 1 1 Schulz & Gopnik (in press) A B C E 1 0 0 0 0 1 0 0 Biology Psychology Ahchoo! Eek! 0 0 1 1 Ahchoo! 1 1 1 1 Eek! Common functional form • A theory of sneezing – a flower is a cause with probability – no sneezing without a cause – causes each produce sneezing with probability • A theory of fear – an animal is a cause with probability – no fear without a cause – a cause produces fear with probability Common functional form A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 • Children: choose just C, never just A or just B Common functional form A B C E (1-)3 A B C 1 0 0 0 E 0 1 0 0 0 0 1 1 1 1 1 1 (1-)2 2(1-) 3 • Children: choose just C, never just A or just B Common functional form (1-)2 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 2(1-) 3 • Children: choose just C, never just A or just B • Bayes: just C is preferred, never just A or just B Inter-domain causation • Physical: noise-making machine – A & B are magnetic buttons, C is talking • Psychological: confederate giggling – A & B are silly faces, C is a switch • Procedure: – baseline: which could be causes? – trials: same contingencies as Experiment 3 – test: which are causes? (Schulz & Gopnik, in press, Experiment 4) Inter-domain causation • A theory with inter-domain causes – – – – intra-domain entities are causes with probability 1 inter-domain entities are causes with probability 0 no effect occurs without a cause causes produce effects with probability • Lower prior probability for inter-domain causes (i.e. 0 much lower than 1) A problem with priors? • If lack of mechanism results in lower prior probability, shouldn’t inferences change? • Intra-domain causes (Experiment 3): – biological: – psychological: 78% took C 67% took C • Inter-domain causes (Experiment 4): – physics: – psychological: 75% took C 81% took C A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 (1- 0)(1-1)2 A B C (1- 0)(1-1)1 0(1-1)2 E (1-0)12 01(1-1) 0 1 2 0(1-1)2 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 01(1-1) 0 1 2 0(1-1)2 A B C E 1 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 01(1-1) 0 1 2 A direct test of inter-domain priors • Ambiguous causes: – A and C together produce E – B and C together produce E – A and B and C together produce E • For C intra-domain, choose C (Sobel et al., in press) • For C inter-domain, should choose A and B The plausibility matrix Identifies plausible causal graphs Grounded predicates Grounded predicates Plausibility of relation M= Injected(c1) Injected(c2) Injected(c3) Expressed(g1) Expressed(g2) Expressed(g3) Injected(c1) Injected(c2) Injected(c3) Expressed(g1) Expressed(g2) Expressed(g3) Entities: c1, c2, c3, g1, g2, g3 Predicates: Injected, Expressed 1 1 1 1 1 1 1 1 1 The Chomsky hierarchy • • • • Languages Machines Type 0 (computable) Type 1 (context sensitive) Type 2 (context free) Type 3 (regular) Turing machine Bounded TM Push-down automaton Finite state automaton Languages in each class a strict subset of higher classes (Chomsky, 1956) Grammaticality and plausibility sentences • Grammar: – indicates admissibility of (infinitely many) sentences generated from terminals • Theory: predicates – indicates plausibility of (infinitely many) relations generated from grounded predicates predicates