Causal learning and modeling David Danks CMU Philosophy & Psychology

Causal learning and modeling David Danks CMU Philosophy & Psychology 2014 NASSLLI High-level overview  Monday:  History of causal inference  Basic representation of causal structures  Tuesday:  Inference & reasoning using graphical models  Interventions in causal structures High-level overview  Wednesday:  Basic  principles of search & causal discovery Thursday:  Challenges  Both to causal discovery, and responses principled and real-world High-level overview  Friday: One of two possibilities:  Singular / actual causation & counterfactuals (in the causal graphical model framework)  Recent advances in causal learning & inference  Decided by a vote at end-of-class tomorrow (Tues) Structure & assumptions  Mix of lecture & (group) problem-solving, so if you have questions/uncertainty, Ask!  If  you’re confused, then someone else probably is too… Assuming basic knowledge of probabilities  Focus is on conceptual/foundational issues, not the technical details  But ask if you want to know more about those details! A BRIEF HISTORY OF CAUSAL DISCOVERY “Big Picture” (very roughly)  Greeks - 1750: Unhelpful platitudes  1750 - 1950: Practical successes  1950 - present: Computers + Formal models = principled methods Aristotle    384-322 BC Trying to answer: “Why does X have A?” Four types of ‘cause’  Formal: Because of its structure  Material: Because of its composition  Efficient: Because of its development  Final: Because of its purpose  But no systematic theory of inference Francis Bacon   1561-1626 Novum Organum (1620)  For any phenomenon, construct:  The table of presence (tabula praesentiae)  The table of absence (tabula absentiae)  The table of degrees (tabula graduum)  The cause of the phenomenon is the set of properties that explains every case on each of the three tables John Stuart Mill   1806-1873 System of Logic (1843)  Algorithmic form of Bacon’s method (though unattributed)  Method of agreement  Method of difference  Method of concomitant variation David Hume   1711-1776 Causal inference cannot be done using deduction  It is always logically possible that future “causes” will not be followed by the effect  Actually a general argument about induction  But we do it by “custom or habit”  Had an evolutionary justification, but no framework in which to express it Responses to Hume’s skepticism  Hume’s arguments were quite influential in philosophical circles  And  still matter in present-day philosophy But in the sciences, people were starting to find methods that (sometimes) gave answers that at least seemed right… Regression (Least Squares)  18th c. astronomy: find the “best” values for 6 unknowns given 75 observations  Euler (1748)  Failed due to computational intractability  Legendre (1805)  Developed  Gauss the method of least squares (1795 / 1809)  Independent  (earlier, unpublished) discovery & justification Still the most common causal inference method… Growth of statistics  Early theory of statistics emerges from probability theory throughout the 1800s 1822 1911 Galton 1749 Laplace 1796 1827 Pearson 1857 Quetelet 1874 1863 Spearman Yule 1871 1800 1936 1900 1945 1951 Ronald A. Fisher   1890-1962 Essentially the father of modern statistics, and developed:  An array of statistical tests  An analysis of various experimental designs  The standard statistical and methodological reference texts for a generation of scientists Sewall Wright   1889-1988 Path analysis  Graphs encode high-level structure, and then regression can be used to estimate parameters  By mid-20th c., it had been adopted by a number of economists and sociologists  But no search procedures were provided  Have to know the high-level structure Causal graphical models  Developed by statisticians, computer scientists, and philosophers  Dawid, Spiegelhalter, Wermuth, Cox, Lauritzen, Pearl, Spirtes, Glymour, Scheines  Represent both qualitative and quantitative aspects of causation REPRESENTING CAUSAL STRUCTURES Qualitative representation  We want a representation that captures many qualitative features of causality Qualitative representation  We want a representation that captures many qualitative features of causality  Causation occurs among variables ⇒ One node per variable Qualitative representation  We want a representation that captures many qualitative features of causality  Causation occurs among variables ⇒ One node per variable Food Eaten Exercise Weight Metabolism Qualitative representation  We want a representation that captures many qualitative features of causality  Asymmetry of causation ⇒ Need an asymmetric connection in the graph Food Eaten Exercise Weight Metabolism Qualitative representation  We want a representation that captures many qualitative features of causality  Asymmetry of causation ⇒ Need an asymmetric connection in the graph Food Eaten Exercise Weight Metabolism Qualitative representation  We want a representation that captures many qualitative features of causality  No (immediate) reciprocal causation ⇒ No cycles (without explicit temporal indexing) Food Eaten Exercise Weight Metabolism Qualitative representation  We want a representation that captures many qualitative features of causality  No (immediate) reciprocal causation ⇒ No cycles (without explicit temporal indexing) Food Eaten Exercise Food Eaten Weight Metabolism Time t Exercise Weight Metabolism Time t+1 Directed Acyclic Graphs  More precise: DAG G = <V, E> V = set of nodes (for variables)  E = set of edges (i.e., ordered pairs of nodes)  Path π = sequence of adjacent edges  Directed   path = path with all edges same direction Acyclicity: No directed path from node A to itself In general: We use genealogical & topological language to describe graphical relationships Quantitative representation  DAGs alone can represent “A causes B”… but not “strength” or “form” of causation  Need to represent the relationships between the various variables states  Exact quantitative representation will depend on the type of variables being represented Bayesian networks   All variables are discrete/categorical Represent quantitative causation using a joint probability distribution  I.e., a specification of the probability of any combination of variable values, such as:  P(E=Hi & FE=Lo & M=Hi & W=Hi) = 0.001; P(E=Hi & FE=Lo & M=Hi & W=Lo) = 0.03; etc.  Note: Nothing inherently Bayesian about Bayes nets! Structural Equation Models (SEMs)   All variables are continuous/real-valued Represent quantitative causation using systems of linear equations  For example: Exercise = a1FE + a2M + a3W + εE_noise FE = b1E + b2M + b3W + εFE_noise etc. Connecting the pieces  DAG-based graphical model: Qualitative Quantitative ??? P(X) = P(X1) P(X2 | X1) P(X3 | X1) P(X4 | X1,X2) Connecting the pieces  Causal Markov assumption:  Variables are independent of their non-effects conditional on their direct causes  Use the qualitative graph to constrain the quantitative relationships  Encodes  Given the intuition of “screening off” the values of the direct causes, learning the value of a non-effect doesn’t help me predict Connecting the pieces  Markov assumption for Bayes nets ⇒  Markov factorization of P(X1, X2, …): Connecting the pieces  Markov assumption for Bayes nets:  Markov factorization of P(X1, X2, …):  Example: Food Eaten Exercise ⇒ Weight Metabolism P(E, FE, W, M) = P(E) * P(FE | E) * P(M | E) * P(W | M, FE) Connecting the pieces  Markov assumption for SEMs:  Markov factorization of joint probability density: Connecting the pieces  Markov assumption for SEMs:  Markov factorization of joint probability density:  Example: Food Eaten Exercise ⇒ Weight Metabolism E = εE_noise FE = a1E + εFE_noise M = b1E + εM_noise W = c1FE + c2M + εC_noise Connecting the pieces  Causal Faithfulness assumption  The only independencies are those predicted by the Markov assumption  Uses the quantitative relations to constrain the qualitative graph  Implication: No exactly counter-balancing causal paths  Exercise → Food Eaten → Weight Exercise → Metabolism → Weight do not exactly offset one another  Implication:  and No perfectly deterministic relationships In particular, no variable is a mathematical function of others Causal vs. statistical models  Bayes nets and SEMs are not inherently causal models  Markov and Faithfulness assumptions can be expressed purely as graph-quant. constraints  Assuming a non-causal version of the assumptions ⇒ purely statistical model  I.e., a compact representation of statistical independencies among some set of variables Causation and intervention  Causal claims support counterfactuals  In particular, those about interventions  “If I had flipped the switch, the light would have turned on”  “If she hadn’t dropped the plate, then it would not have broken”  Etc. Causation and intervention  One of the central causal asymmetries  Interventions on a cause lead to changes in the effect   In contrast, interventions on an effect do not lead to changes in the cause   Flipping the switch turns off the light Breaking the light bulb doesn’t flip the switch Some have argued that this is the paradigmatic feature of causation (Woodward, Hausman) Looking ahead…   Have: Basic formal representation for causation Need:  Fundamental causal asymmetry (of intervention)  Inference & reasoning methods  Search & causal discovery methods Looking ahead…   Have: Basic formal representation for causation Need:  Fundamental causal asymmetry (of intervention)  Inference & reasoning methods  Search & causal discovery methods

Causal learning and modeling David Danks CMU Philosophy & Psychology

Related documents

Products

Support

Causal learning and modeling David Danks CMU Philosophy &amp; Psychology

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib

Causal learning and modeling David Danks CMU Philosophy & Psychology