REASONING WITH CAUSES AND COUNTERFACTUALS Judea Pearl UCLA (www.cs.ucla.edu/~judea/) 1 OUTLINE 1. Unified conceptualization of counterfactuals, structural-equations, and graphs 2. Confounding demystified 3. Direct and indirect effects (Mediation) 4. Transportability and external validity mathematized 2 TRADITIONAL STATISTICAL INFERENCE PARADIGM Data P Joint Distribution Q(P) (Aspects of P) Inference e.g., Infer whether customers who bought product A would also buy product B. Q = P(B | A) 3 THE STRUCTURAL MODEL PARADIGM Data Joint Distribution Data Generating Model Q(M) (Aspects of M) M Inference M – Invariant strategy (mechanism, recipe, law, protocol) by which Nature assigns values to variables in the analysis. • “Think Nature, not experiment!” 4 FAMILIAR CAUSAL MODEL ORACLE FOR MANIPILATION X Y Z INPUT OUTPUT 5 WHY PHYSICS DESERVES A NEW ALGEBRA Scientific Equations (e.g., Hooke’s Law) are non-algebraic e.g., Length (Y) equals a constant (2) times the weight (X) Correct notation: YY :==2X 2X X=1 X=1 Y=2 Process information The solution Had X been 3, Y would be 6. If we raise X to 3, Y would be 6. Must “wipe out” X = 1. 6 STRUCTURAL CAUSAL MODELS Definition: A structural causal model is a 4-tuple V,U, F, P(u), where • V = {V1,...,Vn} are endogeneas variables • U = {U1,...,Um} are background variables • F = {f1,..., fn} are functions determining V, vi = fi(v, u) e.g., y x u Y • P(u) is a distribution over U P(u) and F induce a distribution P(v) over observable variables 7 CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “Y would be y (in situation u), had X been x,” denoted Yx(u) = y, means: The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y. The Fundamental Equation of Counterfactuals: Yx (u ) YM x (u ) 8 READING COUNTERFACTUALS FROM SEM 2 2 Z g 1 X 3 Y X Treatment Z Study Time Y Score 0.5 Z 1 X g 0.4 3 0.7 Y x 1 z x 2 y x gz 3 Data shows: = 0.7, = 0.5, g = 0.4 A student named Joe, measured X=0.5, Z=1, Y=1.9 Q1: What would Joe’s score be had he doubled his study time? 9 READING COUNTERFACTUALS 2 0.75 2 0.5 1 X Z g 0.4 3 0.7 Y 0.5 1 0.5 2 0.75 Z 1.0 g 0.4 3 0.75 1 0.5 X 0.5 0.7 Y 1.5 Z 2.0 g 0.4 3 0.75 X 0.5 0.7 Y 1.9 Q1: What would Joe’s score be had he doubled his study time? Answer: Joe’s score would be 1.9 Or, In counterfactual notation: Yz 2 (u ) 1.9 10 READING COUNTERFACTUALS 2 0.75 2 0.5 1 X Z g 0.4 3 0.7 Y 0.5 1 0.5 2 0.75 Z 1.0 g 0.4 3 0.75 1 0.5 X 0.5 0.7 Y 1.5 Z 2.0 g 0.4 3 0.75 X 0.5 0.7 Y 1.9 Q2: What would Joe’s score be, had the treatment been 0 and had he studied at whatever level he would have studied had the treatment been 1? 2 0.75 2 0.75 Z 1.25 ZZ11.25 .0 0.5 g 0.4 g 0.4 0.5 1 0.5 3 0.75 3 0.75 1 0.5 X 10.5 0.7 Y 12..525 X 10 0.7 Y 12.25 11 POTENTIAL AND OBSERVED OUTCOMES PREDICTED BY A STRUCTURAL MODEL 12 CAUSAL MODELS AND COUNTERFACTUALS Definition: The sentence: “Y would be y (in situation u), had X been x,” denoted Yx(u) = y, means: The solution for Y in a mutilated model Mx, (i.e., the equations for X replaced by X = x) with input U=u, is equal to y. • Joint probabilities of counterfactuals: P (Yx y, Z w z ) In particular: u:Yx (u ) y , Z w (u ) z P ( y | do(x) ) P (Yx y ) P (Yx ' y '| x, y ) u:Yx (u ) y P (u ) P (u ) P (u | x, y ) u:Yx ' (u ) y ' 13 THE FIVE NECESSARY STEPS OF CAUSAL ANALYSIS Define: Express the target quantity Q as a function Q(M) that can be computed from any model M. PE(Y e.g., e.g.,ETT ATE (Yx |doy(|xX 1))xE' )(Y | do( x0 )) Assume: Formulate causal assumptions A using some formal language. Identify: Determine if Q is identifiable given A. Estimate: Estimate Q if it is identifiable; approximate it, if it is not. Test: Test the testable implications of A (if any). 14 THE FIVE NECESSARY STEPS OF CAUSAL ANALYSIS Define: Assume: Express the target quantity Q as a function Q(M) that can be computed from any model M. e.g., DEETT E ), IE E (Yxy1),Z x Yx0 ) e.g., PC e.g., | 0X x' )x1|, Y x0| X (YPPx((1YY,xxZ x0yyY 0 1 0 Formulate causal assumptions A using some formal language. Identify: Determine if Q is identifiable given A. Estimate: Estimate Q if it is identifiable; approximate it, if it is not. Test: Test the testable implications of A (if any). 15 THE LOGIC OF CAUSAL ANALYSIS CAUSAL MODEL (MA) A - CAUSAL ASSUMPTIONS A* - Logical implications of A Causal inference Q Queries of interest Q(P) - Identified estimands T(MA) - Testable implications Statistical inference Data (D) Q - Estimates of Q(P) Q(Q | D, A) Provisional claims g (T ) Model testing Goodness of fit 16 IDENTIFICATION IN SCM Find the effect of X on Y, P(y|do(x)), given the causal assumptions shown in G, where Z1,..., Zk are auxiliary variables. G Z1 Z2 Z3 X Z4 Z5 Z6 Y Can P(y|do(x)) be estimated if only a subset, Z, can be measured? 17 ELIMINATING CONFOUNDING BIAS THE BACK-DOOR CRITERION P(y | do(x)) is estimable if there is a set Z of variables such that Z d-separates X from Y in Gx. Gx G Z1 Z1 Z2 Z3 Z2 Z3 Z4 X Z Z6 Z5 Y Z4 X Z6 Z5 Y P ( x, y , z ) Moreover, P( y | do( x)) P( y | x, z ) P( z ) z z P( x | z ) • (“adjusting” for Z) Ignorability 18 EFFECT OF WARM-UP ON INJURY (After Shrier & Platt, 2008) Watch out! ??? Front Door No, no! Warm-up Exercises (X) Injury (Y) 19 PROPENSITY SCORE ESTIMATOR (Rosenbaum & Rubin, 1983) Z1 Z2 P(y | do(x)) = ? Z4 Z3 Z5 L X Z6 Y Can L replace {Z1, Z2, Z3, Z4, Z5} ? L( z1, z2 , z3 , z4 , z5 ) P( X 1 | z1, z2 , z3 , z4 , z5 ) Theorem: P( y | z , x) P( z ) P( y | L l , x) P( L l ) z l Adjustment for L replaces Adjustment for Z 20 WHAT PROPENSITY SCORE (PS) PRACTITIONERS NEED TO KNOW L( z ) P ( X 1 | Z z ) P( y | z, x) P( z ) P( y | l , x) P(l ) z l 1. The asymptotic bias of PS is EQUAL to that of ordinary adjustment (for same Z). 2. Including an additional covariate in the analysis CAN SPOIL the bias-reduction potential of others. Z Z X Z Y X Y X Y X Z 3. In particular, instrumental variables tend to amplify bias. 4. Choosing sufficient set for PS, requires knowledge of the model. Y 21 SURPRISING RESULT: Instrumental variables are Bias-Amplifiers in linear models (Bhattarcharya & Vogt 2007; Wooldridge 2009) Z U c3 “Naive” bias c1 X (Unobserved) c2 c0 Y B0 E (Y | x) E (Y | do( x)) c0 c1c2 c0 c1c2 x x Adjusted bias c1c2 cc c0 1 2 c1c2 Bz E (Y | x, z ) E (Y | do( x)) c0 2 2 x x 1 c 1 c 3 3 22 INTUTION: When Z is allowed to vary, it absorbs (or explains) some of the changes in X. Z U c3 c1 X c2 Y c0 When Z is fixed the burden falls on U alone, and transmitted to Y (resulting in a higher bias) Z U c3 c1 X c2 c0 Y 23 WHAT’S BETWEEN AN INSTRUMENT AND A CONFOUNDER? Should we adjust for Z? U Z c4 c1 c3 c2 T1 X ANSWER: Yes, if T2 c0 Y c4 c2c1 c3 1 c 2 3 No, otherwise CONCLUSION: Adjusting for a parent of Y is safer than a parent of X 24 WHICH SET TO ADJUST FOR Should we adjust for {T}, {Z}, or {T, Z}? Z T X Y Answer 1: (From bias-amplification considerations) {T} is better than {T, Z} which is the same as {Z} Answer 2: (From variance considerations) {T} is better than {T, Z} which is better than {Z} 25 CONCLUSIONS • The prevailing practice of adjusting for all covariates, especially those that are good predictors of X (the “treatment assignment,” Rubin, 2009) is totally misguided. • The “outcome mechanism” is as important, and much safer, from both bias and variance viewpoints • As X-rays are to the surgeon, graphs are for causation 26 REGRESSION VS. STRUCTURAL EQUATIONS (THE CONFUSION OF THE CENTURY) Regression (claimless, nonfalsifiable): Y = ax + Y Structural (empirical, falsifiable): Y = bx + uY Claim: (regardless of distributions): E(Y | do(x)) = E(Y | do(x), do(z)) = bx The mothers of all questions: Q. When would b equal a? A. When all back-door paths are blocked, (uY X) Q. When is b estimable by regression methods? A. Graphical criteria available 27 TWO PARADIGMS FOR CAUSAL INFERENCE Observed: P(X, Y, Z,...) Conclusions needed: P(Yx=y), P(Xy=x | Z=z)... How do we connect observables, X,Y,Z,… to counterfactuals Yx, Xz, Zy,… ? N-R model Counterfactuals are primitives, new variables Structural model Counterfactuals are derived quantities Super-distribution Subscripts modify the model and distribution P * ( X , Y ,..., Yx , X z ,...) X , Y , Z constrain Yx , Z y ,... P(Yx y ) PM x (Y y ) 28 “SUPER” DISTRIBUTION IN N-R MODEL X Y Z Yx=0 Yx=1 Xz=0 Xz=1 Xy=0 U 0 0 0 0 1 0 0 0 1 1 1 0 1 0 0 1 u1 u2 0 0 0 1 0 0 1 1 u3 1 0 0 1 0 0 1 0 u4 inconsistency: Defines : x = 0 Yx=0 = Y Y = xY1 + (1-x) Y0 P * ( X , Y , Z ,...Yx , Z y ...Yxz , Z xy ,... ...) P * (Yx y | Z , X z ) Yx X | Z y 29 ARE THE TWO PARADIGMS EQUIVALENT? • Yes (Galles and Pearl, 1998; Halpern 1998) • In the N-R paradigm, Yx is defined by consistency: Y xY1 (1 x)Y0 • In SCM, consistency is a theorem. • Moreover, a theorem in one approach is a theorem in the other. • Difference: Clarity of assumptions and their implications 30 AXIOMS OF STRUCTURAL COUNTERFACTUALS Yx(u)=y: Y would be y, had X been x (in state U = u) (Galles, Pearl, Halpern, 1998): 1. Definiteness x X s.t. X y (u ) x 2. Uniqueness ( X y (u ) x) & ( X y (u ) x' ) x x' 3. Effectiveness X xw (u ) x 4. Composition (generalized consistency) X w (u ) x Ywx (u ) Yw (u ) 5. Reversibility (Yxw (u ) y ) & (Wxy (u ) w) Yx (u ) y 31 FORMULATING ASSUMPTIONS THREE LANGUAGES 1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U) U Y X Z 2. Counterfactuals: Z x (u ) Z yx (u ), X y (u ) X zy (u ) X z (u ) X (u ), Yz (u ) Yzx (u ), Z x {Yz , X } 3. Structural: X Z Y x f1(u , 1) z f 2 ( x, 2 ) y f3 ( z , u , 3) 32 COMPARISON BETWEEN THE N-R AND SCM LANGUAGES 1. Expressing scientific knowledge 2. Recognizing the testable implications of one's assumptions 3. Locating instrumental variables in a system of equations 4. Deciding if two models are equivalent or nested 5. Deciding if two counterfactuals are independent given another 6. Algebraic derivations of identifiable estimands 33 GRAPHICAL – COUNTERFACTUALS SYMBIOSIS Every causal graph expresses counterfactuals assumptions, e.g., X Y Z 1. Missing arrows Y Z 2. Missing arcs Y Z Yx, z (u ) Yx (u ) Yx Z y consistent, and readable from the graph. • Express assumption in graphs • Derive estimands by graphical or algebraic methods 34 EFFECT DECOMPOSITION (direct vs. indirect effects) 1. Why decompose effects? 2. What is the definition of direct and indirect effects? 3. What are the policy implications of direct and indirect effects? 4. When can direct and indirect effect be estimated consistently from experimental and nonexperimental data? 35 WHY DECOMPOSE EFFECTS? 1. To understand how Nature works 2. To comply with legal requirements 3. To predict the effects of new type of interventions: Signal routing, rather than variable fixing 36 LEGAL IMPLICATIONS OF DIRECT EFFECT Can data prove an employer guilty of hiring discrimination? (Gender) X Z (Qualifications) Y (Hiring) What is the direct effect of X on Y ? CDE E(Y | do( x1), do( z )) E (Y | do( x0 ), do( z )) (averaged over z) Adjust for Z? No! No! 37 FISHER’S GRAVE MISTAKE (after Rubin, 2005) What is the direct effect of treatment on yield? (Soil treatment) X Z (Plant density) (Latent factor) Y (Yield) Compare treated and untreated lots of same density Zz Zz E(Y | do( x1), do( z )) E (Y | do( x0 ), do( z )) No! No! Proposed solution (?): “Principal strata” 38 NATURAL INTERPRETATION OF AVERAGE DIRECT EFFECTS Robins and Greenland (1992) – “Pure” X Z z = f (x, u) y = g (x, z, u) Y DE ( x0 , x1;Y ) Natural Direct Effect of X on Y: The expected change in Y, when we change X from x0 to x1 and, for each u, we keep Z constant at whatever value it attained before the change. E[Yx1Z x Yx0 ] 0 In linear models, DE = Natural Direct Effect ( x1 x0 ) 39 DEFINITION AND IDENTIFICATION OF NESTED COUNTERFACTUALS Consider the quantity Q Eu [YxZ (u ) (u )] x* Given M, P(u), Q is well defined Given u, Zx*(u) is the solution for Z in Mx*, call it z YxZ (u ) (u ) is the solution for Y in Mxz x* experiment al Can Q be estimated from data? nonexperim ental Experimental: nest-free expression Nonexperimental: subscript-free expression 40 DEFINITION OF INDIRECT EFFECTS X Z Y z = f (x, u) y = g (x, z, u) No Controlled Indirect Effect IE ( x0 , x1;Y ) Indirect Effect of X on Y: The expected change in Y when we keep X constant, say at x0, and let Z change to whatever value it would have attained had X changed to x1. E[Yx0 Z x Yx0 ] 1 In linear models, IE = TE - DE 41 POLICY IMPLICATIONS OF INDIRECT EFFECTS What is the indirect effect of X on Y? The effect of Gender on Hiring if sex discrimination is eliminated. GENDER X IGNORE Z QUALIFICATION f Y HIRING Deactivating a link – a new type of intervention 42 MEDIATION FORMULAS 1. The natural direct and indirect effects are identifiable in Markovian models (no confounding), 2. And are given by: DE [ E (Y | do( x1, z )) E (Y | do( x0 , z ))]P( z | do( x0 )). z IE E (Y | do( x0 , z ))[ P( z | do( x1)) P( z | do( x0 ))] z TE DE IE (rev ) 3. Applicable to linear and non-linear models, continuous and discrete variables, regardless of distributional form. 43 WHY TE DE IE Z m1 X m2 In linear systems TE m1m2 DE IE m1m2 TE DE IE g xz z m1x 1 y x m2 z gxz 2 Y Linear + interaction TE m1m2 m1g DE IE m1m2 TE DE IE m1g 44 MEDIATION FORMULAS IN UNCONFOUNDED MODELS Z X Y DE [ E (Y | x1, z ) E (Y | x0 , z )]P( z | x0 ) z IE [ E (Y | x0 , z )[ P( z | x1) P( z | x0 )] z TE E (Y | x1) E (Y | x0 ) IE Fraction of responses explained by mediation TE DE Fraction of responses owed to mediation 45 WHY TE DE IE Z m1 X TE DE IE (rev ) m2 Y In linear systems IE (rev ) IE TE TE m1m2 DE IE m1m2 TE DE IE Effect sustained by mediation alone Is NOT equal to: TE DE Effect prevented by disabling mediation TE - DE DE Disabling mediation IE Disabling direct path 46 Z X MEDIATION FORMULA FOR BINARY VARIABLES Y n1 n2 n3 n4 n5 n6 n7 n8 X 0 0 0 0 1 1 1 1 Z 0 0 1 1 0 0 1 1 Y E(Y|x,z)=gxz E(Z|x)=hx n2 0 g00 n1 n2 n3 n4 1 h0 n4 n1 n2 n3 n4 0 g01 n3 n4 1 n6 0 g10 n5 n6 n7 n8 1 h1 n8 n5 n6 n7 n8 0 g11 n7 n8 1 DE ( g10 g00 )(1 h0 ) ( g11 g01)h0 IE (h1 h0 )( g01 g00 ) 47 RAMIFICATION OF THE MEDIATION FORMULA • DE should be averaged over mediator levels, IE should NOT be averaged over intervention levels. • TE-DE need not equal IE TE-DE = proportion for whom mediation is necessary IE = proportion for whom mediation is sufficient • TE-DE informs interventions on indirect pathways IE informs intervention on direct pathways. 48 RECENT PROGRESS 1. A Theory of causal transportability When can causal relations learned from experiments be transferred to a different environment in which no experiment can be conducted? 2. A Theory of statistical transportability When can statistical information learned in one domain be transferred to a different domain in which a. only a subset of variables can be observed? Or, b. only a few samples are available? 3. A Theory of Selection Bias Selection bias occurs when samples used in learning are chosen differently than those used in testing. When can this bias be eliminated? 49 TRANSPORTABILITY — WHEN CAN WE EXTRAPOLATE EXPERIMENTAL FINDINGS TO A DIFFERENT ENVIRONMENT? Z (Observation) X (Intervention) Y (Outcome) Experimental study in LA Measured: P ( x, y , z ) P( y | do( x), z ) Needed: Z X Y Observational study in NYC Measured: P* ( x, y, z ) P( x, y, z ) P* ( x, y,z ) P* ( y | do( x)) ? 50 TRANSPORT FORMULAS DEPEND ON THE STORY Z S S S Z Y X Y X X (b) (a) Z (c) Y a) Z represents age P* ( y | do( x)) P( y | do( x), z ) P* ( z ) z b) Z represents language skill P* ( y | do( x)) ?P( y | do( x)) c) Z represents a bio-marker P* ( y | do( x)) ? P( y | do( x), z ) P* ( z | x ) z 53 TRANSPORTABILITY (Pearl and Bareinboim, 2011) Definition 1 (Transportability) Given two environments, and *, characterized by graphs G and G*, a causal relation R is transportable from to * if 1. R() is estimable from the set I of interventional studies on , and 2. R(*) is identified from I, P*, G, and G*. 54 RESULT : ALGORITHM TO DETERMINE IF AN EFFECT IS TRANSPORTABLE S U INPUT: Causal Graph S Factors creating differences V OUTPUT: 1. Transportable or not? 2. Measurements to be taken in the experimental study 3. Measurements to be taken in the target population 4. A transport formula T S X W Z Y P* ( y | do( x)) P ( y | do( x) , z) P* ( z | w) P (w | do( x), t ) P* (t ) z z t 55 WHICH MODEL LICENSES THE TRANSPORT OF THE CAUSAL EFFECT XY S External factors creating disparities Yes No S S S X (a) Y X Y Z (d) X W Z (e) Y No S Y Z (c) Yes S W X (b) Yes X Yes S Y X Z (f) Y 56 STATISTICAL TRANSPORTABILITY Why should we transport statistical information? Why not re-learn things from scratch ? 1. Measurements are costly Limit measurements to a subset V * of variables called “scope”. 2. Samples are costly Share samples from diverse domains to illuminate their common components. 57 STATISTICAL TRANSPORTABILITY Definition: (Statistical Transportability) A statistical relation R(P) is said to be transportable from to * over V * if R(P*) is identified from P, P*(V *), and G* where P*(V *) is the marginal distribution of P* over a subset of variables V *. S R=P* (y | x) is estimable without re-measuring Y X Z R P * ( z | x) P( z | y ) Y z S X Z Y If few samples (N2) are available from * and many samples (N1) from, then estimating R = P*(y | x) by decomposition: R P * ( y | x, z ) P ( z | x ) z achieves much higher precision 58 THE ORIGIN OF SELECTION BIAS: UX UY c0 X Y 1 2 S No selection bias Selection bias activated by a virtual collider Selection bias activated by both a virtual collider and real collider 59 CONTROLLING SELECTION BIAS BY ADJUSTMENT U1 U2 UY X Y S2 S3 S1 Can be eliminated by randomization or adjustment 60 CONTROLLING SELECTION BIAS BY ADJUSTMENT U1 U2 UY X Y S2 S3 S1 Cannot be eliminated by randomization, requires adjustment for U2 61 CONTROLLING SELECTION BIAS BY ADJUSTMENT U1 U2 UY X Y S2 S3 S1 Cannot be eliminated by randomization or adjustment 62 CONTROLLING SELECTION BIAS BY ADJUSTMENT UX UY c0 X Y 1 2 S Cannot be eliminated by adjustment or by randomization 63 CONTROLLING BY ADJUSTMENT MAY REQUIRE EXTERNAL INFORMATION U1 U2 UY X Y S2 S3 S1 Adjustment for U2 gives P( y | do( x)) P( y | x, S2 1, u2 ) P(u2 ) u2 If all we have is P(u2 | S2 = 1), not P(u2), then only the U2-specific effect is recoverable 64 WHEN WOULD THE ODDS RATIO BE RECOVERABLE? (a) X (b) Y S (c) P( y1 | x1) CP( y1 / x0 ) OR( X , Y ) / P( y0 | xX1) P(Yy0 | x0 ) Z X Y OR(Y , X ) S W S (a) OR is recoverable, despite the virtual collider at Y. whenever (X Y | Y,Z)G or (Y S | X , Z)G, giving: OR (X,Y | S = 1) = OR(X,Y) (Cornfield 1951, Wittemore 1978; Geng 1992) (b) the Z-specific OR is recoverable, but is meaningless. OR (X ,Y | Z) = OR (X,Y | Z, S = 1) (c) the C-specific OR is recoverable, which is meaningful. OR (X ,Y | C) = OR (X,Y | W,C, S = 1) 65 2. GRAPHICAL CONDITION FOR ODDS-RATIO RECOVERABILITY Theorem (Bareinboim and Pearl, 2011) A necessary condition for the recoverability of OR(Y,X) is that every ancestor Ai of S that is also a descendant of X have a separating set that either: 1. d-separates Ai from X given Y, or 2. d-separates Ai from Y given X. Result: Polynomial algorithm to decide recoverability 66 EXAMPLES OF ODDS RATIO RECOVERABILITY (b) (a) Separator X Y W4 (a) Y W4 W3 W1 W2 {W1, W2, W4} {W4, W3, W1} X W3 W1 W2 S S Recoverable Non-recoverable OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1) 67 EXAMPLES OF ODDS RATIO RECOVERABILITY (b) (a) Separator X Y W4 W1 (a) X W4 W3 W1 W2 Y W3 W1 W2 S S Recoverable Non-recoverable OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1) 68 EXAMPLES OF ODDS RATIO RECOVERABILITY (b) (a) Separator X Y W4 0 Y W4 W3 W1 W2 (a) X W3 W1 W2 S S Recoverable Non-recoverable OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1) 69 EXAMPLES OF ODDS RATIO RECOVERABILITY (b) (a) Separator X Y W4 W4 Y W4 W3 W1 W2 (a) X W3 W1 W2 S S Recoverable Non-recoverable OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1) 70 EXAMPLES OF ODDS RATIO RECOVERABILITY (b) (a) Separator X Y W4 0 Y W4 W3 W1 W2 (a) X W3 W1 W2 S S Recoverable Non-recoverable OR(Y , X ) OR(Y , X | W1,W2 ,W4 , S 1) 71 CONCLUSIONS I TOLD YOU CAUSALITY IS SIMPLE • Formal basis for causal and counterfactual inference (complete) • Unification of the graphical, potential-outcome and structural equation approaches • Friendly and formal solutions to century-old problems and confusions. • No other method can do better (theorem) 72 Thank you for agreeing with everything I said. 73