REASONING WITH CAUSES AND COUNTERFACTUALS Judea Pearl UCLA

advertisement
REASONING WITH CAUSES
AND COUNTERFACTUALS
Judea Pearl
UCLA
(www.cs.ucla.edu/~judea/)
1
OUTLINE
1. Unified conceptualization of counterfactuals,
structural-equations, and graphs
2. Confounding demystified
3. Direct and indirect effects (Mediation)
4. Transportability and external validity mathematized
2
TRADITIONAL STATISTICAL
INFERENCE PARADIGM
Data
P
Joint
Distribution
Q(P)
(Aspects of P)
Inference
e.g.,
Infer whether customers who bought product A
would also buy product B.
Q = P(B | A)
3
THE STRUCTURAL MODEL
PARADIGM
Data
Joint
Distribution
Data
Generating
Model
Q(M)
(Aspects of M)
M
Inference
M – Invariant strategy (mechanism, recipe, law,
protocol) by which Nature assigns values to variables
in the analysis.
•
“Think
Nature, not experiment!”
4
FAMILIAR CAUSAL MODEL
ORACLE FOR MANIPILATION
X
Y
Z
INPUT
OUTPUT
5
WHY PHYSICS DESERVES
A NEW ALGEBRA
Scientific Equations (e.g., Hooke’s Law) are non-algebraic
e.g., Length (Y) equals a constant (2) times the weight (X)
Correct notation:
YY :==2X
2X
X=1
X=1
Y=2
Process information
The solution
Had X been 3, Y would be 6.
If we raise X to 3, Y would be 6.
Must “wipe out” X = 1.
6
STRUCTURAL
CAUSAL MODELS
Definition: A structural causal model is a 4-tuple
V,U, F, P(u), where
• V = {V1,...,Vn} are endogeneas variables
• U = {U1,...,Um} are background variables
• F = {f1,..., fn} are functions determining V,
vi = fi(v, u)
e.g., y    x  u
Y
• P(u) is a distribution over U
P(u) and F induce a distribution P(v) over observable
variables
7
CAUSAL MODELS AND COUNTERFACTUALS
Definition:
The sentence: “Y would be y (in situation u), had X been x,”
denoted Yx(u) = y, means:
The solution for Y in a mutilated model Mx, (i.e., the equations for X
replaced by X = x) with input U=u, is equal to y.
The Fundamental Equation of Counterfactuals:
Yx (u )  YM x (u )
8
READING COUNTERFACTUALS
FROM SEM
2
2
Z
g

1
X

3
Y
X  Treatment
Z  Study Time
Y  Score
  0.5
Z
1
X
g  0.4
3
  0.7
Y
x  1
z  x  2
y  x  gz  3
Data shows:  = 0.7,  = 0.5, g = 0.4
A student named Joe, measured X=0.5, Z=1, Y=1.9
Q1: What would Joe’s score be had he doubled his study time?
9
READING COUNTERFACTUALS
 2  0.75
2
  0.5
1
X
Z
g  0.4
3
  0.7
Y
  0.5
1  0.5
 2  0.75
Z  1.0
g  0.4
3  0.75 1  0.5
X  0.5   0.7 Y  1.5
Z  2.0
g  0.4
3  0.75
X  0.5   0.7 Y  1.9
Q1: What would Joe’s score be had he doubled his study time?
Answer: Joe’s score would be 1.9
Or,
In counterfactual notation:
Yz 2 (u )  1.9
10
READING COUNTERFACTUALS
 2  0.75
2
  0.5
1
X
Z
g  0.4
3
  0.7
Y
  0.5
1  0.5
 2  0.75
Z  1.0
g  0.4
3  0.75 1  0.5
X  0.5   0.7 Y  1.5
Z  2.0
g  0.4
3  0.75
X  0.5   0.7 Y  1.9
Q2: What would Joe’s score be, had the treatment been 0 and
had he studied at whatever level he would have studied had
the treatment been 1?
 2  0.75
 2  0.75
Z  1.25
ZZ11.25
.0
  0.5
g  0.4
g  0.4
  0.5
1  0.5
3  0.75
3  0.75
1  0.5
X  10.5   0.7 Y  12..525
X  10   0.7 Y  12.25
11
POTENTIAL AND OBSERVED OUTCOMES
PREDICTED BY A STRUCTURAL MODEL
12
CAUSAL MODELS AND COUNTERFACTUALS
Definition:
The sentence: “Y would be y (in situation u), had X been x,”
denoted Yx(u) = y, means:
The solution for Y in a mutilated model Mx, (i.e., the equations for X
replaced by X = x) with input U=u, is equal to y.
• Joint probabilities of counterfactuals:
P (Yx  y, Z w  z ) 
In particular:

u:Yx (u )  y , Z w (u )  z
P ( y | do(x) ) 
 P (Yx  y ) 
P (Yx '  y '| x, y ) 


u:Yx (u )  y
P (u )
P (u )
P (u | x, y )
u:Yx ' (u )  y '
13
THE FIVE NECESSARY STEPS
OF CAUSAL ANALYSIS
Define:
Express the target quantity Q as a function
Q(M) that can be computed from any model M.
PE(Y
e.g.,
e.g.,ETT
ATE
 (Yx |doy(|xX
1))xE' )(Y | do( x0 ))
Assume:
Formulate causal assumptions A using some formal
language.
Identify:
Determine if Q is identifiable given A.
Estimate:
Estimate Q if it is identifiable; approximate it,
if it is not.
Test:
Test the testable implications of A (if any).
14
THE FIVE NECESSARY STEPS
OF CAUSAL ANALYSIS
Define:
Assume:
Express the target quantity Q as a function
Q(M) that can be computed from any model M.
e.g.,
DEETT
 E
), IE
E (Yxy1),Z x Yx0 )
e.g.,
PC
e.g.,
| 0X

x' )x1|, Y
x0| X
(YPPx((1YY,xxZ x0yyY
0 1
0
Formulate causal assumptions A using some formal
language.
Identify:
Determine if Q is identifiable given A.
Estimate:
Estimate Q if it is identifiable; approximate it,
if it is not.
Test:
Test the testable implications of A (if any).
15
THE LOGIC OF CAUSAL ANALYSIS
CAUSAL
MODEL
(MA)
A - CAUSAL
ASSUMPTIONS
A* - Logical
implications of A
Causal inference
Q Queries of
interest
Q(P) - Identified
estimands
T(MA) - Testable
implications
Statistical inference
Data (D)
Q - Estimates of
Q(P)
Q(Q | D, A)
Provisional claims
g (T )
Model testing
Goodness of fit
16
IDENTIFICATION IN SCM
Find the effect of X on Y, P(y|do(x)), given the
causal assumptions shown in G, where Z1,..., Zk
are auxiliary variables.
G
Z1
Z2
Z3
X
Z4
Z5
Z6
Y
Can P(y|do(x)) be estimated if only a subset, Z,
can be measured?
17
ELIMINATING CONFOUNDING BIAS
THE BACK-DOOR CRITERION
P(y | do(x)) is estimable if there is a set Z of
variables such that Z d-separates X from Y in Gx.
Gx
G
Z1
Z1
Z2
Z3
Z2
Z3
Z4
X
Z
Z6
Z5
Y
Z4
X
Z6
Z5
Y
P ( x, y , z )
Moreover, P( y | do( x))   P( y | x, z ) P( z )  
z
z P( x | z )
•
(“adjusting” for Z)  Ignorability
18
EFFECT OF WARM-UP ON INJURY
(After Shrier & Platt, 2008)
Watch out!
???
Front
Door
No, no!
Warm-up Exercises (X)
Injury (Y)
19
PROPENSITY SCORE ESTIMATOR
(Rosenbaum & Rubin, 1983)
Z1
Z2
P(y | do(x)) = ?
Z4
Z3
Z5
L
X
Z6
Y
Can L replace {Z1, Z2, Z3, Z4, Z5} ?
L( z1, z2 , z3 , z4 , z5 ) 
 P( X  1 | z1, z2 , z3 , z4 , z5 )
Theorem:
 P( y | z , x) P( z )   P( y | L  l , x) P( L  l )
z
l
Adjustment for L replaces Adjustment for Z
20
WHAT PROPENSITY SCORE (PS)
PRACTITIONERS NEED TO KNOW
L( z )  P ( X  1 | Z  z )
 P( y | z, x) P( z )   P( y | l , x) P(l )
z
l
1. The asymptotic bias of PS is EQUAL to that of ordinary
adjustment (for same Z).
2. Including an additional covariate in the analysis CAN SPOIL
the bias-reduction potential of others.
Z
Z
X
Z
Y
X
Y
X
Y
X
Z
3. In particular, instrumental variables tend to amplify bias.
4. Choosing sufficient set for PS, requires knowledge of the
model.
Y
21
SURPRISING RESULT:
Instrumental variables are Bias-Amplifiers in linear
models (Bhattarcharya & Vogt 2007; Wooldridge 2009)
Z
U
c3
“Naive” bias
c1
X
(Unobserved)
c2
c0
Y


B0  E (Y | x)  E (Y | do( x))  c0  c1c2   c0  c1c2
x
x
Adjusted bias



c1c2 
cc
  c0  1 2  c1c2
Bz  E (Y | x, z )  E (Y | do( x))  c0 
2
2
x
x

1

c
1

c

3
3
22
INTUTION:
When Z is allowed to vary, it absorbs (or explains)
some of the changes in X.
Z
U
c3
c1
X
c2
Y
c0
When Z is fixed the burden falls on U alone, and
transmitted to Y (resulting in a higher bias)
Z
U
c3
c1
X
c2
c0
Y
23
WHAT’S BETWEEN AN INSTRUMENT AND
A CONFOUNDER?
Should we adjust for Z?
U
Z
c4
c1
c3
c2
T1
X
ANSWER:
Yes, if
T2
c0
Y
c4 c2c1

c3 1  c 2
3
No, otherwise
CONCLUSION:
Adjusting for a parent of Y is safer
than a parent of X
24
WHICH SET TO ADJUST FOR
Should we adjust for {T}, {Z}, or {T, Z}?
Z
T
X
Y
Answer 1: (From bias-amplification considerations)
{T} is better than {T, Z} which is the same as {Z}
Answer 2: (From variance considerations)
{T} is better than {T, Z} which is better than {Z}
25
CONCLUSIONS
• The prevailing practice of adjusting for all
covariates, especially those that are good
predictors of X (the “treatment
assignment,” Rubin, 2009) is totally
misguided.
• The “outcome mechanism” is as
important, and much safer, from both bias
and variance viewpoints
• As X-rays are to the surgeon, graphs are
for causation
26
REGRESSION VS. STRUCTURAL EQUATIONS
(THE CONFUSION OF THE CENTURY)
Regression (claimless, nonfalsifiable):
Y = ax + Y
Structural (empirical, falsifiable):
Y = bx + uY
Claim: (regardless of distributions):
E(Y | do(x)) = E(Y | do(x), do(z)) = bx
The mothers of all questions:
Q. When would b equal a?
A. When all back-door paths are blocked, (uY  X)
Q. When is b estimable by regression methods?
A. Graphical criteria available
27
TWO PARADIGMS FOR
CAUSAL INFERENCE
Observed: P(X, Y, Z,...)
Conclusions needed: P(Yx=y), P(Xy=x | Z=z)...
How do we connect observables, X,Y,Z,…
to counterfactuals Yx, Xz, Zy,… ?
N-R model
Counterfactuals are
primitives, new variables
Structural model
Counterfactuals are
derived quantities
Super-distribution
Subscripts modify the
model and distribution
P * ( X , Y ,..., Yx , X z ,...)
X , Y , Z constrain Yx , Z y ,...
P(Yx  y )  PM x (Y  y )
28
“SUPER” DISTRIBUTION
IN N-R MODEL
X
Y
Z
Yx=0
Yx=1
Xz=0
Xz=1
Xy=0
U
0
0
0
0
1
0
0
0
1
1
1
0
1
0
0
1
u1
u2
0
0
0
1
0
0
1
1
u3
1
0
0
1
0
0
1
0
u4
inconsistency:
Defines :
x = 0  Yx=0 = Y
Y = xY1 + (1-x) Y0
P * ( X , Y , Z ,...Yx , Z y ...Yxz , Z xy ,... ...)
P * (Yx  y | Z , X z )
Yx  X | Z y
29
ARE THE TWO
PARADIGMS EQUIVALENT?
• Yes (Galles and Pearl, 1998; Halpern 1998)
• In the N-R paradigm, Yx is defined by
consistency:
Y  xY1  (1  x)Y0
• In SCM, consistency is a theorem.
• Moreover, a theorem in one approach is a
theorem in the other.
• Difference: Clarity of assumptions and their
implications
30
AXIOMS OF STRUCTURAL
COUNTERFACTUALS
Yx(u)=y: Y would be y, had X been x (in state U = u)
(Galles, Pearl, Halpern, 1998):
1. Definiteness
x  X s.t. X y (u )  x
2. Uniqueness
( X y (u )  x) & ( X y (u )  x' )  x  x'
3. Effectiveness
X xw (u )  x
4. Composition (generalized consistency)
X w (u )  x  Ywx (u )  Yw (u )
5. Reversibility
(Yxw (u )  y ) & (Wxy (u )  w)  Yx (u )  y
31
FORMULATING ASSUMPTIONS
THREE LANGUAGES
1. English: Smoking (X), Cancer (Y), Tar (Z), Genotypes (U)
U
Y
X
Z
2. Counterfactuals: Z x (u )  Z yx (u ),
X y (u )  X zy (u )  X z (u )  X (u ),
Yz (u )  Yzx (u ), Z x  {Yz , X }
3. Structural:
X
Z
Y
x  f1(u , 1)
z  f 2 ( x,  2 )
y  f3 ( z , u ,  3)
32
COMPARISON BETWEEN THE
N-R AND SCM LANGUAGES
1. Expressing scientific knowledge
2. Recognizing the testable implications of one's
assumptions
3. Locating instrumental variables in a system of
equations
4. Deciding if two models are equivalent or nested
5. Deciding if two counterfactuals are independent
given another
6. Algebraic derivations of identifiable estimands
33
GRAPHICAL – COUNTERFACTUALS
SYMBIOSIS
Every causal graph expresses counterfactuals
assumptions, e.g., X  Y  Z
1. Missing arrows Y  Z
2. Missing arcs
Y
Z
Yx, z (u )  Yx (u )
Yx  Z y
consistent, and readable from the graph.
• Express assumption in graphs
• Derive estimands by graphical or algebraic
methods
34
EFFECT DECOMPOSITION
(direct vs. indirect effects)
1. Why decompose effects?
2. What is the definition of direct and indirect effects?
3. What are the policy implications of direct and indirect
effects?
4. When can direct and indirect effect be estimated
consistently from experimental and nonexperimental
data?
35
WHY DECOMPOSE EFFECTS?
1. To understand how Nature works
2. To comply with legal requirements
3. To predict the effects of new type of interventions:
Signal routing, rather than variable fixing
36
LEGAL IMPLICATIONS
OF DIRECT EFFECT
Can data prove an employer guilty of hiring discrimination?
(Gender)
X
Z (Qualifications)
Y
(Hiring)
What is the direct effect of X on Y ?
CDE  E(Y | do( x1), do( z ))  E (Y | do( x0 ), do( z ))
(averaged over z)
Adjust for Z? No! No!
37
FISHER’S GRAVE MISTAKE
(after Rubin, 2005)
What is the direct effect of treatment on yield?
(Soil treatment)
X
Z (Plant density)
(Latent factor)
Y
(Yield)
Compare treated and untreated lots of same density
Zz
Zz
E(Y | do( x1), do( z ))  E (Y | do( x0 ), do( z ))
No! No! Proposed solution (?): “Principal strata”
38
NATURAL INTERPRETATION OF
AVERAGE DIRECT EFFECTS
Robins and Greenland (1992) – “Pure”
X
Z
z = f (x, u)
y = g (x, z, u)
Y
DE ( x0 , x1;Y )
Natural Direct Effect of X on Y:
The expected change in Y, when we change X from x0 to x1 and,
for each u, we keep Z constant at whatever value it attained
before the change.
E[Yx1Z x  Yx0 ]
0
In linear models, DE = Natural Direct Effect  ( x1  x0 )
39
DEFINITION AND IDENTIFICATION OF
NESTED COUNTERFACTUALS
Consider the quantity Q 
 Eu [YxZ (u ) (u )]
x*
Given M, P(u), Q is well defined
Given u, Zx*(u) is the solution for Z in Mx*, call it z
YxZ (u ) (u ) is the solution for Y in Mxz
x*
 experiment al 
Can Q be estimated from 
 data?
nonexperim ental 
Experimental: nest-free expression
Nonexperimental: subscript-free expression
40
DEFINITION OF
INDIRECT EFFECTS
X
Z
Y
z = f (x, u)
y = g (x, z, u)
No Controlled Indirect Effect
IE ( x0 , x1;Y )
Indirect Effect of X on Y:
The expected change in Y when we keep X constant, say at x0,
and let Z change to whatever value it would have attained had
X changed to x1.
E[Yx0 Z x  Yx0 ]
1
In linear models, IE = TE - DE
41
POLICY IMPLICATIONS
OF INDIRECT EFFECTS
What is the indirect effect of X on Y?
The effect of Gender on Hiring if sex discrimination
is eliminated.
GENDER X
IGNORE
Z QUALIFICATION
f
Y HIRING
Deactivating a link – a new type of intervention
42
MEDIATION FORMULAS
1. The natural direct and indirect effects are identifiable in
Markovian models (no confounding),
2. And are given by:
DE   [ E (Y | do( x1, z ))  E (Y | do( x0 , z ))]P( z | do( x0 )).
z
IE   E (Y | do( x0 , z ))[ P( z | do( x1))  P( z | do( x0 ))]
z
TE  DE  IE (rev )
3. Applicable to linear and non-linear models, continuous
and discrete variables, regardless of distributional form.
43
WHY TE  DE  IE
Z
m1
X
m2

In linear systems
TE    m1m2
DE  
IE  m1m2
TE  DE  IE
g xz
z  m1x  1
y  x  m2 z  gxz   2
Y
Linear + interaction
TE    m1m2  m1g
DE  
IE  m1m2
TE  DE  IE  m1g
44
MEDIATION FORMULAS
IN UNCONFOUNDED MODELS
Z
X
Y
DE   [ E (Y | x1, z )  E (Y | x0 , z )]P( z | x0 )
z
IE   [ E (Y | x0 , z )[ P( z | x1)  P( z | x0 )]
z
TE  E (Y | x1)  E (Y | x0 )
IE  Fraction of responses explained by mediation
TE  DE  Fraction of responses owed to mediation
45
WHY TE  DE  IE
Z
m1
X
TE  DE  IE (rev )
m2

Y
In linear systems
IE (rev )   IE
TE
TE    m1m2
DE  
IE  m1m2  TE  DE
IE  Effect sustained by mediation alone
Is NOT equal to:
TE  DE  Effect prevented by disabling
mediation
TE - DE
DE
Disabling
mediation
IE
Disabling
direct path
46
Z
X
MEDIATION FORMULA
FOR BINARY VARIABLES
Y
n1
n2
n3
n4
n5
n6
n7
n8
X
0
0
0
0
1
1
1
1
Z
0
0
1
1
0
0
1
1
Y E(Y|x,z)=gxz E(Z|x)=hx
n2
0
 g00
n1  n2
n3  n4
1
 h0
n4
n1  n2  n3  n4
0
 g01
n3  n4
1
n6
0
 g10
n5  n6
n7  n8
1
 h1
n8
n5  n6  n7  n8
0
 g11
n7  n8
1
DE  ( g10  g00 )(1  h0 )  ( g11  g01)h0
IE  (h1  h0 )( g01  g00 )
47
RAMIFICATION OF THE
MEDIATION FORMULA
• DE should be averaged over mediator levels,
IE should NOT be averaged over intervention levels.
• TE-DE need not equal IE
TE-DE = proportion for whom mediation is necessary
IE = proportion for whom mediation is sufficient
• TE-DE informs interventions on indirect pathways
IE informs intervention on direct pathways.
48
RECENT PROGRESS
1. A Theory of causal transportability
When can causal relations learned from experiments
be transferred to a different environment in which no
experiment can be conducted?
2. A Theory of statistical transportability
When can statistical information learned in one domain
be transferred to a different domain in which
a. only a subset of variables can be observed? Or,
b. only a few samples are available?
3. A Theory of Selection Bias
Selection bias occurs when samples used in learning
are chosen differently than those used in testing.
When can this bias be eliminated?
49
TRANSPORTABILITY — WHEN CAN WE
EXTRAPOLATE EXPERIMENTAL FINDINGS TO A
DIFFERENT ENVIRONMENT?
Z (Observation)
X
(Intervention)
Y
(Outcome)
Experimental study in LA
Measured:
P ( x, y , z )
P( y | do( x), z )
Needed:
Z
X
Y
Observational study in NYC
Measured: P* ( x, y, z )
P( x, y, z )  P* ( x, y,z )
P* ( y | do( x))  ?
50
TRANSPORT FORMULAS DEPEND
ON THE STORY
Z
S
S
S
Z
Y
X
Y
X
X
(b)
(a)
Z
(c)
Y
a) Z represents age
P* ( y | do( x))   P( y | do( x), z ) P* ( z )
z
b) Z represents language skill
P* ( y | do( x))  ?P( y | do( x))
c) Z represents a bio-marker
P* ( y | do( x))  ? P( y | do( x), z ) P* ( z | x )
z
53
TRANSPORTABILITY
(Pearl and Bareinboim, 2011)
Definition 1 (Transportability)
Given two environments,  and *, characterized by
graphs G and G*, a causal relation R is transportable
from  to * if
1. R() is estimable from the set I of
interventional studies on , and
2. R(*) is identified from I, P*, G, and G*.
54
RESULT : ALGORITHM TO DETERMINE
IF AN EFFECT IS TRANSPORTABLE
S
U
INPUT: Causal Graph
S
Factors creating differences
V
OUTPUT:
1. Transportable or not?
2. Measurements to be taken in
the experimental study
3. Measurements to be taken in
the target population
4. A transport formula
T
S
X
W
Z
Y
P* ( y | do( x))
  P ( y | do( x) , z)   P* ( z | w)  P (w | do( x), t ) P* (t )
z
z
t
55
WHICH MODEL LICENSES THE TRANSPORT
OF THE CAUSAL EFFECT XY
S
External factors creating disparities
Yes
No
S
S
S
X
(a)
Y
X
Y
Z
(d)
X
W
Z
(e)
Y
No
S
Y
Z
(c)
Yes
S
W
X
(b)
Yes
X
Yes
S
Y
X
Z
(f)
Y
56
STATISTICAL TRANSPORTABILITY
Why should we transport statistical information?
Why not re-learn things from scratch ?
1. Measurements are costly
Limit measurements to a subset V * of variables
called “scope”.
2. Samples are costly
Share samples from diverse domains to illuminate
their common components.
57
STATISTICAL TRANSPORTABILITY
Definition: (Statistical Transportability)
A statistical relation R(P) is said to be transportable from  to *
over V * if R(P*) is identified from P, P*(V *), and G* where
P*(V *) is the marginal distribution of P* over a subset of
variables V *.
S
R=P* (y | x) is estimable without re-measuring Y
X
Z
R   P * ( z | x) P( z | y )
Y
z
S
X
Z
Y
If few samples (N2) are available from *
and many samples (N1) from, then
estimating R = P*(y | x) by decomposition:
R   P * ( y | x, z ) P ( z | x )
z
achieves much higher precision
58
THE ORIGIN OF
SELECTION BIAS:
UX
UY
c0
X
Y
1
2
S
No selection bias
Selection bias activated by a virtual collider
Selection bias activated by both a virtual collider and real collider
59
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
U1
U2
UY
X
Y
S2
S3
S1
Can be eliminated by randomization or adjustment
60
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
U1
U2
UY
X
Y
S2
S3
S1
Cannot be eliminated by randomization, requires
adjustment for U2
61
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
U1
U2
UY
X
Y
S2
S3
S1
Cannot be eliminated by randomization or adjustment
62
CONTROLLING SELECTION BIAS BY
ADJUSTMENT
UX
UY
c0
X
Y
1
2
S
Cannot be eliminated by adjustment or by randomization
63
CONTROLLING BY ADJUSTMENT
MAY REQUIRE EXTERNAL
INFORMATION
U1
U2
UY
X
Y
S2
S3
S1
Adjustment for U2 gives
P( y | do( x))   P( y | x, S2  1, u2 ) P(u2 )
u2
If all we have is P(u2 | S2 = 1), not P(u2),
then only the U2-specific effect is recoverable
64
WHEN WOULD THE ODDS
RATIO BE RECOVERABLE?
(a)
X
(b)
Y
S
(c)
P( y1 | x1) CP( y1 / x0 )
OR( X , Y ) 
/
P( y0 | xX1) P(Yy0 | x0 )
Z
X
Y
 OR(Y , X )
S
W
S
(a) OR is recoverable, despite the virtual collider at Y.
whenever (X  Y | Y,Z)G or (Y  S | X , Z)G, giving:
OR (X,Y | S = 1) = OR(X,Y)
(Cornfield 1951, Wittemore 1978; Geng 1992)
(b) the Z-specific OR is recoverable, but is meaningless.
OR (X ,Y | Z) = OR (X,Y | Z, S = 1)
(c) the C-specific OR is recoverable, which is meaningful.
OR (X ,Y | C) = OR (X,Y | W,C, S = 1)
65
2. GRAPHICAL CONDITION FOR
ODDS-RATIO RECOVERABILITY
Theorem (Bareinboim and Pearl, 2011)
A necessary condition for the recoverability of OR(Y,X) is
that every ancestor Ai of S that is also a descendant of X
have a separating set that either:
1. d-separates Ai from X given Y, or
2. d-separates Ai from Y given X.
Result: Polynomial algorithm to decide recoverability
66
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
(a)
Y
W4
W3
W1
W2
{W1, W2, W4}
{W4, W3, W1}
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X )  OR(Y , X | W1,W2 ,W4 , S  1)
67
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
W1
(a)
X
W4
W3
W1
W2
Y
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X )  OR(Y , X | W1,W2 ,W4 , S  1)
68
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
0
Y
W4
W3
W1
W2
(a)
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X )  OR(Y , X | W1,W2 ,W4 , S  1)
69
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
W4
Y
W4
W3
W1
W2
(a)
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X )  OR(Y , X | W1,W2 ,W4 , S  1)
70
EXAMPLES OF ODDS
RATIO RECOVERABILITY
(b)
(a)
Separator
X
Y
W4
0
Y
W4
W3
W1
W2
(a)
X
W3
W1
W2
S
S
Recoverable
Non-recoverable
OR(Y , X )  OR(Y , X | W1,W2 ,W4 , S  1)
71
CONCLUSIONS
I TOLD YOU CAUSALITY IS SIMPLE
• Formal basis for causal and counterfactual inference
(complete)
• Unification of the graphical, potential-outcome and
structural equation approaches
• Friendly and formal solutions to
century-old problems and confusions.
• No other method can do better (theorem)
72
Thank you for agreeing
with everything I said.
73
Download