THE MATHEMATICS OF CAUSAL MODELING Judea Pearl Department of Computer Science

advertisement
THE MATHEMATICS
OF CAUSAL MODELING
Judea Pearl
Department of Computer Science
UCLA
OUTLINE
• Modeling: Statistical vs. Causal
• Causal Models and Identifiability
• Inference to three types of claims:
1. Effects of potential interventions
2. Claims about attribution (responsibility)
3. Claims about direct and indirect effects
• Robustness of Causal Claims
TRADITIONAL STATISTICAL
INFERENCE PARADIGM
Data
P
Joint
Distribution
Q(P)
(Aspects of P)
Inference
e.g.,
Infer whether customers who bought product A
would also buy product B.
Q = P(B|A)
THE CAUSAL INFERENCE
PARADIGM
Data
M
Data-generating
Model
Q(M)
(Aspects of M)
Inference
Some Q(M) cannot be inferred from P.
e.g.,
Infer whether customers who bought product A
would still buy A if we double the price.
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES
Probability and statistics deal with static relations
Statistics
Data
•
Probability
joint
distribution
inferences
from passive
observations
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES
Probability and statistics deal with static relations
Statistics
Probability
inferences
Data
from passive
observations
Causal analysis deals with changes (dynamics)
i.e. What remains invariant when P changes.
joint
distribution
• P does not tell us how it ought to change
e.g. Curing symptoms vs. curing diseases
e.g. Analogy: mechanical deformation
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES
Probability and statistics deal with static relations
Statistics
Probability
inferences
Data
from passive
observations
Causal analysis deals with changes (dynamics)
1. Effects of
Data
interventions
Causal
2. Causes of
Model
Causal
effects
assumptions
3. Explanations
Experiments
joint
distribution
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES (CONT)
1. Causal and statistical concepts do not mix.
CAUSAL
Spurious correlation
Randomization
Confounding / Effect
Instrument
Holding constant
Explanatory variables
2.
3.
4.
STATISTICAL
Regression
Association / Independence
“Controlling for” / Conditioning
Odd and risk ratios
Collapsibility
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES (CONT)
1. Causal and statistical concepts do not mix.
CAUSAL
Spurious correlation
Randomization
Confounding / Effect
Instrument
Holding constant
Explanatory variables
STATISTICAL
Regression
Association / Independence
“Controlling for” / Conditioning
Odd and risk ratios
Collapsibility
2. No causes in – no causes out (Cartwright, 1989)
statistical assumptions + data
causal conclusions
causal assumptions
}
3. Causal assumptions cannot be expressed in the mathematical
language of standard statistics.
4.
FROM STATISTICAL TO CAUSAL ANALYSIS:
1. THE DIFFERENCES (CONT)
1. Causal and statistical concepts do not mix.
CAUSAL
Spurious correlation
Randomization
Confounding / Effect
Instrument
Holding constant
Explanatory variables
STATISTICAL
Regression
Association / Independence
“Controlling for” / Conditioning
Odd and risk ratios
Collapsibility
2. No causes in – no causes out (Cartwright, 1989)
statistical assumptions + data
causal conclusions
causal assumptions
}
3. Causal assumptions cannot be expressed in the mathematical
language of standard statistics.
4. Non-standard mathematics:
a) Structural equation models (SEM)
b) Counterfactuals (Neyman-Rubin)
c) Causal Diagrams (Wright, 1920)
WHAT'S IN A CAUSAL MODEL?
Oracle that assigns truth value to causal
sentences:
Action sentences:
Counterfactuals:
Explanation:
B if we do A.
B would be different if
A were true.
B occurred because of A.
Optional: with what probability?
FAMILIAR CAUSAL MODEL
ORACLE FOR MANIPILATION
X
Y
Z
INPUT
OUTPUT
CAUSAL MODELS AND
CAUSAL DIAGRAMS
Definition: A causal model is a 3-tuple
M = V,U,F
with a mutilation operator do(x): M Mx where:
(i)
V = {V1…,Vn} endogenous variables,
(ii)
U = {U1,…,Um} background variables
(iii) F = set of n functions, fi : V \ Vi  U  Vi
vi = fi(pai,ui) PAi  V \ Vi Ui  U
•
CAUSAL MODELS AND
CAUSAL DIAGRAMS
Definition: A causal model is a 3-tuple
M = V,U,F
with a mutilation operator do(x): M Mx where:
(i)
V = {V1…,Vn} endogenous variables,
(ii)
U = {U1,…,Um} background variables
(iii) F = set of n functions, fi : V \ Vi  U  Vi
vi = fi(pai,ui) PAi  V \ Vi Ui  U
q  b1 p  d1i  u1
p  b2q  d 2 w  u2
U1
I
W
Q
P
U2
PAQ
CAUSAL MODELS AND
MUTILATION
Definition: A causal model is a 3-tuple
M = V,U,F
with a mutilation operator do(x): M Mx where:
(i)
V = {V1…,Vn} endogenous variables,
(ii)
U = {U1,…,Um} background variables
(iii) F = set of n functions, fi : V \ Vi  U  Vi
vi = fi(pai,ui) PAi  V \ Vi Ui  U
(iv) Mx= U,V,Fx,
X  V, x  X
where Fx = {fi: Vi  X }  {X = x}
(Replace all functions fi corresponding to X with the constant
functions X=x)
•
CAUSAL MODELS AND
MUTILATION
Definition: A causal model is a 3-tuple
M = V,U,F
with a mutilation operator do(x): M Mx where:
(i)
V = {V1…,Vn} endogenous variables,
(ii)
U = {U1,…,Um} background variables
(iii) F = set of n functions, fi : V \ Vi  U  Vi
vi = fi(pai,ui) PAi  V \ Vi Ui  U
(iv)
q  b1 p  d1i  u1
p  b2q  d 2 w  u2
U1
I
W
Q
P
U2
CAUSAL MODELS AND
MUTILATION
Definition: A causal model is a 3-tuple
M = V,U,F
with a mutilation operator do(x): M Mx where:
(i)
V = {V1…,Vn} endogenous variables,
(ii)
U = {U1,…,Um} background variables
(iii) F = set of n functions, fi : V \ Vi  U  Vi
vi = fi(pai,ui) PAi  V \ Vi Ui  U
(iv)
q  b1 p  d1i  u1 U1
p  b2q  d 2 w  u2
p  p0
Mp
I
W
U2
Q
P
P = p0
PROBABILISTIC
CAUSAL MODELS
Definition: A causal model is a 3-tuple
M = V,U,F
with a mutilation operator do(x): M Mx where:
(i)
V = {V1…,Vn} endogenous variables,
(ii)
U = {U1,…,Um} background variables
(iii) F = set of n functions, fi : V \ Vi  U  Vi
vi = fi(pai,ui) PAi  V \ Vi Ui  U
(iv) Mx= U,V,Fx,
X  V, x  X
where Fx = {fi: Vi  X }  {X = x}
(Replace all functions fi corresponding to X with the constant
functions X=x)
Definition (Probabilistic Causal Model):
M, P(u)
P(u) is a probability assignment to the variables in U.
CAUSAL MODELS AND
COUNTERFACTUALS
Definition: Potential Response
The sentence: “Y would be y (in unit u), had X been x,”
denoted Yx(u) = y, is the solution for Y in a mutilated model
Mx, with the equations for X replaced by X = x.
(“unit-based potential outcome”)
•
•
CAUSAL MODELS AND
COUNTERFACTUALS
Definition: Potential Response
The sentence: “Y would be y (in unit u), had X been x,”
denoted Yx(u) = y, is the solution for Y in a mutilated model
Mx, with the equations for X replaced by X = x.
(“unit-based potential outcome”)
Joint probabilities of counterfactuals:
P(Yx  y, Z w  z ) 
•

u:Yx (u )  y,Z w (u )  z
P(u )
CAUSAL MODELS AND
COUNTERFACTUALS
Definition: Potential Response
The sentence: “Y would be y (in unit u), had X been x,”
denoted Yx(u) = y, is the solution for Y in a mutilated model
Mx, with the equations for X replaced by X = x.
(“unit-based potential outcome”)
Joint probabilities of counterfactuals:
P(Yx  y, Z w  z ) 
In particular:

u:Yx (u )  y,Z w (u )  z
P(u )

P( y | do(x ) ) P (Yx  y )   P (u )

u:Yx (u )  y
P(Yx '  y '| x, y ) 

P (u | x, y )
u:Yx ' (u )  y '
3-STEPS TO COMPUTING
COUNTERFACTUALS
S5. If the prisoner is dead, he would still be dead
if A were not to have shot. DDA
Abduction
TRUE
U (Court order)
C (Captain)
A
B (Riflemen)
D (Prisoner)
TRUE
3-STEPS TO COMPUTING
COUNTERFACTUALS
S5. If the prisoner is dead, he would still be dead
if A were not to have shot. DDA
Abduction
TRUE
Action
U
TRUE
C
Prediction
U
TRUE
C
C
FALSE
FALSE
A
B
D
TRUE
U
A
B
D
A
B
D
TRUE
COMPUTING PROBABILITIES
OF COUNTERFACTUALS
P(S5). The prisoner is dead. How likely is it that he would be dead
if A were not to have shot. P(DA|D) = ?
Abduction
P(u)
P(u|D)
Action
U
P(u|D)
C
Prediction
U
P(u|D)
C
C
FALSE
FALSE
A
B
D
TRUE
U
A
B
D
A
B
D
P(DA|D)
CAUSAL INFERENCE
MADE EASY (1985-2000)
1. Inference with Nonparametric Structural Equations
made possible through Graphical Analysis.
2. Mathematical underpinning of counterfactuals
through nonparametric structural equations
3. Graphical-Counterfactuals symbiosis
IDENTIFIABILITY
Definition:
Let Q(M) be any quantity defined on a causal
model M, and let A be a set of assumption.
Q is identifiable relative to A iff
P(M1) = P(M2)  Q(M1) = Q(M2)
for all M1, M2, that satisfy A.
•
•
IDENTIFIABILITY
Definition:
Let Q(M) be any quantity defined on a causal
model M, and let A be a set of assumption.
Q is identifiable relative to A iff
P(M1) = P(M2)  Q(M1) = Q(M2)
for all M1, M2, that satisfy A.
In other words, Q can be determined uniquely
from the probability distribution P(v) of the
endogenous variables, V, and assumptions A.
•
IDENTIFIABILITY
Definition:
Let Q(M) be any quantity defined on a causal
model M, and let A be a set of assumption.
Q is identifiable relative to A iff
P(M1) = P(M2)  Q(M1) = Q(M2)
for all M1, M2, that satisfy A.
In this talk:
A: Assumptions encoded in the diagram
Q1: P(y|do(x)) Causal Effect (= P(Yx=y))
Q2: P(Yx =y | x, y) Probability of necessity
Q3: E(Yx ) Direct Effect
Z x'
THE FUNDAMENTAL THEOREM
OF CAUSAL INFERENCE
Causal Markov Theorem:
Any distribution generated by Markovian structural model M
(recursive, with independent disturbances) can be factorized as
P(v1, v2,..., vn )   P(vi | pai )
i
Where pai are the (values of) the parents of Vi in the causal
diagram associated with M.
•
THE FUNDAMENTAL THEOREM
OF CAUSAL INFERENCE
Causal Markov Theorem:
Any distribution generated by Markovian structural model M
(recursive, with independent disturbances) can be factorized as
P(v1, v2,..., vn )   P(vi | pai )
i
Where pai are the (values of) the parents of Vi in the causal
diagram associated with M.
Corollary: (Truncated factorization, Manipulation Theorem)
The distribution generated by an intervention do(X=x)
(in a Markovian model M) is given by the truncated factorization
P(v1, v2,..., vn | do( x )) 

i|Vi X
P(vi | pai ) |
X x
RAMIFICATIONS OF THE
FUNDAMENTAL THEOREM
Given P(x,y,z), should we ban smoking?
U (unobserved)
U (unobserved)
X
Smoking
Z
Tar in
Lungs
Pre-intervention
•
Y
Cancer
X=x
Smoking
Z
Tar in
Lungs
Post-intervention
Y
Cancer
RAMIFICATIONS OF THE
FUNDAMENTAL THEOREM
Given P(x,y,z), should we ban smoking?
U (unobserved)
U (unobserved)
X
Z
Smoking
Tar in
Lungs
Y
Cancer
X=x
Smoking
Y
Z
Tar in
Lungs
Cancer
Pre-intervention
Post-intervention
P( x, y, z )   P(u )P( x | u )P( z | x )P( y | z, u )
P( y, z | do( x ))   P(u )P( z | x )P( y | z, u )
u
•
u
RAMIFICATIONS OF THE
FUNDAMENTAL THEOREM
Given P(x,y,z), should we ban smoking?
U (unobserved)
U (unobserved)
X
Z
Smoking
Tar in
Lungs
Y
Cancer
X=x
Smoking
Y
Z
Tar in
Lungs
Cancer
Pre-intervention
Post-intervention
P( x, y, z )   P(u )P( x | u )P( z | x )P( y | z, u )
P( y, z | do( x ))   P(u )P( z | x )P( y | z, u )
u
u
To compute P(y,z|do(x)), we must eliminate u. (Graphical problem.)
THE BACK-DOOR CRITERION
Graphical test of identification
P(y | do(x)) is identifiable in G if there is a set Z of
variables such that Z d-separates X from Y in Gx.
G
Z1
Gx
Z1
Z2
Z3
•
Z2
Z3
Z4
X
Z
Z6
Z5
Y
Z4
X
Z6
Z5
Y
THE BACK-DOOR CRITERION
Graphical test of identification
P(y | do(x)) is identifiable in G if there is a set Z of
variables such that Z d-separates X from Y in Gx.
G
Z1
Gx
Z1
Z2
Z3
Z2
Z3
Z4
X
Z
Z6
Z5
Y
Z4
X
Z6
Moreover, P(y | do(x)) =  P(y | x,z) P(z)
z
(“adjusting” for Z)
Z5
Y
RULES OF CAUSAL CALCULUS
Rule 1: Ignoring observations
P(y | do{x}, z, w) = P(y | do{x}, w)
if (Y  Z|X,W )G
Rule 2: Action/observation exchange
X
P(y | do{x}, do{z}, w) = P(y | do{x},z,w)
if (Y  Z|X,W )G
Rule 3: Ignoring actions
XZ
P(y | do{x}, do{z}, w) = P(y | do{x}, w)
if (Y  Z|X,W )G
X Z(W)
DERIVATION IN CAUSAL CALCULUS
Genotype (Unobserved)
Smoking
Tar
Cancer
P (c | do{s}) = t P (c | do{s}, t) P (t | do{s})
Probability Axioms
= t P (c | do{s}, do{t}) P (t | do{s})
Rule 2
= t P (c | do{s}, do{t}) P (t | s)
Rule 2
= t P (c | do{t}) P (t | s)
Rule 3
= st P (c | do{t}, s) P (s | do{t}) P(t |s) Probability Axioms
= st P (c | t, s) P (s | do{t}) P(t |s)
Rule 2
= s t P (c | t, s) P (s) P(t |s)
Rule 3
OUTLINE
• Modeling: Statistical vs. Causal
• Causal models and identifiability
• Inference to three types of claims:
1. Effects of potential interventions,
2. Claims about attribution (responsibility)
3.
DETERMINING THE CAUSES OF EFFECTS
(The Attribution Problem)
•
•
Your Honor! My client (Mr. A) died BECAUSE
he used that drug.
DETERMINING THE CAUSES OF EFFECTS
(The Attribution Problem)
•
•
Your Honor! My client (Mr. A) died BECAUSE
he used that drug.
Court to decide if it is MORE PROBABLE THAN
NOT that A would be alive BUT FOR the drug!
P(? | A is dead, took the drug) > 0.50
THE PROBLEM
Theoretical Problems:
1. What is the meaning of PN(x,y):
“Probability that event y would not have occurred if
it were not for event x, given that x and y did in fact
occur.”
•
THE PROBLEM
Theoretical Problems:
1. What is the meaning of PN(x,y):
“Probability that event y would not have occurred if
it were not for event x, given that x and y did in fact
occur.”
Answer:
PN ( x, y )  P(Yx'  y' | x, y )
P(Yx'  y' , X  x,Y  y )

P( X  x,Y  y )
THE PROBLEM
Theoretical Problems:
1. What is the meaning of PN(x,y):
“Probability that event y would not have occurred if
it were not for event x, given that x and y did in fact
occur.”
2. Under what condition can PN(x,y) be learned from
statistical data, i.e., observational, experimental
and combined.
WHAT IS INFERABLE FROM
EXPERIMENTS?
Simple Experiment:
Q = P(Yx= y | z)
Z nondescendants of X.
Compound Experiment:
Q = P(YX(z) = y | z)
Multi-Stage Experiment:
etc…
CAN FREQUENCY DATA DECIDE
LEGAL RESPONSIBILITY?
Deaths (y)
Survivals (y)
•
•
•
•
Experimental
do(x) do(x)
16
14
984
986
1,000 1,000
Nonexperimental
x
x
2
28
998
972
1,000 1,000
Nonexperimental data: drug usage predicts longer life
Experimental data: drug has negligible effect on survival
Plaintiff: Mr. A is special.
1. He actually died
2. He used the drug by choice
Court to decide (given both data):
Is it more probable than not that A would be alive
but for the drug?
PN 
 P(Yx'  y' | x, y )  0.50
TYPICAL THEOREMS
(Tian and Pearl, 2000)
•
Bounds given combined nonexperimental and
experimental data
0


 1 
 P( y )  P( y ) 
 P( y' ) 
x'
x'
max 

PN

min



P( x,y )


 P( x,y ) 




•
Identifiability under monotonicity (Combined data)
P( y|x )  P( y|x' ) P( y|x' )  P( y x' )
PN 

P( y|x )
P( x,y )
corrected Excess-Risk-Ratio
SOLUTION TO THE ATTRIBUTION
PROBLEM (Cont)
•
•
•
WITH PROBABILITY ONE P(yx | x,y) =1
From population data to individual case
Combined data tell more that each study alone
OUTLINE
• Modeling: Statistical vs. Causal
• Causal models and identifiability
• Inference to three types of claims:
1. Effects of potential interventions,
2. Claims about attribution (responsibility)
3. Claims about direct and indirect effects
•
QUESTIONS ADDRESSED
• What is the semantics of direct and
indirect effects?
• Can we estimate them from data?
Experimental data?
WHY DECOMPOSE
EFFECTS?
1. Direct (or indirect) effect may be more transportable.
2. Indirect effects may be prevented or controlled.

Pill
Pregnancy
+ +
Thrombosis
3. Direct (or indirect) effect may be forbidden
Gender
Qualification
Hiring
TOTAL, DIRECT, AND INDIRECT
EFFECTS HAVE SIMPLE SEMANTICS
IN LINEAR MODELS
b
X
a
Z
c
z = bx + 1
y = ax + cz + 2
Y


TE 
E(Y | do( x ))  a + bc
x


DE 
E(Y | do( x ), do( z ))  a
x
IE 
 TE  DE  bc
Z - independen t
SEMANTICS BECOMES NONTRIVIAL
IN NONLINEAR MODELS
(even when the model is completely specified)
X
Z
z = f (x, 1)
y = g (x, z, 2)
Y


TE 
E(Y | do( x ))
x


DE 
E(Y | do( x ), do( z ))
x
IE 
 ????
Dependent on z?
Void of operational meaning?
THE OPERATIONAL MEANING OF
DIRECT EFFECTS
X
Z
z = f (x, 1)
y = g (x, z, 2)
Y
“Natural” Direct Effect of X on Y:
The expected change in Y per unit change of X, when we
keep Z constant at whatever value it attains before the
change.
E[Yx1Z x  Yx0 ]
0
In linear models, NDE = Controlled Direct Effect
POLICY IMPLICATIONS
(Who cares?)
indirect
What is the direct effect of X on Y?
The effect of Gender on Hiring if sex discrimination
is eliminated.
GENDER X
IGNORE
Z QUALIFICATION
f
Y HIRING
THE OPERATIONAL MEANING OF
INDIRECT EFFECTS
X
Z
z = f (x, 1)
y = g (x, z, 2)
Y
“Natural” Indirect Effect of X on Y:
The expected change in Y when we keep X constant, say
at x0, and let Z change to whatever value it would have
under a unit change in X.
E[Yx0 Z x  Yx0 ]
1
In linear models, NIE = TE - DE
LEGAL DEFINITIONS TAKE THE
NATURAL CONCEPTION
(FORMALIZING DISCRIMINATION)
``The central question in any employment-discrimination
case is whether the employer would have taken the
same action had the employee been of different race
(age, sex, religion, national origin etc.) and everything
else had been the same’’
[Carson versus Bethlehem Steel Corp. (70 FEP Cases 921,
7th Cir. (1996))]
x = male, x = female
y = hire, y = not hire
z = applicant’s qualifications
NO DIRECT EFFECT
Yx'Z  Yx,
x
YxZ
x'
 Yx'
SEMANTICS AND IDENTIFICATION
OF NESTED COUNTERFACTUALS
Consider the quantity
Q
 Eu [YxZ x * (u ) (u )]
Given M, P(u), Q is well defined
Given u, Zx*(u) is the solution for Z in Mx*, call it z
YxZ (u ) (u ) is the solution for Y in Mxz
x*
 experiment al 
Can Q be estimated from 
 data?
nonexperim ental 
GRAPHICAL CONDITION FOR
EXPERIMENTAL IDENTIFICATION
OF AVERAGE NATURAL DIRECT EFFECTS
Theorem: If there exists a set W such that
(Y  Z | W )G XZ and W  ND( X  Z )
NDE ( x, x*;Y )   E (Yxz | w)  E (Yx*z | w)P( Z x*  z | w) P( w)
w, z
Example:
HOW THE PROOF GOES?
Proof:


NDE  x, x*;Y   E Yx , Z x*  E (Yx* )
If W
Yxz  Z x* | W for all z and x
E (Yx, Z x* )    E (Yxz | Z x*  z ,W  w)
wz
P( Z x*  z | W  w) P(W  w)
E (Yx, Z x* )    E (Yxz  Y | W  w)
w z
P( Z x*  z | W  w) P(W  w)
Each factor is identifiable by experimentation.
GRAPHICAL CRITERION FOR
COUNTERFACTUAL INDEPENDENCE
Yxz  Z x* | W for all z and x
U3
U1  U 2
U3
U2
X
Z
Y
U2
Z
X
U1
U1
Y
U3
(Y  Z | W )G XZ
U1  U 2
U2
Z
X
Y
G XZ
U1
GRAPHICAL CONDITION FOR
NONEXPERIMENTAL IDENTIFICATION
OF AVERAGE NATURAL DIRECT EFFECTS
NDE ( x, x*;Y )
  E (Yxz | w)  E (Yx*z | w)P( Z x*  z | w) P( w)
w, z
Identification conditions
1. There exists a W such that (Y 
 Z | W)GXZ
2. There exist additional covariates that render all
counterfactual terms identifiable.
IDENTIFICATION IN
MARKOVIAN MODELS
Corollary 3:
The average natural direct effect in Markovian models is
identifiable from nonexperimental data, and it is given by
NDE ( x, x*;Y )   [ E (Y | x, z )  E (Y | x*, z )]P( Z x*  z )
z
X
Z
Y
NDE ( x, x*;Y )   E (Y | x, z )  E ( y | x*, z )P( z | x*)
z
RELATIONS BETWEEN TOTAL,
DIRECT, AND INDIRECT EFFECTS
Theorem 5: The total, direct and indirect effects obey
The following equality
TE ( x, x*;Y )  NDE ( x, x*;Y )  NIE ( x*, x;Y )
In words, the total effect (on Y) associated with the
transition from x* to x is equal to the difference
between the direct effect associated with this transition
and the indirect effect associated with the reverse
transition, from x to x*.
GENERAL PATH-SPECIFIC
EFFECTS (Def.)
x*
X
W
Z
X
W
Z
z* = Zx* (u)
Y
Y
Form a new model, M g* , specific to active subgraph g
fi* ( pai , u; g )  fi ( pai ( g ), pai*( g ),u )
Definition: g-specific effect
E g ( x, x* ;Y )M  TE( x, x* ;Y )
M g*
Nonidentifiable even in Markovian models
ANSWERS TO QUESTIONS
• Graphical conditions for estimability from
experimental / nonexperimental data.
• Graphical conditions hold in Markovian models
•
ANSWERS TO QUESTIONS
• Graphical conditions for estimability from
experimental / nonexperimental data.
• Graphical conditions hold in Markovian models
• Useful in answering new type of policy questions
involving mechanism blocking instead of variable
fixing.
THE OVERRIDING THEME
1.
2.
Define Q(M) as a counterfactual expression
Determine conditions for the reduction
Q( M )  Pexp ( M ) or Q( M )  P( M )
3.
If reduction is feasible, Q is inferable.
• Demonstrated on three types of queries:
Q1: P(y|do(x)) Causal Effect (= P(Yx=y))
Q2: P(Yx = y | x, y) Probability of necessity
Q3: E(Yx ) Direct Effect
Z x'
OUTLINE
• Modeling: Statistical vs. Causal
• Causal Models and Identifiability
• Inference to three types of claims:
1. Effects of potential interventions
2. Claims about attribution (responsibility)
3. Claims about direct and indirect effects
• Actual Causation and Explanation
• Robustness of Causal Claims
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
a
Smoking
In linear systems: y = ax + u
a is identifiable.
a = Ryx
y
Cancer
cov (x,u) = 0
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
Smoking
y
Cancer
The claim aRyx is sensitive to the assumption
cov (x,u) = 0.
a is non-identifiable if cov (x,u) ≠ 0.
ROBUSTNESS:
MOTIVATION
Z
Price of
Cigarettes
Genetic Factors (unobserved)
u
b
a
x
Smoking
y
Cancer
Z – Instrumental variable; cov(z,u) = 0
a is identifiable, even if cov (x,u) ≠ 0
R yz
R yz  a b
a

Rxz  b 
Rxz
ROBUSTNESS:
MOTIVATION
Z
Price of
Cigarettes
Genetic Factors (unobserved)
u
b
a
x
Smoking
a 0  R yx
y
Cancer
a1 
R yz
Rxz
Suppose a 0  a1
Claim “a = Ryx” is likely to be true
ROBUSTNESS:
MOTIVATION
Z1
Price of
Cigarettes
Genetic Factors (unobserved)
u
b
a
g
Z2
Peer
Pressure
y
x
Smoking
Cancer
Invoking several instruments
a 0  R yx
a1 
R yz1
Rxz1
a2 
R yz2
Rxz2
If a0 = a1 = a2, claim “a = a0” is more likely correct
ROBUSTNESS:
MOTIVATION
Z1
Price of
Cigarettes
Z2
Peer
Pressure
Genetic Factors (unobserved)
u
b
a
g
x
Smoking
y
Cancer
Z3
Anti-smoking Legislation
Zn
Greater surprise: a1 = a2 = a3….= an = q
Claim a = q is highly likely to be correct
ROBUSTNESS:
MOTIVATION
Given a parameter a in a general graph
a
x
y
Assume we have several independent estimands
of a, and
a1 = a2 = …an
Find the degree to which a is robust to violations
of model assumptions
ROBUSTNESS:
ATTEMPTED FORMULATION
Bad attempt:
if:
f1, f2:
Parameter a is robust (over identifies)
a  f1()
a  f 2 ()
Two distinct functions
if model induces constraint g ()  0, then
a  f ()  t1[ g ()]  f ()  t2[ g ()]  
ti [ g ()] are distinct.
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
a
Smoking
y
Cancer
a 0  R yx
Rsx
a1 
Rsy
Is a robust if a0 = a1?
b
s
Symptom
b  Rsy,ab  Rsx 
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
u
x
Smoking
a
y
Cancer
b
s
Symptom
Symptoms do not act as instruments
a remains non-identifiable if cov (x,u) ≠ 0
Why? Taking a noisy measurement (s) of an
observed variable (y) cannot add new information
ROBUSTNESS:
MOTIVATION
Genetic Factors (unobserved)
Sn
u
S2
a
x
y
Smoking
Cancer
S1
Symptom
Adding many symptoms does not help.
a remains non-identifiable
INDEPENDENT:
BASED ON DISTINCT SETS OF ASSUMPTION
u
u
a
z
x
a
y
x
y
z
Estimand
Assumptioms
Estimand
Assumptioms
a 0  R yx
R yz
a1 
Rxz
cov( x, u )  0
a 0  R yx
Rzx
a1 
Rzy
cov( x, u )  0
others
cov( x, u )  0
 others
RELEVANCE:
FORMULATION
Definition 8
Let A be an assumption embodied in model M,
and p a parameter in M. A is said to be relevant
to p if and only if there exists a set of assumptions
S in M such that S and A sustain the identification
of p but S alone does not sustain such
identification.
Theorem 2
An assumption A is relevant to p if and only if A is a
member of a minimal set of assumptions sufficient
for identifying p.
ROBUSTNESS:
FORMULATION
Definition 5 (Degree of over-identification)
A parameter p (of model M) is identified to
degree k (read: k-identified) if there are k
minimal sets of assumptions each yielding a
distinct estimand of p.
ROBUSTNESS:
FORMULATION
b
c
x
y
Minimal assumption sets for c.
x
c
y
G1
z
x
c
y
z
c
z x
y
G3
G2
Minimal assumption sets for b.
x
b
y
z
z
FROM MINIMAL ASSUMPTION SETS
TO MAXIMAL EDGE SUPERGRAPHS
FROM PARAMETERS TO CLAIMS
Definition
A claim C is identified to degree k in model M (graph
G), if there are k edge supergraphs of G that permit the
identification of C, each yielding a distinct estimand.
e.g., Claim: (Total effect) TE(x,z) = q
x
y
TE(x,z) = Rzx
z x
x
y
y
z
TE(x,z) = Rzx Rzy ·x
z
FROM MINIMAL ASSUMPTION SETS
TO MAXIMAL EDGE SUPERGRAPHS
FROM PARAMETERS TO CLAIMS
Definition
A claim C is identified to degree k in model M (graph
G), if there are k edge supergraphs of G that permit the
identification of C, each yielding a distinct estimand.
e.g., Claim: (Total effect) TE(x,z) = q
x
Nonparametric
y
TE ( x, z )  P( z | x)
z x
x
y
z
z
y
TE ( z , x)   P( y | x) P( z | x' , y ) P( x' )
y
x'
SUMMARY OF
ROBUSTNESS RESULTS
1. Formal definition to ROBUSTNESS of
causal claims:
“A claim is robust when it is insensitive to
violations of some of the model
assumptions relevant to substantiating
that claim.”
2. Graphical criteria and algorithms for
computing the degree of robustness of a
given causal claim.
CONCLUSIONS
Structural-model semantics enriched
with logic + graphs leads to formal
interpretation and practical assessments
of wide variety of causal and counterfactual
relationships.
Download