a=1 - HUMIS

advertisement

Modern Approach to Causal

Inference

Brian C. Sauer, PhD MS

SLC VA Career Development Awardee

About Me

• SLC VA Career Development Awardee

• PhD in Pharmacoepidemiology from College of

Pharmacy at University of Florida

• MS in Biomedical Informatics from University

Utah

• Assistant Research Professor in Division of

Epidemiology, Department of Internal

Medicine

Acknowledgments

• Mentors

– Matthew Samore, MD

– Tom Greene , PhD

– Jonathan Nebeker, MD

• Simulation

– Chen Wang, MS statistics

• Primary References:

– Causal Inference Book: Jamie Robins & Miguel Hernán

• http://www.hsph.harvard.edu/faculty/miguel-hernan/causalinference-book/

– Modern Epidemiology 3 rd Ed. Chapter 12. Rothman,

Greenland, Lash

Outline

• Causal Inference and the counterfactual framework

• Exchangeability & conditional exchangeability

• Use of Directed Acyclic Graphs (DAGs) to identify a minimal set of covariates to remove confounding.

Key Learning Points

• Understand the Rationale for

– randomized control trials.

– covariate selection in observation research.

• Identify the minimal set of covariates needed to produce unbiased effect estimates.

• Develop terminology and language to describe these ideas with precision.

• Become familiar with notation for causal inference, which is a barrier to this literature.

Counterfactual Framework

• Neyman (1923)

– Effects of point exposures in randomized experiments

• Rubin (1974)

– Effects of point exposures in randomized and observational studies (potential outcomes and

Rubin Causal Framework)

• Robins (1986)

– Effects of time-varying exposures in randomized and observational studies. (counterfactuals)

Counterfactual

Working Example:

• Zeus took the heart pill 5-days later he died

• Had he not taken the heart pill he would still be alive on that 5th day

– that is, all things being equal

• Did the pill cause Zeus’s death ?

Counterfactual

Working Example:

• Hera didn’t take the pill

– 5-days later she was alive

• Had she taken the pill she would still be alive

5-days later.

• Did the pill cause Hera’s survival ?

Gettysburg: A Novel of the Civil War

• Newt Gingrich and

William R Forstchen

• Historical Fiction

– Imagines how the war would have ended if there was a confederate victory at Gettysburg

Notation for Actual Data

• Y=1 if patient died, 0 otherwise

– Y z =1, Y h =0

• A=1 if patient treated, 0 otherwise

– A z =1, A h =0

Pat ID A Y

Zeus

Hera

1

0

1

0

Notation for Ideal Data

Outcome under No Treatment

• Y a=0 =1 if subject had not taken the pill, he would have died

– Y z, a=0 = 0, Y h, a=0 = 0

Outcome under Treatment

• Y a=1 =1 if subject had taken the pill, he would have died

– Y z, a=1 = 1, Y h, a=1 = 0

Pat ID A

Zeus 1

Hera 0

Y a =0

0

0

Y a =1

1

0

Available Research Data Set

ID

Zeus

Hera

Apollo 1

Cyclope 0

A

1

0

Y

1

0

0

0

Y a=0

?

0

?

0

Y a=1

1

?

0

?

(Individual) Causal Effect

Formal definition of causal effects:

– For Zeus: Pill has a causal effect because

- Y z,a=1Y z,a=0

– For Hera: Pill doesn’t have a causal effect because

- Y h,a=1 = Y h,a=0

Average Causal Effects

Formal Definitional of Average Causal Effects

• In the population, exposure A has a causal effect on the outcome Y if

- Pr[Y a=1 =1] ≠ Pr[Y a=0 =1]

Causal null hypothesis holds if

- Pr[Y a=1 =1] = Pr[Y a=0 =1]

- E[Y a=1 =1] = E[Y a=0 =1]

Representation of causal null

Causal effects can be measured in many scales

• Risk difference: Pr[Y a=1 =1] - Pr[Y a=0 =1] =0

• Risk Ratio: Pr[Y a=1 = 1] ÷ Pr[Y a=0 =1] =1

• Odds Ratio, Hazard Ratio, etc…

Average Causal Effects

No average causal effect:

• Risk difference: Pr[Y a=1 =1] - Pr[Y a=0 =1] =0

• Risk Ratio: Pr[Y a=1 =1] / Pr[Y a=0 =1] =1

• Are there individual Causal Effects?

Pat ID

Rheia

Kronos

Demeter

Hades

Hestia

Poseidon

Hera

Zeus

Artemis

Apollo

Leto

Ares

Athena

Hephaestus

Aphrodite

Cyclope

Persephone

Hermes

Hebe

Dionysus

0

1

1

0

0

1

1

Y a=0 Y a=1

1

0

0

0

0

1

0

0

1

1

0

1

1

0

0

0

0

1

0

0

1

1

0

1

1

1

1

1

0

1

1

0

0

Causal Effects

Definition

• Causal effects are calculated by contrasting counterfactual risks within the population.

• Counterfactual contrasts are by definition causal effects.

Associational Measures

Observed in real world – association ≠ causation

• Pr[Y=1|A=1] =Pr[Y=1|A=0]: Treatment A and outcome Y are independent

• Also quantify strength of association - Risk

Difference, Risk Ratio, OR, HR, etc

• Pr[Y=1|A=1] - Pr[Y=1|A=0]=0

• Pr[Y=1|A=1] ÷ Pr[Y=1|A=0]=1

Causation vs. Association

The key conceptual difference:

• A causal effect defines a comparison of the sample subjects under different actions

– Assumes the counterfactual approach

– Everyone simultaneously treated and untreated

– Marginal effects

• Association is defined as a comparison of different subjects under different conditions

– Effects conditional on treatment assignment group

Causation vs. Association

Causation?

Question:

• Under what conditions can associational measures be used to estimate causal effects?

Causation?

Question:

• Under what conditions can associational measures be used to estimate causal effects?

Answer:

• Ideal randomized experiments

Randomized Experiments

• Generate missing counterfactual data

• Missing counterfactual is missing completely at random (MCAR)

• Because of this, causal effects can be consistently estimated with ideal RCTs despite the missing data

Ideal Randomized Experiments

Exchangeability:

• Risk under the potential treatment value a among the treated is equal to the risk under the potential treatment value a for the untreated

• Pr[Y a =1|A=1] = Pr[Y a =1|A=0]

• Consequence of these conditional risk being equal in all subsets defined by treatment status in the population is that they must be equal to the

marginal risk under treatment value a in the whole population

Population of Interest

Key Issue

• In the presence of exchangeability the counterfactual risk under treatment in the white part of the population would equal the counterfactual risk under treatment in the entire population.

RCT?

• A= heart transplant

• Y= death

• L=prognostic factor

• Counts

– 13 of 20 (65%) gods were treated

– 9 of 12 (75%) treated had prognostic factor (l=1)

– 3 of 7 (25%) not treated have prognostic factor

Pat ID

Rheia

Kronos

Demeter

Hades

Hestia

Poseidon

Hera

Zeus

Artemis

Apollo

Leto

Ares

Athena

Eros

Aphrodite

Cyclope

Persephone

Hermes

Hebe

Dionysus

0

0

0

0

Y

0

1

0

1

1

1

1

1

0

1

1

1

1

0

0

0

1

1

0

0

A

0

0

1

1

0

1

1

0

0

1

1

1

1

1

1

1

0

0

0

0

L

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

RCT?

• Design 1:

– 13 of 20 treated: Randomly selected 65% for treatment

• Design 2:

– 9 out of 12 in critical condition (75%) treated

– 4 out of 8 not in critical condition were treated

(50%)

Pat ID

Rheia

Kronos

Demeter

Hades

Hestia

Poseidon

Hera

Zeus

Artemis

Apollo

Leto

Ares

Athena

Eros

Aphrodite

Cyclope

Persephone

Hermes

Hebe

Dionysus

0

0

0

0

Y

0

1

0

1

1

1

1

1

0

1

1

1

1

0

0

0

1

1

0

0

A

0

0

1

1

0

1

1

0

0

1

1

1

1

1

1

1

0

0

0

0

L

0

0

0

0

1

1

1

1

1

1

1

1

1

1

1

1

Conditionally RCT

• Simply combination of 2 marginally randomized experiments.

• One conducted in subset of population L=0 and the other in L=1

• Values not MCAR, but they are MAR condition on the covariate L

• Marginal exchangeability not achieved

• randomization generates conditional exchangeability

Analysis Randomized Trials

• Question: How do you analyze a marginally

RCT?

• Answer:

Analysis Randomized Trials

• Question: How do you typically analyze a marginally RCT?

• Hint : Dependent and independent variables?

• Answer:

Analysis Randomized Trials

• Question 1: How do you typically analyze a marginally RCT?

• Hint : Dependent and independent variables?

• Answer: Crude or unadjusted analysis with treatment and outcome.

Analysis Randomized Trials

• Question 2: How do you typically analyze a conditionally RCT?

Analysis Randomized Trials

• Question 2: How do you typically analyze a conditionally RCT?

• Answer 2:

– Robins recommends standardization and IPW

– Stratification type method or common

– Conditions were standardization ≠ stratification.

Review their text.

Summary

• Randomization produces

– Marginal exchangeability

– Conditional exchangeability

• Exchangeability

– Allows us to use associational measure to estimate causal effects

Observational Studies

• Investigator has no control over treatment assignment, e.g., randomization

• Cannot achieve exchangeability by design

• To estimate a causal contrast we must obtain valid observable substitute quantities for the desired counterfactual quantities

• If we don’t have good substitutes thenwe have a confounded relationship, i.e., the associational RR ≠

CRR

Observational Studies

Conceptual justification

• Conceptualize observational studies as though they are conditionally randomized experiments.

• We assume that some components of the observational study happen by chance.

Identifiability Conditions

• Consistency: treatment levels are not assigned by researcher, but correspond to well defined interventions

• Positivity: all conditional probabilities of treatment are greater than zero

• Conditional Exchangeability: conditional probabilities of being assigned to specific treatment not chosen by investigator, but can be calculated from data

Observational Studies?

Causal Inference

• Exchangeability and conditional exchangeability can not be reached by design.

• Question 1 : How do we address conditional exchangeability in Observational studies?

Observational Studies?

Causal Inference

• Exchangeability and conditional exchangeability can not be reached by design.

• Question 1 : How do we address conditional exchangeability in Observational studies?

• Question 2: How should be pick covariates for our observational studies?

Big Picture

• Covariates should be selected to produce conditional exchangeability

• Confounding must be removed to produce conditional exchangeability

– A variable that removes confounding is a confounder

• Adjusting for certain types of covariates can either block paths, open paths or do nothing

• We want to adjust variables that block all backdoor paths between the treatment and outcome, i.e., remove confounding.

Theory of Causal DAGs

• Mathematically formalized by

– Pearl (1988, 1995, 2000)

– Sprites, Glymour, and

Scheines (1993, 2000)

Directed Acyclic Graphs

• Are abstract mathematical objects.

• Encode an investigators a priori assumptions about the causal relations among the exposure, outcomes and covariates.

• They represent:

– joint probability distributions

– causal structures.

Value of DAGs

• Support communication among researchers and clinicians

• Explicate our belief and background knowledge about causal structures

• Allow us to determine what needs to be measured to remove confounding

• Helps us determine how bias can be induced

• Helps choose appropriate statistics

Directed Acyclic Graphs (DAGs)

• Directed edges (arrows) linking nodes

(variables)

• Variables joined by an arrow are said to be adjacent or neighbors

• Acyclic because no arrows from descendents (effects) to ancestors

(causes)

• Descendants of variable X are variables affected either directly or indirectly by X

• Ancestors of X are all the variables that affect X directly or indirectly

• Paths between two variables can be directed or undirected

d-separation Criteria

• Rules linking absence of open paths to statistical independencies

• Describe expected data distributions if the causal structure represented by the graph is correct

• Unconditional d-separation

– Path is open or unblocked if no collider on path

– Collider blocks a path

• d-Connected

– Open path between two variables

Graphical Conditioning

• Conditioning (adjustment) on a collider F on a path, or any descendant of F, opens the path at F

– U

1 and U

2 are marginally independent, but conditionally associated (conditioning on F)

• Conditioning on a non-collider closes the path and removes C as a source of association between

A and Y

– A and Y are marginally associated, but conditionally independent

(conditioning on C)

Graphical vs. Statistical

Criteria for Indentifying Confounders

• Statistical: a confounder must

– Be associated with the exposure under study in the source population

– Be a risk factor for the outcome, though it need not actually cause the outcome

– Not be affected by the exposure or the outcome

• Graphical: a confounder must

– Be a common cause

– Have an unblocked back-door path

Unified Theory of Bias

• Bias can be reduced to or explained by 3 structures

– Reverse causation: case-control – outcome precedes exposure measurement or outcome can have effect on exposure. Measurement error or

Information bias.

– Common cause: confounding, confounding by indication

– Conditioning on common effects: collider, selection bias, time varying confounding

Covariate Selection

• Adequate Background Knowledge

– Confounder identification must be grounded on an understanding of the causal structure linking the variables being studied (treatment and disease)

– Build a directed acyclic graph (DAG) to check whether the necessary criteria for confounding exists.

– Condition on the minimal set of variables necessary to remove confounding

• Inadequate Background Knowledge

– Remove known instrumental variables, colliders, intermediates (variables with post treatment measurement

– Use automated selection procedures such as HDPS

Confounding and Bias

• Under adjustment occurs when

– An open back door path was not closed

• Over adjustment can occur from adjusting

– Instrumental variables

– Intermediate variables

– Colliders

– Variables caused by outcome

• Discussion of variable types

Confounder

• Common Cause, i.e., confounder

• Confounder L distort the effect of treatment A on disease Y

• Always adjust for confounders, unless small data set and confounder has strong association with treatment and week association with outcome

• Goal is to produce conditional exchangeability

Confounder Example

• A = treatment

a=1 statin alone

a=0 niacin alone

• L = Baseline Cholesterol

l=1: LDL ≥ 160 mg/dL

l=0: LDL < 160 mg/dL

• Y = Myocardial infarction

– Y=1: Yes

– Y=0: No

Intermediate Variable

• Adjusting for intermediate variable

I in a fixed covariate model will remove the effect of treatment A on disease/outcome Y

• In a fixed covariate model we do not want to include variables influenced by A or Y

• Time-varying treatment model does include time-varying confounding that is also an intermediate variable

Intermediate Example

• A = treatment

a=1 statin alone

a=0 niacin alone

• I = Post-treatment Cholesterol

i=1: LDL ≥ 160 mg/dL

i=0: LDL < 160 mg/dL

• Y = Myocardial infarction

– Y=1: Yes

– Y=0: No

Collider

• Adjusting for the collider

C can produce bias

• Conditioning on common effect F without adjustment of U

1 or U

2 will induce an association between U

1 and U

2

, which will confound the association between A and Y

Collider

• A = antidepressant use

• Y = lung cancer

• U1 = depression

• U2 = smoking status

• F= cardiovascular disease

Variables associated with treatment or disease only

• Inclusion of variables associated with treatment only (A) can cause bias and imprecision

• Variables associated with disease but not treatment (risk factors) can be included in models. They are expected to decrease variance of treatment effect without increasing bias

• Including variables associated with disease reduces the chance of missing important confounders

Reality is Complicated

Shrier I, Platt, RW. Reducing bias through directed acyclic graphs. BMC

Medical Research Methodology. 2008: 8:70

Determining Minimal Set of Variables

• Produce a DAG and get clinical experts to agree on underlying causal network

• Block (condition) on the variables that allow for open backdoor paths

– Backdoor paths are confounders

• Pearl 6-step approach for determining minimal set of variables

(illustrated by Shrier & Platt.

Reducing bias through DAGs. BMC research Methodology 2008 8:70)

Limitations of DAG approach

• Subject matter knowledge is often not good enough to draw DAG that can be used to determine the minimal set of covariates needed to produce conditional exchangeability

• In large database studies with many providers it is difficult to know all the factors that influence treatment decisions.

Insufficient Background Knowledge

• Recommendation :

– Propensity score (PS) approach :

• Remove colliders and instruments (variables associated with treatment but not disease)

• In a large PS study we should include as many of the remaining variables as possible.

• Focus should be on variables that are a priori thought to be strongly causally related to outcomes (risk factors, confounders)

– Outcome models approach:

• Use a change in estimate approach to select variables

– Since evidence of best variable selection approaches are limited, researchers should explore the sensitivity of their results to different variable selection strategies as well a s removal and inclusion of variables that could be IV or colliders.

Analysis of Observational Data Based on Counterfactuals

• Fixed treatments

– Propensity Score

– Instrumental Variables

– IPW

• Time-varying treatments (sequentially randomization)

– IPW

– G-estimation

– Doubly robust

Simulation

• We have developed simulations to understand and teach these concepts.

• Poster at CDA conference.

• If interested then please contact me

• Brian.sauer@va.gov

; brian.sauer@utah.edu

Download