RECONSIDERING DEFINITIONS OF DIRECT AND INDIRECT EFFECTS IN MEDIATION ANALYSIS WITH A SOLUTION FOR A CONTINUOUS FACTOR Ilya Novikov, Michal Benderly, Laurence Freedman Gertner Institute for Epidemiology and Health Policy Research, Tel Hashomer, Israel IBS-EMR 2013 1 Introduction • Analysis of mediation is a part of causal inferences in statistics • During the last two decades the area was substantially developed by Pearl, Robbins, Greenland and others • We present an implementation of the modified Pearl’s approach for different situations • The modification was proposed recently by several authors but the implementation for continuous factor is apparently new IBS-EMR 2013 2 Mediation Situation - There are at least two factors (X, Z) affecting the outcome ( Y) - There may be confounders ( C ) associated with X,Z, and Y Causal Model Factor X does not depend on Z and Y . Mediator Z depends on X but not on Y Outcome Y depends of X and Z HOWEVER F(X|Z,Y,C)≠F(X|C), F(X,Z|Y,C) ≠F(X,Y|C) Z X Y C IBS-EMR 2013 3 Effects – concept. An effect is defined as the difference in the expectations of the outcome in two situations. Total effect of X on Y = effect on Y of changes in X Direct effect of X on Y = effect on Y of changes in X while Z remains unchanged Indirect effect of X on Y =effect on Y of changes in Z, that were induced by changes in X, while X remains unchanged Z X Y IBS-EMR 2013 4 Counterfactual approach. Binary factor DATA: there are two data sets with X=0 and X=1 (red) In order to estimate the direct and indirect effects we need two unobservable data sets (gray) Y(X=0,Z(0)) Y(X=0, Z(1)) Y(X=1,Z(0)) Y(X=1, Z(1)) IBS-EMR 2013 5 Effects – elaboration of the definitions. What is the meaning of UNCHANGED? In comparison with what situation? Placing Z on causal path from X to Y provides the answer. Direct effect: Z remains as it was at the initial values of X Indirect effect: X remains at its new values Y(X=0, Z(1)) Y(X=0,Z(0)) Direct Y(X=1,Z(0)) Indirect IBS-EMR 2013 Y(X=1, Z(1)) 6 Estimation of the effects. Linear model. • For a linear model there is an exact solution. Let x,y,z are continuous and all assumptions of linear regression are fulfilled. y=b0+b1*x+b2*z+e1 z=a0+a1*x + e2 e1,e2 – independent random errors, non-correlated with x Then Total effect = E(Y|X+1)-E(Y|X)= b1+b2*a1 (product formula) Direct effect = E(Y|X+1,Z(X))-E(Y|X,Z(X))=b1 Indirect effect= E(Y|X+1,Z(X+1))-E(Y|X+1,Z(X))= b2*a1 IBS-EMR 2013 7 Problems with non-linear regression • In non-linear situation (for example, for a binary outcome Y) the product formula is not applicable. • Various attempts to estimate indirect effect using coefficients of non-linear regression were not commonly accepted • The source of the problem is that the effect can not be expressed using only regression coefficients but needs also the distribution of the covariates • The solution was found using a counterfactual approach IBS-EMR 2013 8 Mediation formula • Total effect=E(Y|X=1,Z(1))-E(Y|X=0,Z(0)) • Direct effect =E(Y|X=1,Z(0))-E(Y|X=0,Z(0)) • Indirect effect(Pearl)=E(Y|X=0,Z(1))-E(X=0,Z(0)) • Indirect effect (Modified)= E(Y|X=1,Z(1))-E(Y|X=1,Z(0)) In general the total effect is not equal to the sum of the direct and Pearl’s indirect effect However the total effect is always equal to the sum of the direct and modified indirect effect IBS-EMR 2013 9 Estimation of the effects. Binary factor. Since Y(X=1,Z(0)) does not exist, we estimate it using a multiple imputation technique for missing values • Z(0) for X=1 is estimated using regression Z(0) on C when X=0 • Y(X=1,Z(0)) is estimated using regression Y on C,Z when X=1 Y(X=0,Z(0)) Direct Y(X=1,Z(0)) Indirect IBS-EMR 2013 Y(X=1, Z(1)) 10 Continuous factor. Definitions • Data: For continuous X we have only one data set • Requirement: in linear model it should lead to the exact solution • Definition (reduction to the binary situation): For each object j, and “intervention” replaces x(j) by x’(j)=x(j)+1 Y(X,Z(X)) Direct Y(X’,Z(X)) Indirect IBS-EMR 2013 Y(X’, Z(X’1)) Y(X’,Z(X’)) 11 Continuous factor. Estimation • For each object “j” the potential outcome y’(j)=y(x+1,z(x)) is estimated and imputed using the appropriate model y(x,z) • For each object “j” the mediator z’(j)=z(x’) is estimated and imputed using the appropriate model z(x) • For each object “j” the potential outcome y’’(j)=y(x’,z(x’)) is estimated and imputed using the appropriate model y(x,z) Y(X,Z(X)) Direct Y(X’,Z(X)) Indirect IBS-EMR 2013 Y(X’, Z(X’)) 12 Realization. General logic • We wrote a SAS macro that implements the general logic described here and covers all 8 potential combinations of binary or continuous types of variables X,Y,Z • For binary X it assumes two data sets with X=0 and X=1 • For continuous X it uses one data set • For imputation of Z the macro uses linear regression Z on X and covariates C , and for binary Z it uses logistic regression on X and C • For imputation of Y the macro uses linear regression of Y on X,Z and covariates C , and for binary Y it uses logistic regression on X ,Z and C IBS-EMR 2013 13 MACRO CALL (example) %mediation8(data=all, ep=new_chf, mediator=totMIn, predictor=CRPq, covbin=, covcont=age, RTF=y); Thanks for your attention. IBS-EMR 2013 14