Comparison of Single Sample and Resampling Tests for Mediation

advertisement
Special Topic: Logistic Regression
for Binary outcomes
The dependent variable is often binary such as whether a
person litters or not, used a condom or not, dead or alive,
diseased or not, intercourse or not, or divorced or not.
In this case, logistic or probit regression is the method of
choice because of violation of assumptions if ordinary
least squares regression is used.
Estimates of the mediated effect using logistic and probit
regression can be distorted using conventional procedures.
Here we examine binary or continuous X, continuous M, and
binary Y.
MacKinnon et al., under review in Clinical Trials and
1
MacKinnon et al., under review Psychological Methods.
Logistic Regression Model for
Equations 1 and 2
Standard logistic regression model, where Y depends
on X, β1 is the intercept and τ codes the relation
between X and Y.
logit Pr{Y=1|X} =β1 + τX
(1)
Standard logistic regression model, where Y depends
on X and M, β2 is the intercept, τ′ codes the
relation between X and Y adjusted for M and β
codes the relation between M and Y, adjusted for
X.
logit Pr{Y=1|X,M} = β2 + τ′X + β M (2)
2
Logistic Regression Model for
latent variable Y*
Y* = β1 + τX + ε1
(1)
Y* = β2 + τ′X + β M + ε2
(2)
The unobserved latent variable Y* is linearly related
to X and then to both X and M, ε1 and ε2 represent
residual variability and have a standard logistic
distribution. The dichotomous Y is derived from
Y* through the relation Y = 1 if and only if Y* > 0.
The same model applies for the probit with the
errors having a standard normal distribution rather
than a standard logistic distribution.
3
Equation 3
M = β3 + αX + ε3
(3)
M is a continuous variable so ordinary least squares
regression is used to estimate this model where β 3 is the
intercept, α represents the relation between X and
Y, and ε3 is residual variability.
4
Logistic Regression Model for
latent variable Y*
τ - τ′ Difference in coefficients. The coefficients are from
separate logistic regression equations.
αβ
Product of coefficients. The β coefficient is from a
logistic regression model and α is from an ordinary least
squares regression model.
As will be shown, the difference in coefficient method can
give distorted values for the mediated effect because of
differences in the scale of separate logistic regression
models. For both Equations 1 and 2, residual variability is
fixed at 2/3 and fixed at 1 for probit regression.
5
What is the in the next plot?
 Expected logistic regression coefficients based on
Haggstrom (1983) are used to compute τ - τ′ and
α β.
 All possible combinations of α, β and τ′ values for small
(2% variance explained), medium (13%), large (26%),
and very large (40%) effects (4 X 4 X 4 = 64)
 Y-axis is the expected value for τ - τ′ and
αβ
 X-axis is the true value of the b coefficient in the
continuous variable mediation model. It is indicated by
βC
6
Plot of true values of αβ and τ - τ′ as
a function of true mediated effect
and true value of βC.
1
0.9
0.8
Mediated Effect
0.7
0.6
αβ
0.5
τ-τ΄
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
βc
0.8
1
7
Plot of true proportion mediated
as a function of true value of βC.
1
0.9
Proportion Mediated
0.8
0.7
0.6
αβ/τ
0.5
αβ/(αβ+τ΄)
0.4
1-(τ΄/τ)
0.3
0.2
0.1
0
0
0.2
0.4
0.6
βc
0.8
1
8
αβ and τ - τ′ are not equal in
Logistic and Probit Regression
• The two estimators, α β and τ - τ′ are not identical
in logistic or probit regression because, unlike
ordinary least squares regression where the
residual variance varies across equations, in
logistic regression the residual variance is fixed to
equal 2/3 (MacKinnon & Dwyer, 1993). So the
logistic regression coefficients are a function of
the relations among variables and the fixed value
of the residual variance.
• There are solutions
9
Solutions to mediation estimation
in Logistic and Probit Regression
• Standardize the values of the coefficients.
• One standardization method computes the variance of Y in
both equations and uses that to standardize values
(MacKinnon & Dwyer, 1993; Winship & Mare, 1983).
• Another standardization method standardizes coefficients in
Equation 2 to be in the same metric as Equation 1. To the
best of our knowledge, this is a new method that is described
below.
• Use a computer program such as Mplus that appropriately
handles categorical variables in covariance structure models.
I believe that this approach is similar to the first approach to
standardization, i.e., the scale of the latent Y* is the same for
all equations in a model.
10
Standardizing across logistic
regression equations
• Standardize the values of the coefficients in Equations 1 and 2 (see
MacKinnon & Dwyer, 1993 and Winship & Mare, 1983).
• s2Y* = τ2sX2 + 2/3 and divide the τ coefficient and standard error by
sY* from this equation.
• s2Y* = τ′2sX2 + β2sM2 + 2 τ′ β sXM + 2/3 and divide the τ′ and β
coefficients and standard errors by sY* from this equation.
• where sX2 is the variance of the X variable, sM2 is the variance of the M
variable, and sXM is the covariance of the X and M variables.
• The α parameter does not require rescaling if M is continuous. Note
that if probit regression is used the last term of the equations for s2Y*
should be 1 rather than 2/3.
11
Standardizing Equation 2 to the
metric of Equation 1
• The coefficients from Equation 2 are
divided by the following quantity:
ˆ
2
Y*


2
3
2
2
ˆ
  ˆ
33. X
• where σ233·X is the residual variance in the regression
model for M predicted by X, i.e. Equation 3. The first
term is replaced with 1 for probit regression.
12
Plot of true values of αβ and τ - τ′ as
a function βC, after standardization.
1
0.9
Mediated Effect
0.8
0.7
0.6
αβ
0.5
t-t'
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
βc
13
Plot of true values of proportion mediated
as a function of βC, after standardization.
1
0.9
Proportion Mediated
0.8
0.7
0.6
ab/t
0.5
αβ/(αβ+τ΄)
0.4
1-(t'/t)
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
1
βc
14
Simulation Design
 All possible combinations of α, β and τ′ effect size for
small (2% variance explained), medium (13%), large
(26%), and very large (40%) effects.
 6 Sample sizes, N= 50, 100, 200, 500, 1000, 5000
 1000 Replications of each of the 4 X 4 X 4 X 6 = 384
generated data sets or 384,000 data sets.
 Probit and Logistic Regression on the same data
 Standardized and Unstandardized coefficients
 Data were generated using standard normal deviate for
the error term in Equation 2–which is the probit model.
15
Simulation Outcomes
 Estimates of α β and τ - τ′ before and after standardization
for both probit and logistic regression.
 Estimates of proportion mediated α β /(α β +τ′), 1-(τ′/τ), and
α β /τ before and after standardization for both probit and
logistic regression
 Measures of mean and average relative bias
 Tables and plots
16
The estimated mediated effect, τ τ′, as a function of βC for α=.14.
0.2
Mediated Effect
0.1
0
-0.1
0
0.2
0.4
0.6
0.8
1
τ΄=.14
1.2
τ΄=.39
τ΄=.59
-0.2
τ΄=1
-0.3
-0.4
-0.5
βc
17
Power: Logistic Regression
Small Effect Size
Delta Method
Joint Significance
Asymmetric
50
.001
.012
.012
Sample Size
100
200
500
.005
.044
.367
.042
.142
.571
.041
.143
.599
1000
.842
.905
.921
*from MacKinnon, Yoon, & Lockwood (2003, SPR).
18
Summary and Future Directions
1.
2.
3.
4.
Unlike the linear OLS model case, the difference in coefficients and product of
coefficients estimators of the mediated effect are not equal. The difference in
coefficients estimator is distorted, as shown with expected values and in the
simulation study. The same problem occurs for the proportion mediated
measures.
Standardization of coefficients across equations solves the problem and removes
distortion. Two approaches to standardization were mentioned, but the results for
rescaling coefficients in Equation 2 to be in the same metric as those in Equation
1 were described. The other standardization method works in a similar manner.
The simplest approach is the product of coefficients estimator of the mediated
effect, which does not require standardization. Researchers who prefer the logic
of the difference in coefficients methods should standardize coefficients prior to
computing the mediated effect..
The standardization approaches should apply to other examples of the
Generalized Linear model such as the Poisson and survival analysis model.
19
Surrogate Endpoint Research I
• The length of time for a disease to occur and low
incidence of the disease require very large sample
sizes and long duration studies.
• Alternative is to find an outcome that can serve as
a surrogate for the ultimate outcome. Here the
mediator is called a surrogate or intermediate
outcome.
• Surrogate endpoints are more frequent or more
proximate to the prevention strategy.
20
Examples of Surrogate endpoints
•
•
•
•
•
•
Precancerous cells for colon cancer
Cholesterol level for coronary heart disease.
Bone density for osteoporosis
Lymphocyte levels for HIV/AIDS
Partial loss of vision for blindness
Tumor size for breast cancer
21
Surrogate endpoints Research II
• “Above all else, we believe that the issue of when
and how to use surrogate endpoints is probably the
pre-eminent contemporary problem in clinical
trials methodology, so it merits much extensive
scrutiny” (Begg & Leung, 2000, p. 27).
• A surrogate endpoint is a “response variable for
which a test of the null hypothesis of no
relationship to the treatment groups under
comparison is also a valid test of the corresponding
null hypothesis based on the true endpoint”
(Prentice, 1989, p. 432)
22
Micromediational chain
• It is often not possible to study all steps in a mediation
chain, e.g., in a prevention program, to study each of six
constructs in a theoretical chain from exposure to a
component, comprehension, retention of the component’s
message, short-term attitude change, long-term attitude
change, and long-term refusal to use drugs.
• Cook and Campbell (1979) make a distinction between
molar mediation where some steps are studied and
micromediation where each link is measured. Kenny et al.,
(1998) make the distinction between proximal and distal
mediators.
• Any mediation model is part of a longer mediation chain.
The researcher decides what part of the micromediational
chain to examine. Similar decisions must be made about
23
outcomes.
Possible Surrogate endpoints in
Prevention
• Aggression at age 12 for incarceration at 24.
• Early onset gateway drug use for adult
addiction and driving under the influence.
• Harming animals as a child for later assault.
• Social withdrawal at age 8 for adult
depression.
• School dropout for adult unemployment.
24
Prevention Mediators versus
Surrogate Endpoints
• Many similarities between surrogates and
mediators in prevention science, but…
• Theoretical causal connection between surrogate
and outcome is often clearer than in prevention.
• In prevention, relation between mediator
(surrogate) and outcome is weaker than in most
areas of surrogate endpoint research.
• Surrogates are more likely to completely mediate
effects of X on the outcome.
25
Download