licensed under a . Your use of this Creative Commons Attribution-NonCommercial-ShareAlike

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this
material constitutes acceptance of that license and the conditions of use of materials on this site.
Copyright 2007, The Johns Hopkins University and Qian-Li Xue. All rights reserved. Use of these materials
permitted only in accordance with license rights granted. Materials provided “AS IS”; no representations or
warranties provided. User assumes all responsibility for use, and all liability related thereto, and must independently
review all materials for accuracy and efficacy. May contain materials owned by others. User is responsible for
obtaining permissions for use from third parties as needed.
Concluding Topics
Statistics for Psychosocial Research II:
Structural Models
Qian-Li Xue
Outline
ƒ Brief discussion of Standardized
Coefficients
ƒ Causal inference
ƒ Design, power, sample size
Standardized Coefficients
ƒ Make “relative” direct influences clearer
ƒ Standardized coefficient of β:
β =β
*
σx
σy
β
X
Y
ƒ The standardized effect is the mean change in
standard deviation units of Y for a one standard
deviation change in X.
ƒ If X increases by one standard deviation, then we
expect that Y will increase by β* standard deviations
ƒ Standardized can be easier for making inferences
from SEMs
Quick Little Derivation
Model: y = β0 + β1 x1 + β2 x2 + e
Standardizing Variables
x1* = x1 σ x1 ;
x2* = x2 σ x2 ;
y* = y σ y
σ x x1* = x1 ;
σ x x2* = x2 ;
σ y y* = y
1
2
Substitute standardized variables into model:
σ y y * = β0 + β1σ x x1* + β2σ x x2* + e
1
2
σx *
σx *
β0
y =
+ β1
x1 + β2
x2 + e
σy
σy
σy
*
1
2
Standardized Coefficients
ƒ MPLUS provides different types of
“standardized” estimates
σx
*
ƒ StdYX – β = β σ y
ƒ StdY –
β = β /σ y
*
more appropriate for binary variable
™ interpreted as the change in standard deviation units of y
when x changes from zero to one
™
ƒ Std – uses the variances of the continuous latent
variables for standardization
ƒ StdYX and Std are the same for parameter estimates
involving only latent variables
Mplus User’s Guide
Standardized Coefficients
ƒ Caveats:
ƒ Beware of comparing standardized
coefficients for the same variable across
groups!
™ Standardized
effects might be different due to
different standard deviations
™ Look at unstandardized coefficients when
comparing across groups
Latent Variables in SEM
ƒ Much like path analysis with observed
variables
ƒ Some additional considerations
Constraining Latent Variables
ƒ Recall: 2 options for CFA model
ƒ set variance of (exogenous) LV equal to 1 (my
preference)
ƒ set path coefficient to one of its indicators equal to
1. This scales the LV in the same units as the
indicator.
ƒ More complicated models:
ƒ endogenous LV
™ set variance of its error term to 1 or
™ set path coefficient to one of its indicators to 1
ƒ MUST do one of these or model is not identifiable!
Causal Inference
Many views on this topic!
ƒ Correlation ≠ Causation
ƒ But, coupled with other information,
correlation can imply causation
ƒ Statistics helps a lot with causal inference
ƒ Statistical models used to draw causal
inferences are distinctly different from
those used for showing associational
differences
‘Potential’ Cause
ƒ Holland (1986): each ‘unit’ of observation must
be able to be ‘exposed’ to the cause
ƒ For causal inference, cause must be subject to
“human manipulation.”
ƒ Does
ƒ Smoking cause lung cancer?
ƒ Sleet or snow cause traffic accidents?
ƒ A change in interest rates cause the stock market to
fluctuate?
ƒ Gender or race cause discrimination?
“Potentially Exposable”
ƒ Every ‘unit’ should be able to be exposed to the
cause.
ƒ Good example: randomized clinical trial
ƒ Need to be able to postulate that we could state
what WOULD have happened to a patient’s
outcome had the cause been “the reverse”.
ƒ Assume Yti is the outcome of Yi if patient Yi is in the
treatment group
ƒ Assume Yci is the outcome of Yi if patient Yi is in the
control group
ƒ We are interested in the causal effect: Yti - Yci
Fundamental Problem of Causal Inference
“It is impossible to observe the value of Yti and Yci on the
same patient. Therefore it is impossible to observe the
causal effect of treatment on patient Yi.”
ƒ Important point: ‘observe’ is key word.
ƒ Possible exceptions: cross-over designs in some
settings.
ƒ However, we CAN estimate ‘average’ causal effects over
a population of patients
ƒ Topic of the new course offered in 3rd and 4th term
Causality
ƒ Strong assumption of causality in SE
models
ƒ Bollen’s three components for
‘practically’ defining a causal
relationship:
ƒ Association
ƒ Direction of influence
ƒ Isolation
Association
ƒ Easier to establish
ƒ Causal variable should have strong
association with outcome
ƒ Problems:
ƒ Incorrect standard errors or test statistics (e.g.
correlated errors, poor measures)
ƒ Multicollinearity
ƒ Replication/Repetition important (and also
helps establish isolation)
Multicollinearity Example
η1: morale; η2: sense of belonging
η1
x1 x2 x3 x4
η2
x5
x6
x7
x1 = γ 11η1 + ζ 1
M
x4 = γ 14η1 + γ 24η 2 + ζ 4
M
x7 = γ 27η 2 + ζ 7
ƒ In truth, x4 is a measure of morale, but we allow it to be
related to sense of belonging
ƒ Results? Both γ14 and γ24 are insignificant
ƒ Why? Because morale and sense of belonging are
highly associated
Direction of Causation
ƒ Plausibility of association being causal
rests on having causal direction correct
ƒ Temporal?
ƒ x should come before y in time
ƒ problematic: simultaneous reciprocal
causation (feedback) is not possible
ƒ window of cause and response time
ƒ We often have cross-sectional data.
ƒ Can future event predict past or present
event?
Recap: Total, Direct, and Indirect Effects
•
•
•
x1 is marital status, y1 is income, y2 is depression
Direct effect: measured by a single arrow between two variables
Indirect effects: measured by all possible “paths” or “connections”
between two variables EXCEPT for the direct path. We multiply the
coefficients on path together to get each indirect effect.
• Total effect: the sum of the direct and indirect paths between two
variables
ζ1
γ11
y1
Indirect effect(s) of x1 on y2:
β21
x1
γ21
Direct effect of x1 on y2:
Total effect of x1 on y2:
Direct effect of y1 on y2:
y2
ζ2
Indirect effect(s) of y1 on y2:
Total effect of y1 on y2:
Isolation
ƒ Isolation: hold everything constant except the
cause and the outcome
ƒ Impossible to establish unless x and y occur in a
“vacuum”
ƒ Especially difficult in observational studies!
ƒ Without true isolation can never be 100% certain
about cause
ƒ Isolation tends to be the weakest link in
determining causality!
“Pseudo-isolation”
y1 = γ 11 x1 + ζ 1
ƒ
ƒ
ƒ
ƒ
ƒ
ζ1 is the unobserved error/disturbance
ζ1 represents ALL other causes/correlates of y1
Standard assumption for pseudo-isolation:
Cov(x1, ζ1) = 0
That is, x1 is independent of all other
causes/correlates of y1
If the assumption is true, then we can assess
causal association of x1 and y1 “isolated” from
all other causes (ζ1).
Examples of Violations of Isolation
(1) Intervening Variables
ζ1
γ11
y1
β21
x1
γ21
y2
ζ2
True Model:
y1 = γ 11 x1 + ζ 1
y2 = β 21 y1 + γ 21 x1 + ζ 2
Cov(ζ 1 , ζ 2 ) = 0
Cov( x1 , ζ 1 ) = 0
Cov( x1 , ζ 2 ) = 0
(e.g. x1 is marital status, y1 is household income, y2 is depression)
(Bollen, p.46)
What if we omit y1 (income)?
ƒ Assumed model:
y2 = γ 21 x1 + ζ
*
x1
*
2
γ*21
y2
ƒ This implies:
ζ 2* = β 21 y1 + ζ 2
ƒ And our pseudo-isolation assumption….
ζ*2
Cov ( x1 , ζ 2* ) = Cov ( x1 , β 21 y1 + ζ 2 )
= Cov ( x1 , β 21 (γ 11 x1 + ζ 1 ) + ζ 2 )
= β 21γ 11Var ( x1 ) + β 21 Cov ( x1 , ζ 1 ) + Cov ( x1 , ζ 2 )
= β 21γ 11Var ( x1 )
≠ 0
Unless
Effect on Inference?
ƒ γ21* converges to total effect, β21 γ11 + γ21,
instead of direct effect, γ21
ƒ This yields an over or under-estimate of the
effect of x1 on y2.
ƒ Can be a really big problem if direct and
indirect effects cancel each other out.
ƒ If γ21 = 1; β21 = 0.5; γ11 = -2
ƒ Then, γ21* = -0.5*2 + 1 = 0
ƒ We might conclude that there is NO
association!
Examples of Violations of Isolation
(2) Left Out Common Cause
ζ1
γ11
y1
γ21
y1 = γ 11 x1 + ζ 1
y2 = β 21 y1 + γ 21 x1 + ζ 2
β21
x1
Recall True Model:
Cov (ζ 1 , ζ 2 ) = 0
y2
Cov ( x1 , ζ 1 ) = 0
ζ2
Cov( x1 , ζ 2 ) = 0
What if we omit x1 from the model?
Then y2 = β 21* y1 + ζ 2*
where ζ 2* = γ 21 x1 + ζ 1
Examples of Violations of Isolation
ƒ Is pseudo-isolation assumption violated?
Cov( y1 , ζ 2* ) = Cov(γ 11 x1 + ζ 1 , γ 21 x1 + ζ 2 )
= γ 11γ 21Var ( x1 )
≠0
ƒ What happens to our estimate of β21 ?
β 21* = β 21 + γ 21b11
Where b11 is the regression coefficient
from an “auxiliary” regression of x1 on y1
Effects on Inference
ƒ Worst case scenario: y1 and y2 have little
or no association, but both are highly
associated with x1.
ƒ Example:
x1 = age
y1 = proportion of gray hairs
y2 = quality of vision
ƒ “Spurious Association”
Examples of Violations of Isolation
(3) Omitted Variable Has Unspecified Relation To
Other Variables
ζ1
x1
γ11
y1
ρ12
x2
γ12
True Model:
y1 = γ 11 x1 + γ 12 x2 + ζ 1
What if we omit x2?
y1 = γ 11* x1 + ζ 1*
γ 11* = γ 11 + ρ12γ 12
(Bollen, p.54)
This is an even bigger problem…
ƒ Note that the association between x1 and
x2 is unspecified: It could be true that
ƒ x1 causes x2 and y1 (intervening variable)
ƒ x2 causes x1 and y1 (common cause)
ƒ something else
ƒ We can’t infer about the exact
consequences of omitting x2 because we
don’t know its association to the other
variables.
Other Violations
ƒ
ƒ
ƒ
ƒ
“feedback” or “reciprocal causation”
Wrong functional form between 2 variables
Correlated errors
Bottom line:
ƒ SEM CAN NOT be used to PROVE causation!
ƒ Instead, SEM can be used to
“provide estimates of the strength of all the hypothesized
relationships between variables in a theoretical model”
™ Assess consistency between data and model
™ “Distinguish between or among alternative perspectives”
™
Sample Size and Power
ƒ Factor analysis
ƒ General rule: need larger sample size (~300-500) for lower
communalities, more factors, and fewer indicators per factor
(MacCallum, 1999)
ƒ SEM
> 50 + 8 × number of variables in the model
10 to 20 ×number of variables (Mitchell, 1993)
>15 cases per measured variable or indicator (Stevens,1996)
>5 cases per parameter estimate (including error terms as well
as path coefficients) is model assumptions are met (Bentler and
Chou, 1987)
ƒ Larger N when data are non-normal or incomplete
ƒ
ƒ
ƒ
ƒ
ƒ Monte Carlo Simulation studies
Monte Carlo Simualtion Study for a CFA
TITLE:
this is an example of a Monte Carlo
simulation study for a CFA with covariates
(MIMIC) with continuous factor indicators
and patterns of missing data
MONTECARLO:
NAMES ARE y1-y4 x1 x2;
NOBSERVATIONS = 500;
NREPS = 500;
SEED = 4533;
!GENERATE=
CUTPOINTS = x2(1);
PATMISS = y1(.1) y2(.2) y3(.3) y4(1) |
y1(1) y2(.1) y3(.2) y4(.3);
PATPROBS = .4 | .6;
MODEL POPULATION:
[x1-x2@0];
x1-x2@1;
f BY y1@1 y2-y4*1;
f*.5;
y1-y4*.5;
f ON x1*1 x2*.3;
MODEL:
f BY y1@1 y2-y4*1;
f*.5;
y1-y4*.5;
f ON x1*1 x2*.3;
OUTPUT:
TECH9;
Simulation of a hypothetical population
Numer of samples drawn
generate different types of dependent
variables (default: continuous)
Generate binary covariate based on
Normal(0,1) using threshold=1
Specify patterns of missing data, here
we have two missing patterns
Provide true population parameter
values for data simulation
MPLUS User’s Guide (Ex.11.1)
Monte Carlo Study Output
MODEL RESULTS
Population
F
ESTIMATES
Average
Std. Dev.
S. E.
Average
M. S. E.
95% % Sig
Cover Coeff
BY
Y1
Y2
Y3
Y4
1.000
1.000
1.000
1.000
1.0000
1.0028
1.0005
1.0020
0.0000
0.0663
0.0674
0.0806
0.0000
0.0622
0.0633
0.0743
0.0000
0.0044
0.0045
0.0065
1.000
0.300
1.0023
0.2991
0.0671
0.1065
0.0620
0.1052
0.0045 0.938 1.000
0.0113 0.958 0.810
0.000
0.000
0.000
0.000
-0.0018
0.0022
0.0015
0.0005
0.0718
0.0515
0.0535
0.0603
0.0688
0.0500
0.0519
0.0644
0.0052
0.0027
0.0029
0.0036
0.950
0.946
0.944
0.962
0.050
0.054
0.056
0.038
Residual Variances
Y1
0.500
Y2
0.500
Y3
0.500
Y4
0.500
F
0.500
0.4887
0.4982
0.4966
0.4905
0.5019
0.0776
0.0502
0.0558
0.0755
0.0715
0.0779
0.0513
0.0532
0.0697
0.0679
0.0061
0.0025
0.0031
0.0058
0.0051
0.944
0.950
0.926
0.920
0.930
1.000
1.000
1.000
1.000
1.000
F
1.000
0.940
0.930
0.932
0.000
1.000
1.000
1.000
ON
X1
X2
Intercepts
Y1
Y2
Y3
Y4