RStats Structural Equation Modeling Text

advertisement
Structural Equation Modeling
Theory:
1. Confirmatory
2. Theory-driven
3. Complex multiple regression
4. Not causal, model is based on correlation
5. Model = Data + Error
Definitions:
1. Measurement and Structural Models
a. A measurement model is a latent variable and the observed variables that measure
the latent.
b. A structural model includes the relationships among latent variables.
2. Latent and Manifest Variables
a. Latent variables are the constructs being measured. Examples of latent variables
include IQ, depression, political orientation, short-term memory, and
consumerism. There must be at least two manifest (or observed variables) to
measure each latent variable with more variables being better. Latent variables are
represented by ovals in the model.
1|Page
Kayla N. Jordan – RStats Workshop Spring 2014
b. Manifest or observed variables are the actual measurements on a construct. These
could include scores on an IQ test, clinical ratings of depression symptoms,
support of political candidates, and violent behaviors. Observed variables are
represented by squares in the model.
c. SEM works best is observed variables are measured on a continuous scale.
d. Measures should be reliable and valid. If they variables are not measured well,
then the model will not work.
3. Endogenous and Exogenous Variables
a. Endogenous variables are latent variables and are the independent variables in the
model. These variables do not have any arrows going into them.
b. Exogenous variables are latent variables and are the dependent variables in the
model. These are any variables which have arrows going into them.
4. Knowns and Unknowns
a. The knowns in the model are all the correlations derived from the data. The
number of knowns can be calculated with the formula, n(n+1)/2, where n is the
number of observed variables in the model.
2|Page
Kayla N. Jordan – RStats Workshop Spring 2014
b. The unknowns (or parameters) in the model are the number of arrows in the
model which represent all the pathways being estimated.
c. Degrees of freedom = knowns - unknowns
5. Underidentified and Overidentified Models
a. Underidentified models are ones in which there are more unknowns than knowns.
In order to fix these, certain parameters should be constrained, or forced to equal a
certain value. Typically, this is accomplished by setting one of the factor loadings
for each latent equal to one; this also assigns a scale for the latent variable.
b. Overidentified models are ones in which there are more knowns than unknowns.
This is good.
6. Example of Calculating Degrees of Freedom:
a. Knowns: n(n+1)/2 = 8(9)/2 = 36
b. Unknowns: 5 factor loadings, 2 path coefficients, 8 error variances, 2 residuals =
17 total unknowns
c. Degrees of Freedom: Knowns – Unknowns = 19
7. Arrows
a. A straight arrow represents a path in the model.
b. A curved arrow represents a correlation between two variables.
8. Estimates
a. Factor loadings are used to determine if an observed variable is measuring the
latent variable. If factor loadings are not significant, then the observed variable is
not measuring the latent variable well.
b. Path coefficients or beta weights are used to determine how well exogenous
(predictor) variables are predicting endogenous (outcome) variables. If path
coefficients are not significant, then the exogenous variable is not predicting the
endogenous variable.
3|Page
Kayla N. Jordan – RStats Workshop Spring 2014
c. Error variances are associated with each observed variable and are a measure of
how much error there is in measuring each observed variable. If high, then there
are problems with measure the variable.
d. Residuals are associated with each endogenous variable and represent the
difference between the hypothesized model and the data. These can also be used
to calculate an effect size for the relationship between two variables in terms of
the variance accounted for by squaring the residual and subtracting that value
from one.
Types of Models
1. Confirmatory Factor Analysis (CFA)
a. Tests the validity of a factor structure
b. Should be based on theory or exploratory factor analysis (EFA)
c. Often used to determine if items on a questionnaire group together
d. A group of items which are similar (based on how well they are correlated) is
called a factor.
4|Page
Kayla N. Jordan – RStats Workshop Spring 2014
2. Path Analysis
a. Tests relationship between observed variables
5|Page
Kayla N. Jordan – RStats Workshop Spring 2014
3. Full Structural Models
a. Tests the fit of both measurement and structural models
b. Mediation and moderation
4. Multi-trait, multi-method (MTMM)
a. Tests whether or not multiple measures measure the same set of traits
6|Page
Kayla N. Jordan – RStats Workshop Spring 2014
5. Multi-group CFA
a. Tests whether a factor structure holds for two groups
b. For example, if the same factors work for both men and women
6. Latent Growth Curves
a. Tests how a measure changes over time
Fit Indices
1. Chi-square
a. Test of the goodness-of-fit of the model
b. Smaller chi-square is better
c. Non-significant = good
d. Highly influenced by sample size
2. Factor Loadings/Path Coefficients
a. These should be significant
b. If non-significant, that observed variable or path is not contributing to the model.
3. Comparative Fit Index (CFI)
a. Test of goodness-of-fit
b. Greater than .90 is good; greater than .95 is better
4. Normed Fit Index (NFI)
a. Same as CFI
5. Tucker-Lewis Index (TLI)
a. Same as CFI
6. Root Mean Square Error of Approximation (RMSEA)
a. Measure of difference between observed correlations and model correlations
b. Less than .10 is good; less than .06 is better
c. Influenced by small df and sample size
7. Standardized Root Mean Square Residual (SRMR)
a. Same as RMSEA
8. Example Output:
a. Default Model is the hypothesized model and produces the fit indices for the
model that was created.
b. Saturated Model is a model with no degrees of freedom model, and Independence
Model is a model with the least possible number degrees of freedom. These can
be ignored.
c. CMIN is the chi-square value and DF is the associated degrees of freedom. The P
value should be greater than .05 in a good fitting model. This output indicates a
poor fitting model.
d. CMIN/DF is a chi-square value correcting for sample size. This should be less
than 3 in a good fitting model.
e. NFI, TLI, and CFI indicate a poor fitting model as they are less than .90.
f. RMSEA indicates an adequate model as it is below .10.
7|Page
Kayla N. Jordan – RStats Workshop Spring 2014
Model Fit Summary
CMIN
Model
Default model
Saturated model
Independence model
NPAR
70
465
30
CMIN
1021.692
.000
2715.382
NFI
Delta1
.624
1.000
.000
RFI
rho1
.586
DF
395
0
435
P
.000
CMIN/DF
2.587
.000
6.242
Baseline Comparisons
Model
Default model
Saturated model
Independence model
.000
IFI
Delta2
.730
1.000
.000
TLI
rho2
.697
.000
CFI
.725
1.000
.000
RMSEA
Model
Default model
Independence model
RMSEA
.089
.162
LO 90
.082
.156
HI 90
.096
.168
PCLOSE
.000
.000
Model Comparisons
1. The initial, hypothesized model often does not work out or there could be competing
models; therefore, it is important to be able to compare models.
2. Modification Indices
a. These are suggested additions to the model to improve the fit.
3. Chi-square difference test
a. Subtract the chi-square values of the two models,
b. Subtract the degrees of freedom of the two models,
c. Determine the critical chi-square value,
d. If the difference in the chi-square values is greater than the critical chi square
value, then the models are significantly different.
e. Smaller chi-square values are better, so whichever model has the smaller chisquare value is the better model.
4. CFI difference
a. If the difference in the CFI values of two models is greater than .01, then the
models are significantly different.
b. The model with the greater CFI is the better model.
8|Page
Kayla N. Jordan – RStats Workshop Spring 2014
Assumptions
1. Sample Size
a. Good sample size estimate is 100-200
b. Larger sample may be needed if there are a large number of variables
c. Some SEM programs do not allow there to be missing data
2. Normality
a. Data should be normally distributed
b. Bootstrapping can help with non-normal data
3. Outliers
a. Outliers can affect the fit of the model
b. Can be checked using Mahlanobis distance
4. Linearity
5. Multicollinearity
a. It is important to test for multicollinearity.
b. If two variables are highly correlated (r > .9), it is basically like putting the same
variable in the model twice.
c. If this is a problem, one of the variables should be deleted.
6. Homoscedascity
Programs
1. AMOS
a. Good for beginners
b. Visual representation of models
2. EQS
3. LISREL
4. MPlus
Recommended Texts
Byrne, B. (2006). Structural Equation Modeling with EQS. 2nd Edition, (New York:
Routlege).
Byrne, B. (2009). Structural Equation Modeling with AMOS. (New York: Routlege).
Brown, T. A. (2006) Confirmatory Factor Analysis for Applied Research. (New York:
Guilford Press).
Kline, R. B. (2010). Principles and Practice of Structural Equation Modeling. (New
York: Guilford Press).
9|Page
Kayla N. Jordan – RStats Workshop Spring 2014
Download