Introduction - of David A. Kenny

advertisement
SEM Model Fit:
Introduction
David A. Kenny
January 25, 2014
Definition
Fit refers to the ability of a model to reproduce
the data (i.e., usually the variance-covariance
matrix).
A good-fitting model is one that is reasonably
consistent with the data and so does not
necessarily require any major respecification.
A good-fitting measurement model is required
before interpreting the causal paths of the
structural model.
2
However…
• A good-fitting model does not necessarily mean a
good model.
• For instance, a model all of whose parameters are
zero is likely a "good-fitting" model.
• Additionally, models with nonsensical results (e.g.,
paths that are clearly the wrong sign) and models
with poor discriminant validity or Heywood cases
can be “good-fitting” models. Parameter estimates
must be carefully examined to determine if one has
a reasonable model as well as the fit statistics.
3
Additionally
• A good fitting model can still be
improved.
• You can make a good-fitting model a
better-fitting model.
• Also too you can make the model
simpler and not appreciably change the
fit.
• A good fitting model is not necessarily
the best model.
4
Model Fit versus
Comparison
• Two very different questions
–Model Fit: Is a given model a
good-fitting model?
–Model Comparison: Which of
two models is better fitting?
5
How Big a Sample Size?
• Rules of Thumb
–Ratio of N to the Number of
Free Parameters
–Absolute N
• Power Analysis
6
Ratio of N to Number of
Free Parameters
• Tanaka (1987): 20 to 1, but that
is unrealistically high.
• Bentler & Chou (1987): 5 to 1
• Many published studies fail to
meet this goal!
7
Absolute N
• 200 is seen as a goal for SEM research
• Lower sample sizes can be used for
– Models with no latent variables
– Models where all loadings are fixed
(usually to one)
– Models with strong correlations
– Simpler models
– Models for which there is a practical upper
limit on N (e.g., countries or years as the
8
unit)
Power Analysis
• The best way to determine if you have
a large enough sample is to conduct a
power analysis.
• Either use the Sattora and Saris (1985)
method or conduct a simulation.
• To determine the power to detect a
poor-fitting model, you can use
Preacher and Coffman’s web-based
calculator.
9
2
c Test
• For models with about 75 to 200 cases, the chi
square test is a reasonable measure of fit. But for
models with more cases (e.g., 400 or more), the c2
test is likely statistically significant.
• This is why fit indices were invented. They provide
a way to claim that one has a good model, despite
the fact that the c2 test is statistically significant.
• Sometimes c2 is more interpretable if it is
transformed into a Z value. The following
approximation can be used:
Z = √(2χ2) - √(2df - 1)
10
More on
2
c
An old measure of fit is the chi square to df ratio or
c2/df. A problem with this fit index is that there is no
universally agreed upon standard as to what is a good- and a
bad-fitting model. Note, however, that two very popular fit
indices, TLI and RMSEA, are largely based on this oldfashioned ratio.
The chi square test is too liberal (i.e., too many Type I)
errors when variables have non-normal distributions,
especially distributions with kurtosis. Moreover, with small
sample sizes, there are too many Type I errors. Note the c2
test is asymptotic test, and so it works best with large sample
sizes.
11
Typology of Fit Indices
• This typology is very different from
the mainstream definitions. Please
note differences.
• Typology
–Incremental
–Absolute
–Comparative
12
Incremental Fit Indices
An incremental (sometimes called
relative) fit index is analogous to R2, and so a
value of zero indicates having the worst
possible model and a value of one indicates
having the best possible.
So my model is placed on a continuum.
In terms of a formula, it is
Worst Possible Model – My Model
Worst Possible Model – Best Possible Model
13
The Best Model
• The standard definition of best possible
model is one in which c2 equals its degrees
of freedom (the expected value of c2 given
the null hypothesis of perfect fit).
• An older definition (Bentler-Bonnet) is to
assume the best model has a c2 of zero, but
that definition ignores sampling error.
14
The Worst Model
• The worst possible model is called the null or
independence model and the usual
convention is to allow all the variables in the
model to have variation but no correlation.
• The usual null model is to allow the means
to equal their actual value. However, for
growth curve models, the null model should
set the means as all equal to each other, i.e.,
no growth.
15
df of the Null Model
• The degrees of freedom of the null
model are k(k – 1)/2 where k is the
number of variables in the model.
• If the null model sets the means
equal, as in a growth-curve model,
its df are (k + 2)(k – 1)/2.
16
Alternative Null Models
• Alternative null models might be considered (but
almost never employed).
– One alternative null model is that all latent variable
correlations are zero
– Another is that all exogenous variables are
correlated but the endogenous variables are
uncorrelated with each other and the exogenous
variables. In fact, this is the null model in Mplus
when the exogenous variables are measured.
• O’Boyle and Williams (2011) suggest two different null
models for the measurement and structural models.
17
Absolute Fit Indices
• An absolute measure of fit presumes that the
best fitting model has a fit of zero. The
measure of fit determines how far the model
is from perfect fit.
• These measures of fit are typically “badness”
measure of fit in that a larger number implies
worse fit.
• Common absolute fit indices are SRMR and
RMSEA.
18
Comparative Fit Indices
• A comparative measure of fit is only interpretable
when comparing two different models and cannot
be used to determine whether a given model is
good-fitting. This term is unique to this
presentation in that these measures are more
commonly called absolute fit indices. However, it
is helpful to distinguish absolute indices that do
not require a comparison between two models.
• One advantage of comparative fit indices is that
often they can be computed for models that are
just-identified.
19
• Examples are AIC, BIC, and SABIC.
Controversy about Fit
Indices
• There is considerable controversy about fit
indices.
• Some researchers do not believe that fit indices
add anything to the analysis (e.g., Barrett, 2007)
and only the chi square should be interpreted. The
worry is that fit indices allow researchers to claim
that a miss-specified model is not a bad model.
• Others (e.g., Hayduk, Cummings, Boadu,
Pazderka-Robinson, & Boulianne, 2007) argue that
cutoffs for a fit index can be misleading and subject
20
to misuse.
Consensus View
• Most analysts believe in the value of fit
indices, but caution against strict reliance on
cutoffs.
• Particularly, problematic is the “cherry
picking” a fit index. That is, you compute
many fit indices and you pick the one index
that allows you to make the point that you
want to make. If you decide not to report a
popular index (e.g., the TLI or the RMSEA),
you need to give a good reason why you are
21
not.
Compute a Fit Index?
• Kenny, Kaniskan, and McCoach (2014) have
argued that fit indices should not even be
computed for small degrees of freedom
models.
• Rather for these models, the researcher
should locate the source of specification
error by determining what parameter could
be added to the model and then test that
parameter.
22
Additional Presentations
• Measures of fit
• Factors affecting measures of fit
• References (pdf)
23
Download