Why are measures not PD - Wright State University

advertisement
QUESTIONS of the MOMENT...
"How does one remedy a 'not Positive Definite' message?"
(The APA citation for this paper is Ping, R.A. (2012). "How does one remedy a 'not Positive Definite' message?" [on-line paper]. http:// www.wright.edu/~robert.ping/
NotPD1.doc .)
(An earlier version of this paper, Ping, R.A. (2009). "How does one remedy a 'not Positive
Definite' message?" [on-line paper]. http:// www.wright.edu/~robert.ping/NotPD.doc . is available here.)
Few things are as frustrating, after gathering and entering survey data for a new model,
and creating and debugging the estimation software program for the model, as the first
bug-free software run producing a "not Positive Definite" message ("Ill Conditioned" in
exploratory factor analysis). The definition of this problem provides little help, and there
is little guidance for remedying matters, besides "check the data and the data correlation
matrix," "delete items," or "use Ridge Option estimates" (the last of which produces
biased parameter estimates, standard errors, and fit indices).
Besides data entry errors,1 2 experience suggests that in real-world survey data, causes for
a Not Positive Definite (NPD) message usually are 1) collinearity among the items, 2)
measure(s) inconsistency, 3) inadequate starting values, and 4) model misspecification.
(1) Collinearity among the items is easiest to investigate. Specifically, SAS, SPSS, etc.
item correlations of 0.9 or above should be investigated by removing the item(s) involved
to see if it remedies NPD. In this case, dropping or combining the highly correlated items
may remove the NPD message. 3
Occasionally in real world data, NPD can be the result of two or more parallel measures
(measures of the same construct). With NPD and parallel measures, the measure(s) with
the lesser psychometrics (reliability/validity) probably should be abandoned, and NPD
should be reassessed.
(2) Measure Inconsistency: In real world data NPD can accompany lack of consistency in
the Anderson and Gerbing (1988) sense (lack of model-to-data fit). Unfortunately, the
procedure for investigating this possibility is tedious.
1
Experience suggests that random data entry errors seldom produce a "not Positive Definite" message.
However, because they may create other problems later, it always prudent to examine the data for data
entry errors.
2
3
There are other plausible conditions--see for example http://www2.gsu.edu/~mkteer/npdmatri.html.
Deleting an item should be done with concern for the content or face validity of the resulting measure.
Combining items may be less desirable than removing them because it could be argued that the resulting
combined "item" is no longer an observed variable.
The process begins with maximum likelihood exploratory (common) factor analysis (FA)
of each measure. Specifically, each measure should be FA’d. Then, pairs of measures
should be FA’d, then triplets, etc. (Note that one or more measures may be
multidimensional—experience suggests that usually will not produce NPD.)
Typically, NPD occurs when adding a measure to a group of m (m < n, where n is the
number of measures) measures that was Positive Definite (PD). When a measure is found
to create NPD, its items should be “weeded” using reliability maximization.4 In
particular, after an item is dropped, NPD should be evaluated in a FA with all n of the
measures. After that, deleted items should be added back to measures, beginning with the
item contributing most to content validity of the most “important” measure, then
proceeding to the next most "important" item, etc. With each added item, NPD should be
checked using FA and all n measures.
Occasionally, the above approach does not remedy NPD (it does not produce a set of n
measures that is PD in a FA with all n measures) without excessive item weeding (too
many items are weeded out), or without weeding an item(s) that are judged to be essential
for face or content validity. In this case, experience suggests that weeding using
Modification Indices instead of reliability might remedy NPD. (See the EXCEL template
"For 'weeding' a multi-item measure so it 'fits the data'..." on the preceding web page).
Specifically, the measure to be weeded should be estimated in a single LV measurement
model (MM) and:
a) it should be estimated using covariances and Maximum Likelihood estimation.
b) The single LV MM should have one (unstandardized) loading constrained to equal 1,
and all the other (unstandardized) loadings should be free (unconstrained) and their
estimates should be between 0 and 1.5
c) In the single LV MM, the unstandardized LV variance should be free, and the
measurement model LV variance estimate should be positive and larger than its errorattenuated (i.e., SAS, SPSS, etc.) estimate.
d) All measurement error variances should be free and uncorrelated,6 and their estimates
each should be zero or positive.
4
SAS, SPSS, etc. have procedures that assess the reliability of the remaining items if an item is dropped
from the group.
5
If one or more loadings in a measure is greater than one, the largest loading should be fixed at 1 and the
other measure loadings should be freed, including the loading previously fixed at 1.
6
Uncorrelated measurement errors is a classical factor analysis assumption. If this assumption is violated
(e.g., to obtain model to data fit with an item that must not be deleted), experience suggests that the above
procedure may still work. If it does not, the interested reader is encouraged to e-mail me for suggestions for
their specific situation.
e) The measurement model should fit the data very well using a sensitive fit index such as
RMSEA (i.e., RMSEA should be at least 0.08--see Brown and Cudeck 1993, Jöreskog
1993).
In the unusual case that MI weeding does not remedy NPD, experience suggests that
during MI weeding two items in a measure may have had practically the same MI. In this
case, the removal of either item will improve model to data fit, and, because the deletion
of the first item did not remove NPD, the second item should be deleted instead (rarely,
both items should be deleted). Again, deleted items should be added back selectively to
their respective measures until NPD reoccurs.
(The case where NPD remains unremedied is discussed below.)
For emphasis, the objective should be to find the item(s) responsible for NPD. Stated
differently, each measure should retain as many items as possible.
(3) Inadequate Starting Values: It is possible in real-world survey data that is actually
Positive Definite (PD), to obtain a fitted or reproduced covariance matrix that is NPD.
Experience suggests this usually is the result of structural model misspecification, which
is discussed below, but it also can be the result of inadequate (parameter) starting values.
Inadequate starting values usually occur in the structural model, and while they can be
produced by the software (LISREL, EQS, AMOS, etc.), more often they are user
supplied. Fortunately, adequate starting values for LV variances and covariances can be
obtained from SAS, SPSS, etc. using averaged items. Starting estimates for loadings can
be obtained from maximum likelihood exploratory (common) factor analysis, 7 and
regression estimates (with averaged indicators for each measure) can be used for
adequate structural coefficient starting values. For emphasis, the parameters with these
starting values should always be unfixed (i.e., free) (except for each measure’s item with
a loading of 1).
(4) Model Misspecification: It is also possible with PD input to obtain a fitted, or
reproduced, structural model covariance matrix8 that is NPD because of structural model
misspecification. Remedying this is very tedious.
The procedure uses three steps beginning with verifying that the structural model paths
reflect the hypotheses exactly. If they do, check that the full measurement model (FMM)
is PD. Next, verify that the structural model is specified exactly as the FMM, except that
7
Each measure should be factored individually, and in each individual factor analysis the resulting
standardized loadings should be divided by the largest loading in that factor analysis for SEM starting
values.
8
NPD parameter matrices (e.g., PSI or THETA-EPS) also can occur, usually with under identified LV's
having fixed parameters or correlated measurement errors. The interested reader is encouraged to e-mail me
for suggestions for their specific situation.
the hypothesized structural paths have replaced several MM correlations. Then, remove
any structural model misspecification.
Specifically, estimate a full measurement model with all the measures present using steps
a-e above with “single,” “each,” etc. replaced with “full.”
If the FMM is NPD and remedies 1-3 above have been tried, an approach that uses
Anderson and Gerbing's 1988 suggestions for establishing internal and external
consistency should be used. It begins with estimating each LV in its own measurement
model (with no other measures present), then estimating pairs of LV's in two-measure
measurement models (with only the two measures present). Next, triplets of LV's are
estimated in three-measure measurement models, then quadruplets of LV's and so on
until the full measurement model is estimated.
Specifically, each of the single LV measurement models should be estimated as described
in steps a) through e) above to establish "baseline" parameter estimates for each measure
for later use. (If these parameter estimates are already available, this step can be skipped.)
Then, the LV's should be estimated in pairs--for example with 4 LV's, 6 measurement
models, each containing two LV's are estimated. These measurement models also should
be estimated using steps a) through e) above.
In addition,
f) Each LV should be specified exactly as it was in its own single LV measurement
model, no indicator of any LV should load onto a different LV, and no measurement
error variance of one LV should be allowed to correlate with any measurement error
variance of any other LV.
g) The covariances between the LV's should be free, and in real world data the resulting
estimated variances and covariances of each LV, their loadings and measurement error
variances should be nearly identical to those from the LV's own single LV measurement
model.
h) Each of these measurement models should fit the data very well using a sensitive fit
index such as RMSEA.
Next, the LV's should be estimated in triplets--for 4 LV's this will produce 3
measurement models with 3 LV's in each. Each of these measurement models should be
estimated using steps a) through h) above.
Then, this process of estimating larger and larger combinations of LV's using steps a)
through h) should be repeated until the full measurement model with all the LV's present
would be estimated using steps a) through h).
At this point at lease one measure should have been found to be problematic. Then,
please contact me for the next steps.
If the FMM is PD and possibilities 1-3 above have been checked, reverify that the
structural model reflects the hypotheses exactly, and that the structural model is specified
exactly as the FMM, except that the hypothesized structural paths have replaced several
MM correlations. Then, try dropping an LV from the structural model. If the structural
model is still NPD, try dropping a different LV. If repeating this process remedies NPD,
please email me for the next steps. If repeating this process, dropping each measure oneat-a-time does not remedy NPD, please email me for different next steps.
REFERENCES
Anderson, James C. and David W. Gerbing (1988), "Structural Equation Modeling in
Practice: A Review and Recommended Two-Step Approach," Psychological Bulletin,
103 (May), 411-23.
Browne, Michael W. and Robert Cudeck (1993), "Alternative Ways of Assessing Model
Fit," in Testing Structural Equation Models, K. A. Bollen et al. eds., Newbury Park
CA: SAGE Publications.
Jöreskog, Karl G. (1993), "Testing Structural Equation Models," in Testing Structural
Equation Models, Kenneth A. Bollen and J. Scott Long eds., Newbury Park, CA:
SAGE.
Download