QUESTIONS of the MOMENT... "How does one remedy a 'not Positive Definite' message?" (The APA citation for this paper is Ping, R.A. (2012). "How does one remedy a 'not Positive Definite' message?" [on-line paper]. http:// www.wright.edu/~robert.ping/ NotPD1.doc .) (An earlier version of this paper, Ping, R.A. (2009). "How does one remedy a 'not Positive Definite' message?" [on-line paper]. http:// www.wright.edu/~robert.ping/NotPD.doc . is available here.) Few things are as frustrating, after gathering and entering survey data for a new model, and creating and debugging the estimation software program for the model, as the first bug-free software run producing a "not Positive Definite" message ("Ill Conditioned" in exploratory factor analysis). The definition of this problem provides little help, and there is little guidance for remedying matters, besides "check the data and the data correlation matrix," "delete items," or "use Ridge Option estimates" (the last of which produces biased parameter estimates, standard errors, and fit indices). Besides data entry errors,1 2 experience suggests that in real-world survey data, causes for a Not Positive Definite (NPD) message usually are 1) collinearity among the items, 2) measure(s) inconsistency, 3) inadequate starting values, and 4) model misspecification. (1) Collinearity among the items is easiest to investigate. Specifically, SAS, SPSS, etc. item correlations of 0.9 or above should be investigated by removing the item(s) involved to see if it remedies NPD. In this case, dropping or combining the highly correlated items may remove the NPD message. 3 Occasionally in real world data, NPD can be the result of two or more parallel measures (measures of the same construct). With NPD and parallel measures, the measure(s) with the lesser psychometrics (reliability/validity) probably should be abandoned, and NPD should be reassessed. (2) Measure Inconsistency: In real world data NPD can accompany lack of consistency in the Anderson and Gerbing (1988) sense (lack of model-to-data fit). Unfortunately, the procedure for investigating this possibility is tedious. 1 Experience suggests that random data entry errors seldom produce a "not Positive Definite" message. However, because they may create other problems later, it always prudent to examine the data for data entry errors. 2 3 There are other plausible conditions--see for example http://www2.gsu.edu/~mkteer/npdmatri.html. Deleting an item should be done with concern for the content or face validity of the resulting measure. Combining items may be less desirable than removing them because it could be argued that the resulting combined "item" is no longer an observed variable. The process begins with maximum likelihood exploratory (common) factor analysis (FA) of each measure. Specifically, each measure should be FA’d. Then, pairs of measures should be FA’d, then triplets, etc. (Note that one or more measures may be multidimensional—experience suggests that usually will not produce NPD.) Typically, NPD occurs when adding a measure to a group of m (m < n, where n is the number of measures) measures that was Positive Definite (PD). When a measure is found to create NPD, its items should be “weeded” using reliability maximization.4 In particular, after an item is dropped, NPD should be evaluated in a FA with all n of the measures. After that, deleted items should be added back to measures, beginning with the item contributing most to content validity of the most “important” measure, then proceeding to the next most "important" item, etc. With each added item, NPD should be checked using FA and all n measures. Occasionally, the above approach does not remedy NPD (it does not produce a set of n measures that is PD in a FA with all n measures) without excessive item weeding (too many items are weeded out), or without weeding an item(s) that are judged to be essential for face or content validity. In this case, experience suggests that weeding using Modification Indices instead of reliability might remedy NPD. (See the EXCEL template "For 'weeding' a multi-item measure so it 'fits the data'..." on the preceding web page). Specifically, the measure to be weeded should be estimated in a single LV measurement model (MM) and: a) it should be estimated using covariances and Maximum Likelihood estimation. b) The single LV MM should have one (unstandardized) loading constrained to equal 1, and all the other (unstandardized) loadings should be free (unconstrained) and their estimates should be between 0 and 1.5 c) In the single LV MM, the unstandardized LV variance should be free, and the measurement model LV variance estimate should be positive and larger than its errorattenuated (i.e., SAS, SPSS, etc.) estimate. d) All measurement error variances should be free and uncorrelated,6 and their estimates each should be zero or positive. 4 SAS, SPSS, etc. have procedures that assess the reliability of the remaining items if an item is dropped from the group. 5 If one or more loadings in a measure is greater than one, the largest loading should be fixed at 1 and the other measure loadings should be freed, including the loading previously fixed at 1. 6 Uncorrelated measurement errors is a classical factor analysis assumption. If this assumption is violated (e.g., to obtain model to data fit with an item that must not be deleted), experience suggests that the above procedure may still work. If it does not, the interested reader is encouraged to e-mail me for suggestions for their specific situation. e) The measurement model should fit the data very well using a sensitive fit index such as RMSEA (i.e., RMSEA should be at least 0.08--see Brown and Cudeck 1993, Jöreskog 1993). In the unusual case that MI weeding does not remedy NPD, experience suggests that during MI weeding two items in a measure may have had practically the same MI. In this case, the removal of either item will improve model to data fit, and, because the deletion of the first item did not remove NPD, the second item should be deleted instead (rarely, both items should be deleted). Again, deleted items should be added back selectively to their respective measures until NPD reoccurs. (The case where NPD remains unremedied is discussed below.) For emphasis, the objective should be to find the item(s) responsible for NPD. Stated differently, each measure should retain as many items as possible. (3) Inadequate Starting Values: It is possible in real-world survey data that is actually Positive Definite (PD), to obtain a fitted or reproduced covariance matrix that is NPD. Experience suggests this usually is the result of structural model misspecification, which is discussed below, but it also can be the result of inadequate (parameter) starting values. Inadequate starting values usually occur in the structural model, and while they can be produced by the software (LISREL, EQS, AMOS, etc.), more often they are user supplied. Fortunately, adequate starting values for LV variances and covariances can be obtained from SAS, SPSS, etc. using averaged items. Starting estimates for loadings can be obtained from maximum likelihood exploratory (common) factor analysis, 7 and regression estimates (with averaged indicators for each measure) can be used for adequate structural coefficient starting values. For emphasis, the parameters with these starting values should always be unfixed (i.e., free) (except for each measure’s item with a loading of 1). (4) Model Misspecification: It is also possible with PD input to obtain a fitted, or reproduced, structural model covariance matrix8 that is NPD because of structural model misspecification. Remedying this is very tedious. The procedure uses three steps beginning with verifying that the structural model paths reflect the hypotheses exactly. If they do, check that the full measurement model (FMM) is PD. Next, verify that the structural model is specified exactly as the FMM, except that 7 Each measure should be factored individually, and in each individual factor analysis the resulting standardized loadings should be divided by the largest loading in that factor analysis for SEM starting values. 8 NPD parameter matrices (e.g., PSI or THETA-EPS) also can occur, usually with under identified LV's having fixed parameters or correlated measurement errors. The interested reader is encouraged to e-mail me for suggestions for their specific situation. the hypothesized structural paths have replaced several MM correlations. Then, remove any structural model misspecification. Specifically, estimate a full measurement model with all the measures present using steps a-e above with “single,” “each,” etc. replaced with “full.” If the FMM is NPD and remedies 1-3 above have been tried, an approach that uses Anderson and Gerbing's 1988 suggestions for establishing internal and external consistency should be used. It begins with estimating each LV in its own measurement model (with no other measures present), then estimating pairs of LV's in two-measure measurement models (with only the two measures present). Next, triplets of LV's are estimated in three-measure measurement models, then quadruplets of LV's and so on until the full measurement model is estimated. Specifically, each of the single LV measurement models should be estimated as described in steps a) through e) above to establish "baseline" parameter estimates for each measure for later use. (If these parameter estimates are already available, this step can be skipped.) Then, the LV's should be estimated in pairs--for example with 4 LV's, 6 measurement models, each containing two LV's are estimated. These measurement models also should be estimated using steps a) through e) above. In addition, f) Each LV should be specified exactly as it was in its own single LV measurement model, no indicator of any LV should load onto a different LV, and no measurement error variance of one LV should be allowed to correlate with any measurement error variance of any other LV. g) The covariances between the LV's should be free, and in real world data the resulting estimated variances and covariances of each LV, their loadings and measurement error variances should be nearly identical to those from the LV's own single LV measurement model. h) Each of these measurement models should fit the data very well using a sensitive fit index such as RMSEA. Next, the LV's should be estimated in triplets--for 4 LV's this will produce 3 measurement models with 3 LV's in each. Each of these measurement models should be estimated using steps a) through h) above. Then, this process of estimating larger and larger combinations of LV's using steps a) through h) should be repeated until the full measurement model with all the LV's present would be estimated using steps a) through h). At this point at lease one measure should have been found to be problematic. Then, please contact me for the next steps. If the FMM is PD and possibilities 1-3 above have been checked, reverify that the structural model reflects the hypotheses exactly, and that the structural model is specified exactly as the FMM, except that the hypothesized structural paths have replaced several MM correlations. Then, try dropping an LV from the structural model. If the structural model is still NPD, try dropping a different LV. If repeating this process remedies NPD, please email me for the next steps. If repeating this process, dropping each measure oneat-a-time does not remedy NPD, please email me for different next steps. REFERENCES Anderson, James C. and David W. Gerbing (1988), "Structural Equation Modeling in Practice: A Review and Recommended Two-Step Approach," Psychological Bulletin, 103 (May), 411-23. Browne, Michael W. and Robert Cudeck (1993), "Alternative Ways of Assessing Model Fit," in Testing Structural Equation Models, K. A. Bollen et al. eds., Newbury Park CA: SAGE Publications. Jöreskog, Karl G. (1993), "Testing Structural Equation Models," in Testing Structural Equation Models, Kenneth A. Bollen and J. Scott Long eds., Newbury Park, CA: SAGE.