SEM Model Fit: Introduction David A. Kenny January 25, 2014 Definition Fit refers to the ability of a model to reproduce the data (i.e., usually the variance-covariance matrix). A good-fitting model is one that is reasonably consistent with the data and so does not necessarily require any major respecification. A good-fitting measurement model is required before interpreting the causal paths of the structural model. 2 However… • A good-fitting model does not necessarily mean a good model. • For instance, a model all of whose parameters are zero is likely a "good-fitting" model. • Additionally, models with nonsensical results (e.g., paths that are clearly the wrong sign) and models with poor discriminant validity or Heywood cases can be “good-fitting” models. Parameter estimates must be carefully examined to determine if one has a reasonable model as well as the fit statistics. 3 Additionally • A good fitting model can still be improved. • You can make a good-fitting model a better-fitting model. • Also too you can make the model simpler and not appreciably change the fit. • A good fitting model is not necessarily the best model. 4 Model Fit versus Comparison • Two very different questions –Model Fit: Is a given model a good-fitting model? –Model Comparison: Which of two models is better fitting? 5 How Big a Sample Size? • Rules of Thumb –Ratio of N to the Number of Free Parameters –Absolute N • Power Analysis 6 Ratio of N to Number of Free Parameters • Tanaka (1987): 20 to 1, but that is unrealistically high. • Bentler & Chou (1987): 5 to 1 • Many published studies fail to meet this goal! 7 Absolute N • 200 is seen as a goal for SEM research • Lower sample sizes can be used for – Models with no latent variables – Models where all loadings are fixed (usually to one) – Models with strong correlations – Simpler models – Models for which there is a practical upper limit on N (e.g., countries or years as the 8 unit) Power Analysis • The best way to determine if you have a large enough sample is to conduct a power analysis. • Either use the Sattora and Saris (1985) method or conduct a simulation. • To determine the power to detect a poor-fitting model, you can use Preacher and Coffman’s web-based calculator. 9 2 c Test • For models with about 75 to 200 cases, the chi square test is a reasonable measure of fit. But for models with more cases (e.g., 400 or more), the c2 test is likely statistically significant. • This is why fit indices were invented. They provide a way to claim that one has a good model, despite the fact that the c2 test is statistically significant. • Sometimes c2 is more interpretable if it is transformed into a Z value. The following approximation can be used: Z = √(2χ2) - √(2df - 1) 10 More on 2 c An old measure of fit is the chi square to df ratio or c2/df. A problem with this fit index is that there is no universally agreed upon standard as to what is a good- and a bad-fitting model. Note, however, that two very popular fit indices, TLI and RMSEA, are largely based on this oldfashioned ratio. The chi square test is too liberal (i.e., too many Type I) errors when variables have non-normal distributions, especially distributions with kurtosis. Moreover, with small sample sizes, there are too many Type I errors. Note the c2 test is asymptotic test, and so it works best with large sample sizes. 11 Typology of Fit Indices • This typology is very different from the mainstream definitions. Please note differences. • Typology –Incremental –Absolute –Comparative 12 Incremental Fit Indices An incremental (sometimes called relative) fit index is analogous to R2, and so a value of zero indicates having the worst possible model and a value of one indicates having the best possible. So my model is placed on a continuum. In terms of a formula, it is Worst Possible Model – My Model Worst Possible Model – Best Possible Model 13 The Best Model • The standard definition of best possible model is one in which c2 equals its degrees of freedom (the expected value of c2 given the null hypothesis of perfect fit). • An older definition (Bentler-Bonnet) is to assume the best model has a c2 of zero, but that definition ignores sampling error. 14 The Worst Model • The worst possible model is called the null or independence model and the usual convention is to allow all the variables in the model to have variation but no correlation. • The usual null model is to allow the means to equal their actual value. However, for growth curve models, the null model should set the means as all equal to each other, i.e., no growth. 15 df of the Null Model • The degrees of freedom of the null model are k(k – 1)/2 where k is the number of variables in the model. • If the null model sets the means equal, as in a growth-curve model, its df are (k + 2)(k – 1)/2. 16 Alternative Null Models • Alternative null models might be considered (but almost never employed). – One alternative null model is that all latent variable correlations are zero – Another is that all exogenous variables are correlated but the endogenous variables are uncorrelated with each other and the exogenous variables. In fact, this is the null model in Mplus when the exogenous variables are measured. • O’Boyle and Williams (2011) suggest two different null models for the measurement and structural models. 17 Absolute Fit Indices • An absolute measure of fit presumes that the best fitting model has a fit of zero. The measure of fit determines how far the model is from perfect fit. • These measures of fit are typically “badness” measure of fit in that a larger number implies worse fit. • Common absolute fit indices are SRMR and RMSEA. 18 Comparative Fit Indices • A comparative measure of fit is only interpretable when comparing two different models and cannot be used to determine whether a given model is good-fitting. This term is unique to this presentation in that these measures are more commonly called absolute fit indices. However, it is helpful to distinguish absolute indices that do not require a comparison between two models. • One advantage of comparative fit indices is that often they can be computed for models that are just-identified. 19 • Examples are AIC, BIC, and SABIC. Controversy about Fit Indices • There is considerable controversy about fit indices. • Some researchers do not believe that fit indices add anything to the analysis (e.g., Barrett, 2007) and only the chi square should be interpreted. The worry is that fit indices allow researchers to claim that a miss-specified model is not a bad model. • Others (e.g., Hayduk, Cummings, Boadu, Pazderka-Robinson, & Boulianne, 2007) argue that cutoffs for a fit index can be misleading and subject 20 to misuse. Consensus View • Most analysts believe in the value of fit indices, but caution against strict reliance on cutoffs. • Particularly, problematic is the “cherry picking” a fit index. That is, you compute many fit indices and you pick the one index that allows you to make the point that you want to make. If you decide not to report a popular index (e.g., the TLI or the RMSEA), you need to give a good reason why you are 21 not. Compute a Fit Index? • Kenny, Kaniskan, and McCoach (2014) have argued that fit indices should not even be computed for small degrees of freedom models. • Rather for these models, the researcher should locate the source of specification error by determining what parameter could be added to the model and then test that parameter. 22 Additional Presentations • Measures of fit • Factors affecting measures of fit • References (pdf) 23