v.2 N.K.Bowen, 2014 TESTING FOR DIFFERENCES IN MEASUREMENT (CFA) MODELS USING MPLUS’S INVARIANCE SHORTCUT CODE (WLSMV) Measurement invariance occurs when the parameters of a measurement model are statistically equivalent across two or more groups. The parameters of interest in measurement invariance testing are the factor loadings and indicator intercepts (when ML estimation is used) or indicator thresholds (when WLSMV is used with categorical variables). Invariance in residual variances or scale factors can also be tested, but there is consensus that it is not necessary to demonstrate invariance across groups on these parameters. Researchers typically hope to demonstrate that their measures are invariant across groups. When loadings and intercepts or thresholds are invariant across groups, scores on latent variables can be validly compared across the groups and the latent variables can be used in structural models hypothesizing relationships among latent variables. A recent update to Mplus (7.11) provides a convenient shortcut for conducting a sequence of increasingly restrictive invariance tests. This handout gives some non-Mplusspecific information on invariance testing, information on Mplus non-shortcut invariance testing, and then invariance testing using the Mplus shortcut. We focus on tests of invariance with measurement models based on ordinal variables—the most common type of scale variables in social science research. When scales are based on ordinal measures, it is appropriate to use WLSMV estimation and a polychoric correlation matrix for the analysis. Specifying that observed indicator variables are ordinal automatically leads to the use of WLSMV and the polychoric correlation matrix. Note that Mplus allows two different approaches to the specification of multiple group CFAs: delta and theta parameterization. Delta is the default, but we recommend using Theta. Theta parameterization lets you obtain information on residual variances (unexplained variance in the observed indicators of factors). Social work researchers are typically interested in error variances, not scale factors. Add the line “Parameterization is Theta” to the analysis section of your input file to request Theta parameterization. Step 1: Test your hypothesized model separately with each group. Some invariance scholars recommend obtaining the best model for each group separately before beginning invariance testing. To run separate analyses based on groups represented by values on a variable in your data set, use the following syntax in the VARIABLE section of the code, substituting your variable’s name and value as appropriate. For example, the code below specifies that the analysis should use only cases that have a value of 2 on the race/ethnicity variable in the dataset. After running the model with this group, the value would be changed to 1 or some other value to run the model with another group. 1 v.2 N.K.Bowen, 2014 USEOBSERVATION ARE (raceth EQ 2); (Note: there is no space after USE) If data for your groups are in separate data files, specify the data file for one group at a time in the usual way: DATA: FILE IS "C:\MyMplus\datafilewhite.dat”; Seek a good-fitting model for each group, but it may be acceptable to have marginal fit at this stage (Raykov, Marcoulides, & Li, 2012). The model for one group may include correlated errors that do not appear in other groups, but the pattern of factors and loadings should be the same across groups in order to proceed to multiple group modeling. However, see Byrne, Shavelson, Muthén, 1989 for a more relaxed approach. Step 2: Specify a multiple group model. It may include slight variations for each group if they were identified in the individual group tests. Multiple group models can be run from multiple datasets or based on values for a variable in the dataset. To specify a multiple group model based on values on a variable in the dataset, use code like the following. The numbers in parentheses must match values on the variable gender in the dataset. This code goes in the section under VARIABLE, because the variable “raceth” here has a special function. GROUPING IS raceth (1 = WH 2 = AA); According to the online Mplus user’s guide, the way to specify that data are in two files is: FILE (white) IS white.dat; FILE (AA) IS AA.dat; When data are all in one file, the “GROUPING IS” code is all it takes to let Mplus know you want to do a multiple group analysis. If the models for your groups are the same, simply enter your model code as if you were doing a single group model. For example, the code below specifies that a factor called SSFR (social support from friends) is measured by 4 observed indicators in both groups: MODEL: SSFR BY c24 c26 c27 c28; If the factor model is the same for all groups, no further specification is required. However, if, for example, a correlated error was found between c24 and c25 when the African American group was tested individually, that difference from the overall model 2 v.2 N.K.Bowen, 2014 could be specified in the multiple group model with code referring to just the African American group as follows: MODEL: SSFR BY c24 c26 c27 c28; MODEL AA: c24 WITH c25; (Note that this line does not turn blue because it is a subcommand under MODEL:) By Mplus default (without the shortcut), the code above will test a highly constrained model—one in which factor loadings and thresholds are constrained across groups, with residual variances (theta parameterization) or scales (delta parameterization) fixed at one in the first group and free in the second, and with latent factor means fixed at 0 for the first group and free in the second group. The AA group model would also have its freely estimated correlated error. To compare this constrained model to a less constrained model, we would have to first specify in many lines of syntax a model in which the default constraints on thresholds and/or loadings were all freed (except for those that need to be fixed or constrained for identification purposes). We would need to run this less restrictive model first in order to use the WLSMV difftest mechanism. The following difftest code would be included in the syntax for the less restrictive model. The code requests that a small file be saved with information on the current model to be used later when running the more restrictive model. SAVEDATA: difftest=freemodel.dat; Then we would run the more constrained model, this time including the “difftest= freemodel.dat” code under the ANALYSIS command. These steps give us the chi square difference test results we need to determine if model fit is significantly worse with the equality constraints. BUT, there is a new and easier way to test measurement invariance! Version 7.11 has a shortcut that allows us to simultaneously run and compare chi squares for a configural model, metric model, and scalar model all in one analysis. You specify the grouping variable or two data files as before, AND, in the ANALYSIS section of the code type: ****** MODEL IS CONFIGURAL METRIC SCALAR; ****** From Dimitrov (2010) we have the following definitions of these levels of invariance: Configural Invariance—same pattern of factors and loadings across groups, established before conducting measurement invariance tests 3 v.2 N.K.Bowen, 2014 Measurement Invariance • weak or Metric Invariance—invariant factor loadings • strong or Scalar Invariance—invariant factor loadings and intercepts (ML) or thresholds (WLSMV) • invariance of uniquenesses or strict invariance—invariant residual variances (not necessary for demonstrating measurement invariance) With the shortcut, Mplus specifies the Configural model as having loadings and thresholds free in both groups except the loading for the referent indicator, which is fixed at 1.0 in both groups. The means of factors in both groups are fixed at 0 while their variances are free to vary. Residual variances (with theta parameterization) or scale factors (delta parameterization) are fixed at 1.0 in all groups. With the shortcut, Mplus specifies the Metric model as having loadings constrained across groups except the loading for the first indicator of a factor, which is fixed at 1.0 in both groups. Thresholds are generally allowed to vary across groups, but certain thresholds have to be constrained in order for the model to be identified. Therefore, the first two thresholds of the referent indicator are constrained to be equal across groups, and the first threshold of each other indicator on a factor is constrained to be equal. Note that you can’t run a Metric model with binary indicator variables because it can’t be identified— there is only one threshold for a binary variable. With the shortcut metric model, the mean of the first factor is fixed at 0 and other factor means are free to vary and factor variances are free to vary. Some researchers, and the online Mplus User’s Guide, suggest evaluating only the Configural/Scalar comparison. They contend that lambdas and thresholds for indicators should only be freed or constrained in tandem. In this case, the code “MODEL IS CONFIGURAL SCALAR” can be used, and searches for sources of invariance will involve freeing lambdas and their thresholds at the same time. Note too, that any one of the three invariance models can be run by itself with the “MODEL IS” line under ANALYSIS. Step 3: Run the Model: With the example from above (MODEL: and MODEL AA:) we will get output comparing the fit of the three models, which will guide us to our conclusions or next steps. Specifically, before you see detailed fit information in the output for each model, you’ll see: MODEL FIT INFORMATION Invariance Testing 4 v.2 N.K.Bowen, 2014 Model Configural Metric Scalar Number of Parameters 32 29 22 Chi-square Degrees of Freedom 4.919 7.763 26.491 4 7 14 Models Compared Chi-square Metric against Configural Scalar against Configural Scalar against Metric 3.571 21.589 20.653 Degrees of Freedom 3 10 7 P-value 0.2957 0.3540 0.0224 P-value 0.3117 0.0173 0.0043 Step 4: Interpreting the Model Comparison Output A limited number of scenarios are possible in the Invariance Testing output. Scenario 1: All models have good fit and invariance is supported at each step. Scenario 2: One of the models does not have good fit. We need to demonstrate good fit of the model at each step--an issue separate from the invariance tests. Not only would we examine the chi square statistic for the models above, but we would evaluate any other fit indices we have pre-specified. Fit indices pertaining to the multiple group configural model, for example, can be found in the output section called MODEL FIT INFORMATION FOR THE CONFIGURAL MODEL. If the fit statistics are not adequate for any CFA model in our sequence of increasingly constrained model tests, it makes no sense to proceed to the next model: first because any less constrained model should have better fit than the next more constrained model, and second because if our measurement model at any step is unacceptable, it makes no sense to examine further whether parameters differ across groups. Scenario 3: Fit worsens from the configural to the metric level. If there is noninvariance at the metric level compared to the configural level (meaning constraining lambdas leads to worse fit), the researcher can choose to search for the source(s) of worse fit by testing the loadings of one factor at a time as a set and/or testing one loading at a time. If only a relatively small number of loadings are non-invariant, it is possible to proceed to test the effects of constraining thresholds in this new model. This sequence would be similar to (but simpler than) the search for non-invariant thresholds described in Step 5 below. 5 v.2 N.K.Bowen, 2014 Scenario 4: Fit does not significantly worsen when lambdas are constrained, but it does get worse when thresholds are constrained. In this situation, we can claim metric, or weak, invariance of our measure and stop testing, but weak invariance is not desirable. Alternatively, we can do additional tests to identify the source(s) of noninvariance. If only a small number of thresholds are non-invariant, we could claim that our measure is “partially invariant” (Byrne, 1989; Dimitrov, 2010—says invariance in up to 20% of parameters is okay). Scenario 3 is what we have in our example. The “Metric against Configural” line above indicates that constraining factor loadings did not significantly worsen fit (p of 2 change > .05). However, the “Scalar against Metric” lines indicate that constraining thresholds across groups worsened fit (p of 2 change < .05). We cannot conclude that our measure has scalar, or strong, invariance. We can, however, determine the extent of the non-invariance in the thresholds. Step 5: Searching for Sources of Non-Invariance, Approach 1 If we pre-specified that we would be testing thresholds and lambdas separately (as suggested by Millsap & Yun-Tein, 2004), we could take advantage of the shortcut code in the following way, even though we are going beyond the shortcut to look for individual noninvariant parameters. 5a. Mplus’s WLSMV difftest requires that the less restrictive model be run first. The “nested” model is run second. Therefore, we would start with the less constrained model—the SCALAR model plus thresholds freed for one indicator. ANALYSIS: . . . . MODEL IS SCALAR; Lambdas and thresholds constrained . . . MODEL: SSFR BY c24 c26 c27 c28; MODEL AA: [C24$2*] [C24$3*] (threshold 1 for c24 remains constrained for identification. The other 2 are freed) SAVEDATA: DIFFTEST=Ts24free.DAT; Save a difftest file to use in the next analysis No need to even look at the output, we just wanted to get that difftest file saved. 6 v.2 N.K.Bowen, 2014 5b. Then run the more constrained SCALAR model and use difftest to compare its fit to the prior model ANALYSIS: . . . . MODEL IS SCALAR; DIFFTEST = Ts24free.DAT The model with lambdas and thresholds constrained Compares fit of this model with prior model’s fit. MODEL: SSFR BY c24 c26 c27 c28; No thresholds freed SAVEDATA: DIFFTEST=SCALAR.DAT; Save fit information for next comparison. The results: Chi-Square Test for Difference Testing Value Degrees of Freedom P-Value 4.545 3 0.2083 Constraining the thresholds of C24 did not worsen fit, so we’ll move on to another indicator. We know the source of bad fit is out there somewhere! Repeat Step 5b until the problem indicator is identified. The test for invariance in the thresholds of C27 and C26 had the same result as the test with C28. Each time we compared the more restrictive SCALAR model (with all lambda and threshold constraints) to a less restrictive model—one with the thresholds free for one indicator. Each time the chidiff test was non-significant, we re-constrained the thresholds before continuing our search. 5c. Look for invariance of thresholds within one indicator. We now know that the problem lies somewhere among the thresholds for C24. Instead of testing all of them together (we already know the difftest would be significant), let’s test them one at a time. ANALYSIS: . . . . MODEL IS SCALAR; MODEL: Lambdas and thresholds constrained . . . 7 v.2 N.K.Bowen, 2014 SSFR BY c24 c26 c27 c28; MODEL AA: [C24$1] Except: One threshold for C24 is freed. SAVEDATA: DIFFTEST=TAU124free.DAT; Save a difftest file to use in the next analysis According to the diff test results below, threshold 1 for C24 passes the test! We can reconstrain it and continue our search. Chi-Square Test for Difference Testing Value Degrees of Freedom P-Value 2.994 1 0.0836 The comparison of the SCALAR model with a model with threshold 2 for indicator 24 freed yields the following difference test information: Chi-Square Test for Difference Testing Value Degrees of Freedom P-Value 13.197 1 0.0003 Threshold 2 is non-invariant across groups. Fit gets statistically significantly worse when the threshold is constrained to be equal across groups. We need to allow this parameter to remain free in the second group in our final model. We have one more threshold to test. To make sure that the order of our testing does not affect our conclusions, we will test threshold 3 the same way we have tested the others— against the fully constrained SCALAR model (i.e., with threshold 2 constrained again). We get the following results: Chi-Square Test for Difference Testing Value Degrees of Freedom P-Value 0.159 1 0.6904 8 v.2 N.K.Bowen, 2014 The model with threshold 3 constrained does not have significantly worse fit than the model with it free. It looks as though threshold 2 for item 24 is the sole culprit! One noninvariant parameter out of 3 lambdas and 12 thresholds isn’t bad, so when we move on to latent variable modeling with “social support from friends” variable, we’ll model the noninvariant parameter but treat the latent variable as invariant across the groups we tested. Step 5: Searching for Sources of Invariance, Approach 2 If we had pre-specified that we believed it was best to test lambdas and thresholds together as a set instead of separately (Muthén & Muthén, 1998-2012), we would have conducted the tests above with code releasing one indicator’s lambda and thresholds at the same time. Because c24 is the referent indicator for SSFR, we’d have to make another indicator the referent when testing the invariance of its parameters. Also, we can’t free all of an indicator’s thresholds at once—we would have an identification problem. So, in our example, we free any two of our three thresholds at a time to test the effects of freeing lambda and thresholds. If with any combination of thresholds free we find a significant deterioration of fit, we model the lambda and those thresholds as non-invariant. MODEL: SSFR BY c24* c26@1 c27 c28; (free the default of c24@1 & fix the loading for c26 to 1.0 instead) MODEL AA: SSFR by c24* c26@1; [C24$1*] [C24$2*] (threshold 3 for c24 remains constrained for identification. The other 2 are freed) The first loading will be different in the two groups. The second loading equals 1.0 in both groups. The third and fourth loadings are constrained to be equal across groups. The first two thresholds vary across groups; the third is constrained across groups. If we accept the rule that loadings and thresholds should be freed or constrained together, noninvariance found at this step, would suggest we continue to model c24 this way without searching for non-invariance of individual parameters within the set. Cheung & Rensvold, 1999 discuss importance of doing the tests with different referent indicators. Sass, 2011 is a good source on issues involved with invariance testing with ordinal data. Among other things, he explains how to interpret invariance of parameters. 9