Mplus for Windows: An Introduction and Overview Alan C. Acock Department of HDFS Oregon State University 7/2009 Intro to Mplus—Alan C. Acock 1 Mplus for Windows: An Introduction and Overview Contents Section 1: Using Mplus 1.1 Launching Mplus 1.2 Input and Output Windows 1.3 Mplus Command Structure 1.4 Selected Defaults 1.5 Commands Section 2: Exploratory Factor Analysis 2.1 EFA with Continuous Variables 2.2 Comparing two Solutions 2.3 EFA with Categorical Outcomes 2.4 Selected Results 2.5 Comparing Two Solutions 2.6 Comparison: Categorical & Continuous Section 3: Confirmatory Factor Analysis 3.1 CFA with Continuous Variables 3.2 Output and Interpretation 3.2.1 Missing value summary 3.2.2 Covariances and correlations 3.2.3 Model Fit 3.2.4 Model result 3.2.5 Residuals 3.2.6 Modification indices Section 4: EFA as an Alternative to CFA Section 5: Equality Constaints—Longitudinal CFA 5.1 Programs for testing equality constraints 5.2 Selected output Section 6: Path Analysis 6.1 Model and programs 6.2 Indirect effects Section 7: Putting it Together: CFA & SEM 7.1 Program 7.2 Output 7.3 Interpretation and modification indices Section 8: Putting it Together: EFA & SEM 8.1 Program & model 8.2 Output Section 9: Summary & Resource Intro to Mplus—Alan C. Acock ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... ....................................... 3 3 3 5 7 7 9 9 11 12 13 14 14 16 18 21 21 22 23 23 26 27 29 33 36 39 42 42 43 43 44 44 48 49 49 50 54 2 Section 1: Using Mplus 1.1 Launching Mplus 1.2. Input and Output Windows The window shown above is the input window. You write Mplus programs in this window to read the data to be analyzed and to specify your model of interest. You then save your Mplus program and select Run Mplus from the Mplus menu to submit your program to the Mplus engine for processing. ► File ► Open Open ex1.inp. This is located at c:\Mplus Examples\ex1.inp. Intro to Mplus—Alan C. Acock 3 We will utilize files from the Mplus manual for many of our examples. These typically involve simulated data. Sometimes we well assign hypothetical variable names to make these somewhat realistic. The Manual itself does not provide a substantial about of explanation of the examples and the specific output so we hope the discussion of them here will be useful at a later time when you are trying to read the manuals. Here is one screen of the data in ex1.dat (The ANALYSIS: command here is not needed in the current version of Mplus) We have labeled missing values with a -9. Easiest to pick one value that will work for all variables—can be any number or a dot. Notice we have one observation, case 13, that has a missing value on all variables. The data happens to be in a fixed format. Could be comma delimited, cvs file from Excel. Intro to Mplus—Alan C. Acock 4 Could be free format, other formats possible, but more complicated Here is the confirmatory factor analysis model we are estimating Intro to Mplus—Alan C. Acock 5 We will explain the program in a moment, but for now we will just run it to see how the interface works. ► Mplus ► Run Mplus Or, you can click the Run icon. Once Mplus has finished processing your command program, it opens an output window. The output window first displays your Mplus program. Below the Mplus program are the Mplus model results. If there is an error in your Mplus program or you want to modify your Mplus program in any way (e.g., to fit a different model to the data), you must return to the input window and you can then modify the previous commands, save the modified command file, and run Mplus once again to obtain new output. 1.3 Mplus Command Structure After you have launched Mplus, you may build a command file. There are nine sets of Mplus commands (ususally only a few of these are used, but some have numerous subcommands) : 1. 2. 3. 4. 5. 6. 7. 8. 9. TITLE: (optional unless you want to know what the file is intended to do) DATA: (required), VARIABLE: (required), DEFINE: (some data transformations are available) SAVEDATA: (used for specialized applications) ANALYSIS: (for special analyses such as EFA MODEL: (a series of equations) OUTPUT: (many options are available) MONTECARLO: (used for simulations, power analysis) Rules: 1. All commands (Title, Data, etc.) must begin on a new line. 2. All command names must be followed by a colon. 3. For e.g., Title: Once you enter the colon, the key word becomes blue. Intro to Mplus—Alan C. Acock 6 4. 5. 6. 7. Semicolons separate command options—similar to SAS. The records in the input setup must be no longer than 80 columns. They can contain upper and/or lower case letters and tabs. Only variable names are case sensitive. (Y1 and y1 are different variables) 1.4 Selected Defaults The current version of Mplus assumes that you either have no missing values or are using full information maximum likelihood estimation and assuming missing values are missing at random (MAR) Parameters such as loadings can be fixed o Many loadings are fixed at 0.0 in the CFA models because the item should not load on the factor. o There is no path from F1 to y4 in our figure. Fixed parameters can be “freed,” meaning you will estimate them. o We could add a path from to Y4 or o Let E1 be correlated with E4 Fixed parameters are required to stay at a specified value, such as 0.0 or 1.0. All free parameters are put into a vector and iterations change values of these free parameters, until the model’s fit is optimal. Unless we tell it otherwise, Mplus will fix the first indicator’s loading at 1.0 as the reference indicator (except for EFA). o For example, F1y1 and F2y4 have fixed loadings of 1.0 by default. o One way to change reference indicator is to reorder variables, e.g. o F1 by y2 y1 y3 makes y2 reference indicator o Good to pick a strong indicator as the reference indicator—don’t get a significance test for reference indicator 1.5 Commands The TITLE command allows you to specify a title that Mplus will print on each page of the output file. This can go on and on for many lines and usually should. Everything is a Title until a command name appears at the start of a new line. I like to put the file name as the first line of a title. The DATA command specifies where Mplus will locate the data, the format of the data, and the names of variables. At present, Mplus will read the following file formats: tab-delimited text, Intro to Mplus—Alan C. Acock 7 space-delimited text, and comma-delimited text. The input data file may contain records in free field format or fixed format. If you are using data stored in another form (e.g., Stata, SAS, SPSS, or Excel), you will need to convert it to one of the formats with which Mplus can work before you read it into Mplus. SAS and SPSS require you to write a file out an ASCII (plain text) file. If you have the data in Stata you can use stata2mplus to set things up for you. You can obtain it using findit stata2mplus and install the program. Here is the Stata session: stata2mplus using "I:\flash\HDFS630\mplus\classnsfh", replace This creates two files: classnsfh.inp that will run a basic analysis in Mplus and classnsfh.dat, a comma delimit ASCII file that Mplus can read with all missing values coded/recoded as -9999. 1,1,1,1,2,1,1,1,1,1,1,1,2,1,1 3,2,2,3,2,3,2,2,2,2,2,2,2,2,2 1,1,1,-9999,1,2,2,1,2,2,2,2,2,1,1 3,3,3,3,3,3,3,3,3,3,3,3,3,3,3 2,2,1,2,1,2,2,1,1,2,1,-9999,1,1,1 The DATA command tells Mplus where the data is stored. If you store the program and the data in the same folder, you don’t need to include the path. Recommended to make a separte folder for each project such. A long file reference can exceed the character limits per line in Mplus. Mplus uses all available data by default. If you want to use listwise deletion you must specify this under the Data command. Listwise = on; The VARIABLE command names variables. These must be in the identical order to the way Stata/SAS/SPSS wrote the data file. (common mistake) Intro to Mplus—Alan C. Acock 8 Mplus variable names may not have more than 8 characters. Change variable names to be 8 characters or less or you will get error messages. Variable names are case sensitive. Must be consistent (common mistake) The ANALYSIS command tells Mplus what type of analysis to perform. Many analysis options are available. Some of these such as Type = EFA make additional commands unnecessary. SECTION 2: Exploratory Factor Analysis 2.1 EFA with Continuous Variables TITLE: DATA: VARIABLE: ANALYSIS: ! ! OUTPUT: efa1.inp This is an example of an exploratory factor analysis with continuous factor indicators FILE IS "c:\Mplus Examples\efa1.dat"; NAMES ARE y1-y12; TYPE = EFA 1 4; ESTIMATOR = ml; ROTATION = Geomin; sampstat; The Type = EFA 1 4 tells Mplus to perform an exploratory factor analysis. The 1 and 4 following the EFA specification tells Mplus to generate all possible factor solutions between and including 1 and 4. The ESTIMATOR = ml option has Mplus use the maximum likelihood estimator to perform the factor analysis o This provides a chi-square goodness of fit test that the number of hypothesized factors is sufficient to account for the correlations among the six variables in the analysis o This has an exclamation mark in front of it which makes it green. Anything green is a comment and is ignored by the program. This subcommand is not necessary because maximum likelihood estimation is the default. Mplus uses the geomin rotation which is oblique as its default. More traditional rotations such as varimax are available. See help for a listing of options. We do not need a MODEL: command because the EFA 1 4 takes care of this. One useful feature of Mplus is its ability to handle non-normal input data. Intro to Mplus—Alan C. Acock 9 Recall that the default ml estimator assumes that the input data are distributed joint multivariate normal. If you have reason to believe that this assumption has not been met and your sample is reasonably large (e.g., n ≥ 200), you may substitute mlm or mlmv in place of ml on the ESTIMATOR = line. o The mlm option provides a mean-adjusted chi-square model test statistic whereas the o mlmv option produces a mean and variance adjusted chi-square test of model fit. o SEM users who are familiar with Bentler's EQS software program should also note that the mlm chi-square test and standard errors are equivalent to those produced by EQS in its ML;ROBUST method. You may also add the OUTPUT command following the ANALYSIS and MODEL commands. The OUTPUT command is used to specify optional output. For this example the keyword sampstat tells Mplus to include sample statistics as part of its printed output. OUTPUT: sampstat ; You can use Mplus’ Help menu to get a listing of all the options available for each command. You might try this to see what OUTPUT options are available. Mplus produces the Sample correlations, Root Mean Square Error of Approximation (RMSEA), and the Chi-square test of the one, two, three, and four factor models. Standard errors and z-tests for loadings and correlations of factors. As you can see from the results, shown below, the chi-square test for a one factor solution is statistically significant, so the null hypothesis that a single factor fits the data is rejected; more factors are required to obtain a non-significant chi-square. Since the Chi-square test is: Sensitive to sample size (such that large samples often return statistically significant chi-square values) and Non-normality in the input variables. Intro to Mplus—Alan C. Acock 10 Mplus also provides the Root Mean Square Error of Approximation (RMSEA) statistic. The RMSEA is not as sensitive to large sample sizes. According to Hu and Bentler (1999), RMSEA values below .06 indicate satisfactory model fit. Kline indicates a .08 is acceptable. Run the program and interpret the results. 2.2 Comparing two Solutions You can test whether the adding additional factors significantly improves the fit to the data. Model Model Model Model 1 2 3 4 chi-square chi-square chi-square chi-square (54 (43 (33 (24 degrees degrees degrees degrees of of of of freedom) = 1052.089; p < .001 freedom) = 723.022; p < .001 freedom) = 341.268; p < .001 freedom) 25.799; p not sign. Is model 4 better than model 3. Model 3 chi-square (33 degrees of freedom) = 341.268 Model 4 chi-square (24 degrees of freedom) = 25.799 Difference chi-square (9 degrees of freedom 315.469; p < .001 This is significant at the .05 level With Stata, you can get the probability when this is not in a table . display 1-chi2(df,chi-square) . display 1-chi2(9,315.469) 0 This is obviously less than .05. Often you can’t use tables for chi-square because you have lots of degrees of freedom and tables only show significance levels for relatively few degrees of freedom. Estimate the model and interpret the results. 2.3. EFA with Categorical Outcomes For the purposes of illustration, suppose that you recode each variable into a replacement variable where all six variables' values at the median or below are Intro to Mplus—Alan C. Acock 11 assigned a categorical value of 1.00 and all values above the median assigned a value of 2.00. For categorical variables, Mplus automatically recodes the lowest value to zero with subsequent values increasing in units of 1.00. While the four underlying latent factors remain continuous, the six categorical observed variables' response values are now ordered dichotomous categories. You may use the program that appeared in the initial exploratory factor analysis example, with the following modifications, and the new data file that contains the categorical variables ex4.2.dat, as shown below. There are two estimators. WLSM (Weighted Least Squares) is very fast and reasonably good. o You should use this for initial runs. o Running this on a server used by many students, it ran in 1 sectond. o This is the default MLR (Robust Maximum Liklihood). This is painfully slow, even for a simple and well behaved example like the one we will estimate. o Save this till you are almost done o Use this when you need to test for the number of factors o This took 18 minutes to run. o Under the Analysis section you need to specify this estimator as shown below. TITLE: ex4.2.inp This is an example of an exploratory factor analysis with categorical factor indicators It uses weighted least squares estimation It computes tetrachoric correlations and does the Factor analysis on them. The RMSEA and chi-square Values are reported. DATA: FILE IS ex4.2.dat; VARIABLE: NAMES ARE u1-u12; CATEGORICAL ARE u1-u12; ANALYSIS: TYPE = EFA 1 4; Intro to Mplus—Alan C. Acock 12 ESTIMATOR = MLR; PROCESSORS = 4 ; You tell Mplus which variables are categorical with the CATEGORICAL subcommand of the VARIABLE command, like this: CATEGORICAL ARE u1 – u2 ; You should also change the ESTIMATOR option for the ANALYSIS command. The default estimator for categorical variables is weighted least squares o With wls it took 2 seconds o Could use this for preliminary analysis I have used MLR, Maximum Likelihood Robust. o This uses a default 7 integration points and is extremely slow to converge. o This program took almost 20 minutes for a fairly simple model o It makes it possible to compare models using a likelihood ratio test. 2.4 Selected results Mplus begins with a summary of the distribution of the categorical indicators: Next we get fit statistics for the 1 factor solution Intro to Mplus—Alan C. Acock 13 2.5 Comparing Two Solutions If you use Weighted Least Squares (WLSM) with categorical data you get a RMESA to help compare the models and can do a chi-square test as described at http://www.statmodel.com/chidiff.shtml 2.6 Comparison of Categorical and Continuous Solutions One way of evaluating the efficacy of a categorical factor analysis is its ability to reproduce the factors obtained when the data is continuous. The program ex4.1.inp estimates the same factors for the 12 items when they are continuous and we can compare the results. The low loadings on each factor are all low whether we have the continuous variables or have dichotomized the variables. The high loads are all fairly close matches. First, here is the result when the variables are continuous: Intro to Mplus—Alan C. Acock 14 GEOMIN ROTATED LOADINGS 1 2 3 ________ ________ ________ 0.637 0.008 0.074 0.808 0.022 -0.005 0.631 -0.042 -0.058 0.027 0.646 -0.002 -0.029 0.760 -0.023 0.010 0.674 0.030 -0.006 0.003 0.734 -0.040 0.002 0.727 0.049 -0.007 0.707 -0.037 0.006 -0.010 0.004 0.013 0.001 0.035 -0.036 0.008 Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 1 2 3 4 GEOMIN FACTOR CORRELATIONS 1 2 ________ ________ 1.000 -0.039 1.000 0.007 0.029 -0.002 -0.121 3 ________ 1.000 -0.028 4 ________ -0.021 0.041 -0.028 -0.018 0.017 -0.012 0.018 -0.016 -0.001 0.692 0.791 0.658 4 ________ 1.000 And here are the results for the categorical solution: GEOMIN ROTATED LOADINGS 1 ________ U1 0.628 U2 0.938 U3 0.690 U4 0.072 U5 -0.130 U6 0.015 U7 0.026 U8 -0.036 U9 0.017 U10 -0.066 U11 -0.005 U12 0.110 1 2 2 ________ -0.004 0.042 -0.073 0.705 0.805 0.602 -0.014 -0.003 0.034 -0.017 0.040 -0.069 GEOMIN FACTOR CORRELATIONS 1 2 ________ ________ 1.000 -0.026 1.000 Intro to Mplus—Alan C. Acock 3 ________ 0.098 -0.012 -0.019 -0.120 0.018 0.089 0.805 0.720 0.669 0.057 -0.056 0.022 4 ________ -0.061 0.070 -0.036 -0.005 0.016 -0.045 0.009 -0.029 0.025 0.654 0.872 0.624 3 ________ 4 ________ 15 3 4 Item 1 2 3 4 5 6 7 8 9 10 11 12 0.025 -0.029 F1 (con) .637 .808 .631 F1 (cat) .628 .938 .690 0.032 -0.150 1.000 -0.059 F2 (con) F2 (cat) .646 .760 .674 .705 .805 .602 1.000 F3 (con) F3 (cat) .734 .727 .707 .805 .720 .669 F4 (con) F4 (cat) .692 .791 .658 .654 .872 .624 There are several notes worth keeping in mind when you perform exploratory factor analysis with categorical outcome variables. Although one or more of the observed variables may be categorical, any latent variables in the model are assumed to be continuous The analysis specification and interpretation of the output, e.g., loadings & factor correlations, is the same whether one, a subset, or all observed variables are categorical. Categorical observed variables may be dichotomous or ordered categorical outcomes of more than two levels), but nominal level observed variables with more than two categories may not be used in the analysis as outcome variables using this strategy. Sample size requirements are somewhat more stringent than for continuous variables; typically you want a minimum of 200 cases (preferably more) to perform any analysis with categorical outcome variables. Mplus provides standard errors and z-tests for all loadings and correlations. SECTION 3: Confirmatory Factor Analysis What if you had an a priori hypothesis that the visual perception (Y1), cubes (Y2), and lozenges (Y3) variables belonged to a single factor whereas the paragraph (Y4), sentence (Y5), and word meaning (Y6) variables belonged to a second factor? The diagram shown below illustrates the model visually. Intro to Mplus—Alan C. Acock 16 You can test this hypothesized factor structure using confirmatory factor analysis, as shown in the next section. The first thing you want to do is look at the correlation matrix: Y1 Y2 Y3 Y4 Y5 Y6 Y1 1.000 0.524 0.475 -0.004 -0.029 0.023 Y2 Y3 Y4 Y5 Y6 1.000 0.533 -0.032 -0.040 -0.012 1.000 -0.007 -0.048 0.037 1.000 0.431 0.369 1.000 0.419 1.000 Y1-Y3 are highly correlated with each other so they might form a factor Y4-Y5 are highly correlated with each other so they might form a factor Y1-Y3 are weakly correlated with each other so there is factor separation Y2 is slighly more negative correlated with Y4 than is Y1. Y2 is slightly more negatively correlated with Y5 than is Y1. Y2 is slightly negatively correlated with Y6 and Y1 is slightly positive correlated. Intro to Mplus—Alan C. Acock 17 Compare this to the following. Y2 and Y1 have a different pattern with Y4-Y6. The single correlation between F1 and F2 could not handle this. The fit will not be very good. Y1 Y2 Y3 Y4 Y5 Y6 Y1 1.000 0.524 0.475 0.200 -0.100 0.300 Y2 Y3 Y4 Y5 Y6 1.000 0.533 -0.100 0.200 0.100 1.000 -0.007 -0.048 0.037 1.000 0.431 0.369 1.000 0.419 1.000 Consider the following correlation matrix: Y1 Y2 Y3 Y4 Y5 Y6 Y1 1.000 0.524 0.475 -0.004 -0.029 0.023 Y2 Y3 Y4 Y5 Y6 1.000 0.533 -0.032 -0.040 -0.012 1.000 0.400 -0.048 0.037 1.000 0.431 0.369 1.000 0.419 1.000 Y3 and Y4 are too correlated to be on separate factors. Factorial confounding will mean that Y1-Y4 load on F1 and Y3-Y4 load on F2. Therefore, Y3 & Y 4 are factorially confounded. 3.1 CFA with Continuous Variables TITLE: ex1.inp CFA with continuous factor indicators There are Missing values DATA: FILE IS "ex1.dat" ; VARIABLE: NAMES ARE y1-y6; MISSING ARE all (-9) ; MODEL: f1 BY y1-y3; Intro to Mplus—Alan C. Acock 18 ! OUTPUT: f2 BY y4-y6; f1 WITH f2; sampstat standardized residual patterns mod(3.84); When Mplus sees EFA it sets up the relationship in a certain way, but in a CFA, Mplus needs you to provide a MODEL: to tell it how to set up the relationships that you wish to confirm). The model is general in the sense that o You must define what parameters are estimated; o All other parameters are assumed to be fixed. o Fixed parameters are either zero or some value you set. Under VARIABLE we have defined what code is used to represent missing values. You do not need an ANALYSIS section, since we use the MODEL section to specify the model and are going with the default analysis. o This assumes full information maximum likelihood. o To do listwise deletion we would specify this in the DATA command Open Help Under data you see that you would enter Listwise = on—make sure you put it under DATA The MODEL command allows you to specify the parameters of your model. o The first line of the MODEL command shown above defines a latent factor for the first factor. o The BY keyword (an abbreviation for "measured by") is used to define the latent variables; o The latent variable name appears on the left-hand side of the BY keyword whereas the measured variables appear on the right-hand side of the BY keyword. o Mplus will fix the loading for the first indicator at 1.0 unless you tell it otherwise. Put the “best” indicator first. Similarly, in the second line of the MODEL: command a latent factor called verbal has three indicators: Y1, Y2, and Y3. The third line of MODEL: command uses the WITH keyword to correlate the F1 latent factor with the F2 latent factor. By Intro to Mplus—Alan C. Acock Measured by 19 With Correlated with We do not need F1 with F2 because that is the default. If we wanted to see how the model did with these fixed we would add the line F1 with F2@0 ; Finally, the OUTPUT command contains an added keyword, standardize. This option instructs Mplus to output standardized parameter estimate values in addition to the default unstandardized values. Selected output from the analysis appears below. Why is one loading fixed at 1.0? The default fixes the unstandardized loading of the first item after BY at 1.0 This has to do with model identification. In exploratory factor analysis the variance of the factor (latent variable) is fixed at 1.0 by the program. Given this, the program estimates the loadings. With CFA, you need to set a variance for the latent variable because the size of the loadings are scaled from the size of the variance. Setting the variance of the latent variable (factor) at 1.0 solves this problem with EFA and is an option with CFA and you get standardized loadings. But, Mplus suggests a more general approach in which you fix one of the loadings of each latent variable (factor) at 1.0. Why is this more general? One group might be more variable than another. We might find that girls not only have higher verbal skills than boys, but that they are either more homogeneous or more heterogeneous in these skills. An intervention that not only improves the mean outcome, but does so in a way that makes the distribution more homogeneous is preferred. In some cases we are interested in the variances of the latent variables as an important topic and we could not study that if we fixed the variance at 1.0. Regardless of which item you pick to fix the loading at 1, the standardized solution will always be the same because that solution rescales the variance of the latent variable to be 1 and the fully standardized solution also rescales the variance of each indicator to be 1. Intro to Mplus—Alan C. Acock 20 We should pick the strongest indicator at 1.0. This makes the results less confusing to readers because all of the loadings will be less than 1.0. If you fixed a weak indicator at 1.0, an indicator that was twice as strong would have a loading of 2.0 and that would be confusing to readers. You do not need to fix the loadings at 1, any number will identify the model equally well. 3.2 Output and Interpret 3.2.1 Missing value summary SUMMARY OF DATA Number of patterns 8 SUMMARY OF MISSING DATA PATTERNS Y1 Y2 Y3 Y4 Y5 Y6 MISSING DATA PATTERNS 1 2 3 4 5 6 x x x x x x x x x x x x x x x x x x x x x x x x x x x x 7 x x x x x 8 x x x x x MISSING DATA PATTERN FREQUENCIES Pattern Frequency Pattern Frequency 1 473 4 1 2 15 5 1 3 3 6 1 COVARIANCE COVERAGE OF DATA Minimum covariance coverage value Y1 Y2 Y3 Y4 Frequency 2 3 0.100 PROPORTION OF DATA PRESENT Covariance Coverage Y1 Y2 ________ ________ 0.994 0.990 0.996 0.992 0.994 0.984 0.986 Intro to Mplus—Alan C. Acock Pattern 7 8 Y3 ________ 0.998 0.988 Y4 ________ Y5 ________ 0.990 21 Y5 Y6 Y6 0.992 0.960 0.994 0.962 0.996 0.964 0.990 0.960 0.998 0.966 Y2 ________ Y3 ________ Y4 ________ Y5 ________ 1.968 1.043 -0.064 -0.091 -0.035 1.968 -0.037 -0.133 0.053 Y2 ________ Y3 ________ 1.000 0.530 -0.032 -0.051 -0.019 1.000 -0.019 -0.073 0.029 Covariance Coverage Y6 ________ 0.966 3.2.2 Covariances and correlations Y1 Y2 Y3 Y4 Y5 Y6 Covariances Y1 ________ 1.948 1.020 0.930 -0.020 -0.076 0.021 Y6 Covariances Y6 ________ 1.684 Y1 Y2 Y3 Y4 Y5 Y6 Correlations Y1 ________ 1.000 0.521 0.475 -0.010 -0.042 0.012 Y6 Correlations Y6 ________ 1.000 Intro to Mplus—Alan C. Acock 2.070 0.810 0.710 Y4 ________ 1.000 0.437 0.380 1.664 0.716 Y5 ________ 1.000 0.428 22 3.2.3 Model Fit THE MODEL ESTIMATION TERMINATED NORMALLY TESTS OF MODEL FIT Chi-Square Test of Model Fit Value Degrees of Freedom P-Value 3.895 8 0.8665 Chi-Square Test of Model Fit for the Baseline Model Value 589.067 Degrees of Freedom 15 P-Value 0.0000 CFI/TLI CFI TLI Loglikelihood H0 Value H1 Value Information Criteria Number of Free Parameters Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC (n* = (n + 2) / 24) 1.000 1.013 -4850.279 -4848.331 19 9738.558 9818.597 9758.290 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.000 90 Percent C.I. 0.000 Probability RMSEA <= .05 0.995 0.027 SRMR (Standardized Root Mean Square Residual) Value 0.015 3.2.4 Model result We are usually interested in the fully standardized results but the unstandardized results appear first. MODEL RESULTS--Unstandardized Intro to Mplus—Alan C. Acock 23 Two-Tailed P-Value Estimate S.E. Est./S.E. 1.000 1.123 1.019 0.000 0.098 0.088 999.000 11.430 11.532 999.000 0.000 0.000 1.000 1.032 0.869 0.000 0.129 0.105 999.000 7.972 8.316 999.000 0.000 0.000 -0.033 0.053 -0.621 0.534 Intercepts Y1 Y2 Y3 Y4 Y5 Y6 -0.017 0.030 0.037 -0.022 -0.012 0.066 0.063 0.063 0.063 0.065 0.058 0.059 -0.267 0.478 0.590 -0.336 -0.209 1.120 0.790 0.633 0.555 0.737 0.835 0.263 Variances F1 F2 0.912 0.786 0.125 0.138 7.308 5.677 0.000 0.000 Residual Variances Y1 Y2 Y3 Y4 Y5 Y6 1.041 0.803 1.012 1.287 0.861 1.077 0.095 0.100 0.095 0.123 0.112 0.098 10.977 8.044 10.612 10.449 7.664 10.992 0.000 0.000 0.000 0.000 0.000 0.000 F1 BY Y1 Y2 Y3 F2 BY Y4 Y5 Y6 F2 WITH F1 STANDARDIZED MODEL RESULTS Intro to Mplus—Alan C. Acock 24 STDYX Standardization Two-Tailed P-Value Estimate S.E. Est./S.E. 0.683 0.767 0.695 0.035 0.034 0.035 19.573 22.537 20.011 0.000 0.000 0.000 0.616 0.702 0.596 0.046 0.047 0.045 13.498 14.916 13.112 0.000 0.000 0.000 -0.039 0.062 -0.622 0.534 Intercepts Y1 Y2 Y3 Y4 Y5 Y6 -0.012 0.021 0.026 -0.015 -0.009 0.051 0.045 0.045 0.045 0.045 0.045 0.045 -0.267 0.478 0.590 -0.336 -0.209 1.120 0.790 0.633 0.555 0.737 0.835 0.263 Variances F1 F2 1.000 1.000 0.000 0.000 999.000 999.000 999.000 999.000 Residual Variances Y1 Y2 Y3 Y4 Y5 Y6 0.533 0.411 0.517 0.621 0.507 0.645 0.048 0.052 0.048 0.056 0.066 0.054 11.174 7.868 10.699 11.057 7.677 11.887 0.000 0.000 0.000 0.000 0.000 0.000 F1 BY Y1 Y2 Y3 F2 BY Y4 Y5 Y6 F2 WITH F1 STDY Standardization –ommitted— Intro to Mplus—Alan C. Acock 25 R-SQUARE Observed Variable Y1 Y2 Y3 Y4 Y5 Y6 Estimate S.E. Est./S.E. 0.466 0.582 0.483 0.387 0.496 0.369 0.049 0.054 0.050 0.056 0.065 0.055 9.483 10.849 9.700 6.883 7.605 6.747 Two-Tailed P-Value 0.000 0.000 0.000 0.000 0.000 0.000 Each unstandardized estimate represents the amount of change in the outcome variable as a function of a single unit change in the variable causing it. Different measures often have different scales, so you will often find it useful to examine the standardized coefficients when you want to compare the relative strength of associations across observed variables that are measured on different scales. Mplus provides two standardized coefficients. The first, labeled StdYX, standardizes based on latent and observed variables' variances. This standardized coefficient represents the amount of standardized change in an outcome variable per standard deviation unit of a predictor variable. Finally, the r-square output illustrates the amount of variance accounted for in the indicators. 3.2.5 Residuals Mplus output reports a residual for each variance and covariance. To simplify interpretation, it also reports a z-test (normalized residual for each variance and covarinace. Normalized Residuals for Covariances/Correlations/Residual Correlations Y1 Y2 Y3 Y4 Y5 ________ ________ ________ ________ ________ Y1 -0.001 Y2 0.000 0.000 Y3 0.007 -0.005 0.000 Y4 0.333 -0.071 0.159 0.000 Y5 -0.290 -0.400 -0.955 -0.027 -0.001 Y6 0.790 0.182 1.180 0.046 -0.006 Intro to Mplus—Alan C. Acock 26 Normalized Residuals for Covariances/Correlations/Residual Correlations Y6 ________ Y6 0.000 Because there are many tests, it would not make sense to use the 1.96 value as a significant failure. Still, we should look for a large z-score as an indicator that our model does not explain some relationship. 3.2.6 Modification indices Finally, Mplus reports modification indices because we specified mod(3.84). The 3.84 corresponds to the .05 level Use this with caution, especially on a large sample These are perameters we fixed that could improve the fit if they were free. We have no path from F1 to y6, for example The M.I. is an estimate of how much chi-square for the model would be reduced if a single parameter is made free—one at a time. Nothing would improve the fit of our model. MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index M.I. E.P.C. Std E.P.C. 3.840 StdYX E.P.C. No modification indices above the minimum value. As is the case with exploratory factor analysis of continuous outcome variables, you may want to use the mlm or mlmv estimators in lieu of the default ml estimator if your input data are not distributed joint multivariate normal by using the ESTIMATOR = option on the ANALYSIS command. The mlm option provides a mean-adjusted chisquare model test statistic whereas the mlmv option produces a mean and variance adjusted chi-square test of model fit; both options also induce Mplus to produce robust standard errors displayed in the model results table that are used to compute z tests of significance for individual parameter estimates. Intro to Mplus—Alan C. Acock 27 SECTION 4: Exploratory Factor Analysis as an Alternative to CFA Most often, when doing a CFA, a researcher uses modification indexes to modify the matrix by allowing some fixed parameters to be free. We may allow an item load on more than one factor or We may allow two items to have correlated errors. When we change a model this way it is no longer confirmatory, but exploratory. We are combining the modification indexes with our own judgement to change the model. In Mplus 5.1 and EFA alternative was introduced that can challenge CFA. The rotation will find the optimal solution and this will be a better fit than we can do by looking at a few indexes and using our own judgment. Intro to Mplus—Alan C. Acock 28 However, if the optimal solution makes no sense, then we have a different problem. With CFA we fix several paths at a value of 0.0. This results in very clean factors who have a clear meaning. However, the best guess my be a loading that is not exactly zero. Suppose you had two latent variables measured for both the husband and wife. Alternatively, you might think of these as two latent variables measured for the same person, but at two times, say one year apart. You believe that the first three measures are indicators of the first factor and the second three are indicators of the second factor. However, it is usually unreasonable to assume that all the cross loadings are 0.000. You expect them to be small, but there is no necessity to say they must be exactly zero. Intro to Mplus—Alan C. Acock 29 Consider andolescents who have three beliefs about the certainty that they will be caught and three beliefs about the severity of punishment if they are caught. You measure them at two time points, at age 15 and again at age 17. Discuss how this is different from a CFA model What results would support your thinking? 1. The three beliefs about the certainty of being caught (Y1 – Y3) would load strongly on factors 1 (measured the first year) and the same 3 measured a year later (Y7 – Y9), but have weak loadings on factors 2 (measured the first year) and 4 (measured a year later). 2. Conversely, the beliefs about the severity of punishment (Y4 – Y6; Y10 – Y12) should load strongly on factors 2 and 4 but should have relatively weak loadings on factors 1 and 3. 3. The loadings of Y1 – Y6 should be identical to the corresponding loadings of Y7 – Y12 4. The errorst E1 – E6 should be correlated with the corresponding errors in E7 – E12. Mplus calls this exploratory factor analysis because we are not fixing values at particular values, but clearly we are putting enormous constraints on the model. Mplus VERSION 5.1 MUTHEN & MUTHEN 06/30/2008 6:14 PM INPUT INSTRUCTIONS TITLE: DATA: VARIABLE: MODEL: OUTPUT: example3.inp this is an example of an EFA at two timepoints with factor loading invariance and correlated residuals across time FILE IS example3.dat; NAMES ARE y1-y12; f1-f2 BY y1-y6 (*t1 1); f3-f4 BY y7-y12 (*t2 1); f3-f4 WITH f1-f2; y1-y6 PWITH y7-y12; TECH1 STANDARDIZED; Estimator Rotation Row standardization Type of rotation THE MODEL ESTIMATION TERMINATED NORMALLY ML GEOMIN COVARIANCE OBLIQUE TESTS OF MODEL FIT Chi-Square Test of Model Fit Intro to Mplus—Alan C. Acock 30 Value Degrees of Freedom P-Value 43.990 42 0.3873 Chi-Square Test of Model Fit for the Baseline Model Value Degrees of Freedom P-Value 1265.442 66 0.0000 CFI/TLI CFI TLI 0.998 0.997 Loglikelihood H0 Value H1 Value -9396.403 -9374.409 Information Criteria Number of Free Parameters Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC (n* = (n + 2) / 24) 48 18888.807 19091.108 18938.753 RMSEA (Root Mean Square Error Of Approximation) Estimate 90 Percent C.I. Probability RMSEA <= .05 0.010 0.000 1.000 0.032 SRMR (Standardized Root Mean Square Residual) Value 0.027 MODEL RESULTS F1 Two-Tailed P-Value Estimate S.E. Est./S.E. 0.744 0.896 0.726 0.014 -0.098 0.013 0.062 0.072 0.055 0.039 0.060 0.034 11.959 12.523 13.103 0.370 -1.619 0.398 0.000 0.000 0.000 0.712 0.106 0.690 0.052 -0.016 0.005 0.058 0.053 0.018 0.898 -0.304 0.256 0.369 0.761 0.798 BY Y1 Y2 Y3 Y4 Y5 Y6 F2 BY Y1 Y2 Y3 Intro to Mplus—Alan C. Acock 31 Y4 Y5 Y6 F3 0.734 0.908 0.749 0.061 0.072 0.064 12.082 12.626 11.760 0.000 0.000 0.000 0.744 0.896 0.726 0.014 -0.098 0.013 0.062 0.072 0.055 0.039 0.060 0.034 11.959 12.523 13.103 0.370 -1.619 0.398 0.000 0.000 0.000 0.712 0.106 0.690 0.052 -0.016 0.005 0.734 0.908 0.749 0.058 0.053 0.018 0.061 0.072 0.064 0.898 -0.304 0.256 12.082 12.626 11.760 0.369 0.761 0.798 0.000 0.000 0.000 0.414 0.310 0.066 0.067 6.241 4.592 0.000 0.000 0.299 0.289 0.494 0.069 0.070 0.085 4.364 4.116 5.823 0.000 0.000 0.000 0.451 0.069 6.571 0.000 0.397 0.061 6.463 0.000 0.128 0.058 2.220 0.026 Estimate S.E. Est./S.E. 0.584 0.707 0.569 0.011 -0.077 0.010 0.043 0.049 0.038 0.030 0.048 0.026 13.475 14.449 15.086 0.370 -1.614 0.398 BY Y7 Y8 Y9 Y10 Y11 Y12 F4 BY Y7 Y8 Y9 Y10 Y11 Y12 F3 WITH F1 F2 F4 WITH F1 F2 F3 F2 WITH F1 Y1 WITH Y7 Y2 WITH Y8 STANDARDIZED MODEL RESULTS STDYX Standardization F1 BY Y1 Y2 Y3 Y4 Y5 Y6 F2 Two-Tailed P-Value 0.000 0.000 0.000 0.712 0.107 0.690 BY Intro to Mplus—Alan C. Acock 32 Y1 Y2 Y3 Y4 Y5 Y6 F3 0.046 0.042 0.014 0.042 0.051 0.043 0.898 -0.304 0.256 13.617 14.120 13.177 0.369 0.761 0.798 0.000 0.000 0.000 0.586 0.728 0.605 0.011 -0.079 0.011 0.046 0.045 0.040 0.030 0.049 0.027 12.690 16.053 15.166 0.370 -1.621 0.398 0.000 0.000 0.000 0.711 0.105 0.691 0.041 -0.013 0.004 0.565 0.730 0.586 0.046 0.043 0.015 0.041 0.054 0.040 0.897 -0.304 0.256 13.651 13.443 14.556 0.369 0.762 0.798 0.000 0.000 0.000 0.403 0.301 0.059 0.063 6.820 4.758 0.000 0.000 BY Y7 Y8 Y9 Y10 Y11 Y12 F4 BY Y7 Y8 Y9 Y10 Y11 Y12 F3 WITH F1 F2 0.041 -0.013 0.004 0.568 0.720 0.570 Beginning Time: Ending Time: Elapsed Time: 18:14:23 18:14:24 00:00:01 MUTHEN & MUTHEN 3463 Stoner Ave. Los Angeles, CA 90066 Tel: (310) 391-9971 Fax: (310) 391-8971 Web: www.StatModel.com Support: Support@StatModel.com Copyright (c) 1998-2008 Muthen & Muthen Section 5: Equality Constraints—Longitudinal CFA We often need to test for equality constraints: 1. Are items truly interchangeable. Alpha assumes that all items are equally salient to the concept being measured. That is you weight each item equally with a 1.0 weight. CFA can extend this and test it: tau equivalence—All loadings are constrained to be equal. Intro to Mplus—Alan C. Acock 33 o Compare fit of this model to a model in which they are unconstrained Parallel equavalence. Tau equivalence plus all error terms are equal o Very hard to achieve and often we can proceed without this condition 2. Compare marital satisfaction of women and men Tau equivalence o Women may weigh emotional support more than men o Men may weight sexual satisfaction more than women If tau equivalence holds the latent variable has the same meaning in both groups. o Without this equivalence we are compareing apples and oranges. Why compare means if the concent has a different meaning for each group? o Men may be more satisfied than women 3. We will focus on longitudinal CFA equivalence as this is most salient to growth models. This section summaries the example in Brown’s book on CFA A little algebra. In regresson we wrote: Y a bX e Intro to Mplus—Alan C. Acock 34 We solved for the intercept, a, using a M Y bM X rearranging this we can say M Y a bM X If we examine the figure we see that each observed variable, we will call it X, has a similar set of equations where the latent variable is the predictor. For each X X X e Where tau is the intercept, lambda the matrix of loadings, ksi is the score on the latent variable and theta is the error, and the mean of each X will be M X X X where kappa is the mean of the latent variable. This adds two sets of parameters that are not shown in the figure. Each indicator has an intercept, tau, and each latent variable has a mean, kappa. This adds 10 parameters we need to estimate, 8 intercepts and 2 latent variable means. To estimate these 10 new parameters we need to Include the means along with the covariance matrix Make some additional restrictions because we just added 8 known means, but need to estimate 10 new parameters. There are two ways of identifying these parameters. 1. We could fix the latent variable means at one time at zero and estimate the latent mean at the second, third, etc. time 2. We could fix one intercept at each wave at zero. We will use the second approach and fix the intercept of the first indicator (A1, A5) at zero. This scales satisfaction at time 1 to the mean of the first indicator at time 1, A1 Intro to Mplus—Alan C. Acock 35 This scales satisfaction at time 2 to the mean of the first indicator at time 2, B1 Instead of entering raw data (that would be fine), we will enter a row of means, a row of standard deviations, and a correlation matrix. 1.500 1.940 1.000 0.736 0.731 0.771 0.685 0.481 0.485 0.508 1.320 2.030 1.450 2.050 1.410 1.990 6.600 2.610 6.420 2.660 6.560 2.590 6.310 2.550 1.000 0.648 0.694 0.512 0.638 0.442 0.469 1.000 0.700 0.496 0.431 0.635 0.453 1.000 0.508 0.449 0.456 0.627 1.000 0.726 0.743 0.759 1.000 0.672 0.689 1.000 0.695 1.000 We need the means because we are estimating means of latent variables We need the standard deviations so Mplus can convert the correlation matrix to a covariance matrix. We estimate four models, each of which includes estimating means. The first model estimates the means imposing the same form for the model at both waves. We are not restricting loadings to be equal, intercepts to be equal, or errors to be equal. This model doesn’t make a lot of sense because if at least the loadings aren’t equal, then we are back to comparing apples to oranges. With unequal loadings, the very meaning of satisfaction changes over time with some indicators becoming more salient and others less salient. This can be interesting as, for example, sexual satisfaction may become less central and emotional supportmay become more satisfying in more mature marriages. (a number of the comments apply to latter models) 5.1 Programs for testing equality constraints Model 1 TITLE: ! ! ! DATA: VARIABLE: ANALYSIS: ! MPLUS PROGRAM FOR TIME1-TIME2 MSMT MODEL OF JOB SATISFACTION This has equality constraints on everything. This is from Brown's CFA book Equal form Equal factor loadings Equal indicator intercepts Equal indicator errors FILE IS FIG7.2.DAT; TYPE IS MEANS STD CORR; ! INDICATOR MEANS ALSO INPUTTED ! Raw data would work equally well NOBS ARE 250; NAMES ARE A1 B1 C1 D1 A2 B2 C2 D2; ESTIMATOR=ML; TYPE=MEANSTRUCTURE; ! ANALYSIS OF MEAN STRUCTURE this is the default Intro to Mplus—Alan C. Acock 36 MODEL: SATIS1 BY A1 B1 C1 D1; SATIS2 BY A2 B2 C2 D2; A1 WITH A2; B1 WITH B2; C1 WITH C2; D1 WITH D2; ! Correlated errors [A1@0]; [A2@0]; ! FIXES THE A INDICATOR INTERCEPTS TO ZERO [SATIS1*]; [SATIS2*]; ! FREELY ESTIMATES FACTOR MEANS ! [B1 B2] (4); [C1 C2] (5); [D1 D2] (6); ! Equal intercepts ! A1 A2 (7); B1 B2 (8); C1 C2 (9); D1 D2 (10); ! Equal errors OUTPUT: SAMPSTAT MODINDICES(4.00) STAND RESIDUAL; ! Notes: The Model command uses numbers to create equality constraints. ! parameters followed by (1) are equal; (2) are equal, etc. Thus, ! C! equals C2 because both share a (2) and D1 = D2 because both have (3) ! The first line is confusing. The first variable is fixed at 1.0 by ! Default. Hence A1 = A2 = 1, but the (1) does not apply to either A1 or A2 ! Things in [] are either means or intercepts depending on context ! A1 with A2 means correlate the errors ! A1 A2 (7) means equal error terms for A1 and A2 Next, we estimate the model imposing equal factor loadings. I consider this the minimum equality constraint to meaningfull comparison of means. Others would disagree with many want more constraints and with Muthén okay with what he calls “partial” invariance. Model 2 TITLE: MPLUS PROGRAM FOR TIME1-TIME2 MSMT MODEL OF JOB SATISFACTION This has equality constraints on everything. This is from Brown's CFA book Equal form Equal factor loadings ! Equal indicator intercepts ! Equal indicator errors DATA: FILE IS FIG7.2.DAT; TYPE IS MEANS STD CORR; ! INDICATOR MEANS ALSO INPUTTED ! Raw data would work equally well NOBS ARE 250; VARIABLE: NAMES ARE A1 B1 C1 D1 A2 B2 C2 D2; ANALYSIS: ESTIMATOR=ML; ! TYPE=MEANSTRUCTURE; ! ANALYSIS OF MEAN STRUCTURE this is the default MODEL: SATIS1 BY A1 B1 (1); C1 (2); D1 (3); SATIS2 BY A2 B2 (1) C2 (2); D2 (3); A1 WITH A2; B1 WITH B2; C1 WITH C2; D1 WITH D2; ! Correlated errors [A1@0]; [A2@0]; ! FIXES THE A INDICATOR INTERCEPTS TO ZERO [SATIS1*]; [SATIS2*]; ! FREELY ESTIMATES FACTOR MEANS ! [B1 B2] (4); [C1 C2] (5); [D1 D2] (6); ! Equal intercepts ! A1 A2 (7); B1 B2 (8); C1 C2 (9); D1 D2 (10); ! Equal errors OUTPUT: SAMPSTAT MODINDICES(4.00) STAND RESIDUAL; ! Notes: The Model command uses numbers to create equality constraints. ! parameters followed by (1) are equal; (2) are equal, etc. Thus, ! C! equals C2 because both share a (2) and D1 = D2 because both have (3) ! The first line is confusing. The first variable is fixed at 1.0 by Intro to Mplus—Alan C. Acock 37 ! ! ! ! Default. Hence A1 = A2 = 1, but the (1) does not apply to either A1 or A2 Things in [] are either means or intercepts depending on context A1 with A2 means correlate the errors A1 A2 (7) means equal error terms for A1 and A2 The third model imposes equal intercepts: TITLE: MPLUS PROGRAM FOR TIME1-TIME2 MSMT MODEL OF JOB SATISFACTION This has equality constraints on everything. This is from Brown's CFA book Equal form Equal factor loadings Equal indicator intercepts ! Equal indicator errors DATA: FILE IS FIG7.2.DAT; TYPE IS MEANS STD CORR; ! INDICATOR MEANS ALSO INPUTTED ! Raw data would work equally well NOBS ARE 250; VARIABLE: NAMES ARE A1 B1 C1 D1 A2 B2 C2 D2; ANALYSIS: ESTIMATOR=ML; ! TYPE=MEANSTRUCTURE; ! ANALYSIS OF MEAN STRUCTURE MODEL: SATIS1 BY A1 B1 (1) C1 (2) D1 (3); SATIS2 BY A2 B2 (1) C2 (2) D2 (3); A1 WITH A2; B1 WITH B2; C1 WITH C2; D1 WITH D2; ! Correlated errors [A1@0]; [A2@0]; ! FIXES THE A INDICATOR INTERCEPTS TO ZERO [SATIS1*]; [SATIS2*]; ! FREELY ESTIMATES FACTOR MEANS [B1 B2] (4); [C1 C2] (5); [D1 D2] (6); ! Equal intercepts ! A1 A2 (7); B1 B2 (8); C1 C2 (9); D1 D2 (10); ! Equal errors OUTPUT: SAMPSTAT MODINDICES(4.00) STAND RESIDUAL; ! Notes: The Model command uses numbers to create equality constraints. ! parameters followed by (1) are equal; (2) are equal, etc. Thus, ! C! equals C2 because both share a (2) and D1 = D2 because both have (3) ! The first line is confusing. The first variable is fixed at 1.0 by ! Default. Hence A1 = A2 = 1, but the (1) does not apply to either A1 or A2 ! Things in [] are either means or intercepts depending on context ! A1 with A2 means correlate the errors ! A1 A2 (7) means equal error terms for A1 and A2 The fourth model adds the final constraint of equal errors on the indicator variables. This is an extreme level of invariance. TITLE: DATA: MPLUS PROGRAM FOR TIME1-TIME2 MSMT MODEL OF JOB SATISFACTION This has equality constraints on everything. This is from Brown's CFA book Equal form Equal factor loadings Equal indicator intercepts Equal indicator errors FILE IS FIG7.2.DAT; Intro to Mplus—Alan C. Acock 38 TYPE IS MEANS STD CORR; ! INDICATOR MEANS ALSO INPUTTED ! Raw data would work equally well NOBS ARE 250; VARIABLE: NAMES ARE A1 B1 C1 D1 A2 B2 C2 D2; ANALYSIS: ESTIMATOR=ML; ! TYPE=MEANSTRUCTURE; ! ANALYSIS OF MEAN STRUCTURE MODEL: SATIS1 BY A1 B1 (1) C1 (2) D1 (3); SATIS2 BY A2 B2 (1) C2 (2) D2 (3); A1 WITH A2; B1 WITH B2; C1 WITH C2; D1 WITH D2; ! Correlated errors [A1@0]; [A2@0]; ! FIXES THE A INDICATOR INTERCEPTS TO ZERO [SATIS1*]; [SATIS2*]; ! FREELY ESTIMATES FACTOR MEANS [B1 B2] (4); [C1 C2] (5); [D1 D2] (6); ! Equal intercepts A1 A2 (7); B1 B2 (8); C1 C2 (9); D1 D2 (10); ! Equal errors OUTPUT: SAMPSTAT MODINDICES(4.00) STAND RESIDUAL; ! Notes: The Model command uses numbers to create equality constraints. ! parameters followed by (1) are equal; (2) are equal, etc. Thus, ! C! equals C2 because both share a (2) and D1 = D2 because both have (3) ! The first line is confusing. The first variable is fixed at 1.0 by ! Default. Hence A1 = A2 = 1, but the (1) does not apply to either A1 or A2 ! Things in [] are either means or intercepts depending on context ! A1 with A2 means correlate the errors ! A1 A2 (7) means equal error terms for A1 and A2 We can summaries these as follows: 2 Model Equal form Equal factor loadings Equal intercepts Equal error variances 2.09 3.88 7.25 90.73*** 2 diff Df 15 18 21 25 df 1.79 3 3.37 3 83.48*** 4 RMSEA .000 .000 .000 .103 CFI TLI SRMR 1.00 1.00 1.00 .96 1.01 1.01 1.01 .96 .010 .014 .026 .037 We cannot go all the way to equal indicator error variances, but we can go all the way to equal indicator intercepts before chi-square increases significantly. Here are selected results for the equal indicator intercepts model: MODEL RESULTS SATIS1 A1 B1 C1 D1 BY SATIS2 A2 BY Two-Tailed P-Value Estimate S.E. Est./S.E. 1.000 0.989 0.993 0.962 0.000 0.017 0.017 0.016 999.000 56.699 60.078 60.427 999.000 0.000 0.000 0.000 1.000 0.000 999.000 999.000 Intro to Mplus—Alan C. Acock 39 B2 C2 D2 0.989 0.993 0.962 0.017 0.017 0.016 56.699 60.078 60.427 0.000 0.000 0.000 2.547 0.321 7.923 0.000 0.723 0.117 6.187 0.000 1.023 0.162 6.298 0.000 1.031 0.158 6.515 0.000 D2 0.785 0.133 5.918 0.000 Means SATIS1 SATIS2 1.500 6.617 0.121 0.160 12.421 41.422 0.000 -0.156 -0.032 -0.039 0.000 -0.156 -0.032 -0.039 0.000 0.098 0.099 0.089 0.000 0.098 0.099 0.089 999.000 -1.583 -0.327 -0.436 999.000 -1.583 -0.327 -0.436 999.000 0.113 0.743 0.663 999.000 0.113 0.743 0.663 2.936 5.013 0.291 0.499 10.099 10.050 0.000 0.000 SATIS2 WITH SATIS1 A1 WITH A2 B1 WITH B2 C1 WITH C2 D1 WITH Intercepts A1 B1 C1 D1 A2 B2 C2 D2 Variances SATIS1 SATIS2 0.000 Note, these = M of A1 & A2 0.000 Residual Variances A1 0.711 0.101 7.060 0.000 B1 1.391 0.152 9.149 0.000 C1 1.427 0.155 9.185 0.000 D1 1.070 0.124 8.655 0.000 A2 1.428 0.188 7.609 0.000 B2 2.363 0.260 9.082 0.000 C2 2.066 0.236 8.753 0.000 D2 1.839 0.214 8.610 0.000 Why would wave 2 have bigger error variances? This should be explored STANDARDIZED MODEL RESULTS STDYX Standardization Estimate SATIS1 S.E. Est./S.E. Two-Tailed P-Value BY Intro to Mplus—Alan C. Acock 40 A1 B1 C1 D1 0.897 0.821 0.818 0.847 0.016 0.020 0.020 0.019 57.727 40.566 40.304 45.498 0.000 0.000 0.000 0.000 0.882 0.822 0.840 0.846 0.016 0.020 0.019 0.019 53.469 40.444 43.777 45.040 0.000 that all the item 0.000 variances are equal. 0.000 0.000 0.664 0.039 17.084 0.000 0.717 0.050 14.314 0.000 0.564 0.052 10.827 0.000 0.601 0.050 12.023 0.000 D2 0.560 0.055 10.149 0.000 Means SATIS1 SATIS2 0.876 2.955 0.083 0.161 10.584 18.373 0.000 0.000 0.000 -0.075 -0.016 -0.020 0.000 -0.058 -0.012 -0.015 0.000 0.048 0.048 0.046 0.000 0.036 0.037 0.035 999.000 -1.588 -0.328 -0.436 999.000 -1.587 -0.328 -0.436 999.000 0.112 0.743 0.663 999.000 0.112 0.743 0.663 Variances SATIS1 SATIS2 1.000 1.000 0.000 0.000 999.000 999.000 999.000 999.000 Residual Variances A1 B1 C1 D1 A2 B2 C2 D2 0.195 0.326 0.330 0.282 0.222 0.325 0.295 0.284 0.028 0.033 0.033 0.032 0.029 0.033 0.032 0.032 6.995 9.822 9.929 8.955 7.615 9.741 9.145 8.918 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 SATIS2 A2 B2 C2 D2 Equality only meaningful for the unstandardized solution. These will also be equal only in the case BY SATIS2 WITH SATIS1 A1 WITH A2 B1 WITH B2 C1 WITH C2 D1 WITH Intercepts A1 B1 C1 D1 A2 B2 C2 D2 Intro to Mplus—Alan C. Acock 41 SECTION 6: Path Analysis 6.1 Model and Program TITLE: DATA: VARIABLE: MODEL: ex3.11 This is an example of a path analysis with continuous dependent variables FILE IS ex3.11.dat; NAMES ARE y1-y3 x1-x3; y1 y2 ON x1 x2 x3; y3 ON y1 y2 x2; MODEL y2 y2 y2 y3 y3 Intro to Mplus—Alan C. Acock indirect: ind x1; ind x2; ind x3; ind x1; ind x2; 42 OUTPUT: y3 ind x3; standardized mod(3.84); 6.2. Indirect Effects The MODEL INDIRECT: subcommand estimates indirect effects for you You get the Total indirect effect that combines as many specific indirect effects as there are in the model Specific indirect effects of x1 go y3 include o x1 y1 y3 o x1 y2 y3 Tests of significant for both specific and total indirect effects Estimate and interpret the output: SECTION 7: Putting it Together—CFA & SEM Interpret the figure. Notice indirect effects. Intro to Mplus—Alan C. Acock 43 7.1 Program TITLE: DATA: VARIABLE: MODEL: OUTPUT: example2cfa This is an example of a SEM with CFA factors with continuous factor indicators And Indirect Effects FILE IS example2.dat; NAMES ARE y1-y12; f1 BY y1-y3; f2 by y4-y6; f3 by y7-y9; f4 BY y10-y12; f3 ON f1-f2; f4 ON f3; MODEL INDIRECT: f4 ind f1; f4 ind f2; standardized mod(3.84) 7.2 Output Mplus VERSION 5.1 MUTHEN & MUTHEN 06/30/2008 8:12 PM SUMMARY OF ANALYSIS Number Number Number Number Number of of of of of groups observations dependent variables independent variables continuous latent variables 1 500 12 0 4 Observed dependent variables Continuous Y1 Y2 Y3 Y7 Y8 Y9 Y4 Y10 Continuous latent variables F1 F2 F3 F4 Estimator Information matrix Maximum number of iterations Intro to Mplus—Alan C. Acock Y5 Y11 Y6 Y12 ML OBSERVED 1000 44 Convergence criterion Maximum number of steepest descent iterations 0.500D-04 20 TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 53.492 Degrees of Freedom 50 P-Value 0.3417 Chi-Square Test of Model Fit for the Baseline Model Value 4600.240 Degrees of Freedom 66 P-Value 0.0000 CFI/TLI CFI 0.999 TLI 0.999 Loglikelihood H0 Value -6483.831 H1 Value -6457.085 Information Criteria Number of Free Parameters 40 Akaike (AIC) 13047.662 Bayesian (BIC) 13216.247 Sample-Size Adjusted BIC 13089.284 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.012 90 Percent C.I. 0.000 0.032 Probability RMSEA <= .05 1.000 SRMR (Standardized Root Mean Square Residual) Value 0.019 MODEL RESULTS F1 Two-Tailed P-Value Estimate S.E. Est./S.E. 1.000 1.103 0.942 0.000 0.062 0.058 999.000 17.881 16.346 999.000 0.000 0.000 1.000 1.006 1.023 0.000 0.057 0.060 999.000 17.691 17.064 999.000 0.000 0.000 1.000 0.894 0.902 0.000 0.021 0.021 999.000 41.937 42.479 999.000 0.000 0.000 BY Y1 Y2 Y3 F2 BY Y4 Y5 Y6 F3 BY Y7 Y8 Y9 Intro to Mplus—Alan C. Acock 45 F4 BY Y10 Y11 Y12 F3 1.000 0.734 0.684 0.000 0.028 0.028 999.000 26.424 24.405 999.000 0.000 0.000 0.640 0.912 0.069 0.074 9.271 12.399 0.000 0.000 0.546 0.032 16.975 0.000 0.297 0.038 7.767 0.000 0.599 0.618 0.061 0.064 9.766 9.717 0.000 0.000 0.367 0.296 0.412 0.400 0.340 0.392 0.183 0.191 0.181 0.240 0.183 0.213 0.525 0.565 0.033 0.033 0.033 0.034 0.031 0.034 0.019 0.017 0.017 0.027 0.017 0.018 0.049 0.049 11.044 8.946 12.309 11.640 10.888 11.370 9.799 11.268 10.812 8.746 10.791 11.998 10.636 11.488 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Estimate S.E. Est./S.E. Two-Tailed P-Value 0.787 0.843 0.751 0.023 0.020 0.025 34.084 41.362 30.614 0.000 0.000 0.000 0.779 0.805 0.789 0.023 0.021 0.022 34.055 37.480 35.305 0.000 0.000 0.000 0.948 0.006 153.038 0.000 ON F1 F2 F4 ON F3 F2 WITH F1 Variances F1 F2 Residual Variances Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 F3 F4 STANDARDIZED MODEL RESULTS STDYX Standardization F1 BY Y1 Y2 Y3 F2 BY Y4 Y5 Y6 F3 BY Y7 Intro to Mplus—Alan C. Acock 46 Y8 Y9 F4 0.007 0.007 131.246 136.242 0.000 0.000 0.902 0.869 0.835 0.013 0.014 0.017 70.200 59.982 50.003 0.000 0.000 0.000 0.388 0.561 0.039 0.036 10.057 15.610 0.000 0.000 0.680 0.027 24.795 0.000 0.488 0.043 11.343 0.000 1.000 1.000 0.000 0.000 999.000 999.000 999.000 999.000 0.380 0.289 0.437 0.393 0.352 0.378 0.101 0.128 0.120 0.186 0.244 0.302 0.322 0.538 0.036 0.034 0.037 0.036 0.035 0.035 0.012 0.013 0.013 0.023 0.025 0.028 0.031 0.037 10.450 8.397 11.862 11.018 10.197 10.713 8.572 9.598 9.287 8.010 9.698 10.828 10.421 14.433 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 BY Y10 Y11 Y12 F3 ON F1 F2 F4 ON F3 F2 0.934 0.938 WITH F1 Variances F1 F2 Residual Variances Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 F3 F4 TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS STANDARDIZED TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value Effects from F1 to F4 Total Total indirect Specific indirect F4 F3 F1 Effects from F2 to F4 Total Intro to Mplus—Alan C. Acock 0.263 0.263 0.028 0.028 9.269 9.269 0.000 0.000 0.263 0.028 9.269 0.000 0.382 0.029 12.999 0.000 47 Total indirect Specific indirect F4 F3 F2 0.382 0.029 12.999 0.000 0.382 0.029 12.999 0.000 MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index M.I. E.P.C. Std E.P.C. BY Statements F3 BY Y1 5.980 0.103 0.131 WITH Statements Y3 WITH Y2 6.126 0.091 0.091 Y4 WITH Y3 5.405 0.053 0.053 Y5 WITH Y3 5.265 -0.049 -0.049 Y8 WITH Y2 5.695 -0.037 -0.037 Y9 WITH Y6 4.801 0.035 0.035 Beginning Time: Ending Time: Elapsed Time: 3.840 StdYX E.P.C. 0.134 0.260 0.130 -0.132 -0.155 0.133 20:12:48 20:12:49 00:00:01 7.3 Interpretation of modification indices We could reduce Chi-square, which now is Chi-square(50) = 53.492, by about 5.265 if we allowed the error term for Y5 to be correlated with the error term for Y3. The correlation of the two errors would be about -.132—does this make sense? We would do these one at a time We would only do it if it made sense. Say Y5 and Y3 are pen and pencil tests and all the others are face to face interviews. There might be a method effect that we could incorporate as an error term We might not have much to gain even if there is a big modification index if the fit is already good. New Chi-square would be approximately Chi-square(49) = 53.492 – 5.265. A reduction in Chi-square of 5.265 with one degree of freedom would be highly significant. Not much need to improve on a CFI = .997; RMSEA = .012; Intro to Mplus—Alan C. Acock 48 SECTION 8: Putting it Together—EFA & SEM We may have a situation where we are sufficiently confident to have F3 and F4 represented by a CFA model, but not that confident about F1 and F2 for which we want to do an EFA. 8.1 Program & model Here are the program and results: The (*1) in the Model line for f1-f2 by y1-y6 (*1); is included so Mplus knows this is an EFA set. We expect y1-y3 to have strong loadings on f1 and weak loadings on f2. We expect y4-y6 to have weak loadings on f1 and strong loadings on f2. Still, we are not sufficiently confident of this to impose the restriction that these loadings are exactly 0.000. Intro to Mplus—Alan C. Acock 49 8.2 Output TITLE: DATA: VARIABLE: MODEL: OUTPUT: example2.inp This is an example of a SEM with EFA and CFA factors with continuous factor indicators FILE IS example2.dat; NAMES ARE y1-y12; f1-f2 BY y1-y6 (*1); f3 BY y7-y9; f4 BY y10-y12; f3 ON f1-f2; f4 ON f3; MODEL INDIRECT: f4 ind f1; f4 ind f2; Standardized mod(3.84) Mplus VERSION 5.1 MUTHEN & MUTHEN 06/30/2008 8:32 PM TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 51.353 Degrees of Freedom 46 P-Value 0.2720 Chi-Square Test of Model Fit for the Baseline Model Value 4600.240 Degrees of Freedom 66 P-Value 0.0000 CFI/TLI CFI 0.999 TLI 0.998 Loglikelihood H0 Value -6482.762 H1 Value -6457.085 Information Criteria Number of Free Parameters 44 Akaike (AIC) 13053.524 Bayesian (BIC) 13238.966 Sample-Size Adjusted BIC 13099.308 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 0.015 90 Percent C.I. 0.000 0.034 Probability RMSEA <= .05 1.000 Intro to Mplus—Alan C. Acock 50 SRMR (Standardized Root Mean Square Residual) Value 0.018 MODEL RESULTS F1 S.E. Est./S.E. 0.751 0.858 0.736 0.036 -0.028 0.002 0.048 0.042 0.045 0.051 0.049 0.004 15.608 20.467 16.353 0.711 -0.568 0.627 0.000 0.000 0.000 0.477 0.570 0.530 0.034 -0.002 -0.008 0.763 0.810 0.802 0.045 0.016 0.035 0.050 0.048 0.041 0.755 -0.150 -0.220 15.367 16.837 19.461 0.450 0.881 0.826 0.000 0.000 0.000 1.000 0.894 0.902 0.000 0.021 0.021 999.000 41.937 42.479 999.000 0.000 0.000 1.000 0.734 0.684 0.000 0.028 0.028 999.000 26.424 24.405 999.000 0.000 0.000 0.493 0.721 0.058 0.057 8.461 12.752 0.000 0.000 0.546 0.032 16.975 0.000 0.479 0.053 9.094 0.000 1.000 1.000 0.000 0.000 999.000 999.000 999.000 999.000 0.376 0.290 0.406 0.408 0.329 0.393 0.183 0.191 0.034 0.035 0.034 0.035 0.033 0.035 0.019 0.017 11.064 8.239 11.817 11.742 10.046 11.073 9.796 11.269 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 BY Y1 Y2 Y3 Y4 Y5 Y6 F2 BY Y1 Y2 Y3 Y4 Y5 Y6 F3 BY Y7 Y8 Y9 F4 BY Y10 Y11 Y12 F3 ON F1 F2 F4 ON F3 F2 Two-Tailed P-Value Estimate WITH F1 Variances F1 F2 Residual Variances Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Intro to Mplus—Alan C. Acock 51 Y9 Y10 Y11 Y12 F3 F4 0.181 0.240 0.183 0.213 0.527 0.565 0.017 0.027 0.017 0.018 0.049 0.049 10.812 8.746 10.791 11.998 10.644 11.488 0.000 0.000 0.000 0.000 0.000 0.000 Estimate S.E. Est./S.E. 0.764 0.848 0.758 0.036 -0.028 0.002 0.037 0.024 0.033 0.051 0.050 0.003 20.741 34.915 23.068 0.711 -0.568 0.627 0.000 0.000 0.000 0.477 0.570 0.530 0.034 -0.002 -0.008 0.756 0.825 0.787 0.046 0.015 0.036 0.037 0.035 0.023 0.755 -0.150 -0.220 20.282 23.257 33.668 0.450 0.881 0.826 0.000 0.000 0.000 0.948 0.934 0.938 0.006 0.007 0.007 153.043 131.230 136.226 0.000 0.000 0.000 0.902 0.869 0.835 0.013 0.014 0.017 70.200 59.982 50.002 0.000 0.000 0.000 0.386 0.565 0.043 0.038 8.914 14.919 0.000 0.000 0.680 0.027 24.796 0.000 0.479 0.053 9.094 0.000 0.008 0.031 0.007 0.074 0.071 0.045 0.045 0.045 0.045 0.045 0.183 0.688 0.146 1.657 1.590 0.855 0.491 0.884 0.098 0.112 STANDARDIZED MODEL RESULTS STDYX Standardization F1 Two-Tailed P-Value BY Y1 Y2 Y3 Y4 Y5 Y6 F2 BY Y1 Y2 Y3 Y4 Y5 Y6 F3 BY Y7 Y8 Y9 F4 BY Y10 Y11 Y12 F3 ON F1 F2 F4 ON F3 F2 WITH F1 Intercepts Y1 Y2 Y3 Y4 Y5 Intro to Mplus—Alan C. Acock 52 Y6 Y7 Y8 Y9 Y10 Y11 Y12 Variances F1 F2 Residual Variances Y1 Y2 Y3 Y4 Y5 Y6 Y7 Y8 Y9 Y10 Y11 Y12 F3 F4 R-SQUARE Latent Variable F3 F4 0.068 0.044 0.050 0.056 0.008 0.028 0.025 0.045 0.045 0.045 0.045 0.045 0.045 0.045 1.528 0.983 1.115 1.252 0.170 0.616 0.554 0.126 0.326 0.265 0.211 0.865 0.538 0.580 1.000 1.000 0.000 0.000 999.000 999.000 999.000 999.000 0.390 0.283 0.431 0.401 0.341 0.378 0.101 0.128 0.120 0.186 0.244 0.302 0.323 0.538 0.037 0.036 0.038 0.036 0.036 0.036 0.012 0.013 0.013 0.023 0.025 0.028 0.031 0.037 10.510 7.793 11.385 11.149 9.459 10.467 8.570 9.599 9.287 8.009 9.698 10.828 10.432 14.433 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 Estimate 0.677 0.462 S.E. 0.031 0.037 Est./S.E. 21.887 12.398 Two-Tailed P-Value 0.000 0.000 STANDARDIZED TOTAL, TOTAL INDIRECT, SPECIFIC INDIRECT, AND DIRECT EFFECTS STDYX Standardization Two-Tailed Estimate S.E. Est./S.E. P-Value Effects from F1 to F4 Total 0.263 0.031 8.353 0.000 Total indirect 0.263 0.031 8.353 0.000 Specific indirect F4 F3 F1 0.263 0.031 8.353 0.000 Effects from F2 to F4 Total Intro to Mplus—Alan C. Acock 0.384 0.030 12.590 0.000 53 Total indirect Specific indirect F4 F3 F2 0.384 0.030 12.590 0.000 0.384 0.030 12.590 0.000 MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index M.I. E.P.C. Std E.P.C. BY Statements F3 BY Y1 6.537 0.144 0.184 WITH Statements Y3 WITH Y2 5.262 0.115 0.115 Y4 WITH Y1 4.954 -0.052 -0.052 Y4 WITH Y3 5.288 0.055 0.055 Y5 WITH Y3 4.367 -0.049 -0.049 Y8 WITH Y2 5.716 -0.037 -0.037 Y9 WITH Y6 4.853 0.036 0.036 Beginning Time: Ending Time: Elapsed Time: 3.840 StdYX E.P.C. 0.187 0.336 -0.133 0.135 -0.133 -0.157 0.133 20:32:33 20:32:33 00:00:00 Section 9: Summary & Resources This provides a brief introduction to Mplus. We have not covered any of the statistical theory underlying Mplus, but this should be enough for you to read the Manual and follow more complex explications of Mplus and SEM. Key things to remember: 1. BY means measured by and is the path (loading) between latent variables and their indicators. 2. ON is the structural path between variables. In last example, F4 depends ON F3, F3 depends ON both F1 and F2. 3. WITH means correlated with. Two uses include: a. For exogenous variables WITH means the exogenous variables are correlated. In last example, F1 is correlated WITH F2. b. For indicators WITH means the errors/residuals are correlated. In last examples, the modification indices suggest we might correlate the error for Y3 WITH the error for Y5. Intro to Mplus—Alan C. Acock 54 Additional introductory content is available at: http://www.ats.ucla.edu/stat/mplus/ The Mplus webpage has a wealth of support that includes articles applying Mplus that serve as models and extensive on line videos http://www.statmodel.com A very thorough, but gentle discussion of CFA (including sample progrms) is Brown, Timothy A. (2006). Confirmatory Factor Analysis for Applied Research. N.Y.: Guilford A gentle introduction to SEM is Kline, Rex B. (2005). Principles and practice of structural equation modeling (2nd ed.). N.Y.: Guildford Press. Intro to Mplus—Alan C. Acock 55