GROWTH CURVES AND EXTENSIONS USING MPLUS Alan C. Acock alan.acock@oregonstate.edu Department of HDFS 322 Milam Hall Oregon State University Corvallis, OR 97331 This document and selected references, data, and programs can be downloaded from http://oregonstate.edu/~acock/growth Growth Curve and Related Models, Alan C. Acock 1 GROWTH CURVES AND EXTENSIONS USING MPLUS Outline Topic Page 1 Brief summary of topics 3 2 A growth curve 4 3 Quadratic terms in growth curves 16 4 An alternative—developmental time 20 5 Working with missing values 21 6 Multiple cohort growth model with missing waves 28 7 Multiple group models with growth curves 30 8 Alternative to multiple group analysis 37 9 Growth curves with time invariant covariates 43 10 Mediational models with time invariant covariates 52 11 Time varying covariates 52 12 References 54 Goal of the Workshop The goal of this workshop is to explore a variety of applications of latent growth curve models using the Mplus program. Because we will cover a wide variety of applications and extensions of growth curve modeling, we will not cover each of them in great detail. At the end of this workshop it is hoped that participants will be able to run Mplus programs to execute a variety of growth curve modeling applications and to interpret the results correctly. Assumed Background Participants should be familiar with the content in Introduction to Mplus that is located at www.oregonstate.edu/~acock/growth . It will be assumed that participants in the workshop have some background in Structural Equation Modeling. Background in multilevel analysis will also be useful, but is not assumed. It is possible to learn how to estimate the specific models we will cover Growth Curve and Related Models, Alan C. Acock 2 without a comprehensive knowledge of Mplus, but some background using an SEM program is useful. 1 Brief Summary of Topics Introduction to Growth Curve Modeling Growth Curves are a new way of thinking that is ideal for longitudinal studies. Instead of predicting a person’s score on a variable (e.g., mean comparison among scores at different time points or relationships among variables at different time points), we predict their growth trajectory—what is their level on the variable AND how is this changing. We will present a conceptual model, show how to apply the Mplus program, and interpret the results. Once we can estimate growth trajectories, the more interesting issues of explaining individual differences in trajectories (why some people go up, down, or stay the same). More advanced topics we will introduce include: 1. Growth Curves with Limited Outcome Variables Sometimes a researcher is interested in growth on a binary variable (Ever drinking alcohol for adolescents). Some times a researcher is interested in a count variable that involves a relatively rare event (Number of days an adolescent has 5+ drinks of alcohol in the last 30 days). Sometimes we are interested in both types of variables. Different variables may predict the binary variable than predict the count variable. We will show how to do this using Mplus and interpret the results. 2. Growth Mixture Models It is possible to use Mplus to do an exploratory growth curve analysis where our focus is on the person and not the variable. We can locate clusters of people who share similar growth trajectories. This is exploratory research and the standards for it are still evolving. An example would be a study of alcohol consumption from age 15 to 30. It is possible to empirically identify different clusters of people. One cluster may never drink or never drink very much. A second cluster may have increasing alcohol consumption up to about 22 or 23 and then a gradual decline. A third cluster may be very similar to the second cluster but not decline after 23. After deriving these clusters of people who share growth trajectories, it is possible to compare them to find what differentiates membership in the different clusters. We will show how to do these analyses using Mplus and interpret the results. Growth Curve and Related Models, Alan C. Acock 3 2 A Growth Curve Estimating a basic growth curve using Mplus is quite easy. When developing a complex model it is best to start easy and gradually build complexity. Starting easy should include data screening to evaluate the distributions of the variables, patterns of missing values, and possible outliers. Even if you have a theoretically specified model that is complex, always start with the simplest model and gradually add the complexity. Here we will show how structural equation modeling conceptualizes a latent growth curves, show the Mplus program, explain the new program features, and interpret the output. Before showing a figure to represent a growth curve, we examine a small sample of our observations: A BMI value of 25 is considered overweight and a BMI of 30 is considered obese (I’m aware of problems with the BMI as a measure of obesity and with its use for adolescents) With just 10 observations it is hard to see much of a trend, but it looks like people are getting a bigger BMI score as they get older. The X-axis value of 0 is when the adolescent was 12 years old; the 1 is when the adolescent was 13 years old, etc. We are using seven waves of data (labeled 0 to 6) from the panel study. Growth Curve and Related Models, Alan C. Acock 4 A growth curve requires us to have a model and we should draw this before writing the Mplus program. Figure 1 shows a model for our simple growth curve: RI RS Intercept 1 0 1 1 1 Slope 1 1 2 1 3 1 4 5 6 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 e97 e98 e90 e00 e01 e02 e03 This figure is much simpler than it first appears. The key variables are the two latent variables labeled the Intercept and the Slope. The Intercept a. The intercept represents the initial level and is sometimes called the initial level for this reason. It is the estimated initial level and its value may differ from the actual mean for BMI97 because in this case we have a linear growth model. b. It may differ from the mean of BMI97 when covariates are added, expecially when a zero value on covariates is rare. c. Unless the covariates are centered, it usually makes sense to just call it an intercept rather than the initial level. d. The intercept is identified by the constant loadings of 1.0 going to each BMI score. Some programs call the intercept the constant, representing the constant effect. The slope Growth Curve and Related Models, Alan C. Acock 5 a. Is identified by fixing the values of the paths to each BMI variable. In a publication you normally would not show the path to BMI97, since this is fixed at 0.0. b. We fix the other paths at 1.0, 2,0, 3.0, 4.0, 5.0, and 6.0. Where did we get these values? The first year is the base year or year zero. The BMI was measured each subsequent year so these are scored 1.0 through 6.0. c. Other values are possible. Suppose the survey was not done in 2000 or 2001 so that we had 5 time points rather than 7. We would use paths of 0.0, 1.0, 2.0, 5.0, and 6.0 for years 1997, 1998, 1997, 2002, and 2003, respectively. d. It is also possible to fix the first couple years and then allow the subsequent waves to be free. - This might make sense for a developmental process where the yearly intervals may not reflect the developmental rate. Developmental time may be quite different than chronological time. - This has the effect of “stretching” or “shrinking” time to the pattern of the data (Curran & Hussong, 2003). - An advantage of this approach is that it uses fewer degrees of freedom than adding a quadratic slope. Residual Variance and Random Effects a. The individual variation around the Intercept and Slope are represented in Figure 1 by the R I and RS. These are the variance in the intercept and slope around their respective means. b. We expect substantial variance in both of these as some individuals have a higher or lower starting BMI and some individuals will increase (or decrease) their BMI at a different rate than the average growth rate. c. In addition to the mean intercept and slope, each individual will have their own intercept and slope. We say the intercept and the slope are random effects since they may vary across individuals. d. They are random in the sense that each individual may have a steeper or flatter slope than the mean slope and e. Each individual may have a higher or lower initial level than the mean intercept. f. In our sample of 10 individuals shown above, notice one adolescent starts with a BMI around 12 and three adolescents start with a BMI around 30. Some childrent have a BMI that increases and others do not. g. The variances, RI and RS are critical if we are going to explore more complex models with covariates (e.g., gender, psychological problems, race) that might explain why some individuals have a steeper or less steep growth rate than the average. The ei terms represent individual error terms for each year. Some years may move above or below the growth trajectory described by our Intercept and Slope. Sometimes it might be important to allow error terms to be correlated, especially subsequent pairs such as e97-e98, e98-e99, etc. Growth Curve and Related Models, Alan C. Acock 6 Here is the Mplus program: Title: bmi_growth.inp Basic growth curve Data: File is "C:\Mplus examples\bmi_stata.dat" ; Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; usevariables is limited to bmi variables Usevariables are bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; ! Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; Output: Sampstat Mod(3.84); Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); What is new compared to an SEM program? Usevariables are: subcommand to only include the bmi variables since we are doing a growth curve for these variables. We drop the Analysis: section because we are doing basic growth curve and can use the default options. We have a Model: section because we need to describe the model. Mplus was designed after growth curves were well understood. There is a single line to describe our model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; a. In this line the “i” and “s” stand for the intercept and slope. We could have called these anything such as intercept and slope or initial and trend. The vertical line, | , tells Stata that it is about to define an intercept and slope. Growth Curve and Related Models, Alan C. Acock 7 b. Defaults - The intercept is defined by a constant of 1.0 for each bmi variable. Interceptbmij path is 1.0. - The slope is defined by fixing the path from the slope to bmi97 at 0, the path to bmi98 at 1, etc. The @ sign is used for “at.” Don’t forget the semi-colon to end the command. - Mplus assumes that there is a residual variance for both the intercept and slope (RI and RS) and that these covary. Therefore, we do not need to mention this - Mplus assumes there is uncorrelated random error, ei for each observed variable c. To allow e97 and e98 to be correlated we would need to add a line saying bmi97 with bmi98; . - This may seem strange because we are not really correlating bmi97 with bmi98, but e97 with e98. Mplus knows this and we do not need to generate a separate set of names for the error terms. The last additional section in our Mplus program is for selecting what output we want Mplus to provide. There are many optional outputs of the program and we will only illustrate a few of these. The Output: section has the following lines Output: Sampstat Mod(3.84); Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); The first line, Sampstat Mod(3.84) asks for sample statistics and modification indices for parameters we might free, as long as doing so would reduce chi-square by 3.84 (corresponding to the .05 level). We do not bother with parameter estimates that would have less effect than this. Next comes the Plot: subcommand and we say that we want Type is Plot3; for our output. This gives us the descriptive statistics and graphs for the growth curve. The last line of the program specifies the series to plot. By entering the variables with an (*) at the end we are setting a path at 0.0 for bmi97, 1.0 for bmi98, etc. Growth Curve and Related Models, Alan C. Acock 8 Annotated Selected Growth Curve Output The following is selected output with comments: Number of observations 1102 ! listwise, an alternative is FIML estimation Number of dependent variables 7 !these are the bmi scores Number of independent variables 0 Number of continuous latent variables 2 !these are the intercept and slope Continuous latent variables I S !These are the only latent variables Estimator ML TESTS OF MODEL FIT !These have the standard interpretations. It is okay if the fit is not perfect here because when we add the covariates we may get a better fit. The chi-square is significant as it usually is for a large sample because any model is not likely to be a perfect fit for data. However, the CFI = .977 and TLI = .979 are both in the very good range (i.e., over .96 is very good). The RMSEA is .088 and this is not very good. Ideally, this should be below .06, and a value that is not below .08 is considered problematic. The Standardized RMSR = .048 is acceptable (less than .05) Chi-Square Test of Model Fit Value Degrees of Freedom P-Value 220.570 23 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value Degrees of Freedom P-Value Growth Curve and Related Models, Alan C. Acock 8568.499 21 0.0000 9 CFI/TLI CFI TLI 0.977 0.979 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.088 90 Percent C.I. 0.078 Probability RMSEA <= .05 0.000 0.099 SRMR (Standardized Root Mean Square Residual) Value 0.048 MODEL RESULTS Estimates S.E. ! the I and S are all fixed so no tests for them. I | BMI97 1.000 0.000 BMI98 1.000 0.000 BMI99 1.000 0.000 BMI00 1.000 0.000 BMI01 1.000 0.000 BMI02 1.000 0.000 BMI03 1.000 0.000 S Est./S.E. 0.000 0.000 0.000 0.000 0.000 0.000 0.000 | BMI97 0.000 0.000 0.000 BMI98 1.000 0.000 0.000 BMI99 2.000 0.000 0.000 BMI00 3.000 0.000 0.000 BMI01 4.000 0.000 0.000 BMI02 5.000 0.000 0.000 BMI03 6.000 0.000 0.000 ! The slope and intercept are correlated, the covariance is ! .416, z = 5.551, p < .001 (WITH means covariance in Mplus) S WITH I 0.416 0.075 5.551 Growth Curve and Related Models, Alan C. Acock 10 Means I 20.798 0.117 178.026 !Initial level, intercept = 20.798, (BMI starts at 20.798) z = 178.026; p < .001 !Slope = .668 (BMI goes up .668 each year), z = 35.183; p < .001 S 0.668 0.019 35.183 Intercepts BMI97 0.000 0.000 0.000 BMI98 0.000 0.000 0.000 BMI99 0.000 0.000 0.000 BMI00 0.000 0.000 0.000 BMI01 0.000 0.000 0.000 BMI02 0.000 0.000 0.000 BMI03 0.000 0.000 0.000 ! Variances, Ri and Rs in the figure, are both significant. This is what covariates will try to explain—why do some youth start higher/lower and have a different trend, i.e., slope, for the BMI? Variances I 13.184 0.643 20.504 S 0.213 0.018 12.147 ! Following are the residual variances for the observed variables; hence they are the errors, ei’s in our figure. Residual Variances BMI97 5.391 0.290 18.583 BMI98 2.729 0.159 17.124 BMI99 2.697 0.144 18.752 BMI00 3.529 0.178 19.860 BMI01 2.334 0.144 16.187 BMI02 9.533 0.457 20.837 BMI03 7.134 0.397 17.956 MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index M.I. E.P.C. Std E.P.C. 3.840 StdYX E.P.C. ! Many of these changes make no sense. We could let the path of the slope to BMI03 be free and chi-square would drop by about 45 points. BY Statements Growth Curve and Related Models, Alan C. Acock 11 I I I I S S S S BY BY BY BY BY BY BY BY BMI97 BMI99 BMI00 BMI03 BMI97 BMI99 BMI00 BMI03 87.808 25.404 21.840 29.103 55.850 17.773 18.572 44.611 -0.038 0.013 0.014 -0.026 -0.870 0.315 0.352 -0.915 -0.139 0.049 0.050 -0.093 -0.402 0.145 0.162 -0.423 -0.032 0.011 0.011 -0.016 -0.093 0.034 0.035 -0.074 ! When Mplus has a value it can’t compute it prints 999.000. Normally ignore these ON/BY Statements S I ON I BY S / 999.000 0.000 0.000 0.000 ! These “with” statements are for correlated errors. Some make sense, some don’t. WITH Statements BMI99 BMI99 BMI00 BMI00 BMI01 BMI01 BMI01 BMI02 BMI02 BMI02 BMI02 BMI03 BMI03 BMI03 BMI03 WITH WITH WITH WITH WITH WITH WITH WITH WITH WITH WITH WITH WITH WITH WITH BMI97 BMI98 BMI97 BMI99 BMI97 BMI98 BMI00 BMI97 BMI99 BMI00 BMI01 BMI97 BMI99 BMI00 BMI02 4.993 8.669 3.912 17.357 8.255 7.032 12.398 4.707 5.455 9.829 4.305 36.224 9.296 8.824 8.242 -0.349 0.362 -0.322 0.503 -0.421 -0.300 0.447 0.560 -0.431 -0.649 0.413 1.488 -0.525 -0.583 0.931 -0.349 0.362 -0.322 0.503 -0.421 -0.300 0.447 0.560 -0.431 -0.649 0.413 1.488 -0.525 -0.583 0.931 -0.019 0.020 -0.016 0.026 -0.021 -0.015 0.021 0.023 -0.018 -0.025 0.015 0.060 -0.021 -0.022 0.029 ! We do not pay much attention to these intercepts because Mplus automatically fixes them at zero. Before freeing these, it would make more sense to free some of the coefficients for slopes, e.g., 0, 1, *, *, *, * or to try a quadratic slope as discussed in a latter section. Means/Intercepts/Thresholds [ [ [ [ BMI97 BMI99 BMI00 BMI03 ] ] ] ] 79.520 19.737 17.444 23.066 Growth Curve and Related Models, Alan C. Acock -0.770 0.250 0.257 -0.483 -0.770 0.250 0.257 -0.483 -0.179 0.058 0.056 -0.084 12 PLOT INFORMATION The following plots are available: Histograms (sample values, estimated factor scores, estimated values) Scatterplots (sample values, estimated factor scores, estimated values) Sample means Estimated means Sample and estimated means Observed individual values Estimated individual values Here are Some of the Available Plots It is often useful to show the actual means for a small random sample of participants. These are Sample Means. Click on Graphs Observed Individual Values This gives you a menu where you can make some selections. I used the clock to seed a random generation of observations. Growth Curve and Related Models, Alan C. Acock 13 Here I selected Random Order and for 20 cases. This results in the following graph: This shows one person who started at an obese BMI = 30 and then dropped down. However, most people increased gradually. Next, let’s look at a plot of the actual means and the estimated means using our linear growth model. Click on Graphs and then select Sample and estimated means. Growth Curve and Related Models, Alan C. Acock 14 You can improve this graph. You might click on the legend and move it so it is not over the trend lines. You can right click inside the graph and add labels for the X axis and Y axis. You can change the labels, and you can adjust the range for each axis. Notice that there is a clear growth trend in BMI. A BMI of 15-20 is considered healthy and a BMI of 25 is considered overweight. Notice what happens to American youth between the age of 12 and the age of 18. Growth Curve and Related Models, Alan C. Acock 15 3 A Growth Curve with a Quadratic Term This graph is useful to seeing if there is a nonlinear trend. It is simple to add a quadratic term, if the curve is departing from linearity. Looking at the graph it may seem that the linear trend works very well, but our RMSEA was a bit big and the estimated initial BMI is higher than the observed mean. A quadratic might pick this up by having a curve that drops slightly to pick up the BMI97 mean. Estimation requires at least 4 waves of data, but more waves are highly desirable for a good test of the quadratic term. The conceptual model in Figure 1 will be unchanged except a third latent variable is added. We will have the Intercept, Slope, now called linear trend), and the new latent variable called the Quadratic trend. Like the first two, the Quadratic trend will have a residual variance (RQ) that will freely correlated with RI and RL. The paths from the quadratic trend to the individual BMI variables will be the square of the path from the Linear trend to the BMI variables. Hence a. The values for the linear trend will remain 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, and 6.0. b. For the quadratic these values will be 0.0, 1.0, 4.0, 9.0, 16.0, 25.0, and 36.0. RL RI RQ Intercept 1 0 Linear 1 1 4 1 1 Quadratic 1 1 2 1 3 1 4 9 16 25 5 36 6 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 e97 e98 e90 e00 e01 e02 e03 Growth Curve and Related Models, Alan C. Acock 16 You really appreciate the defaults in Mplus when you see what we need to change in the Mplus program when we add a quadratic slope. Here is the only change we need to make: Model: i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; Mplus will know that the quadratic, q (we could use any name) will have values that are the square of the values for the slope, s. Title: bmi_guadratic.inp Quadratic growth curve Data: File is "C:\Mplus examples\bmi_stata.dat" ; Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; usevariables is limited to bmi variables Usevariables are bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; ! Model: i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; Output: Sampstat Mod(3.84); Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); Here is selected output: TESTS OF MODEL FIT ! We have lost 4 degrees of freedom mean for the quadratic slope, Growth Curve and Related Models, Alan C. Acock 17 variance for the quadratic slope, Rq, covariance of the Rq with Ri covariance with Rq with Rs ! The fit is excellent. Chi-Square Test of Model Fit Value Degrees of Freedom P-Value 61.791 !Was 220.570 19 !Was 23 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 8568.499 Degrees of Freedom 21 P-Value 0.0000 CFI/TLI CFI 0.995 !.977 TLI 0.994 !.979 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.045 !.088 90 Percent C.I. 0.033 0.058 Probability RMSEA <= .05 0.715 SRMR (Standardized Root Mean Square Residual) Value 0.022 MODEL RESULTS ! Results for I and S are same as before. The paths for Q are simply the squared values Q | BMI97 0.000 0.000 0.000 BMI98 1.000 0.000 0.000 BMI99 4.000 0.000 0.000 BMI00 9.000 0.000 0.000 BMI01 16.000 0.000 0.000 BMI02 25.000 0.000 0.000 BMI03 36.000 0.000 0.000 S WITH I 0.575 0.220 2.616 Q WITH Growth Curve and Related Models, Alan C. Acock 18 I -0.038 0.034 -1.116 S -0.130 0.021 -6.324 ! The Negative slope, -.064, for quadratic suggests a leveling off of the growth curve. Means I 20.439 0.118 173.266 S 1.045 0.049 21.108 Q -0.064 0.008 -8.183 Variances I 12.381 0.671 18.462 S 0.984 0.134 7.357 Q 0.023 0.004 6.412 Residual Variances BMI97 4.318 0.316 13.660 BMI98 2.789 0.158 17.613 BMI99 2.442 0.141 17.357 BMI00 3.187 0.173 18.418 BMI01 2.354 0.147 16.022 BMI02 9.521 0.454 20.948 BMI03 4.989 0.491 10.157 The fit is so good because the estimated means and observed means are so close. However, there is still significance variance among individual adolescents that needs to be explained. Here are 20 estimated individual growth curves. Growth Curve and Related Models, Alan C. Acock 19 a. Notice that each of these is a curve, but they start at different initial levels and have different trajectories. b. Next, we want to use covariates to explain these differences in the initial levels and growth trajectories. 4 An Alternative to Use of a Quadratic Slope An alternative to adding a quadratic slope is to allow some of the time loadings to be free. We have used loadings of 0, 1, 2, 3, 4, 5, and 6 for the linear slope and 0, 1, 4, 9, 16, 25, and 36 for the quadratic slope. Alternatively We could allow all but two of the loadings to be free. We might use loadings of 0, 1, *, *, *, * . It is necessary to have the 0 and 1 fixed but the 1 does not have to be second; we could use 0, *, *, 1. You may ask how you could justify allowing some of the time loadings to be free if there was a one month or one year difference between waves of data. The answer is that developmental time may be different than chronological time. Allowing these loadings to be free has an advantage over the quadratic in that it uses fewer degrees of freedom but still allows for growth spurts. Growth Curve and Related Models, Alan C. Acock 20 This model is not nested under a quadratic, but you could think of a linear growth model with fixed values for each year (0, 1, 2, 3, 4, 5, 6) being nested within the free model that uses 0, 1, *, *, *, *. If the free model fits much better than the fixed linear model, you might use this instead of the quadratic model. 1 0 RI RS Intercept Slope 1 1 1 1 1 * 1 * 1 * * * BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 e97 e98 e90 e00 e01 e02 e03 5 Working with Missing Values Mplus has two ways of working with missing values. The simplest is to use full information maximum likelihood estimation with missing values (FIML). This uses all available data. For example, some adolescents were interviewed all six years but others may have skipped one, two, or even more years. We use all available information with this approach. The second approach is to utilize multiple imputations. Multiple imputations should not be confused with single imputation available from SPSS if a person purchases their missing values module and which gives incorrect standard errors. Multiple imputation involves a. Imputing multiple datasets (usually 5-10) using appropriate procedures, Growth Curve and Related Models, Alan C. Acock 21 b. Estimating the model for each of these datasets, and c. Then pooling the estimates and standard errors. When the standard errors are pooled this way, they incorporate the variability across the 5-10 solutions and are thereby produced unbiased estimates of standard errors. Multiple imputations can be done with: Norm, a freeware program that works for normally distributed, continuous variables and is often used even on dichotomized variables. A Stata user has written a program called ICE that is an implementation of the S-Plus program called MICE, that has advantages over Norm. It does the imputation by using different estimation models for outcome variables that are continuous, counts, or categorical. See Royston (2005). Mplus can read these multiple datasets, estimate the model for each dataset, and pool the estimates and their standard errors. We will not illustrate the multiple imputation approach because that involves working with other programs to impute the datasets. However, the Mplus User’s Guide, discusses how you specify the datasets in the Data: section. We will illustrate the FIML approach because it is widely used and easily implemented—and doesn’t require explaining another software package. The FIML approach does not work when you can’t justify a maximum likelihood estimator. Here is the program Title: bmi_missing.inp Basic growth curve with missing values Data: File is "C:\Mplus examples\bmi_stata.dat" ; Variable: ! Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; usevariables is limited to bmi variables Usevariables are bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; Growth Curve and Related Models, Alan C. Acock 22 Analysis: Type = General Missing H1 ; Estimator = MLR ; Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; Output: Sampstat Mod(3.84) patterns; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); The conceptual model does not change with missing values. The programming for implementing the FIML solution changes very little. You will recall that we did not need an Analysis: section in our program for doing a growth curve. However, we do need one when we are doing a growth curve with missing values and using FIML estimation. Directly above the Model command we insert Analysis: Type = General Missing H1 ; Estimator = MLR ; Type = General Missing H1; this line is the key change. The missing tells Mplus to do the full information maximum likelihood estimation. The H1 is necessary to get sample statistics in our output. We could do this with maximum likelihood estimation, but will use a robust maximum likelihood estimator, Estimator = MLR, instead. This is optional, but generally conservative when you have substantial missing values. In the Output: section, we also add a single word, patterns. This will give us a lot of information about patterns of missing values. We will see just what patterns there are, the frequency of occurrence of each pattern, and the percentage of data present for each covariance estimate. Output: Sampstat Mod(3.84) patterns ; Growth Curve and Related Models, Alan C. Acock 23 Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); Also, to simplify our presentation we will take out the quadratic term (the fit is better with the quadratic term, but it takes more space to present and interpret the results). Here are selected, annotated results: *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 3 1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS SUMMARY OF ANALYSIS Number of groups 1 Number of observations 1768 ! We had 1102 observations using listwise deletion. ! An ‘x’ mean the data are present. Pattern 1 -- no missing values ! Pattern 2 – missing BMI03 SUMMARY OF MISSING DATA PATTERNS MISSING DATA PATTERNS BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 1 x x x x x x x 2 x x x x x x 3 x x x x x x 4 x x x x x 5 x x x x 6 x x x x x x x 7 x x x x 8 x x x x x 9 10 11 12 13 14 15 16 17 18 19 20 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x Growth Curve and Related Models, Alan C. Acock 24 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x x 81 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 x x x MISSING DATA PATTERN FREQUENCIES Pattern 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Frequency 1102 97 73 38 21 11 5 20 23 4 8 3 8 3 11 25 6 Pattern 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 Growth Curve and Related Models, Alan C. Acock Frequency 2 10 51 4 3 1 1 1 3 6 1 1 1 3 6 3 1 Pattern 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 Frequency 26 53 9 9 2 4 1 4 1 3 5 1 1 1 1 2 1 25 18 19 20 21 22 23 24 25 26 27 3 2 3 1 1 2 7 1 1 6 45 46 47 48 49 50 51 52 53 54 1 2 1 6 3 2 3 3 3 3 72 73 74 75 76 77 78 79 80 81 14 1 1 2 1 1 7 1 2 4 ! We might want to set some minimum standard and drop observations that do not meet that. For example, we might drop people who are missing their BMI for more than 3 waves. COVARIANCE COVERAGE OF DATA Minimum covariance coverage value 0.100 PROPORTION OF DATA PRESENT BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 Covariance Coverage BMI97 BMI98 ________ ________ 0.925 0.847 0.902 0.850 0.856 0.842 0.846 0.839 0.837 0.796 0.794 0.777 0.775 BMI02 BMI03 Covariance Coverage BMI02 BMI03 ________ ________ 0.861 0.774 0.840 BMI99 ________ BMI00 ________ BMI01 ________ 0.910 0.864 0.854 0.805 0.788 0.906 0.859 0.811 0.788 0.904 0.817 0.801 ! We have 77.4% of the 1768 observations answering both BMI02 and BMI03 SAMPLE STATISTICS ! Notice that the means are not dramatically different from the results of the “basic” analysis that had the 1098 observations using listwise deletion. This is reassuring that our missing values are not creating a systematic bias. Growth Curve and Related Models, Alan C. Acock 26 1 Means BMI97 ________ 20.572 BMI98 ________ 21.839 1 Means BMI02 ________ 24.390 BMI03 ________ 24.935 BMI99 ________ 22.651 BMI00 ________ 23.305 BMI01 ________ 23.846 TESTS OF MODEL FIT ! If you compare nested models with MLR estimation you need to use the scaling correction factor as discussed on their web page. We are not doing that here, so this is okay. Chi-Square Test of Model Fit Value Degrees of Freedom P-Value Scaling Correction Factor for MLR * 116.426* 23 0.0000 2.302 The chi-square value for MLM, MLMV, MLR, ULS, WLSM and WLSMV cannot be used for chi-square difference tests. MLM, MLR and WLSM chi-square difference testing is described in the Mplus Technical Appendices at www.statmodel.com. See chi-square difference testing in the index of the Mplus User's Guide. ! The chi-square is much bigger when we use FIML estimation with missing values, in part because the sample is so much bigger. Still there are some fit problems without the quadratic term. Both the CFI and TLI are a bit low to be ideal (under .96). However the RMSEA is good and that is the most widely used measure of fit. Chi-Square Test of Model Fit for the Baseline Model Value Degrees of Freedom P-Value 1279.431 21 0.0000 CFI/TLI CFI 0.926 TLI 0.932 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.048 SRMR (Standardized Root Mean Square Residual) Value 0.051 Growth Curve and Related Models, Alan C. Acock 27 ! The results are similar to the linear model solution with listwise deletion, but our z-scores are bigger due to having more observations. S WITH I 0.408 0.112 3.658 Means I S 21.035 0.701 0.105 0.022 200.935 32.311 Variances I S 15.051 0.255 0.958 0.031 15.714 8.340 5.730 3.276 3.223 4.361 2.845 9.380 8.589 0.638 0.414 0.351 0.973 0.355 3.384 2.736 8.981 7.907 9.175 4.483 8.005 2.772 3.139 Residual Variances BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 PLOT INFORMATION The following plots are available: Histograms (sample values, estimated factor scores, estimated values) Scatterplots (sample values, estimated factor scores, estimated values) Sample means Estimated means Sample and estimated means Observed individual values Estimated individual values Multiple Cohort Growth Model with Missing Waves Major datasets often have multiple cohorts. NLSY97 has youth who were 12-18 in 1997. Seven years later, they are 19-25. It is quite likely that many growth processes that involve going from the age of 12 to the age of 19 are different than going from 19-25. For example, involvement in Growth Curve and Related Models, Alan C. Acock 28 minor crimes (petty theft, etc.) may increase from 12 to 19, but then decrease from there to 25. Here is what we might have for our NLSY97 data Individual 1 2 3 4 5 Cohort 1985 1985 1984 1982 1982 1997 3 2 4 6 5 1998 4 4 5 7 5 1999 5 3 6 5 6 2000 6 5 7 4 4 2001 7 6 6 3 2 2002 7 7 6 2 2 2003 8 7 5 2 1 We can rearrange this data Case 1 2 3 4 5 Cohort 1985 1985 1984 1982 1982 HD12 3 2 * * * HD13 4 4 4 * * HD14 5 3 5 * * HD15 6 5 6 6 5 HD16 7 6 7 7 5 HD17 7 7 6 5 6 HD18 8 7 6 4 4 HD19 * * 5 3 2 HD20 * * * 2 2 HD21 * * * 2 1 In this table HD is the age at which the data was collected. To capture everybody we would need to extend the table to HD25 because the youth who were 18 in 1997 are 25 seven years latter. This table would have massive amounts of missing data, but the missingness would not be related to other variables. It would be missing completely at random (MCAR). We could develop a growth curve that covered the full range from age 12 to age 25. We would have 14 waves of data even though each participant was only measured 7 times. Each participant would have data for 7 of the years and have missing values for the other 7 years. We would want to estimate a growth model with a quadratic term and expect the linear slope to be positive (growth from 12-18) and the quadratic term to be negative (decline from 18-25). Mplus has a special Analysis: type called MCOHORT. There is an example on the Mplus WebPage and we will not cover it here. This is an extraordinary way to deal with missing values. Here is an example from data Muthén analyzed: Growth Curve and Related Models, Alan C. Acock 29 Multiple group growth curves Multiple group analysis using SEM is extremely flexible—some would say it is too flexible because there are so many possibilities. We use gender for our grouping variable because we are interested in the trend in BMI for girls compared to boys. We think of adolescent girls are more concerned about their weight and therefore more likely to have a lower BMI than boys and to have a flatter trajectory. There are several ways of comparing a model across multiple groups. One approach is to see if the same model fits each group, allowing all of the estimated parameters to be different. Here we are saying that a linear growth model fits the data for both boys and girls, but We are not constraining girls and boys to have the same values on any of the parameters - intercept mean - slope mean - intercept variance - slope variance - covariance of intercept and slope - residual errors We can then put increasing invariance constraints on the model. Growth Curve and Related Models, Alan C. Acock 30 a. At a minimum, we want to test whether the two groups have a different intercept (level) and slope. b. If this constraint is acceptable we can add additional constraints on the variances, covariances, and error terms. First, we will estimate the model simultaneously for girls and boys with no constraints on the parameters. Here is the program with new commands highlighted: Title: bmi_growth_gender.inp Data: File is "C:\mplus examples\bmi_stata.dat" ; Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; ! usevariables keeps bmi variables and gender Usevariables are male bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; Grouping is male (0=female 1=male); Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; Output: Sampstat Mod(3.84) ; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); I’ve put the only changes we need to make in bold, underline. We have a binary variable, male, that is coded 0 for females and 1 for males. We add male to the list of variables we are using. We add a subcommand to the Variable: section that says we have a grouping variable, names it, and defines what the values are so the output will be labeled nicely. The command Grouping is male (0=female 1 = male); is going to give us a separate set of estimates for the parameters for girls (labeled female) and boys (labeled male). Growth Curve and Related Models, Alan C. Acock 31 Here is selected, annotated output: SUMMARY OF ANALYSIS Number of groups 2 Number of observations Group FEMALE 528 Group MALE 574 Number of dependent variables 7 Number of independent variables 0 Number of continuous latent variables 2 Variables with special functions Grouping variable MALE SAMPLE STATISTICS FOR FEMALE 1 Means BMI97 ________ 19.904 1 Means BMI02 ________ 23.606 BMI98 ________ 21.198 BMI99 ________ 21.752 BMI00 ________ 22.349 BMI01 ________ 22.805 BMI03 ________ 23.961 SAMPLE STATISTICS FOR MALE 1 Means BMI97 ________ 20.652 BMI98 ________ 21.835 1 Means BMI02 ________ 24.370 BMI03 ________ 24.994 Growth Curve and Related Models, Alan C. Acock BMI99 ________ 22.858 BMI00 ________ 23.638 BMI01 ________ 24.063 32 TESTS OF MODEL FIT Chi-Square Test of Model Fit Value Degrees of Freedom twice the degrees of freedom P-Value 320.535 46 ! Notice we have 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value Degrees of Freedom P-Value 8906.678 42 0.0000 CFI/TLI CFI TLI 0.969 0.972 RMSEA (Root Mean Square Error Of Approximation) Estimate 90 Percent C.I. 0.104 0.093 0.115 SRMR (Standardized Root Mean Square Residual) Value 0.063 MODEL RESULTS Estimates S.E. Est./S.E. Group FEMALE I | Growth Curve and Related Models, Alan C. Acock 33 S WITH I 0.465 0.090 5.187 Means I S 20.421 0.610 0.157 0.024 130.261 24.975 Variances I S 11.579 0.183 0.801 0.020 14.457 8.920 Residual Variances BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 4.632 2.033 1.896 4.567 2.298 15.204 3.400 0.351 0.177 0.153 0.312 0.192 0.991 0.349 13.183 11.463 12.367 14.644 11.984 15.342 9.730 0.337 0.114 2.956 21.215 0.697 0.171 0.027 124.278 25.551 14.528 0.232 0.991 0.026 14.660 8.918 6.306 3.445 3.405 2.651 2.132 0.471 0.269 0.241 0.195 0.183 13.391 12.800 14.108 13.612 11.671 Group MALE S WITH I Means I S Variances I S Residual Variances BMI97 BMI98 BMI99 BMI00 BMI01 Growth Curve and Related Models, Alan C. Acock 34 BMI02 BMI03 4.304 10.570 0.332 0.730 12.960 14.484 Here is the graph of the two growth curves. It appears that the girls have a lower initial level and a flatter rate of growth of BMI. We can re-estimate the model with the intercept and slope invariant. To do this we make the following modifications to the model: Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; [i] (1); [s] (2); Model male: [i] (1); [s] (2); Output: Sampstat Mod(3.84) ; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); Notice that we added two lines to the Model: section, Growth Curve and Related Models, Alan C. Acock 35 [i] (1); and [s] (2);. Then we added a subsection called Model male: where males are the second group (because females were coded 0 and males were coded 1 on male, and put the same two lines. The first model command is understood to be the group coded as zero on the male variable. These changes force the intercept to be equal in both groups because they are both assigned parameter (1) and the slopes to be equal because they are both assigned a parameter (2). Any parameters with a (1) after them are equal in both groups as are any parameters with a (2) after them in both groups. Notice that we have square brackets [ ] around the names of the intercept and slope. When we run the revised program we obtain a chi-square that has two extra degrees of freedom because of the two constraints. TESTS OF MODEL FIT Chi-Square Test of Model Fit Value Degrees of Freedom P-Value 338.157 ! Was 320.535 48 ! Was 46 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value 8906.678 Degrees of Freedom 42 P-Value 0.0000 CFI/TLI CFI 0.967 ! .969 TLI 0.971 ! .972 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.105 ! .104 90 Percent C.I. 0.094 0.115 SRMR (Standardized Root Mean Square Residual) Value 0.081 We can test the difference between the chi-square(48) = 338.17 and the chi-square(46) = 320.535. Growth Curve and Related Models, Alan C. Acock 36 This difference, 17.635 has 48-46 = 2 degrees of freedom and is significant at the p < .001 level. Although we can say there is a highly significant difference between the level and trend for girls and boys, we need to be cautious because this difference of chi-square has the same problem with a large sample size that the original chi-squares have. In fact, the measures of fit are hardly changed whether we constrain the intercept and slope to be equal or not. Moreover, the visual difference in the graph is not dramatic. We could also put other constraints on the two solutions such as equal variances and covariances, and even equal residual error variances, but we will not. 8 Alternative to Multiple Group Analysis An alternative way of doing this, where there are two groups, is to enter the grouping variable as a predictor. This requires re-conceptualizing our model. We can think of the indicator variable Male having a direct path to both the intercept and the slope. Because the indicator variable is coded as 1 for male and 0 for female, If the path from Male to the Intercept is positive this means that boys have a higher initial level on BMI. Similarly, if there is a positive path from Male to the Slope, this indicates that boys have a steeper slope than girls on BMI. Such results would be consistent with our expectation that boys both start higher and gain more fat than girls during adolescence. This approach does not let us test for other types of invariances such as the variances, covariances, and error terms. We are forcing these to be the same for both females and males; this may be unreasonable. The following figure shows these two paths. We are explaining why some people have a higher or lower initial level and why some have a steeper or flatter slope by whether they are a girl or a boy. We are predicting that boys have a higher initial level and a steeper slope. Here is the figure: Growth Curve and Related Models, Alan C. Acock 37 Male (+) (+) Rs Rs Intercept 1 1 1 1 Slope 1 1 2 1 3 1 4 5 6 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 e97 e98 e99 e00 e01 e02 e03 Here is part of the program: Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; ! usevariables is limited to bmi variables and male Usevariables are male bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; Model: i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; i on male ; s on male ; Growth Curve and Related Models, Alan C. Acock 38 Output: Sampstat Mod(3.84) ; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); Here is selected, annotated output: TESTS OF MODEL FIT Chi-Square Test of Model Fit Value 237.517 ! We cannot compare this to the chi-square for the two group design because this is not nested in that model. Degrees of Freedom 28 P-Value 0.0000 Chi-Square Test of Model Fit for the Baseline Model Value Degrees of Freedom P-Value 8602.391 28 0.0000 CFI/TLI CFI TLI 0.976 0.976 Loglikelihood H0 Value H1 Value -19515.302 -19396.543 Information Criteria Number of Free Parameters Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC Growth Curve and Related Models, Alan C. Acock 14 39058.603 39128.672 39084.204 39 (n* = (n + 2) / 24) RMSEA (Root Mean Square Error Of Approximation) Estimate 90 Percent C.I. Probability RMSEA <= .05 0.082 0.073 0.000 0.092 SRMR (Standardized Root Mean Square Residual) Value 0.044 MODEL RESULTS I ON MALE S 0.793 0.233 3.409 ! Males higher 0.084 0.038 2.203 ! Males steeper 0.400 0.075 5.371 ON MALE S WITH I Intercepts BMI97 0.000 0.000 0.000 BMI98 0.000 0.000 0.000 BMI99 0.000 0.000 0.000 BMI00 0.000 0.000 0.000 BMI01 0.000 0.000 0.000 BMI02 0.000 0.000 0.000 BMI03 0.000 0.000 0.000 I 20.385 0.168 121.416 S 0.625 0.027 22.816 ! When we add one or more predictors of the intercept and slope, the intercept and slope means are not reported under a section called “means,” but are now under “intercepts” Growth Curve and Related Models, Alan C. Acock 40 Residual Variances BMI97 5.391 0.290 18.583 BMI98 2.731 0.159 17.129 BMI99 2.696 0.144 18.752 BMI00 3.524 0.177 19.858 BMI01 2.327 0.144 16.175 BMI02 9.552 0.458 20.846 BMI03 7.148 0.398 17.974 I 13.027 0.636 20.471 S 0.212 0.017 12.095 !Both the intercept and slope still have variance to explain We see that the intercept is 20.385 and the slope is .625. How is gender related to this? For girls the equation is: Est. BMI = 20.385 + .625(Time) + .793(Male) + .084(Male)(Time) 20.385 + .625(Time) + .793(0) + .084(0)(Time) = 20.385 + .625(Time) For boys the equation is: Est BMI = 20.385 + .625(Time) + .793(1) + .084(1)(Time) = (20.385 + .793) + (.625 + .084)(Time) = 21.178 + .709(Time) Where Time is coded as 0, 1, 2, 3, 4, 5, 6 Using these we estimate the BMI for girls is initially 20.385. By the seventh year when she is 18(Time = 6) her estimated BMI will be 20.385 + .625(6) or 24.135 Using these results, we estimate the BMI for boys is initially 21.178. By the seventh year it will be 21.78 + .709(6) or 26.034. Since a BMI of 25 is considered overweight, by the age of 18 we estimate the average boy will be classified as overweight. We could use the plots provided by Mplus, but if we wanted a nicer looking plot we could use another program. I used Stata getting this graph. The Stata command is (this is driven by a drop down menu) Growth Curve and Related Models, Alan C. Acock 41 twoway (connected Girls Age, lcolor(black) lpattern(dash) /// lwidth(medthick)) (connected Boys Age, lcolor(black) /// lpattern(solid) lwidth(medthick)), /// ytitle(Body Mass Index) xtitle(Age of Adolescent) /// caption(NLSY97 Data) and the data is +-----------------------+ | Age Girls Boys | |-----------------------| 1. | 12 20.385 21.178 | 2. | 18 24.135 26.034 | +-----------------------+ Body Mass Index by Age of Adolescent 20 22 24 26 Comparison of Girls with Boys 12 14 16 18 Age 12 to 18 Girls Boys Limitations of this approach When we treat a categorical variable as a grouping variable and do multiple comparisons we can test the equality of all the parameters. Growth Curve and Related Models, Alan C. Acock 42 When we treat it as a predictor as in this example, we only test whether the intercept and slope are different for the two groups. In this example we do not allow the other parameters to be different for boys and girls and this might be a problem in some applications. 9 Growth Curves with Time Invariant Covariates An extension of having a categorical predictor includes having a series of covariates that explain variance in the intercept and slope. In this example we use what are known as time invariant covariates. These are covariates that either remain constant (gender) or for which you have a measure only at the start of the study. It is possible to add time varying covariates as well. This has been called Conditional Latent Trajectory Modeling (Curran & Hussong, 2003) because your initial level and trajectory (slope) are conditional on other variables. This is equivalent to the multilevel approach that calls the intercept and slope random effects. With programs such as HLM we use what they call a two level approach. Here are the parallels using a slide adapted from Muthén. In SEM we represent this as follows: Growth Curve and Related Models, Alan C. Acock 43 Level 1 is defined as the measurement model with an intercept (level) and slope (trend/trajectory). Level 2, represented by equations 2a and 2b treats the intercept and slope as random variables that are explained by a vector of covariates, w. The yit is the outcome. In our example it is the score on BMI for individual “i” at time “t”. The xt is the time score. In our example of BMI we use 0, 1, 2, 3, 4, 5, 6 The 0i is the intercept for individual “i”. a. The graph just below equation 1 shows three individuals who each have a different intercept. b. Individual “1” has a higher starting value than individuals 2 or 3. c. In the figure we show 0 because this represents the mean of 0i. d. The paths from 0 and each yt is fixed at 1 because it is a constant effect. The 1ixt is the slope for individual “i” times his or her score on time. a. With our BMI example, we score time as 0, 1, 2, 3, 4, 5, 6. b. In the figure we use 1 because this represents the mean of 1i. If we had a quadratic, we would add an 2txt2. For BMI the xt2 would be 0, 1, 4, 9, 16, 25, 36. The it is the residual error on y for individual “i” at time “t”. Growth Curve and Related Models, Alan C. Acock 44 a. With BMI you can imagine many factors that could have a temporary influence on a person’s BMI score on the day it was measured. b. The figure shows et (t = 1, 2, etc.) and the “i” is implicit. An important distinction that some make between HLM and SEM programs is that SEM programs cannot have the time vary between individuals. If the youth are measured each year, it is important that all of them are measured at the same time so they are all one year apart. This only applied to early version of Mplus. Mplus has a way of eliminating this limitation of SEM by allowing each individual to have a different time between measurements. For example, Li might be measured at 12 month intervals, Jones might be measured at intervals of 11 months, then 13 months, then 9 months, etc. We are not discussing these extensions at this point (see TSCORE in the User’s Manual). Emotional Problems Youth e1 Parent e2 White* Intercept 1 Linear 1 1 1 1 1 1 Quadratic 1 64 1 1 2 1 3 9 16 35 36 1 4 5 6 BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 e97 e98 e90 e00 e01 e02 e03 Growth Curve and Related Models, Alan C. Acock 45 *The variable White (whites = 1; nonwhites = 0) compares Whites to the combination of African American and Hispanic. Asian & Pacific Islander, and Other have been deleted from this analysis because of small sample size. In this figure we have two covariates. One is whether the adolescent is white versus African American or Hispanic and the other is a latent variable reflecting the level of emotional problems a youth has. There are two indicators of emotional problems, one from a parent report, boyprb_p, and the other from a youth report, boyprb_y. A researcher may predict that Whites have a lower initial BMI (intercept) which persists during adolescence, but the White advantage does not increase (same slope as nonwhites). Alternatively, a researcher may predict that being White predicts a lower initial BMI (intercept) and less increase of the BMI (smaller slope) during adolescence. This suggests that minorities start with a disadvantage (high BMI) and this disadvantaged gets even greater across adolescence. A researcher may argue that emotional problems are associated with both higher initial BMI (intercept) and a more rapid increase in BMI over time (slope) By including a covariate that is a latent variable itself, emotional problems, we will show how these are handled by Mplus. We estimated this model for boys only; girls were excluded. The following is our Mplus program: Title: bmi_timea.inp bmi growth curve using race/ethnicity and emotional problems as a second covariate. There are two indicators of emotional problems. Data: File is "c:\Mplus examples\bmi_stata.dat" ; Variable: Names are id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic asian other; Missing are all (-9999) ; ! usevariables is limited to bmi variables and male Usevariables are boyprb_y boyprb_p white bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 ; Useobservations = male eq 1 and asian ne 1 and other ne 1; Model: Growth Curve and Related Models, Alan C. Acock 46 i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6; emot_prb by boyprb_p boyprb_y ; i on white emot_prb; s on white emot_prb; q on white emot_prb; Output: Sampstat Mod(3.84) standardized; Plot: Type is Plot3; Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*); I have highlighted the new lines in the Mplus program. The format of the Useobservations subcommand is similar to select or if used with other programs. The Useobservations = male eq 1 and asian ne 1 and other ne 1; restricts our sample to males (male eq 1). This is very handy when using the same dataset for a variety of models where you want some models to only include selected participants. We have dropped Asians and members of the “other” category. There are relatively few of them in this sample dataset and they may have very different BMI trajectories. Also, the meaning of the category “other” is ambiguous. I added a quadratic term in the Model: command. I first estimated this model using just a linear slope and the fit was not very good. Adding the quadratic improved the fit. This example has a measurement model for a latent covariate, emot_prb. In other programs this can involve complicated programming. Here it is done with the single line emot_prb by boyprb_p boyprb_y ; The by is a key word in Mplus for creating latent variables used in Confirmatory Factor Analysis and SEM. On the right of the by are two observed variables. The boyprb_p is the report of parents about the adolescent’s emotional problems. The boyprb_y is the youths own report. It is desirable to have three or more indicators of a latent variable, but we only have two here so that will have to do. To the left of the by is the name we give to the latent variable, emot_prb. This new latent variable did not appear in the list of variables we are using, but it is defined here. The “by” term o fixes the first variable to the right as a reference indicator, boyprb_p, and assigns a loading of 1 to it. Growth Curve and Related Models, Alan C. Acock 47 o It lets the loading of the second variable, boyprb_y, be estimated. It also creates error/residual variances that are labeled e1 and e2 in the figure. o The default is that these errors are uncorrelated. o It is good practice to have the strongest indicator on the right of the “by” be the reference indicator with a loading fixed at 1.0. You can run the model and if this does not happen, you can re-run it, reversing the order of the items on the right of the “by.” The next three new lines, o o o o o o i on white emot_prb; s on white emot_prb; and q on white emot_prb; Define the relationship between the covariates and the intercept and slope. These are the 1wi in the equation presented earlier. Mplus uses the on command to signify that a variable depends on another variable in the structural part of the model. The by command is the key to understanding how Mplus sets up the measurement model and the on is the key to how Mplus sets up the structural model. There are many defaults. Mplus assumes there are residual variances and covariances for the intercept and slopes. It fixes the intercepts at zero. It assumes the intercept and slope variances are correlated. Here is selected results: Estimator ML TESTS OF MODEL FIT Chi-Square Test of Model Fit Value Degrees of Freedom P-Value 64.201 34 0.0013 CFI/TLI CFI TLI Information Criteria Number of Free Parameters Growth Curve and Related Models, Alan C. Acock 0.993 0.990 29 48 Akaike (AIC) Bayesian (BIC) Sample-Size Adjusted BIC (n* = (n + 2) / 24) 20924.710 21046.407 20954.362 RMSEA (Root Mean Square Error Of Approximation) Estimate 0.043 90 Percent C.I. 0.026 Probability RMSEA <= .05 0.767 0.058 SRMR (Standardized Root Mean Square Residual) Value 0.026 MODEL RESULTS Estimates S.E. Est./S.E. 1.000 0.709 0.000 0.284 0.000 2.492 1.057 0.749 0.663 0.527 ON EMOT_PRB 0.245 0.249 0.984 0.071 0.071 ON EMOT_PRB 0.257 0.130 1.988 0.230 0.230 ON EMOT_PRB -0.045 0.021 -2.118 -0.277 -0.277 -1.050 0.380 -2.767 -0.288 -0.142 -0.023 0.172 -0.136 -0.020 -0.010 -0.003 0.028 -0.107 -0.017 -0.008 0.717 0.384 1.869 0.166 0.166 -0.099 -0.174 0.060 0.038 -1.654 -4.592 -0.157 -0.848 -0.157 -0.848 EMOT_PRB BY BOYPRB_P BOYPRB_Y I S Q I Std StdYX ON WHITE S ON WHITE Q ON WHITE S WITH I Q WITH I S Growth Curve and Related Models, Alan C. Acock 49 WHITE WITH EMOT_PRB -0.065 0.033 -1.975 -0.061 -0.125 Intercepts BOYPRB_Y BOYPRB_P BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 I (mean) S (mean) Q (mean) 1.986 1.676 0.000 0.000 0.000 0.000 0.000 0.000 0.000 21.350 1.272 -0.097 0.064 0.072 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.291 0.132 0.021 31.010 23.382 0.000 0.000 0.000 0.000 0.000 0.000 0.000 73.368 9.651 -4.584 1.986 1.676 0.000 0.000 0.000 0.000 0.000 0.000 0.000 5.858 1.073 -0.560 1.396 1.052 0.000 0.000 0.000 0.000 0.000 0.000 0.000 5.858 1.073 -0.560 Variances EMOT_PRB 1.117 0.467 2.395 1.000 1.000 1.461 1.424 5.238 3.446 3.269 2.119 1.998 4.356 9.906 12.916 1.330 0.028 0.243 0.456 0.578 0.287 0.259 0.196 0.193 0.366 0.914 1.091 0.246 0.006 6.013 3.122 9.060 12.017 12.637 10.805 10.365 11.908 10.833 11.834 5.417 4.406 1.461 1.424 5.238 3.446 3.269 2.119 1.998 4.356 9.906 0.972 0.947 0.924 0.723 0.560 0.283 0.180 0.149 0.091 0.082 0.160 0.297 0.972 0.947 0.924 Residual Variances BOYPRB_Y BOYPRB_P BMI97 BMI98 BMI99 BMI00 BMI01 BMI02 BMI03 I S Q R-SQUARE Observed Variable R-Square BOYPRB_Y BOYPRB_P BMI97 BMI98 BMI99 BMI00 0.277 0.440 0.717 0.820 0.851 0.909 Growth Curve and Related Models, Alan C. Acock 50 BMI01 BMI02 BMI03 0.918 0.840 0.703 Latent Variable R-Square I S Q 0.028 0.053 0.076 ! We are not explaining much variance in any of these. MODEL MODIFICATION INDICES Minimum M.I. value for printing the modification index Std E.P.C. 3.840 M.I. E.P.C. StdYX E.P.C. 4.422 10.800 7.048 7.693 7.599 9.622 -0.012 0.034 -0.205 0.393 2.240 -4.758 -0.043 0.122 -0.243 0.466 0.388 -0.825 -0.008 0.021 -0.047 0.081 0.075 -0.143 -1.119 0.552 0.252 -0.506 0.435 -0.370 -0.648 0.803 -1.356 -1.119 0.552 0.252 -0.506 0.435 -0.370 -0.648 0.803 -1.356 -0.059 0.027 0.032 -0.022 0.018 -0.045 -0.026 0.031 -0.045 0.685 0.685 0.119 BY Statements I I S S Q Q BY BY BY BY BY BY BMI02 BMI03 BMI02 BMI03 BMI02 BMI03 WITH Statements ! Might consider correlating adjacent errors. BMI98 WITH BMI97 4.091 BMI99 WITH BMI98 6.766 BMI01 WITH BOYPRB_P 4.544 BMI01 WITH BMI99 8.391 BMI01 WITH BMI00 5.132 BMI02 WITH BOYPRB_P 4.868 BMI02 WITH BMI00 10.058 BMI02 WITH BMI01 12.449 BMI03 WITH BMI02 4.559 Means/Intercepts/Thresholds [ BMI03 ] 10.211 Unfortunately, we cannot get graphs when we have covariates. You could create these yourslef by substituting fix values for race and emotional problems. Growth Curve and Related Models, Alan C. Acock 51 10 Mediational Models with Time Invariant Covariates Sometimes all of the covariates are time invariant or at least measured at just the start of the study. Curran and Hussong (2003) discuss a study of a latent growth curve on drinking problems with a covariate of parental drinking. Parental drinking influences both the initial level and the rate of growth of drinking problem behavior among adolescents. The question is whether some other variables might mediate this relationship Parental monitoring Peer influence Parent Drinking Parental Monitoring Peer Influence Intercept on Problem Drinking Slope Problem Drinking Mplus allows us to estimate the direct and indirect effect of Parent Drinking on the Intercept and Slope. It also provides a test of significance for these effects. 11 Time Varying Covariates We have illustrated time invariant covariates that are measured at time 1. It is possible to extend this to include time varying covariates. Time varying covariates either are measured after the process has started or have a value that changes (hours of nutrition education). Although we will not show our output, we will illustrate the use of time varying covariates in a figure. In this figure the time varying covariates, a21 to a24 might be Growth Curve and Related Models, Alan C. Acock 52 Hours of nutrition education completed between waves. Independent of the overall growth trajectory, η1, students who have several hours of nutrition education programming may have a decrease in their BMI Physical education curriculum. A physical activity program might lead to reduced BMI. Students who spend more time in this physical activity program might have a lower BMI independent of the overall growth trend, η1 This would be a good way to incorporate fidelity into a program evaluation This figure is borrowed from Muthén where he is examining growth in math performance over 4 years. The w vector contains x variables are covariates that directly influence the intercept, η0, or slope, η1. The aij are number of math courses taken each year. yit a1it a2it w = repeated measures on the outcome (math achievement) = Time score (0, 1, 2, 3) as discussed previously = Time varying covariates (# of math courses taken that year) = Vector of x covariates that are time invariant and measured at or before the first yit In this example we might think of the yi variables being measures of conflict behavior where y1 is at age 17 and y4 is at age 25. We know there is a general decline in conflict behavior during this time interval. Therefore, the slope η1 is expected to be negative. Now suppose we also have a measure of alcohol abuse for each of the 4 waves (aij). We might hypothesize that during a year in which an adolescent has a high score on alcohol abuse (say number of days the person drinks 5 or more drinks in the last 30 days) that there will be an Growth Curve and Related Models, Alan C. Acock 53 elevated level of conflict behavior that cannot be explained by the general decline (negative slope). The negative slope reflects the general decline in conflict behavior by young adults as the move from age 17 to age 25. The effect of aij on yi provides the additional explanation that those years when there is a lot of drinking; there will be an elevated level of conflict that does not fit the general decline. If you want more, here are a few references 2. Basic growth curve modeling a. Bollen, K. A., & Curran, P. J. (2006). Latent Curve Models: A Structural Equation Perspective. Hoboken, NJ: Wiley. b. Curran, F. J., & Hussong, A. M. (2003). The Use of latent Trajectory Models in Psychopathology Research. Journal of Abnormal Psychology. 112:526-544. This is a general introduction to growth curves that is accessible. c. Duncan, T. E., Duncan, S. C., & Strycker, L. A. (2006). An Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and Applications (2nd ed.). Mahwah NJ: Lawrence Erlbaum. The second edition of a classic text on growth curve modeling. d. Kaplan, D. (2000). Chapter 8: Latent Growth Curve Modeling. In D. Kaplan, Structural Equation Modeling: Foundations and Extensions (pp 149-170). Thousand Oaks, CA: Sage. This is a short overview. 3. Limited Outcome Variables: Binary and count variables a. Muthén, B. (1996). Growth modeling with binary responses. In A. V. Eye & C. Clogg (Eds.) Categorical Variables in Developmental Research: Methods of analysis (pp 37-54). San Diego, CA: Academic Press. b. Long, J. S., & Freese, J. (2006). Regression Models for Categorical Dependent Variables Using Stata, 2nd ed. Stata Press (www.stata-press.com). This provides the most accessible and still rigorous treatment of how to use an interpret limited dependent variables. c. Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling Using Stata. Stata Press (www.stata-press.com). This discusses a free set of commands that can be added to Stata that will do most of what Mplus can do and some things Mplus cannot do. It is hard to use and very slow. Growth Curve and Related Models, Alan C. Acock 54 4. Growth mixture modeling a. Muthén, B., & Muthén, L. K. (2000). Integrating person-centered and variablecentered analysis: Growth mixture modeling with latent trajectory classes. Alcoholism: Clinical and Experimental Research. 24:882-891. This is an excellent and accessible conceptual introduction. b. Muthén, B. (2001). Latent variable mixture modeling. In G. Marcoulides, & R. Schumacker (Eds.) New Developments and Techniques in Structural Equation Modeling (pp. 1-34). Mahwah, NJ: Lawrence Erlbaum. c. Muthén, B., Brown, C. H., Booil, J., Khoo, S. Yang, C. Wang, C., Kellam, S., Carlin, J., & Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics, 3:459-475 d. Muthén, B. Latent Variable analysis: Growth Mixture Modeling and Related Techniques for Longitudinal Data. (2004) In D. Kaplan (ed.), Handbook of quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA: Sage Publications e. Muthén, B., Brown, C. H., Booil Jo, K, M., Khoo, S., Yang, C. Wang, C., Kellam, S., Carlin, J., Liao, J. (2002). General growth mixture modeling for randomized preventive interventions. Biostatistics. 3,4, pp. 459-475. 5. The web page for Mplus maintains a current set of references, many as PDF files. These are organized by topic and some include data and the Mplus program. Growth Curve and Related Models, Alan C. Acock 55