Growth Curve Presentation of Day1

advertisement
GROWTH CURVES AND EXTENSIONS USING MPLUS
Alan C. Acock
alan.acock@oregonstate.edu
Department of HDFS
322 Milam Hall
Oregon State University
Corvallis, OR 97331
This document and selected references, data, and programs can be downloaded from
http://oregonstate.edu/~acock/growth
Growth Curve and Related Models, Alan C. Acock
1
GROWTH CURVES AND EXTENSIONS USING MPLUS
Outline
Topic
Page
1 Brief summary of topics
3
2 A growth curve
4
3 Quadratic terms in growth curves
16
4 An alternative—developmental time
20
5 Working with missing values
21
6 Multiple cohort growth model with missing waves
28
7 Multiple group models with growth curves
30
8 Alternative to multiple group analysis
37
9 Growth curves with time invariant covariates
43
10 Mediational models with time invariant covariates
52
11 Time varying covariates
52
12 References
54
Goal of the Workshop
The goal of this workshop is to explore a variety of applications of latent growth curve models
using the Mplus program. Because we will cover a wide variety of applications and extensions of
growth curve modeling, we will not cover each of them in great detail. At the end of this
workshop it is hoped that participants will be able to run Mplus programs to execute a variety of
growth curve modeling applications and to interpret the results correctly.
Assumed Background
Participants should be familiar with the content in Introduction to Mplus that is located at
www.oregonstate.edu/~acock/growth . It will be assumed that participants in the workshop have
some background in Structural Equation Modeling. Background in multilevel analysis will also be
useful, but is not assumed. It is possible to learn how to estimate the specific models we will cover
Growth Curve and Related Models, Alan C. Acock
2
without a comprehensive knowledge of Mplus, but some background using an SEM program is
useful.
1 Brief Summary of Topics
Introduction to Growth Curve Modeling
Growth Curves are a new way of thinking that is ideal for longitudinal studies. Instead of
predicting a person’s score on a variable (e.g., mean comparison among scores at different time
points or relationships among variables at different time points), we predict their growth
trajectory—what is their level on the variable AND how is this changing. We will present a
conceptual model, show how to apply the Mplus program, and interpret the results. Once we can
estimate growth trajectories, the more interesting issues of explaining individual differences in
trajectories (why some people go up, down, or stay the same). More advanced topics we will
introduce include:
1. Growth Curves with Limited Outcome Variables
Sometimes a researcher is interested in growth on a binary variable (Ever drinking alcohol
for adolescents). Some times a researcher is interested in a count variable that involves a
relatively rare event (Number of days an adolescent has 5+ drinks of alcohol in the last 30
days). Sometimes we are interested in both types of variables. Different variables may
predict the binary variable than predict the count variable. We will show how to do this
using Mplus and interpret the results.
2. Growth Mixture Models
It is possible to use Mplus to do an exploratory growth curve analysis where our focus is on
the person and not the variable. We can locate clusters of people who share similar growth
trajectories. This is exploratory research and the standards for it are still evolving. An
example would be a study of alcohol consumption from age 15 to 30. It is possible to
empirically identify different clusters of people. One cluster may never drink or never drink
very much. A second cluster may have increasing alcohol consumption up to about 22 or 23
and then a gradual decline. A third cluster may be very similar to the second cluster but not
decline after 23. After deriving these clusters of people who share growth trajectories, it is
possible to compare them to find what differentiates membership in the different clusters.
We will show how to do these analyses using Mplus and interpret the results.
Growth Curve and Related Models, Alan C. Acock
3
2 A Growth Curve
Estimating a basic growth curve using Mplus is quite easy. When developing a complex model it
is best to start easy and gradually build complexity.
 Starting easy should include data screening to evaluate the distributions of the variables,
patterns of missing values, and possible outliers.
 Even if you have a theoretically specified model that is complex, always start with the
simplest model and gradually add the complexity.
 Here we will show how structural equation modeling conceptualizes a latent growth curves,
show the Mplus program, explain the new program features, and interpret the output.
Before showing a figure to represent a growth curve, we examine a small sample of our
observations:
 A BMI value of 25 is considered overweight and a BMI of 30 is considered obese (I’m
aware of problems with the BMI as a measure of obesity and with its use for adolescents)
 With just 10 observations it is hard to see much of a trend, but it looks like people are
getting a bigger BMI score as they get older.
 The X-axis value of 0 is when the adolescent was 12 years old; the 1 is when the adolescent
was 13 years old, etc. We are using seven waves of data (labeled 0 to 6) from the panel
study.
Growth Curve and Related Models, Alan C. Acock
4
A growth curve requires us to have a model and we should draw this before writing the Mplus
program. Figure 1 shows a model for our simple growth curve:
RI
RS
Intercept
1
0
1
1
1
Slope
1
1
2
1
3
1
4
5
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
This figure is much simpler than it first appears.
 The key variables are the two latent variables labeled the Intercept and the Slope.
 The Intercept
a. The intercept represents the initial level and is sometimes called the initial level for this
reason. It is the estimated initial level and its value may differ from the actual mean for
BMI97 because in this case we have a linear growth model.
b. It may differ from the mean of BMI97 when covariates are added, expecially when a zero
value on covariates is rare.
c. Unless the covariates are centered, it usually makes sense to just call it an intercept rather
than the initial level.
d. The intercept is identified by the constant loadings of 1.0 going to each BMI score. Some
programs call the intercept the constant, representing the constant effect.
 The slope
Growth Curve and Related Models, Alan C. Acock
5
a. Is identified by fixing the values of the paths to each BMI variable. In a publication you
normally would not show the path to BMI97, since this is fixed at 0.0.
b. We fix the other paths at 1.0, 2,0, 3.0, 4.0, 5.0, and 6.0. Where did we get these values? The
first year is the base year or year zero. The BMI was measured each subsequent year so
these are scored 1.0 through 6.0.
c. Other values are possible. Suppose the survey was not done in 2000 or 2001 so that we had
5 time points rather than 7. We would use paths of 0.0, 1.0, 2.0, 5.0, and 6.0 for years 1997,
1998, 1997, 2002, and 2003, respectively.
d. It is also possible to fix the first couple years and then allow the subsequent waves to be
free.
- This might make sense for a developmental process where the yearly intervals may not
reflect the developmental rate. Developmental time may be quite different than
chronological time.
- This has the effect of “stretching” or “shrinking” time to the pattern of the data (Curran
& Hussong, 2003).
- An advantage of this approach is that it uses fewer degrees of freedom than adding a
quadratic slope.
 Residual Variance and Random Effects
a. The individual variation around the Intercept and Slope are represented in Figure 1 by the R I
and RS. These are the variance in the intercept and slope around their respective means.
b. We expect substantial variance in both of these as some individuals have a higher or lower
starting BMI and some individuals will increase (or decrease) their BMI at a different rate
than the average growth rate.
c. In addition to the mean intercept and slope, each individual will have their own intercept
and slope. We say the intercept and the slope are random effects since they may vary
across individuals.
d. They are random in the sense that each individual may have a steeper or flatter slope than
the mean slope and
e. Each individual may have a higher or lower initial level than the mean intercept.
f. In our sample of 10 individuals shown above, notice one adolescent starts with a BMI
around 12 and three adolescents start with a BMI around 30. Some childrent have a BMI
that increases and others do not.
g. The variances, RI and RS are critical if we are going to explore more complex models with
covariates (e.g., gender, psychological problems, race) that might explain why some
individuals have a steeper or less steep growth rate than the average.
 The ei terms represent individual error terms for each year. Some years may move above or
below the growth trajectory described by our Intercept and Slope.
 Sometimes it might be important to allow error terms to be correlated, especially subsequent
pairs such as e97-e98, e98-e99, etc.
Growth Curve and Related Models, Alan C. Acock
6
Here is the Mplus program:
Title:
bmi_growth.inp
Basic growth curve
Data:
File is "C:\Mplus examples\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p
male race_eth bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
usevariables is limited to bmi variables
Usevariables are bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 ;
!
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3
bmi01@4 bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84);
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01
bmi02 bmi03(*);
What is new compared to an SEM program?
 Usevariables are: subcommand to only include the bmi variables since we are doing a
growth curve for these variables.
 We drop the Analysis: section because we are doing basic growth curve and can use the
default options.
 We have a Model: section because we need to describe the model. Mplus was designed after
growth curves were well understood. There is a single line to describe our model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
a. In this line the “i” and “s” stand for the intercept and slope. We could have called these
anything such as intercept and slope or initial and trend. The vertical line, | ,
tells Stata that it is about to define an intercept and slope.
Growth Curve and Related Models, Alan C. Acock
7
b. Defaults
- The intercept is defined by a constant of 1.0 for each bmi variable. Interceptbmij path
is 1.0.
- The slope is defined by fixing the path from the slope to bmi97 at 0, the path to bmi98
at 1, etc. The @ sign is used for “at.” Don’t forget the semi-colon to end the command.
- Mplus assumes that there is a residual variance for both the intercept and slope (RI and
RS) and that these covary. Therefore, we do not need to mention this
- Mplus assumes there is uncorrelated random error, ei for each observed variable
c. To allow e97 and e98 to be correlated we would need to add a line saying bmi97 with
bmi98; .
- This may seem strange because we are not really correlating bmi97 with bmi98, but
e97 with e98. Mplus knows this and we do not need to generate a separate set of names for
the error terms.
The last additional section in our Mplus program is for selecting what output we want Mplus to
provide. There are many optional outputs of the program and we will only illustrate a few of these.
The Output: section has the following lines
Output:
Sampstat Mod(3.84);
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03(*);
 The first line, Sampstat Mod(3.84) asks for sample statistics and modification indices for
parameters we might free, as long as doing so would reduce chi-square by 3.84 (corresponding
to the .05 level). We do not bother with parameter estimates that would have less effect than
this.
 Next comes the Plot: subcommand and we say that we want Type is Plot3; for our
output. This gives us the descriptive statistics and graphs for the growth curve.
 The last line of the program specifies the series to plot. By entering the variables with an (*)
at the end we are setting a path at 0.0 for bmi97, 1.0 for bmi98, etc.
Growth Curve and Related Models, Alan C. Acock
8
Annotated Selected Growth Curve Output
The following is selected output with comments:
Number of observations
1102 ! listwise, an alternative is FIML estimation
Number of dependent variables
7 !these are the bmi scores
Number of independent variables
0
Number of continuous latent variables
2 !these are the intercept and slope
Continuous latent variables
I
S
!These are the only latent variables
Estimator
ML
TESTS OF MODEL FIT
!These have the standard interpretations.
 It is okay if the fit is not perfect here because when we add the covariates we
may get a better fit. The chi-square is significant as it usually is for a large
sample because any model is not likely to be a perfect fit for data.
 However, the CFI = .977 and TLI = .979 are both in the very good range (i.e., over
.96 is very good).
 The RMSEA is .088 and this is not very good. Ideally, this should be below .06,
and a value that is not below .08 is considered problematic.
 The Standardized RMSR = .048 is acceptable (less than .05)
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
220.570
23
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
Growth Curve and Related Models, Alan C. Acock
8568.499
21
0.0000
9
CFI/TLI
CFI
TLI
0.977
0.979
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.088
90 Percent C.I.
0.078
Probability RMSEA <= .05
0.000
0.099
SRMR (Standardized Root Mean Square Residual)
Value
0.048
MODEL RESULTS
Estimates
S.E.
! the I and S are all fixed so no tests for them.
I
|
BMI97
1.000
0.000
BMI98
1.000
0.000
BMI99
1.000
0.000
BMI00
1.000
0.000
BMI01
1.000
0.000
BMI02
1.000
0.000
BMI03
1.000
0.000
S
Est./S.E.
0.000
0.000
0.000
0.000
0.000
0.000
0.000
|
BMI97
0.000
0.000
0.000
BMI98
1.000
0.000
0.000
BMI99
2.000
0.000
0.000
BMI00
3.000
0.000
0.000
BMI01
4.000
0.000
0.000
BMI02
5.000
0.000
0.000
BMI03
6.000
0.000
0.000
! The slope and intercept are correlated, the covariance is
! .416, z = 5.551, p < .001 (WITH means covariance in Mplus)
S
WITH
I
0.416
0.075
5.551
Growth Curve and Related Models, Alan C. Acock
10
Means
I
20.798
0.117
178.026
!Initial level, intercept = 20.798, (BMI starts at 20.798) z = 178.026; p < .001
!Slope = .668 (BMI goes up .668 each year), z = 35.183; p < .001
S
0.668
0.019
35.183
Intercepts
BMI97
0.000
0.000
0.000
BMI98
0.000
0.000
0.000
BMI99
0.000
0.000
0.000
BMI00
0.000
0.000
0.000
BMI01
0.000
0.000
0.000
BMI02
0.000
0.000
0.000
BMI03
0.000
0.000
0.000
! Variances, Ri and Rs in the figure, are both significant. This is what covariates
will try to explain—why do some youth start higher/lower and have a different
trend, i.e., slope, for the BMI?
Variances
I
13.184
0.643
20.504
S
0.213
0.018
12.147
! Following are the residual variances for the observed variables; hence they are
the errors, ei’s in our figure.
Residual Variances
BMI97
5.391
0.290
18.583
BMI98
2.729
0.159
17.124
BMI99
2.697
0.144
18.752
BMI00
3.529
0.178
19.860
BMI01
2.334
0.144
16.187
BMI02
9.533
0.457
20.837
BMI03
7.134
0.397
17.956
MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index
M.I.
E.P.C.
Std E.P.C.
3.840
StdYX E.P.C.
! Many of these changes make no sense. We could let the path of the slope to
BMI03 be free and chi-square would drop by about 45 points.
BY Statements
Growth Curve and Related Models, Alan C. Acock
11
I
I
I
I
S
S
S
S
BY
BY
BY
BY
BY
BY
BY
BY
BMI97
BMI99
BMI00
BMI03
BMI97
BMI99
BMI00
BMI03
87.808
25.404
21.840
29.103
55.850
17.773
18.572
44.611
-0.038
0.013
0.014
-0.026
-0.870
0.315
0.352
-0.915
-0.139
0.049
0.050
-0.093
-0.402
0.145
0.162
-0.423
-0.032
0.011
0.011
-0.016
-0.093
0.034
0.035
-0.074
! When Mplus has a value it can’t compute it prints 999.000. Normally ignore these
ON/BY Statements
S
I
ON I
BY S
/
999.000
0.000
0.000
0.000
! These “with” statements are for correlated errors. Some make sense, some don’t.
WITH Statements
BMI99
BMI99
BMI00
BMI00
BMI01
BMI01
BMI01
BMI02
BMI02
BMI02
BMI02
BMI03
BMI03
BMI03
BMI03
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
BMI97
BMI98
BMI97
BMI99
BMI97
BMI98
BMI00
BMI97
BMI99
BMI00
BMI01
BMI97
BMI99
BMI00
BMI02
4.993
8.669
3.912
17.357
8.255
7.032
12.398
4.707
5.455
9.829
4.305
36.224
9.296
8.824
8.242
-0.349
0.362
-0.322
0.503
-0.421
-0.300
0.447
0.560
-0.431
-0.649
0.413
1.488
-0.525
-0.583
0.931
-0.349
0.362
-0.322
0.503
-0.421
-0.300
0.447
0.560
-0.431
-0.649
0.413
1.488
-0.525
-0.583
0.931
-0.019
0.020
-0.016
0.026
-0.021
-0.015
0.021
0.023
-0.018
-0.025
0.015
0.060
-0.021
-0.022
0.029
! We do not pay much attention to these intercepts because Mplus automatically
fixes them at zero. Before freeing these, it would make more sense to free some of
the coefficients for slopes, e.g., 0, 1, *, *, *, * or to try a quadratic slope as
discussed in a latter section.
Means/Intercepts/Thresholds
[
[
[
[
BMI97
BMI99
BMI00
BMI03
]
]
]
]
79.520
19.737
17.444
23.066
Growth Curve and Related Models, Alan C. Acock
-0.770
0.250
0.257
-0.483
-0.770
0.250
0.257
-0.483
-0.179
0.058
0.056
-0.084
12
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores, estimated
values)
Scatterplots (sample values, estimated factor scores, estimated
values)
Sample means
Estimated means
Sample and estimated means
Observed individual values
Estimated individual values
Here are Some of the Available Plots
It is often useful to show the actual means for a small random sample of participants. These are
Sample Means.
 Click on Graphs
 Observed Individual Values
This gives you a menu where you can make some selections. I used the clock to seed a random
generation of observations.
Growth Curve and Related Models, Alan C. Acock
13
Here I selected Random Order and for 20 cases. This results in the following graph:
This shows one person who started at an obese BMI = 30 and then dropped down. However, most
people increased gradually.
Next, let’s look at a plot of the actual means and the estimated means using our linear growth
model. Click on
 Graphs and then select
 Sample and estimated means.
Growth Curve and Related Models, Alan C. Acock
14
You can improve this graph. You might click on the legend and move it so it is not over the trend
lines. You can right click inside the graph and add labels for the X axis and Y axis. You can
change the labels, and you can adjust the range for each axis.
Notice that there is a clear growth trend in BMI. A BMI of 15-20 is considered healthy and a BMI
of 25 is considered overweight. Notice what happens to American youth between the age of 12
and the age of 18.
Growth Curve and Related Models, Alan C. Acock
15
3 A Growth Curve with a Quadratic Term
This graph is useful to seeing if there is a nonlinear trend. It is simple to add a quadratic term, if
the curve is departing from linearity. Looking at the graph it may seem that the linear trend works
very well, but our RMSEA was a bit big and the estimated initial BMI is higher than the observed
mean. A quadratic might pick this up by having a curve that drops slightly to pick up the BMI97
mean. Estimation requires at least 4 waves of data, but more waves are highly desirable for a good
test of the quadratic term.
The conceptual model in Figure 1 will be unchanged except a third latent variable is added.
 We will have the Intercept, Slope, now called linear trend), and the new latent variable
called the Quadratic trend.
 Like the first two, the Quadratic trend will have a residual variance (RQ) that will freely
correlated with RI and RL.
 The paths from the quadratic trend to the individual BMI variables will be the square of
the path from the Linear trend to the BMI variables. Hence
a. The values for the linear trend will remain 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, and 6.0.
b. For the quadratic these values will be 0.0, 1.0, 4.0, 9.0, 16.0, 25.0, and 36.0.
RL
RI
RQ
Intercept
1
0
Linear
1
1
4
1
1
Quadratic
1
1
2
1
3
1
4
9
16
25
5
36
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
Growth Curve and Related Models, Alan C. Acock
16
You really appreciate the defaults in Mplus when you see what we need to change in the Mplus
program when we add a quadratic slope. Here is the only change we need to make:
Model:
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
Mplus will know that the quadratic, q (we could use any name) will have values that are the
square of the values for the slope, s.
Title:
bmi_guadratic.inp
Quadratic growth curve
Data:
File is "C:\Mplus examples\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p
male race_eth bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
usevariables is limited to bmi variables
Usevariables are bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 ;
!
Model:
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3
bmi01@4 bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84);
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01
bmi02 bmi03(*);
Here is selected output:
TESTS OF MODEL FIT
! We have lost 4 degrees of freedom
 mean for the quadratic slope,
Growth Curve and Related Models, Alan C. Acock
17
 variance for the quadratic slope, Rq,
 covariance of the Rq with Ri
 covariance with Rq with Rs
! The fit is excellent.
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
61.791 !Was 220.570
19 !Was 23
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
8568.499
Degrees of Freedom
21
P-Value
0.0000
CFI/TLI
CFI
0.995 !.977
TLI
0.994 !.979
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.045 !.088
90 Percent C.I.
0.033 0.058
Probability RMSEA <= .05
0.715
SRMR (Standardized Root Mean Square Residual)
Value
0.022
MODEL RESULTS
! Results for I and S are same as before. The paths for Q are simply the squared
values
Q
|
BMI97
0.000
0.000
0.000
BMI98
1.000
0.000
0.000
BMI99
4.000
0.000
0.000
BMI00
9.000
0.000
0.000
BMI01
16.000
0.000
0.000
BMI02
25.000
0.000
0.000
BMI03
36.000
0.000
0.000
S
WITH
I
0.575
0.220
2.616
Q
WITH
Growth Curve and Related Models, Alan C. Acock
18
I
-0.038
0.034
-1.116
S
-0.130
0.021
-6.324
! The Negative slope, -.064, for quadratic suggests a leveling off of the growth
curve.
Means
I
20.439
0.118
173.266
S
1.045
0.049
21.108
Q
-0.064
0.008
-8.183
Variances
I
12.381
0.671
18.462
S
0.984
0.134
7.357
Q
0.023
0.004
6.412
Residual Variances
BMI97
4.318
0.316
13.660
BMI98
2.789
0.158
17.613
BMI99
2.442
0.141
17.357
BMI00
3.187
0.173
18.418
BMI01
2.354
0.147
16.022
BMI02
9.521
0.454
20.948
BMI03
4.989
0.491
10.157
 The fit is so good because the estimated means and observed means are so close.
 However, there is still significance variance among individual adolescents that needs to be
explained.
 Here are 20 estimated individual growth curves.
Growth Curve and Related Models, Alan C. Acock
19
a. Notice that each of these is a curve, but they start at different initial levels and have
different trajectories.
b. Next, we want to use covariates to explain these differences in the initial levels and growth
trajectories.
4 An Alternative to Use of a Quadratic Slope
An alternative to adding a quadratic slope is to allow some of the time loadings to be
free.
 We have used loadings of 0, 1, 2, 3, 4, 5, and 6 for the linear slope and 0, 1, 4,
9, 16, 25, and 36 for the quadratic slope. Alternatively
 We could allow all but two of the loadings to be free. We might use loadings of
0, 1, *, *, *, * .
 It is necessary to have the 0 and 1 fixed but the 1 does not have to be second;
we could use 0, *, *, 1.
You may ask how you could justify allowing some of the time loadings to be free if
there was a one month or one year difference between waves of data. The answer is
that developmental time may be different than chronological time.
Allowing these loadings to be free has an advantage over the quadratic in that it uses
fewer degrees of freedom but still allows for growth spurts.
Growth Curve and Related Models, Alan C. Acock
20
This model is not nested under a quadratic, but you could think of a linear growth
model with fixed values for each year (0, 1, 2, 3, 4, 5, 6) being nested within the free
model that uses 0, 1, *, *, *, *. If the free model fits much better than the fixed linear
model, you might use this instead of the quadratic model.
1
0
RI
RS
Intercept
Slope
1
1
1
1
1
*
1
*
1
*
*
*
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
5 Working with Missing Values
Mplus has two ways of working with missing values. The simplest is to use full information
maximum likelihood estimation with missing values (FIML). This uses all available data. For
example, some adolescents were interviewed all six years but others may have skipped one, two,
or even more years. We use all available information with this approach. The second approach is
to utilize multiple imputations.
 Multiple imputations should not be confused with single imputation available from SPSS if a
person purchases their missing values module and which gives incorrect standard errors.
 Multiple imputation involves
a. Imputing multiple datasets (usually 5-10) using appropriate procedures,
Growth Curve and Related Models, Alan C. Acock
21
b. Estimating the model for each of these datasets, and
c. Then pooling the estimates and standard errors.
When the standard errors are pooled this way, they incorporate the variability across the 5-10
solutions and are thereby produced unbiased estimates of standard errors. Multiple imputations
can be done with:
 Norm, a freeware program that works for normally distributed, continuous variables and is
often used even on dichotomized variables.
 A Stata user has written a program called ICE that is an implementation of the S-Plus program
called MICE, that has advantages over Norm. It does the imputation by using different
estimation models for outcome variables that are continuous, counts, or categorical. See
Royston (2005).
 Mplus can read these multiple datasets, estimate the model for each dataset, and pool the
estimates and their standard errors.
We will not illustrate the multiple imputation approach because that involves working with other
programs to impute the datasets. However, the Mplus User’s Guide, discusses how you specify the
datasets in the Data: section. We will illustrate the FIML approach because it is widely used
and easily implemented—and doesn’t require explaining another software package.
The FIML approach does not work when you can’t justify a maximum likelihood estimator.
Here is the program
Title:
bmi_missing.inp
Basic growth curve with missing values
Data:
File is "C:\Mplus examples\bmi_stata.dat" ;
Variable:
!
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p
male race_eth bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
usevariables is limited to bmi variables
Usevariables are bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 ;
Growth Curve and Related Models, Alan C. Acock
22
Analysis:
Type = General Missing H1 ;
Estimator = MLR ;
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3
bmi01@4 bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84) patterns;
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01
bmi02 bmi03(*);
The conceptual model does not change with missing values. The programming for implementing
the FIML solution changes very little. You will recall that we did not need an Analysis:
section in our program for doing a growth curve. However, we do need one when we are doing a
growth curve with missing values and using FIML estimation. Directly above the Model
command we insert
Analysis:
Type = General Missing H1 ;
Estimator = MLR ;




Type = General Missing H1; this line is the key change.
The missing tells Mplus to do the full information maximum likelihood estimation.
The H1 is necessary to get sample statistics in our output.
We could do this with maximum likelihood estimation, but will use a robust maximum
likelihood estimator, Estimator = MLR, instead. This is optional, but generally
conservative when you have substantial missing values.
In the Output: section, we also add a single word, patterns. This will give us a lot of
information about patterns of missing values. We will see just what patterns there are, the
frequency of occurrence of each pattern, and the percentage of data present for each covariance
estimate.
Output:
Sampstat Mod(3.84) patterns ;
Growth Curve and Related Models, Alan C. Acock
23
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Also, to simplify our presentation we will take out the quadratic term (the fit is better with the
quadratic term, but it takes more space to present and interpret the results).
Here are selected, annotated results:
*** WARNING
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 3
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
1768 ! We had 1102 observations using listwise deletion.
! An ‘x’ mean the data are present. Pattern 1 -- no missing values
! Pattern 2 – missing BMI03
SUMMARY OF MISSING DATA PATTERNS
MISSING DATA PATTERNS
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
1
x
x
x
x
x
x
x
2
x
x
x
x
x
x
3
x
x
x
x
x
x
4
x
x
x
x
x
5
x
x
x
x
6
x
x
x
x
x
x
x
7
x
x
x
x
8
x
x
x
x
x
9 10 11 12 13 14 15 16 17 18 19 20
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x
x x x x x
x x x x
x x x x
x x
x x
x x
x
x
x
x
x
x
x
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x
x x x x x x x x x x x
x x x
x x x x x x x
x x x
x x x x
x x
x
x x
x
x x
x
x
x
x
x
x x
x
x
x
x x
x
Growth Curve and Related Models, Alan C. Acock
24
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
x x x x x x x x x x x x x x x
x x x x x
x
x x x x x
x x x x x x
x x x x x
x x x x
x x x x
x x x x
x x
x x x x
x x
x x
x
x
x
x
x
x
x
x
x
x
x
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
81
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
x
x
x
MISSING DATA PATTERN FREQUENCIES
Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
Frequency
1102
97
73
38
21
11
5
20
23
4
8
3
8
3
11
25
6
Pattern
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Growth Curve and Related Models, Alan C. Acock
Frequency
2
10
51
4
3
1
1
1
3
6
1
1
1
3
6
3
1
Pattern
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
Frequency
26
53
9
9
2
4
1
4
1
3
5
1
1
1
1
2
1
25
18
19
20
21
22
23
24
25
26
27
3
2
3
1
1
2
7
1
1
6
45
46
47
48
49
50
51
52
53
54
1
2
1
6
3
2
3
3
3
3
72
73
74
75
76
77
78
79
80
81
14
1
1
2
1
1
7
1
2
4
! We might want to set some minimum standard and drop observations that do not
meet that. For example, we might drop people who are missing their BMI for more
than 3 waves.
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value
0.100
PROPORTION OF DATA PRESENT
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Covariance Coverage
BMI97
BMI98
________
________
0.925
0.847
0.902
0.850
0.856
0.842
0.846
0.839
0.837
0.796
0.794
0.777
0.775
BMI02
BMI03
Covariance Coverage
BMI02
BMI03
________
________
0.861
0.774
0.840
BMI99
________
BMI00
________
BMI01
________
0.910
0.864
0.854
0.805
0.788
0.906
0.859
0.811
0.788
0.904
0.817
0.801
! We have 77.4% of the 1768 observations answering both BMI02 and BMI03
SAMPLE STATISTICS
! Notice that the means are not dramatically different from the results of the
“basic” analysis that had the 1098 observations using listwise deletion. This is
reassuring that our missing values are not creating a systematic bias.
Growth Curve and Related Models, Alan C. Acock
26
1
Means
BMI97
________
20.572
BMI98
________
21.839
1
Means
BMI02
________
24.390
BMI03
________
24.935
BMI99
________
22.651
BMI00
________
23.305
BMI01
________
23.846
TESTS OF MODEL FIT
! If you compare nested models with MLR estimation you need to use the scaling
correction factor as discussed on their web page. We are not doing that here, so
this is okay.
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
Scaling Correction Factor
for MLR
*
116.426*
23
0.0000
2.302
The chi-square value for MLM, MLMV, MLR, ULS, WLSM and WLSMV cannot be used
for chi-square difference tests. MLM, MLR and WLSM chi-square difference
testing is described in the Mplus Technical Appendices at www.statmodel.com.
See chi-square difference testing in the index of the Mplus User's Guide.
! The chi-square is much bigger when we use FIML estimation with missing values,
in part because the sample is so much bigger. Still there are some fit problems
without the quadratic term. Both the CFI and TLI are a bit low to be ideal (under
.96). However the RMSEA is good and that is the most widely used measure of fit.
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
1279.431
21
0.0000
CFI/TLI
CFI
0.926
TLI
0.932
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.048
SRMR (Standardized Root Mean Square Residual)
Value
0.051
Growth Curve and Related Models, Alan C. Acock
27
! The results are similar to the linear model solution with listwise deletion, but our
z-scores are bigger due to having more observations.
S
WITH
I
0.408
0.112
3.658
Means
I
S
21.035
0.701
0.105
0.022
200.935
32.311
Variances
I
S
15.051
0.255
0.958
0.031
15.714
8.340
5.730
3.276
3.223
4.361
2.845
9.380
8.589
0.638
0.414
0.351
0.973
0.355
3.384
2.736
8.981
7.907
9.175
4.483
8.005
2.772
3.139
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores, estimated values)
Scatterplots (sample values, estimated factor scores, estimated values)
Sample means
Estimated means
Sample and estimated means
Observed individual values
Estimated individual values
Multiple Cohort Growth Model with Missing Waves
Major datasets often have multiple cohorts. NLSY97 has youth who were 12-18 in 1997. Seven
years later, they are 19-25. It is quite likely that many growth processes that involve going from
the age of 12 to the age of 19 are different than going from 19-25. For example, involvement in
Growth Curve and Related Models, Alan C. Acock
28
minor crimes (petty theft, etc.) may increase from 12 to 19, but then decrease from there to 25.
Here is what we might have for our NLSY97 data
Individual
1
2
3
4
5
Cohort
1985
1985
1984
1982
1982
1997
3
2
4
6
5
1998
4
4
5
7
5
1999
5
3
6
5
6
2000
6
5
7
4
4
2001
7
6
6
3
2
2002
7
7
6
2
2
2003
8
7
5
2
1
We can rearrange this data
Case
1
2
3
4
5
Cohort
1985
1985
1984
1982
1982
HD12
3
2
*
*
*
HD13
4
4
4
*
*
HD14
5
3
5
*
*
HD15
6
5
6
6
5
HD16
7
6
7
7
5
HD17
7
7
6
5
6
HD18
8
7
6
4
4
HD19
*
*
5
3
2
HD20
*
*
*
2
2
HD21
*
*
*
2
1
 In this table HD is the age at which the data was collected. To capture everybody we would
need to extend the table to HD25 because the youth who were 18 in 1997 are 25 seven years
latter.
 This table would have massive amounts of missing data, but the missingness would not be
related to other variables. It would be missing completely at random (MCAR).
 We could develop a growth curve that covered the full range from age 12 to age 25. We would
have 14 waves of data even though each participant was only measured 7 times. Each
participant would have data for 7 of the years and have missing values for the other 7 years.
 We would want to estimate a growth model with a quadratic term and expect the linear slope to
be positive (growth from 12-18) and the quadratic term to be negative (decline from 18-25).
 Mplus has a special Analysis: type called MCOHORT. There is an example on the Mplus
WebPage and we will not cover it here. This is an extraordinary way to deal with missing
values.
Here is an example from data Muthén analyzed:
Growth Curve and Related Models, Alan C. Acock
29
Multiple group growth curves
Multiple group analysis using SEM is extremely flexible—some would say it is too flexible
because there are so many possibilities. We use gender for our grouping variable because we are
interested in the trend in BMI for girls compared to boys. We think of adolescent girls are more
concerned about their weight and therefore more likely to have a lower BMI than boys and to have
a flatter trajectory.
There are several ways of comparing a model across multiple groups.
One approach is to see if the same model fits each group, allowing all of the estimated parameters
to be different.
 Here we are saying that a linear growth model fits the data for both boys and girls, but
 We are not constraining girls and boys to have the same values on any of the parameters
- intercept mean
- slope mean
- intercept variance
- slope variance
- covariance of intercept and slope
- residual errors
 We can then put increasing invariance constraints on the model.
Growth Curve and Related Models, Alan C. Acock
30
a. At a minimum, we want to test whether the two groups have a different intercept (level) and
slope.
b. If this constraint is acceptable we can add additional constraints on the variances,
covariances, and error terms.
First, we will estimate the model simultaneously for girls and boys with no constraints on the
parameters. Here is the program with new commands highlighted:
Title:
bmi_growth_gender.inp
Data:
File is "C:\mplus examples\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male
race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02
bmi03 white black hispanic asian other;
Missing are all (-9999) ;
!
usevariables keeps bmi variables and gender
Usevariables are male bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Grouping is male (0=female 1=male);
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4
bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84) ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
I’ve put the only changes we need to make in bold, underline.
 We have a binary variable, male, that is coded 0 for females and 1 for males.
 We add male to the list of variables we are using.
 We add a subcommand to the Variable: section that says we have a grouping variable,
names it, and defines what the values are so the output will be labeled nicely.
 The command Grouping is male (0=female 1 = male); is going to give us a
separate set of estimates for the parameters for girls (labeled female) and boys (labeled
male).
Growth Curve and Related Models, Alan C. Acock
31
Here is selected, annotated output:
SUMMARY OF ANALYSIS
Number of groups
2
Number of observations
Group FEMALE
528
Group MALE
574
Number of dependent variables
7
Number of independent variables
0
Number of continuous latent variables
2
Variables with special functions
Grouping variable
MALE
SAMPLE STATISTICS FOR FEMALE
1
Means
BMI97
________
19.904
1
Means
BMI02
________
23.606
BMI98
________
21.198
BMI99
________
21.752
BMI00
________
22.349
BMI01
________
22.805
BMI03
________
23.961
SAMPLE STATISTICS FOR MALE
1
Means
BMI97
________
20.652
BMI98
________
21.835
1
Means
BMI02
________
24.370
BMI03
________
24.994
Growth Curve and Related Models, Alan C. Acock
BMI99
________
22.858
BMI00
________
23.638
BMI01
________
24.063
32
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
Degrees of Freedom
twice the degrees of freedom
P-Value
320.535
46 ! Notice we have
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
8906.678
42
0.0000
CFI/TLI
CFI
TLI
0.969
0.972
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
0.104
0.093
0.115
SRMR (Standardized Root Mean Square Residual)
Value
0.063
MODEL RESULTS
Estimates
S.E.
Est./S.E.
Group FEMALE
I
|
Growth Curve and Related Models, Alan C. Acock
33
S
WITH
I
0.465
0.090
5.187
Means
I
S
20.421
0.610
0.157
0.024
130.261
24.975
Variances
I
S
11.579
0.183
0.801
0.020
14.457
8.920
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
4.632
2.033
1.896
4.567
2.298
15.204
3.400
0.351
0.177
0.153
0.312
0.192
0.991
0.349
13.183
11.463
12.367
14.644
11.984
15.342
9.730
0.337
0.114
2.956
21.215
0.697
0.171
0.027
124.278
25.551
14.528
0.232
0.991
0.026
14.660
8.918
6.306
3.445
3.405
2.651
2.132
0.471
0.269
0.241
0.195
0.183
13.391
12.800
14.108
13.612
11.671
Group MALE
S
WITH
I
Means
I
S
Variances
I
S
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
Growth Curve and Related Models, Alan C. Acock
34
BMI02
BMI03
4.304
10.570
0.332
0.730
12.960
14.484
Here is the graph of the two growth curves. It appears that the girls have a lower initial level and a
flatter rate of growth of BMI.
We can re-estimate the model with the intercept and slope invariant. To do this we make the
following modifications to the model:
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
[i] (1);
[s] (2);
Model male:
[i] (1);
[s] (2);
Output:
Sampstat Mod(3.84) ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Notice that we added two lines to the Model: section,
Growth Curve and Related Models, Alan C. Acock
35
 [i] (1); and
 [s] (2);.
Then we added a subsection called Model male: where males are the second group (because
females were coded 0 and males were coded 1 on male, and put the same two lines.
 The first model command is understood to be the group coded as zero on the male variable.
 These changes force the intercept to be equal in both groups because they are both assigned
parameter (1) and the slopes to be equal because they are both assigned a parameter (2).
 Any parameters with a (1) after them are equal in both groups as are any parameters with a
(2) after them in both groups.
 Notice that we have square brackets [ ] around the names of the intercept and slope.
When we run the revised program we obtain a chi-square that has two extra degrees of freedom
because of the two constraints.
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
338.157 ! Was 320.535
48 ! Was 46
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
8906.678
Degrees of Freedom
42
P-Value
0.0000
CFI/TLI
CFI
0.967 ! .969
TLI
0.971 ! .972
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.105 ! .104
90 Percent C.I.
0.094 0.115
SRMR (Standardized Root Mean Square Residual)
Value
0.081
We can test the difference between
 the chi-square(48) = 338.17 and
 the chi-square(46) = 320.535.
Growth Curve and Related Models, Alan C. Acock
36
 This difference, 17.635 has 48-46 = 2 degrees of freedom and is significant at the p < .001
level.
 Although we can say there is a highly significant difference between the level and trend for
girls and boys, we need to be cautious because this difference of chi-square has the same
problem with a large sample size that the original chi-squares have.
 In fact, the measures of fit are hardly changed whether we constrain the intercept and slope to
be equal or not. Moreover, the visual difference in the graph is not dramatic.
We could also put other constraints on the two solutions such as equal variances and covariances,
and even equal residual error variances, but we will not.
8 Alternative to Multiple Group Analysis
An alternative way of doing this, where there are two groups, is to enter the grouping variable as a
predictor. This requires re-conceptualizing our model. We can think of the indicator variable
Male having a direct path to both the intercept and the slope. Because the indicator variable is
coded as 1 for male and 0 for female,
 If the path from Male to the Intercept is positive this means that boys have a higher initial
level on BMI.
 Similarly, if there is a positive path from Male to the Slope, this indicates that boys have a
steeper slope than girls on BMI.
 Such results would be consistent with our expectation that boys both start higher and gain more
fat than girls during adolescence.
 This approach does not let us test for other types of invariances such as the variances,
covariances, and error terms. We are forcing these to be the same for both females and males;
this may be unreasonable.
The following figure shows these two paths. We are explaining why some people have a higher or
lower initial level and why some have a steeper or flatter slope by whether they are a girl or a boy.
We are predicting that boys have a higher initial level and a steeper slope.
Here is the figure:
Growth Curve and Related Models, Alan C. Acock
37
Male
(+)
(+)
Rs
Rs
Intercept
1
1
1
1
Slope
1
1
2
1
3
1
4
5
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e99
e00
e01
e02
e03
Here is part of the program:
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97
bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
!
usevariables is limited to bmi variables and male
Usevariables are male bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
i on male ;
s on male ;
Growth Curve and Related Models, Alan C. Acock
38
Output:
Sampstat Mod(3.84) ;
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Here is selected, annotated output:
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
237.517 ! We cannot
compare this to the chi-square for the two group design because this is not nested
in that model.
Degrees of Freedom
28
P-Value
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
8602.391
28
0.0000
CFI/TLI
CFI
TLI
0.976
0.976
Loglikelihood
H0 Value
H1 Value
-19515.302
-19396.543
Information Criteria
Number of Free Parameters
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
Growth Curve and Related Models, Alan C. Acock
14
39058.603
39128.672
39084.204
39
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
Probability RMSEA <= .05
0.082
0.073
0.000
0.092
SRMR (Standardized Root Mean Square Residual)
Value
0.044
MODEL RESULTS
I
ON
MALE
S
0.793
0.233
3.409 ! Males higher
0.084
0.038
2.203 ! Males steeper
0.400
0.075
5.371
ON
MALE
S
WITH
I
Intercepts
BMI97
0.000
0.000
0.000
BMI98
0.000
0.000
0.000
BMI99
0.000
0.000
0.000
BMI00
0.000
0.000
0.000
BMI01
0.000
0.000
0.000
BMI02
0.000
0.000
0.000
BMI03
0.000
0.000
0.000
I
20.385
0.168
121.416
S
0.625
0.027
22.816
! When we add one or more predictors of the intercept and slope, the intercept and
slope means are not reported under a section called “means,” but are now under
“intercepts”
Growth Curve and Related Models, Alan C. Acock
40
Residual Variances
BMI97
5.391
0.290
18.583
BMI98
2.731
0.159
17.129
BMI99
2.696
0.144
18.752
BMI00
3.524
0.177
19.858
BMI01
2.327
0.144
16.175
BMI02
9.552
0.458
20.846
BMI03
7.148
0.398
17.974
I
13.027
0.636
20.471
S
0.212
0.017
12.095
!Both the intercept and slope still have variance to explain
We see that the intercept is 20.385 and the slope is .625. How is gender related to this?
For girls the equation is:
Est. BMI = 20.385 + .625(Time) + .793(Male) + .084(Male)(Time)
20.385 + .625(Time) + .793(0) + .084(0)(Time)
= 20.385 + .625(Time)
For boys the equation is:
Est BMI = 20.385 + .625(Time) + .793(1) + .084(1)(Time)
= (20.385 + .793) + (.625 + .084)(Time)
= 21.178 + .709(Time)
Where Time is coded as 0, 1, 2, 3, 4, 5, 6
Using these we estimate the BMI for girls is initially 20.385. By the seventh year when she is
18(Time = 6) her estimated BMI will be 20.385 + .625(6) or 24.135
Using these results, we estimate the BMI for boys is initially 21.178. By the seventh year it will be
21.78 + .709(6) or 26.034. Since a BMI of 25 is considered overweight, by the age of 18 we
estimate the average boy will be classified as overweight.
We could use the plots provided by Mplus, but if we wanted a nicer looking plot we could use
another program. I used Stata getting this graph.
The Stata command is (this is driven by a drop down menu)
Growth Curve and Related Models, Alan C. Acock
41
twoway (connected Girls Age, lcolor(black) lpattern(dash) ///
lwidth(medthick)) (connected Boys Age, lcolor(black) ///
lpattern(solid) lwidth(medthick)), ///
ytitle(Body Mass Index) xtitle(Age of Adolescent) ///
caption(NLSY97 Data)
and the data is
+-----------------------+
| Age
Girls
Boys |
|-----------------------|
1. | 12
20.385
21.178 |
2. | 18
24.135
26.034 |
+-----------------------+
Body Mass Index by Age of Adolescent
20
22
24
26
Comparison of Girls with Boys
12
14
16
18
Age 12 to 18
Girls
Boys
Limitations of this approach
 When we treat a categorical variable as a grouping variable and do multiple comparisons we
can test the equality of all the parameters.
Growth Curve and Related Models, Alan C. Acock
42
 When we treat it as a predictor as in this example, we only test whether the intercept and
slope are different for the two groups. In this example we do not allow the other parameters
to be different for boys and girls and this might be a problem in some applications.
9 Growth Curves with Time Invariant Covariates
An extension of having a categorical predictor includes having a series of covariates that explain
variance in the intercept and slope. In this example we use what are known as time invariant
covariates. These are covariates that either remain constant (gender) or for which you have a
measure only at the start of the study. It is possible to add time varying covariates as well.
This has been called Conditional Latent Trajectory Modeling (Curran & Hussong, 2003)
because your initial level and trajectory (slope) are conditional on other variables.
This is equivalent to the multilevel approach that calls the intercept and slope random effects.
With programs such as HLM we use what they call a two level approach. Here are the parallels
using a slide adapted from Muthén.
In SEM we represent this as follows:
Growth Curve and Related Models, Alan C. Acock
43
Level 1 is defined as the measurement model with an intercept (level) and slope (trend/trajectory).
Level 2, represented by equations 2a and 2b treats the intercept and slope as random variables that
are explained by a vector of covariates, w.
 The yit is the outcome. In our example it is the score on BMI for individual “i” at time “t”.
 The xt is the time score. In our example of BMI we use 0, 1, 2, 3, 4, 5, 6
 The 0i is the intercept for individual “i”.
a. The graph just below equation 1 shows three individuals who each have a different
intercept.
b. Individual “1” has a higher starting value than individuals 2 or 3.
c. In the figure we show 0 because this represents the mean of 0i.
d. The paths from 0 and each yt is fixed at 1 because it is a constant effect.
 The 1ixt is the slope for individual “i” times his or her score on time.
a. With our BMI example, we score time as 0, 1, 2, 3, 4, 5, 6.
b. In the figure we use 1 because this represents the mean of 1i.
 If we had a quadratic, we would add an 2txt2. For BMI the xt2 would be 0, 1, 4, 9, 16, 25,
36.
 The it is the residual error on y for individual “i” at time “t”.
Growth Curve and Related Models, Alan C. Acock
44
a. With BMI you can imagine many factors that could have a temporary influence on a
person’s BMI score on the day it was measured.
b. The figure shows et (t = 1, 2, etc.) and the “i” is implicit.
 An important distinction that some make between HLM and SEM programs is that SEM
programs cannot have the time vary between individuals. If the youth are measured each
year, it is important that all of them are measured at the same time so they are all one
year apart.
 This only applied to early version of Mplus.
 Mplus has a way of eliminating this limitation of SEM by allowing each individual to
have a different time between measurements. For example, Li might be measured at 12
month intervals, Jones might be measured at intervals of 11 months, then 13 months,
then 9 months, etc.
 We are not discussing these extensions at this point (see TSCORE in the User’s Manual).
Emotional
Problems
Youth
e1
Parent
e2
White*
Intercept
1
Linear
1 1 1
1
1
1
Quadratic
1
64
1
1
2
1
3
9 16 35
36
1
4
5
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
Growth Curve and Related Models, Alan C. Acock
45
*The variable White (whites = 1; nonwhites = 0) compares Whites to the combination of African
American and Hispanic. Asian & Pacific Islander, and Other have been deleted from this analysis
because of small sample size.
In this figure we have two covariates. One is whether the adolescent is white versus African
American or Hispanic and the other is a latent variable reflecting the level of emotional problems a
youth has. There are two indicators of emotional problems, one from a parent report, boyprb_p,
and the other from a youth report, boyprb_y.
 A researcher may predict that Whites have a lower initial BMI (intercept) which persists during
adolescence, but the White advantage does not increase (same slope as nonwhites).
 Alternatively, a researcher may predict that being White predicts a lower initial BMI (intercept)
and less increase of the BMI (smaller slope) during adolescence. This suggests that minorities
start with a disadvantage (high BMI) and this disadvantaged gets even greater across
adolescence.
 A researcher may argue that emotional problems are associated with both higher initial BMI
(intercept) and a more rapid increase in BMI over time (slope)
By including a covariate that is a latent variable itself, emotional problems, we will show how
these are handled by Mplus.
We estimated this model for boys only; girls were excluded.
The following is our Mplus program:
Title: bmi_timea.inp
bmi growth curve using race/ethnicity and emotional problems
as a second covariate. There are two indicators of emotional
problems.
Data:
File is "c:\Mplus examples\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97
bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
!
usevariables is limited to bmi variables and male
Usevariables are boyprb_y boyprb_p white bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Useobservations = male eq 1 and asian ne 1 and other ne 1;
Model:
Growth Curve and Related Models, Alan C. Acock
46
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
emot_prb by boyprb_p boyprb_y ;
i on white emot_prb;
s on white emot_prb;
q on white emot_prb;
Output:
Sampstat Mod(3.84) standardized;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
I have highlighted the new lines in the Mplus program.
 The format of the Useobservations subcommand is similar to select or if used
with other programs.
 The Useobservations = male eq 1 and asian ne 1 and other ne 1;
restricts our sample to males (male eq 1). This is very handy when using the same
dataset for a variety of models where you want some models to only include selected
participants.
 We have dropped Asians and members of the “other” category. There are relatively few of
them in this sample dataset and they may have very different BMI trajectories. Also, the
meaning of the category “other” is ambiguous.
 I added a quadratic term in the Model: command. I first estimated this model using just a
linear slope and the fit was not very good. Adding the quadratic improved the fit.
 This example has a measurement model for a latent covariate, emot_prb. In other
programs this can involve complicated programming. Here it is done with the single line
emot_prb by boyprb_p boyprb_y ;
 The by is a key word in Mplus for creating latent variables used in Confirmatory Factor
Analysis and SEM.
 On the right of the by are two observed variables. The boyprb_p is the report of parents
about the adolescent’s emotional problems. The boyprb_y is the youths own report.
 It is desirable to have three or more indicators of a latent variable, but we only have two
here so that will have to do.
 To the left of the by is the name we give to the latent variable, emot_prb. This new latent
variable did not appear in the list of variables we are using, but it is defined here.
 The “by” term
o fixes the first variable to the right as a reference indicator, boyprb_p, and assigns a
loading of 1 to it.
Growth Curve and Related Models, Alan C. Acock
47
o It lets the loading of the second variable, boyprb_y, be estimated. It also creates
error/residual variances that are labeled e1 and e2 in the figure.
o The default is that these errors are uncorrelated.
o It is good practice to have the strongest indicator on the right of the “by” be the
reference indicator with a loading fixed at 1.0. You can run the model and if this does
not happen, you can re-run it, reversing the order of the items on the right of the
“by.”
 The next three new lines,
o
o
o
o
o
o
i on white emot_prb;
s on white emot_prb; and
q on white emot_prb;
Define the relationship between the covariates and the intercept and slope.
These are the 1wi in the equation presented earlier.
Mplus uses the on command to signify that a variable depends on another variable in
the structural part of the model. The by command is the key to understanding how
Mplus sets up the measurement model and the on is the key to how Mplus sets up the
structural model.
There are many defaults. Mplus assumes there are residual variances and covariances for the
intercept and slopes. It fixes the intercepts at zero. It assumes the intercept and slope variances are
correlated.
Here is selected results:
Estimator
ML
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
64.201
34
0.0013
CFI/TLI
CFI
TLI
Information Criteria
Number of Free Parameters
Growth Curve and Related Models, Alan C. Acock
0.993
0.990
29
48
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
(n* = (n + 2) / 24)
20924.710
21046.407
20954.362
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.043
90 Percent C.I.
0.026
Probability RMSEA <= .05
0.767
0.058
SRMR (Standardized Root Mean Square Residual)
Value
0.026
MODEL RESULTS
Estimates
S.E.
Est./S.E.
1.000
0.709
0.000
0.284
0.000
2.492
1.057
0.749
0.663
0.527
ON
EMOT_PRB
0.245
0.249
0.984
0.071
0.071
ON
EMOT_PRB
0.257
0.130
1.988
0.230
0.230
ON
EMOT_PRB
-0.045
0.021
-2.118
-0.277
-0.277
-1.050
0.380
-2.767
-0.288
-0.142
-0.023
0.172
-0.136
-0.020
-0.010
-0.003
0.028
-0.107
-0.017
-0.008
0.717
0.384
1.869
0.166
0.166
-0.099
-0.174
0.060
0.038
-1.654
-4.592
-0.157
-0.848
-0.157
-0.848
EMOT_PRB BY
BOYPRB_P
BOYPRB_Y
I
S
Q
I
Std
StdYX
ON
WHITE
S
ON
WHITE
Q
ON
WHITE
S
WITH
I
Q
WITH
I
S
Growth Curve and Related Models, Alan C. Acock
49
WHITE
WITH
EMOT_PRB
-0.065
0.033
-1.975
-0.061
-0.125
Intercepts
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I (mean)
S (mean)
Q (mean)
1.986
1.676
0.000
0.000
0.000
0.000
0.000
0.000
0.000
21.350
1.272
-0.097
0.064
0.072
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.291
0.132
0.021
31.010
23.382
0.000
0.000
0.000
0.000
0.000
0.000
0.000
73.368
9.651
-4.584
1.986
1.676
0.000
0.000
0.000
0.000
0.000
0.000
0.000
5.858
1.073
-0.560
1.396
1.052
0.000
0.000
0.000
0.000
0.000
0.000
0.000
5.858
1.073
-0.560
Variances
EMOT_PRB
1.117
0.467
2.395
1.000
1.000
1.461
1.424
5.238
3.446
3.269
2.119
1.998
4.356
9.906
12.916
1.330
0.028
0.243
0.456
0.578
0.287
0.259
0.196
0.193
0.366
0.914
1.091
0.246
0.006
6.013
3.122
9.060
12.017
12.637
10.805
10.365
11.908
10.833
11.834
5.417
4.406
1.461
1.424
5.238
3.446
3.269
2.119
1.998
4.356
9.906
0.972
0.947
0.924
0.723
0.560
0.283
0.180
0.149
0.091
0.082
0.160
0.297
0.972
0.947
0.924
Residual Variances
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I
S
Q
R-SQUARE
Observed
Variable
R-Square
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
0.277
0.440
0.717
0.820
0.851
0.909
Growth Curve and Related Models, Alan C. Acock
50
BMI01
BMI02
BMI03
0.918
0.840
0.703
Latent
Variable
R-Square
I
S
Q
0.028
0.053
0.076
! We are not explaining much variance in any of these.
MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index
Std E.P.C.
3.840
M.I.
E.P.C.
StdYX E.P.C.
4.422
10.800
7.048
7.693
7.599
9.622
-0.012
0.034
-0.205
0.393
2.240
-4.758
-0.043
0.122
-0.243
0.466
0.388
-0.825
-0.008
0.021
-0.047
0.081
0.075
-0.143
-1.119
0.552
0.252
-0.506
0.435
-0.370
-0.648
0.803
-1.356
-1.119
0.552
0.252
-0.506
0.435
-0.370
-0.648
0.803
-1.356
-0.059
0.027
0.032
-0.022
0.018
-0.045
-0.026
0.031
-0.045
0.685
0.685
0.119
BY Statements
I
I
S
S
Q
Q
BY
BY
BY
BY
BY
BY
BMI02
BMI03
BMI02
BMI03
BMI02
BMI03
WITH Statements
! Might consider correlating adjacent errors.
BMI98
WITH BMI97
4.091
BMI99
WITH BMI98
6.766
BMI01
WITH BOYPRB_P
4.544
BMI01
WITH BMI99
8.391
BMI01
WITH BMI00
5.132
BMI02
WITH BOYPRB_P
4.868
BMI02
WITH BMI00
10.058
BMI02
WITH BMI01
12.449
BMI03
WITH BMI02
4.559
Means/Intercepts/Thresholds
[ BMI03
]
10.211
Unfortunately, we cannot get graphs when we have covariates. You could create these yourslef by
substituting fix values for race and emotional problems.
Growth Curve and Related Models, Alan C. Acock
51
10 Mediational Models with Time Invariant Covariates
Sometimes all of the covariates are time invariant or at least measured at just the start of the study.
Curran and Hussong (2003) discuss a study of a latent growth curve on drinking problems with a
covariate of parental drinking. Parental drinking influences both the initial level and the rate of
growth of drinking problem behavior among adolescents. The question is whether some other
variables might mediate this relationship
 Parental monitoring
 Peer influence
Parent
Drinking
Parental
Monitoring
Peer
Influence
Intercept on
Problem Drinking
Slope Problem
Drinking
Mplus allows us to estimate the direct and indirect effect of Parent Drinking on the Intercept and
Slope. It also provides a test of significance for these effects.
11 Time Varying Covariates
We have illustrated time invariant covariates that are measured at time 1. It is possible to extend
this to include time varying covariates. Time varying covariates either are measured after the
process has started or have a value that changes (hours of nutrition education). Although we will
not show our output, we will illustrate the use of time varying covariates in a figure. In this figure
the time varying covariates, a21 to a24 might be
Growth Curve and Related Models, Alan C. Acock
52
 Hours of nutrition education completed between waves. Independent of the overall growth
trajectory, η1, students who have several hours of nutrition education programming may have a
decrease in their BMI
 Physical education curriculum. A physical activity program might lead to reduced BMI.
Students who spend more time in this physical activity program might have a lower BMI
independent of the overall growth trend, η1
 This would be a good way to incorporate fidelity into a program evaluation
This figure is borrowed from Muthén where he is examining growth in math performance over 4
years. The w vector contains x variables are covariates that directly influence the intercept, η0, or
slope, η1. The aij are number of math courses taken each year.
yit
a1it
a2it
w
= repeated measures on the outcome (math achievement)
= Time score (0, 1, 2, 3) as discussed previously
= Time varying covariates (# of math courses taken that year)
= Vector of x covariates that are time invariant and measured at or before the first yit
In this example we might think of the yi variables being measures of conflict behavior where y1 is
at age 17 and y4 is at age 25. We know there is a general decline in conflict behavior during this
time interval. Therefore, the slope η1 is expected to be negative.
Now suppose we also have a measure of alcohol abuse for each of the 4 waves (aij). We might
hypothesize that during a year in which an adolescent has a high score on alcohol abuse (say
number of days the person drinks 5 or more drinks in the last 30 days) that there will be an
Growth Curve and Related Models, Alan C. Acock
53
elevated level of conflict behavior that cannot be explained by the general decline (negative
slope).
The negative slope reflects the general decline in conflict behavior by young adults as the move
from age 17 to age 25. The effect of aij on yi provides the additional explanation that those years
when there is a lot of drinking; there will be an elevated level of conflict that does not fit the
general decline.
If you want more, here are a few references
2. Basic growth curve modeling
a. Bollen, K. A., & Curran, P. J. (2006). Latent Curve Models: A Structural Equation
Perspective. Hoboken, NJ: Wiley.
b. Curran, F. J., & Hussong, A. M. (2003). The Use of latent Trajectory Models in
Psychopathology Research. Journal of Abnormal Psychology. 112:526-544. This is a
general introduction to growth curves that is accessible.
c. Duncan, T. E., Duncan, S. C., & Strycker, L. A. (2006). An Introduction to Latent
Variable Growth Curve Modeling: Concepts, Issues, and Applications (2nd ed.).
Mahwah NJ: Lawrence Erlbaum. The second edition of a classic text on growth curve
modeling.
d. Kaplan, D. (2000). Chapter 8: Latent Growth Curve Modeling. In D. Kaplan,
Structural Equation Modeling: Foundations and Extensions (pp 149-170). Thousand
Oaks, CA: Sage. This is a short overview.
3. Limited Outcome Variables: Binary and count variables
a. Muthén, B. (1996). Growth modeling with binary responses. In A. V. Eye & C.
Clogg (Eds.) Categorical Variables in Developmental Research: Methods of analysis
(pp 37-54). San Diego, CA: Academic Press.
b. Long, J. S., & Freese, J. (2006). Regression Models for Categorical Dependent
Variables Using Stata, 2nd ed. Stata Press (www.stata-press.com). This provides the
most accessible and still rigorous treatment of how to use an interpret limited
dependent variables.
c. Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling
Using Stata. Stata Press (www.stata-press.com). This discusses a free set of
commands that can be added to Stata that will do most of what Mplus can do and
some things Mplus cannot do. It is hard to use and very slow.
Growth Curve and Related Models, Alan C. Acock
54
4. Growth mixture modeling
a. Muthén, B., & Muthén, L. K. (2000). Integrating person-centered and variablecentered analysis: Growth mixture modeling with latent trajectory classes.
Alcoholism: Clinical and Experimental Research. 24:882-891.
This is an excellent and accessible conceptual introduction.
b. Muthén, B. (2001). Latent variable mixture modeling. In G. Marcoulides, & R.
Schumacker (Eds.) New Developments and Techniques in Structural Equation
Modeling (pp. 1-34). Mahwah, NJ: Lawrence Erlbaum.
c. Muthén, B., Brown, C. H., Booil, J., Khoo, S. Yang, C. Wang, C., Kellam, S., Carlin,
J., & Liao, J. (2002). General growth mixture modeling for randomized preventive
interventions. Biostatistics, 3:459-475
d. Muthén, B. Latent Variable analysis: Growth Mixture Modeling and Related
Techniques for Longitudinal Data. (2004) In D. Kaplan (ed.), Handbook of
quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA:
Sage Publications
e. Muthén, B., Brown, C. H., Booil Jo, K, M., Khoo, S., Yang, C. Wang, C., Kellam, S.,
Carlin, J., Liao, J. (2002). General growth mixture modeling for randomized
preventive interventions. Biostatistics. 3,4, pp. 459-475.
5. The web page for Mplus maintains a current set of references, many as PDF files. These are
organized by topic and some include data and the Mplus program.
Growth Curve and Related Models, Alan C. Acock
55
Download