Growth Curve Presentation

advertisement
GROWTH CURVES AND EXTENSIONS USING MPLUS
Alan C. Acock
alan.acock@oregonstate.edu
Department of HDFS
322 Milam Hall
Oregon State University
Corvallis, OR 97331
7/2009
This document and selected references, data, and programs can be downloaded from
http://oregonstate.edu/~acock/growth
Growth Curve and Related Models, Alan C. Acock
1
Section 1: Brief Summary of Topics
Section 2: A Simple Growth Curve
2.1 Precautionary guidelines
2.2 Graphic representation of a growth curve
2.3 Mplus program for simple growth curve
2.4 Annotated selected growth curve output
2.5 Here are some available plots
Section 3: Quadratic growth curve
3.1 Graphic representation of quadratic growth curve
3.2 Mplus program & output quadratic growth curve
3.3 Plots for quadratic growth curve
Section 4: How Many Waves Should We Have?
4.1 Linear Curve—3 minimum, 4 much better
4.2 For a quadratic—4 minimum, 5 much better
Section 5: Alternative to Use of a Quadratic Slope
Section 6: Working with Missing Values
6.1 Two approaches used by Mplus
6.2 Multiple cohort extension
Section 7: Multiple group growth curves
7.1 Program and output without constraints
7.2 Comparing intercept and slope
Section 8: An Alternative to Multiple Group Analysis
8.1 Model and figure
8.2 Mplus program and output
8.3 Graphic representation
Section 9: Growth Curves with Time Invariant Covariates
9.1 A conditional latent trajectory model
9.2 Mplus program and output
Section 10: Mediation & Moderation
Section 11: Time Varying Covariates
Section 12: Extensions and Suggested Reading
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
....................................
3
3
3
4
7
9
14
16
17
18
21
23
23
25
25
27
27
28
29
30
34
37
37
39
42
43
44
44
51
52
53
Goal of the Workshop
The goal of this workshop is to explore a variety of applications of latent growth curve models using the Mplus
program. Because we will cover a wide variety of applications and extensions of growth curve modeling, we will
not cover each of them in great detail. At the end of this workshop it is hoped that participants will be able to run
Mplus programs to execute a variety of growth curve modeling applications and to interpret the results correctly.
Assumed Background
Participants should be familiar with the content in Introduction to Mplus that is located at
www.oregonstate.edu/~acock/growth . It will be assumed that participants in the workshop have some background
in Structural Equation Modeling. Background in multilevel analysis will also be useful, but is not assumed. It is
possible to learn how to estimate the specific models we will cover without a comprehensive knowledge of Mplus,
but some background using an SEM program is useful.
Growth Curve and Related Models, Alan C. Acock
2
Section 1: Brief Summary of Topics
Introduction to Growth Curve Modeling
Growth Curves are ideal for longitudinal studies.
 Instead of predicting a person’s score on a variable (e.g., mean comparison among scores at
different time points or relationships among variables at different time points), we predict
their growth trajectory—what is their level on the variable AND how is this changing.
 We will present a conceptual model, show how to apply the Mplus program, and interpret
the results.
 Once we can estimate growth trajectories, the more interesting issue becomes explaining
individual differences in trajectories (why some people go up, down, or stay the same).
 We will introduce growth curves for multiple groups such as comparing women and men
 Time invariant and time variant covariates will be introduced
 Mediation will be introduced
Section 2: A Simple Growth Curve
2.1 Precautionary guidelines
Estimating a basic growth curve using Mplus is quite easy, but when developing a complex
model it is best to start easy and gradually build complexity.
 Starting easy should include data screening to evaluate the distributions of the variables,
patterns of missing values, and possible outliers.
 Even if you have a theoretically specified model that is complex, always start with the
simplest model and gradually add the complexity.
 Here we will show how structural equation modeling conceptualizes a latent growth curves.
Before showing a figure to represent a growth curve, we examine a small sample of our
observations:
 Data is from the National Longitudinal Survey of Youth that started in 1997.
 We use the cohort that was 12 years old in 1997 and examine their trajectory for the BMI.
 Some may not like using the BMI on this age group, but this is only to illustrate an
application of growth curve modeling.
 The following graph of 10 randomly selected kids was produced by Mplus
Growth Curve and Related Models, Alan C. Acock
3
 A BMI value of 25 is considered overweight and a BMI of 30 is considered obese (I’m
aware of problems with the BMI as a measure of obesity and with its limitations when used
for adolescents)
 With just 10 observations it is hard to see much of a trend, but it looks like adolescents are
getting a higher BMI score as they get older.
 The X-axis value of 0 is when the adolescent was 12 years old; the 1 is when the adolescent
was 13 years old, etc. We are using seven waves of data (labeled 0 to 6) from the panel
study.
2.2 Graphic representation of a growth curve
A growth curve requires us to have a model and we should draw this before writing the Mplus
program. Figure 1 shows a model for our simple growth curve:
Growth Curve and Related Models, Alan C. Acock
4
This figure is much simpler than it first appears. The key variables are the two latent variables
labeled the Intercept growth factor and the Slope growth factor.
 The Intercept growth factor
a. The intercept represents the initial level and is sometimes called the initial level for this
reason. It is the estimated initial level and its value may differ from the actual mean for
BMI97 because in this case we are imposing a linear growth model.
b. It may differ from the mean of BMI97
- When covariates are added, especially when a zero value on covariates is rare and
covariates are not centered (household income)
- A straight line may over or underestimate any one mean including the initial mean
c. Unless the covariates are centered, it usually makes sense to just call it an intercept rather
than the initial level.
d. The intercept is identified by the constant loadings of 1.0 going to each BMI score. Some
programs call the intercept the constant, representing the constant effect to which other
effects are added.
e. It is possible to shift the intercept by how the waves are coded, e.g., we might make it the
last year or the middle year.
Growth Curve and Related Models, Alan C. Acock
5
 The Slope growth factor
a. Is identified by fixing the values of the paths to each BMI variable. In a publication you
normally would not show the path to BMI97, since this is fixed at 0.0.
b. We fix the other paths at 1.0, 2,0, 3.0, 4.0, 5.0, and 6.0. Where did we get these values? The
first year is the base year or year zero. The BMI was measured each subsequent year so
these are scored 1.0 through 6.0.
c. Other values are possible. Suppose the survey was not done in 2000 or 2001 so that we had
5 time points rather than 7. We would use paths of 0.0, 1.0, 2.0, 5.0, and 6.0 for years 1997,
1998, 1997, 2002, and 2003, respectively.
d. It is also possible to fix the first couple years and then allow the subsequent waves to be
free.
- This might make sense for a developmental process where the yearly intervals may not
reflect the developmental rate. Developmental time may be quite different than
chronological time.
- This has the effect of “stretching” or “shrinking” time to the pattern of the data (Curran
& Hussong, 2003).
- An advantage of this approach is that it uses fewer degrees of freedom than adding a
quadratic slope and can fit better.
- Compared to a quadratic for a curve, this approach doesn’t require a monotonic function.
e. Mplus has a feature that allows each participant to have a different interval which is
important when the time between waves varies.
- Annual surveys—One person has a 12-month difference, one a 10-month difference, and
one a 14-month difference.
- TSCORE
 Residual Variance and Random Effects
a. The individual variation around the Intercept and Slope are represented in Figure 1 by the RI
and Rs. These are the variance in the intercept and slope around their respective means.
b. We expect substantial variance in both of these as some individuals have a higher or lower
starting BMI and some individuals will increase (or decrease) their BMI at a different rate
than the average growth rate.
c. In addition to the mean intercept and slope, each individual will have their own intercept
and slope. We say the intercept and the slope are random effects since they may vary
across individuals.
- They are random in the sense that each individual may have a steeper or flatter slope
than the mean slope and
- Each individual may have a higher or lower initial level than the mean intercept.
Growth Curve and Related Models, Alan C. Acock
6
d. In our sample of 10 individuals shown above, notice one adolescent starts with a BMI
around 12 and three adolescents start with a BMI around 30. Some children have a BMI that
increases and others do not.
e. The variances, RI and R2 are critical if we are going to explore more complex models with
covariates (e.g., gender, psychological problems, race, household income, physical activity)
that might explain why some individuals have a steeper or less steep growth rate than the
average.
f. A random intercept model would have a free Ri and fixed Rs.
 The ei terms represent individual error terms for each year. Some years may move above or
below the growth trajectory described by our Intercept and Slope.
 Sometimes it might be important to allow error terms to be correlated, especially subsequent
pairs such as e97-e98, e98-e99, etc.
2.3 Mplus program for simple growth curve
Here is the Mplus program for a simple growth model:
Title:
bmi_growth.inp
Basic growth curve
Data:
Analysis:
Variable:
!
File is "C:\Mplus examples\bmi_stata.dat" ;
Processors = 2;
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p
male race_eth bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
Notice usevariables is limited to bmi variables
Usevariables are bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 ;
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3
bmi01@4 bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84);
Plot:
Growth Curve and Related Models, Alan C. Acock
7
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01
bmi02 bmi03(*);
What is new compared to an SEM program?
 Usevariables are: subcommand to only include the BMI variables since we are doing a
growth curve for these variables.
 We drop the Analysis: section if we have a single processor because we are doing basic
growth curve and can use the default options. With multiple processors, this is included to tell
Mplus how many processors to utilize.
 We have a Model: section because we need to describe the model. Mplus was designed after
growth curves were well understood. There is a single line to describe our model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
a. In this line the “i” and “s” stand for the intercept and slope growth factors, respectively.
We could have called these anything such as intercept and slope or initial and
trend. The vertical line, | (sometimes called “or bar,” tells Mplus that it is about to define
an intercept and slope.
b. Defaults
- The intercept is defined by a constant of 1.0 for each bmi variable. Interceptbmij path
is 1.0. Therefore, we do not need to mention this.
- The slope is defined by fixing the path from the slope to bmi97 at 0, the path to bmi98
at 1, etc. The @ sign is used for “at.” Don’t forget the semi-colon to end the command.
- Mplus assumes that there is a residual variance for both the intercept and slope (RI and
R2) and that these covary. Therefore, we do not need to mention this
- Mplus assumes there is uncorrelated random error, ei for each observed variable
- The intercepts for the Y variables (BMi97-BMi03) are fixed at zero by default. We could
specify this default by adding a line [BMi97-BMi03@0];. The square brackets are
used to fix the intercepts in this case.
- Means of intercept and slope are free. We could specify this by adding a line i s;
where simply naming the variables make their means free.
c. To allow e97 and e98 to be correlated, we would need to add a line saying bmi97 with
bmi98; .
- This may seem strange because we are not really correlating bmi97 with bmi98, but
e97 with e98. Mplus knows this and we do not need to generate a separate set of names for
the error terms.
Growth Curve and Related Models, Alan C. Acock
8
The last additional section in our Mplus program is for selecting what output we want Mplus to
provide. There are many optional outputs of the program and we will only illustrate a few of these.
The Output: section has the following lines
Output:
Sampstat Mod(3.84);
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03(*);
 The first line, Sampstat Mod(3.84) asks for sample statistics and modification indices for
parameters we might free, as long as doing so would reduce chi-square by 3.84 (corresponding
to the .05 level). We do not bother with parameter estimates that would have less effect than
this. The default value is 10.0.
 Next comes the Plot: subcommand and we say that we want Type is Plot3; for our
output. This gives us the descriptive statistics and graphs for the growth curve.
 The last line of the program specifies the series to plot. By entering the variables with an (*)
at the end we are setting a path at 0.0 for bmi97, 1.0 for bmi98, etc.
2.4 Annotated Selected Growth Curve Output
The following is selected output with comments:
Mplus VERSION 5.1
MUTHEN & MUTHEN
07/01/2008
2:40 PM
Growth Curve and Related Models, Alan C. Acock
9
*** WARNING
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 3
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
Mplus uses all available data assuming MAR. There were three cases that were
dropped because they had no BMI report for any wave.
bmi_growth.inp
Basic growth curve
SUMMARY OF ANALYSIS
Number of groups
Number of observations
With listwise deletion we would have an N = 1102
Number of dependent variables
Number of independent variables
Number of continuous latent variables
Observed dependent variables
Continuous
BMI97
BMI98
BMI99
BMI03
Continuous latent variables
I
S
1
1768
7
0
2
BMI00
BMI01
BMI02
The following is a very nice analysis of patterns of missing values.
Estimator
ML
Information matrix
OBSERVED
SUMMARY OF DATA
Number of missing data patterns
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value
0.100
81
PROPORTION OF DATA PRESENT
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Covariance Coverage
BMI97
BMI98
________
________
0.925
0.847
0.902
0.850
0.856
0.842
0.846
0.839
0.837
0.796
0.794
0.777
0.775
Covariance Coverage
Growth Curve and Related Models, Alan C. Acock
BMI99
________
0.910
0.864
0.854
0.805
0.788
BMI00
________
0.906
0.859
0.811
0.788
BMI01
________
0.904
0.817
0.801
10
BMI02
________
0.861
0.774
BMI02
BMI03
BMI03
________
0.840
Check means to see if there is a clear overall trajectory
SAMPLE STATISTICS
ESTIMATED SAMPLE STATISTICS
1
1
Means
BMI97
________
20.572
Means
BMI02
________
24.390
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Correlations
BMI97
________
1.000
0.764
0.765
0.721
0.709
0.652
0.651
BMI02
BMI03
Correlations
BMI02
________
1.000
0.766
BMI98
________
21.839
BMI99
________
22.651
BMI00
________
23.305
BMI01
________
23.846
BMI99
________
BMI00
________
BMI01
________
BMI03
________
24.935
BMI98
________
1.000
0.850
0.812
0.799
0.720
0.707
1.000
0.853
0.853
0.745
0.737
1.000
0.856
0.752
0.751
1.000
0.813
0.815
BMI03
________
1.000
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
268.041
Degrees of Freedom
23
P-Value
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
11502.912
Degrees of Freedom
21
P-Value
0.0000
CFI/TLI
CFI
0.979
TLI
0.981
Loglikelihood
H0 Value
-27739.720
H1 Value
-27605.699
Growth Curve and Related Models, Alan C. Acock
11
Information Criteria
Number of Free Parameters
12
Akaike (AIC)
55503.439
Bayesian (BIC)
55569.171
Sample-Size Adjusted BIC
55531.048
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.078
90 Percent C.I.
0.069
Probability RMSEA <= .05
0.000
SRMR (Standardized Root Mean Square Residual)
Value
0.051
0.086
MODEL RESULTS
Two-Tailed
Estimate
S.E. Est./S.E.
P-Value
The intercept and slope are fixed so there is no test for them.
I
|
BMI97
1.000
0.000
999.000
999.000
BMI98
1.000
0.000
999.000
999.000
BMI99
1.000
0.000
999.000
999.000
BMI00
1.000
0.000
999.000
999.000
BMI01
1.000
0.000
999.000
999.000
BMI02
1.000
0.000
999.000
999.000
BMI03
1.000
0.000
999.000
999.000
S
|
BMI97
0.000
0.000
999.000
BMI98
1.000
0.000
999.000
BMI99
2.000
0.000
999.000
BMI00
3.000
0.000
999.000
BMI01
4.000
0.000
999.000
BMI02
5.000
0.000
999.000
BMI03
6.000
0.000
999.000
Intercept and slope have significant covariance
S
WITH
I
0.408
0.073
5.559
Means
I
21.035
0.100
S
0.701
0.017
Growth curve is BMI’ = 21.035 + .701×Year
Intercepts
BMI97
0.000
0.000
BMI98
0.000
0.000
BMI99
0.000
0.000
BMI00
0.000
0.000
Growth Curve and Related Models, Alan C. Acock
999.000
999.000
999.000
999.000
999.000
999.000
999.000
0.000
210.352
40.663
0.000
0.000
999.000
999.000
999.000
999.000
999.000
999.000
999.000
999.000
12
BMI01
BMI02
BMI03
0.000
0.000
0.000
0.000
0.000
0.000
999.000
999.000
999.000
999.000
999.000
999.000
Variances
I
15.051
0.597
25.209
0.000
S
0.255
0.018
14.228
0.000
There is a big random intercept and random slope effect. The standard
deviation is sqrt(.255) = .50. Putting plus or minus two standard deviations
around the slope of .70 shows how big the variance is. The standard
deviation for the intercept is sqrt(15.051) = 3.880. BMI is probably skewed
positively.
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
5.730
3.276
3.223
4.361
2.845
9.380
8.589
0.268
0.164
0.146
0.185
0.150
0.397
0.422
21.413
19.942
22.009
23.538
19.005
23.622
20.345
0.000
0.000
0.000
0.000
0.000
0.000
0.000
QUALITY OF NUMERICAL RESULTS
Condition Number for the Information Matrix
(ratio of smallest to largest eigenvalue)
0.656E-02
MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index
3.840
We don’t want to change the intercept loadings of 1.0. We might think about
a nonlinear growth. We might think about correlating adjacent error terms.
The suggested correlation between E1 and E7 indicates a straight line is
missing both ends, hence a curve of some kind? Muthen suggests to try not to
mess with the intercepts.
M.I.
BY Statements
I
BY BMI97
I
BY BMI98
I
BY BMI99
I
BY BMI00
I
BY BMI02
I
BY BMI03
S
BY BMI97
S
BY BMI99
112.472
6.440
33.234
13.026
4.015
28.212
70.828
18.208
Growth Curve and Related Models, Alan C. Acock
E.P.C.
-0.038
0.007
0.014
0.010
-0.008
-0.023
-0.825
0.276
Std E.P.C.
-0.147
0.027
0.054
0.037
-0.032
-0.091
-0.417
0.139
StdYX E.P.C.
-0.032
0.006
0.012
0.008
-0.005
-0.015
-0.091
0.030
13
S
BY BMI00
8.062
0.204
0.103
0.021
S
BY BMI03
38.314
-0.755
-0.382
-0.062
WITH Statements
BMI99
WITH BMI98
12.747
0.449
0.449
0.138
BMI00
WITH BMI97
9.699
-0.511
-0.511
-0.102
BMI00
WITH BMI99
26.084
0.641
0.641
0.171
BMI01
WITH BMI97
6.914
-0.388
-0.388
-0.096
BMI01
WITH BMI98
11.566
-0.403
-0.403
-0.132
BMI01
WITH BMI00
5.456
0.310
0.310
0.088
BMI02
WITH BMI97
8.645
0.715
0.715
0.098
BMI02
WITH BMI99
9.066
-0.544
-0.544
-0.099
BMI02
WITH BMI00
9.560
-0.633
-0.633
-0.099
BMI03
WITH BMI97
37.342
1.564
1.564
0.223
BMI03
WITH BMI99
22.526
-0.874
-0.874
-0.166
BMI03
WITH BMI00
11.717
-0.724
-0.724
-0.118
BMI03
WITH BMI02
11.053
1.083
1.083
0.121
Means/Intercepts/Thresholds
[ BMI97
]
97.476
-0.754
-0.754
-0.165
[ BMI98
]
7.230
0.155
0.155
0.035
[ BMI99
]
25.098
0.257
0.257
0.056
[ BMI00
]
10.542
0.185
0.185
0.038
[ BMI02
]
4.646
-0.189
-0.189
-0.032
[ BMI03
]
22.536
-0.448
-0.448
-0.073
There are a number of plots available. These are not bad, but Stata or some
other package, even Excel, could do nicer graphs.
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores, estimated values)
Scatterplots (sample values, estimated factor scores, estimated values)
Sample means
Estimated means
Sample and estimated means
Observed individual values
Estimated individual values
2.5 Here are Some Available Plots
It is often useful to show the actual means for a small random sample of participants. These are
Sample Means.
 Click on Graphs
Growth Curve and Related Models, Alan C. Acock
14
 Observed Individual Values
This gives you a menu where you can make some selections. I used the clock to seed a random
generation of observations.
Here I selected Random Order and for 20 cases. This results in the following graph:
This shows one person who started at an obese BMI = 30 and then dropped down. However, most
people increased gradually.
Growth Curve and Related Models, Alan C. Acock
15
Next, let’s look at a plot of the actual means and the estimated means using our linear growth
model. Click on
 Graphs and then select View graphs
 Sample and estimated means.
 Demonstrate how to edit the graph.
Notice that there is a clear growth trend in BMI. A BMI of 15-20 is considered healthy and a BMI
of 25 is considered overweight. Notice what happens to American youth between the age of 12
and the age of 18.
Section 3: Quadratic Growth Curve
This graph is useful to seeing if there is a nonlinear trend. Changing the scale of the Y-axis can
clarify this. It is simple to add a quadratic term, if the curve is departing from linearity.
 Looking at the graph it may seem that the linear trend works very well, but our RMSEA
was a bit big.
 The estimated initial BMI is higher than the observed mean.
 The estimated BMI at 2003 is also higher than the observed mean
 A quadratic might pick this up by having a curve that drops slightly to pick up the BMI97
mean and the BMI2003 mean.
 Estimation of three terms (Intercept, Linear trend, Quadratic trend) requires at least four
waves of data, but more waves are highly desirable for a good test of the quadratic term.
Growth Curve and Related Models, Alan C. Acock
16
3.1 Graphic representation of quadratic growth curve
The conceptual model in Figure 1 will be unchanged except a third latent variable is added.
 We will have the Intercept, Slope, now called linear trend, and the new latent variable
called the Quadratic trend.
 Like the first two, the Quadratic trend will have a residual variance (R3) that will freely
correlated with R1 and R2.
 The paths from the quadratic trend to the individual BMI variables will be the square of
the path from the Linear trend to the BMI variables. Hence
a. The values for the linear trend will remain 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, and 6.0.
b. For the quadratic these values will be 0.0, 1.0, 4.0, 9.0, 16.0, 25.0, and 36.0.
You really appreciate the defaults in Mplus when you see what we need to change in the Mplus
program when we add a quadratic slope. Here is the only change we need to make:
Growth Curve and Related Models, Alan C. Acock
17
3.2 Mplus program & output for quadratic growth curve
Model:
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
Mplus will know that the quadratic, q (we could use any name) will have values that are the
square of the values for the slope, s.
Title:
bmi_guadratic.inp
Quadratic growth curve
Data:
File is "C:\Mplus examples\bmi_stata.dat" ;
Variable:
!
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p
male race_eth bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
usevariables is limited to bmi variables
Usevariables are bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03 ;
Model:
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3
bmi01@4 bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84);
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01
bmi02 bmi03(*);
Here is selected output:
TESTS OF MODEL FIT
We had 23 degrees of freedom with the linear growth curve and a chi-square
of 268.041. Now we have 19 degrees of freedom and a chi-square of 73.121.
Where did we lose four degrees of freedom?
 Mean for the quadratic
Growth Curve and Related Models, Alan C. Acock
18
 Variance of the quadratic
 Covariance of quadratic residual with intercept residual
 Covariance of quadratic residual with slope residual
Did we improve our fit?
 268.041-73.121 = 194.92 with 4 degrees of freedom, p <
.001
Does our model fit?
 Chi-square (19) = 73.121, p < .001, but
 CFI = .995
 RMSEA = .040
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
73.121
19
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
11502.912
21
0.0000
CFI/TLI
CFI
TLI
0.995
0.995
Loglikelihood
H0 Value
H1 Value
-27642.260
-27605.699
Information Criteria
Number of Free Parameters
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
(n* = (n + 2) / 24)
16
55316.520
55404.161
55353.330
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
Probability RMSEA <= .05
Growth Curve and Related Models, Alan C. Acock
0.040
0.031
0.949
0.050
19
SRMR (Standardized Root Mean Square Residual)
Value
0.026
MODEL RESULTS
Two-Tailed
P-Value
Estimate
S.E.
Est./S.E.
0.550
0.226
2.441
0.015
-0.030
-0.159
0.036
0.022
-0.854
-7.236
0.393
0.000
Means
I
S
Q
20.713
1.060
-0.063
0.101
0.044
0.007
204.728
23.834
-8.585
0.000
0.000
0.000
Variances
I
S
Q
14.273
1.141
0.029
0.638
0.139
0.004
22.382
8.184
7.730
0.000
0.000
0.000
4.635
3.340
2.852
3.994
2.880
9.343
5.677
0.306
0.162
0.143
0.182
0.154
0.394
0.507
15.132
20.643
19.954
21.926
18.762
23.690
11.192
0.000
0.000
0.000
0.000
0.000
0.000
0.000
S
WITH
I
Q
WITH
I
S
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index
M.I.
E.P.C. Std E.P.C.
3.840
StdYX E.P.C.
BY Statements
I
BY BMI97
24.292
Growth Curve and Related Models, Alan C. Acock
-0.024
-0.090
-0.021
20
I
BY BMI98
9.860
I
BY BMI99
5.419
I
BY BMI01
12.777
I
BY BMI03
14.857
S
BY BMI97
18.253
S
BY BMI98
7.381
S
BY BMI01
9.168
S
BY BMI03
10.308
Q
BY BMI97
12.444
Q
BY BMI99
4.868
Q
BY BMI01
11.767
Q
BY BMI03
13.934
ON/BY Statements
Q
ON I
/
I
BY Q
999.000
WITH Statements
BMI98
WITH BMI97
11.493
BMI99
WITH BMI98
8.019
BMI01
WITH BMI98
8.978
BMI02
WITH BMI01
12.482
BMI03
WITH BMI02
5.261
Means/Intercepts/Thresholds
[ BMI97
]
23.635
[ BMI98
]
11.403
[ BMI01
]
9.093
[ BMI03
]
13.777
0.008
0.006
-0.009
0.024
-0.363
0.126
-0.137
0.349
3.442
-1.114
1.725
-5.610
0.032
0.022
-0.035
0.092
-0.388
0.135
-0.147
0.373
0.589
-0.191
0.295
-0.961
0.007
0.005
-0.007
0.016
-0.089
0.031
-0.029
0.063
0.136
-0.041
0.058
-0.163
0.000
0.000
0.000
-1.044
0.361
-0.354
0.694
-1.040
-1.044
0.361
-0.354
0.694
-1.040
-0.265
0.117
-0.114
0.134
-0.143
-0.492
0.191
-0.166
0.495
-0.492
0.191
-0.166
0.495
-0.113
0.043
-0.032
0.084
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores, estimated values)
Scatterplots (sample values, estimated factor scores, estimated values)
Sample means
Estimated means
Sample and estimated means
Observed individual values
Estimated individual values
Growth Curve and Related Models, Alan C. Acock
21
3.3 Plots for quadratic growth curve
 The fit is so good because the estimated means and observed means are so close.
 However, there is still significance variance (random effects for both the intercept and the
slope) among individual adolescents that still needs to be explained.
 Here are 20 estimated individual growth curves.
a. Notice that each of these is a curve, but they start at different initial levels and have
different trajectories.
b. Next, we want to use covariates to explain these differences in the initial levels and growth
trajectories.
Growth Curve and Related Models, Alan C. Acock
22
Section 4: How Many Waves Should We Have?
4.1 Linear model--3 minimum, 4 much better
In this example we have 7 waves of data and this will give us lots of degrees of freedom. What is
the minimum?
Consider degrees of freedom for 3 waves of data.
Degrees of freedom are the differences in number of parameters estimated for an H1: model which
is essentially no relationships and the number of parameters estimated in your H0: model
We are estimating a number of parameters—How many?
H1: model (unrestricted) for 3 waves has
3 means:
My1, My2, My3
3 variances:
Var(Y1), Var(Y2), Var(Y3)
3 covariances:
Cov(Y1,Y2), Cov(Y1,Y3), Cov(Y2,Y3)
9 known statistics
H0: model (simple growth curve) has
The figure shows the following parameters:
.a
1 Variance for Intercept
.b
1 Variance for Slope
.c
1 Covariance of variance of Slope and Intercept
.d
1 Mean of Intercept
.e
1 Mean of Slope
.f, g, h
3 Error Variances
We need to estimate 8 parameters.
Therefore, we have 9 – 8 = 1 degree of freedom.


We could not fit a quadratic. It would use parameter estimates for its mean, variance, and
covariance with both the intercept and slope
We can only free one parameter such as a covariance of the error terms or a loading for a wave
on the slope.
Growth Curve and Related Models, Alan C. Acock
23
What about 4 waves?
H1: model (unrestricted) for 4 waves has
4 means:
My1, My2, My3, My4
4 variances:
Var(Y1), Var(Y2), Var(Y3), Var(Y4)
6 covariances:
Cov(Y1,Y2), Cov(Y1,Y3), Cov(Y1,Y), Cov(Y2,Y3) Cov(Y2,Y4), Cov(Y3,Y4)
total is 14 known statistics
Growth Curve and Related Models, Alan C. Acock
24
We are still estimating the same 8 parameters so we have 14-8 = 6 degrees of
freedom. Adding a 4th wave provides a much better test of a linear model.
Rule—Publish it with three waves, but always try to get 4 or more waves of data.
4.2 For quadratic—4 minimum, 5 much better
 We can follow the same procedure and see that we need to have 4 waves for a quadratic
 We have a much better test (degrees of freedom) if we have 5 waves for a quadratic
Section 5: Alternative to Use of a Quadratic Slope
An alternative to adding a quadratic slope is to allow some of the time loadings for
the slope to be free.
 We have used loadings of 0, 1, 2, 3, 4, 5, and 6 for the linear slope and 0, 1, 4,
9, 16, 25, and 36 for the quadratic slope. Alternatively
 We could allow all but two of the loadings to be free. We might use loadings of
0, 1, *, *, *, * .
 It is necessary to have the 0 and 1 fixed but the 1 does not have to be second;
we could use 0, *, *, *, *,1.
You may ask how you could justify allowing some of the time loadings to be free if
there was a one month or one year difference between waves of data. The answer is
that developmental time may be different than chronological time.
Allowing these loadings to be free has an advantage over the quadratic in that it uses
fewer degrees of freedom but still allows for growth spurts.
This model is not nested under a quadratic, but you could think of a linear growth
model with fixed values for each year (0, 1, 2, 3, 4, 5, 6) being nested within the free
model that uses 0, 1, *, *, *, *. If the free model fits much better than the fixed linear
model, you might use this instead of the quadratic model.
This approach does not impose a specific form on the relationship—it is a free from
that can connect the means in whatever complexity they are
Growth Curve and Related Models, Alan C. Acock
25
Growth Curve and Related Models, Alan C. Acock
26
Section 6: Working with Missing Values
6.1 Two approaches used by Mplus
Mplus has two ways of working with missing values.
 The simplest is to use maximum likelihood estimation with missing values (ML).
o This uses all available data and is the default since version 5.0.
o For example, some adolescents were interviewed all six years but others may have
skipped one, two, or even more years.
o We use all available information with this approach.
 The second approach is to utilize multiple imputations.
o Multiple imputations should not be confused with single imputation available from
earlier versions of SPSS which gives incorrect standard errors.
o Multiple imputation involves imputing multiple datasets (usually 5-20)
o Estimating the model for each of these datasets, and
o Then pooling the estimates and standard errors.
When the standard errors are pooled this way, they incorporate the variability across the 5-20
solutions and are thereby produced unbiased estimates of standard errors. Multiple imputations
can be done with:
 Norm, a freeware program that works for normally distributed, continuous variables and is
often used even on dichotomized variables.
 A Stata user has written a program called ice that is an implementation of the S-Plus/R
program called MICE, that has advantages over Norm. It does the imputation by using different
estimation models for outcome variables that are continuous, counts, or categorical. See
Royston (2005).
 SAS has similar capabilities.
 Mplus can read these multiple datasets, estimate the model for each dataset, and pool the
estimates and their standard errors.
We will not illustrate the multiple imputation approach because that involves working with other
programs to impute the datasets. However, the Mplus User’s Guide, discusses how you specify the
datasets in the Data: section.
Growth Curve and Related Models, Alan C. Acock
27
6.2 Multiple cohort extension
Major datasets often have multiple cohorts. NLSY97 has youth who were 12-18 in 1997.
 Seven years later, they are 19-25.
 It is quite likely that many growth processes that involve going from the age of 12 to the age
of 19 are different than going from 19-25.
 For example, involvement in minor crimes (petty theft, etc.) may increase from 12 to 19, but
then decrease from there to 25.
 Here is what we might have for our NLSY97 data (data inside tables are scores, person 1,
born in 1985, in 1997 at age of 12 had a score of 3 on the outcome variable)
Score by survey year for a single case from each cohort
Individual
1
2
3
4
5
Survey Year
Brth Cohort 1997 1998 1999 2000 2001
1985
3
4
5
6
7
1985
2
4
3
5
6
1984
4
5
6
7
6
1982
6
7
5
4
3
1982
5
5
6
4
2
2002
7
7
6
2
2
2003
8
7
5
2
1
We can rearrange this data
Data for first 5 cases
Case
1
2
3
4
5
Cohort
1985
1985
1984
1982
1982
12
3
2
*
*
*
13
4
4
4
*
*
14
5
3
5
*
*
15
6
5
6
6
5
16
7
6
7
7
5
17
7
7
6
5
6
18
8
7
6
4
4
19
*
*
5
3
2
20
*
*
*
2
2
21
*
*
*
2
1
 In this table the top row is the age at which the data was collected. To capture everybody we
would need to extend the table to HD25 because the youth who were 18 in 1997 are 25 seven
years latter.
 This table would have massive amounts of missing data, but the missingness would not be
related to other variables. It would be missing completely at random (MCAR).
 We could develop a growth curve that covered the full range from age 12 to age 25. We would
have 14 waves of data even though each participant was only measured 7 times. Each
Growth Curve and Related Models, Alan C. Acock
28
participant would have data for a maximum of 7 of the years and have missing values for a
minimum of 7 years.
 We would want to estimate a growth model with a quadratic term and expect the linear slope to
be positive (growth from 12-18) and the quadratic term to be negative (decline from 18-25).
 Mplus has a special Analysis: type called MCOHORT. There is an example on the Mplus
WebPage and we will not cover it here. This is an extraordinary way to deal with missing
values.
Here is an example from data Muthén analyzed.
 He had 7 waves of data on people who were 18-24 at the first wave.
 No data was collected 6 years
 He has a growth curve from 18 to 37
When I copied this image, a couple waves were grey’d out that do not show very well here.
Section 7: Multiple group growth curves
Multiple group analysis using SEM is extremely flexible—some would say it is too flexible
because there are so many possibilities.
 We use gender for our grouping variable because we are interested in the trend in BMI for
girls compared to boys.
Growth Curve and Related Models, Alan C. Acock
29
 We think of adolescent girls are more concerned about their weight and therefore more
likely to have a lower BMI than boys and to have a flatter trajectory.
There are several ways of comparing a model across multiple groups.
One approach is to see if the same model fits each group, allowing all of the estimated parameters
to be different.
Here we are saying that a linear growth model fits the data for both boys and girls, but We are not
constraining girls and boys to have the same values on any of the parameters. They may differ on
the
 intercept mean
 slope mean
 intercept variance
 slope variance
 covariance of intercept and slope residuals
 residual errors
 covariance of the residual errors that may be specified.
 We can then put increasing invariance constraints on the model.
a. At a minimum, we want to test whether the two groups have a different intercept (level) and
slope.
b. If this constraint is acceptable we can add additional constraints on the variances,
covariances, and error terms.
7.1 Program and output without constraints
First, we will estimate the model simultaneously for girls and boys with no constraints on the
parameters. Here is the program with new commands highlighted:
Title:
bmi_growth_gender.inp
Data:
File is bmi_stata.dat ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male
race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02
Growth Curve and Related Models, Alan C. Acock
30
bmi03 white black hispanic asian other;
Missing are all (-9999) ;
!
usevariables keeps bmi variables and gender
Usevariables are male bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Grouping is male (0=female 1=male);
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4
bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84) ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
I’ve put the only changes we need to make in bold, underline.
 We have a binary variable, male, that is coded 0 for females and 1 for males.
 We add male to the list of variables we are using.
 We add a subcommand to the Variable: section that says we have a grouping variable,
names it, and defines what the values are so the output will be labeled nicely.
 The command Grouping is male (0=female 1 = male); is going to give us a
separate set of estimates for the parameters for girls (labeled female) and boys (labeled
male).
 The estimation does both groups simultaneously.
Here is selected, annotated output:
SUMMARY OF ANALYSIS
Number of groups
2
Number of observations
Group FEMALE
859
Group MALE
909
The following shows that we have the same variables in the model
Number of dependent variables
7
Number of independent variables
0
Number of continuous latent variables
2
Observed dependent variables
Continuous
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Continuous latent variables
Growth Curve and Related Models, Alan C. Acock
31
I
S
Variables with special functions
Grouping variable
MALE
SAMPLE STATISTICS
ESTIMATED SAMPLE STATISTICS FOR FEMALE
Means
BMI97
BMI98
BMI99
________
________
________
1
20.432
21.840
22.375
Means
BMI02
BMI03
________
________
1
24.295
24.727
ESTIMATED SAMPLE STATISTICS FOR MALE
Means
BMI97
BMI98
BMI99
________
________
________
1
20.698
21.848
22.896
Means
BMI02
BMI03
________
________
1
24.467
25.111
TESTS OF MODEL FIT
BMI00
________
22.916
BMI01
________
23.443
BMI00
________
23.665
BMI01
________
24.220
Chi-Square Test of Model Fit
Value
411.966
Degrees of Freedom
46 was 23
P-Value
0.0000
Chi-Square Contributions From Each Group
FEMALE
150.775
MALE
261.191
Chi-Square Test of Model Fit for the Baseline Model
Value
11735.530
Degrees of Freedom
42
P-Value
0.0000
CFI/TLI
CFI
0.969
TLI
0.971
Loglikelihood
H0 Value
-27639.607
H1 Value
-27433.624
Information Criteria
Number of Free Parameters
24
Akaike (AIC)
55327.213
Bayesian (BIC)
55458.676
Sample-Size Adjusted BIC
55382.430
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.095
90 Percent C.I.
0.087 0.103
SRMR (Standardized Root Mean Square Residual)
Growth Curve and Related Models, Alan C. Acock
32
Value
0.072
MODEL RESULTS
Group FEMALE
S
WITH
I
Means
I
S
Variances
I
S
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Group MALE
S
WITH
I
Means
I
S
Variances
I
S
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Two-Tailed
P-Value
Estimate
S.E.
Est./S.E.
0.522
0.103
5.050
0.000
20.881
0.663
0.143
0.025
145.640
27.015
0.000
0.000
15.141
0.264
0.859
0.026
17.626
10.221
0.000
0.000
4.662
3.368
2.753
5.154
3.084
13.344
6.105
0.334
0.242
0.190
0.308
0.226
0.769
0.517
13.980
13.940
14.503
16.718
13.649
17.360
11.812
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.278
0.102
2.719
0.007
21.180
0.732
0.139
0.024
152.166
30.661
0.000
0.000
14.911
0.254
0.824
0.025
18.094
10.292
0.000
0.000
6.693
3.237
3.671
3.730
2.489
5.416
10.857
0.417
0.227
0.223
0.224
0.185
0.357
0.676
16.066
14.279
16.487
16.656
13.434
15.190
16.063
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Here is the graph of the two growth curves. It appears that the girls have a lower initial level and a
flatter rate of growth of BMI.
Growth Curve and Related Models, Alan C. Acock
33
7.2 Comparing intercept and slope
We should not rely on our visual inspection, but should explicitly test whether the girls and boys
have a significant difference in their intercept and their slope. We can re-estimate the model with
the intercept and slope invariant (or do it twice so we could have separate tests.) To do this we
make the following modifications to the model:
Notice that we added two lines to the Model: section,
Title:
bmi_growth_gender_equal.inp
Data:
File is bmi_stata.dat ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male
race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02
bmi03 white black hispanic asian other;
Missing are all (-9999) ;
Usevariables are male bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Grouping is male (0=female 1=male);
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4
bmi02@5 bmi03@6;
Growth Curve and Related Models, Alan C. Acock
34
Output:
Model: ! models group with lowest score, 0, female
[i] (1);
[s] (2);
Model male:
[i] (1); ! this makes intercept be the same
[s] (2);
Sampstat Mod(3.84) ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02
bmi03(*);
We kept the group subcommand and added the grouping variable, male. We added two Model:
subcommands under the Model: command. The first one refers to the first group.
 Since girls were coded 0 and boys were coded 1, the first group is girls. We put the name of
parameters in square brackets, [i] and [s].
 If we had called these initial and trend we would have typed [initial] and [trend].
We put an arbitrary number in parentheses after the parameter name. Thus, we put (1)
after [i].
In the second subcommand, Model male:, we put the name of the parameters followed by the same
numbers as they had in the first gorup. Thus, the intercept gets the number 1 for girls and also gets
the number 1 for boys. This tells Mplus these must be held equal. They are still optimized, but
with the constraint that they are equal.
 If we had typed [i] (2) under Model male: what would happen? We would have
constrained the boys intercept to be equal to the girls slope—not something we would want to
do.
 If we had omitted the [s] (2) under Model male: What would happen. We would have
constrained both solutions to have the same intercept [i] (1), but not constrained them to
have the same slopes.
 The first Model: command is understood to be the group coded as zero on the male variable.
 These changes force the intercept to be equal in both groups because they are both assigned
parameter (1) and the slopes to be equal because they are both assigned a parameter (2).
 Any parameters with a (1) after them are equal in both groups as are any parameters with a
(2) after them in both groups.
 Notice that we have square brackets [ ] around the names of the intercept and slope.
When we run the revised program we obtain a chi-square that has two extra degrees of freedom
because of the two constraints.
Growth Curve and Related Models, Alan C. Acock
35
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
418.884
Degrees of Freedom
48
P-Value
0.0000
We had a chi-square(46) = 411.966 without these constraints. The
difference has a chi-square of 6.918 with 2 degrees of freedom.
Using Stata the significance is chi-square(2) = 6.918, p < .05
. display 1-chi2(2,6.918)
.03146121
Chi-Square Contributions From Each Group
The model fits females much better than it fits males:
FEMALE
154.530
MALE
264.353
Chi-Square Test of Model Fit for the Baseline Model
Value
11735.530
Degrees of Freedom
42
P-Value
0.0000
CFI/TLI
CFI
0.968
TLI
0.972
Loglikelihood
H0 Value
-27643.065
H1 Value
-27433.624
Information Criteria
Number of Free Parameters
22
Akaike (AIC)
55330.131
Bayesian (BIC)
55450.638
Sample-Size Adjusted BIC
55380.746
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.093
90 Percent C.I.
0.085 0.102
SRMR (Standardized Root Mean Square Residual)
Value
0.079
Growth Curve and Related Models, Alan C. Acock
36
MODEL RESULTS
Group FEMALE
S
WITH
I
Means
I
S
Two-Tailed
P-Value
Estimate
S.E.
Est./S.E.
0.528
0.104
5.097
0.000
21.046
0.700
0.100
0.017
210.324
40.814
0.000
0.000
2.742
0.006
210.324
40.814
0.000
0.000
Group MALE
S
WITH
I
0.281
0.102
Means
I
21.046
0.100
S
0.700
0.017
We should also look at the variances and covariances.
 Although we can say there is a highly significant difference between the level and trend for
girls and boys, we need to be cautious because this difference of chi-square has the same
problem with a large sample size that the original chi-squares have.
 In fact, the measures of fit are hardly changed whether we constrain the intercept and slope to
be equal or not. Moreover, the visual difference in the graph is not dramatic.
We could also put other constraints on the two solutions such as equal variances and covariances,
and even equal residual error variances, but we will not.
Section 8: An Alternative to Multiple Group Analysis
8.1 Model and figure
An alternative way of doing this, where there are two groups, is to enter the grouping variable as a
predictor. This requires re-conceptualizing our model. We can think of the indicator variable
Male having a direct path to both the intercept and the slope. Because the indicator variable is
coded as 1 for male and 0 for female,
 If the path from Male to the Intercept is positive this means that boys have a higher initial
level on BMI.
Growth Curve and Related Models, Alan C. Acock
37
 Similarly, if there is a positive path from Male to the Slope, this indicates that boys have a
steeper slope than girls on BMI. This direct effect actually represents an interaction between
the trajectory and gender.
 Such results would be consistent with our expectation that boys both start higher and gain more
fat than girls during adolescence.
 This approach does not let us test for other types of invariances such as the residual variances,
covariances, and error terms.
a. We are forcing these to be the same for both females and males; this may be unreasonable.
b. The random effect for the slope for boys, Rs, may be greater or less than it is for girls. We
will not be able to evaluate this possibility with this approach.
The following figure shows these two paths. We are explaining why some people have a higher or
lower initial level and why some have a steeper or flatter slope by whether they are a girl or a boy.
We are predicting that boys have a higher initial level and a steeper slope.
Here is the figure:
Growth Curve and Related Models, Alan C. Acock
38
8.2 Mplus program and output
Here is the program:
Title:
bmi_gender_alternatives.inp
bmi growth curve using gender as a single covariate.
This is an alternative to using gender as two groups.
Data:
File is "c:\Mplus examples\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male
race_eth bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03
white black Hispanic asian other;
Missing are all (-9999) ;
Usevariables are male bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
i on male;
s on male;
Output:
Sampstat Mod(3.84) standardized;
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02
bmi03(*);
Here is selected, annotated output:
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
1771
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
We cannot compare this chi-square to the two group chi-square
because it is not a nested model.
Value
301.244
Degrees of Freedom
28
Growth Curve and Related Models, Alan C. Acock
39
P-Value
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
11544.530
Degrees of Freedom
28
P-Value
0.0000
CFI/TLI
CFI
0.976
TLI
0.976
Loglikelihood
H0 Value
-29020.154
H1 Value
-28869.532
Information Criteria
Number of Free Parameters
14
Akaike (AIC)
58068.308
Bayesian (BIC)
58145.018
Sample-Size Adjusted BIC
58100.541
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.074
90 Percent C.I.
0.067 0.082
Probability RMSEA <= .05
0.000
SRMR (Standardized Root Mean Square Residual)
Value
0.046
MODEL RESULTS
Two-Tailed
Estimate
S.E. Est./S.E.
P-Value
I
ON
MALE
0.242
0.199
1.216
0.224
S
ON
MALE
0.086
0.034
2.524
0.012
Boys and girls do not differ significantly at age 12 (intercept),
although boys are .242 higher on BMI than girls in this linear
model. However, gender and trajectory do interact with boys rate
of growth being .086 higher than it is for girls.
S
WITH
I
0.403
0.073
5.500
0.000
When there is a covariate, the mean intercept and slope appear
under the intercepts heading.
Intercepts
BMI97
0.000
0.000
999.000
999.000
Growth Curve and Related Models, Alan C. Acock
40
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I
S
0.000
0.000
0.000
0.000
0.000
0.000
20.911
0.656
0.000
0.000
0.000
0.000
0.000
0.000
0.144
0.025
999.000
999.000
999.000
999.000
999.000
999.000
145.653
26.553
We cannot estimate different random effects
the intercept or slope using this approach.
Residual Variances
BMI97
5.731
0.268
BMI98
3.266
0.164
BMI99
3.223
0.146
BMI00
4.354
0.185
BMI01
2.834
0.149
BMI02
9.409
0.398
BMI03
8.626
0.424
I
15.045
0.597
S
0.253
0.018
999.000
999.000
999.000
999.000
999.000
999.000
0.000
0.000
for boys and girls on
21.423
19.905
22.018
23.534
18.972
23.624
20.360
25.198
14.175
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
R-SQUARE
Observed
Variable
Estimate
S.E.
Est./S.E.
Two-Tailed
P-Value
BMI97
0.724
0.012
59.339
0.000
BMI98
0.832
0.009
93.431
0.000
BMI99
0.846
0.008
110.551
0.000
BMI00
0.820
0.008
99.975
0.000
BMI01
0.888
0.006
138.194
0.000
BMI02
0.731
0.011
65.511
0.000
BMI03
0.772
0.011
70.133
0.000
You can see why we rarely report the R-square for the intercept
and slope.
Latent
Two-Tailed
Variable
Estimate
S.E. Est./S.E.
P-Value
Growth Curve and Related Models, Alan C. Acock
41
I
S
0.001
0.007
0.002
0.006
0.608
1.266
0.543
0.206
8.3 Graphic representation
We see that the intercept is 20.385 and the slope is .625. How is gender related to this?
For girls the equation is:
Est. BMI = 20.911 + .656(Time) + .242(Male) + .086(Male)(Time)
20.911 + .656(Time) + .242(0) + .086(0)(Time)
= 20.911 + .656(Time)
For boys the equation is:
Est BMI = 20.911 + .656(Time) + .242(1) + .086(1)(Time)
= (20.911 + .242) + (.625 + .086)(Time)
= 21.153 + .711(Time)
Where Time is coded as 0, 1, 2, 3, 4, 5, 6
Using these we estimate the BMI for girls is initially 20.911. By the seventh year when she is
18(Time = 6) her estimated BMI will be 20.385 + .656(6) or 24.847
Using these results, we estimate the BMI for boys is initially 21.153. By the seventh year it will be
21.153 + .711(6) or 25.419. Since a BMI of 25 is considered overweight, by the age of 18 we
estimate the average boy will be classified as overweight and the average girl is not far behind!
We could use the plots provided by Mplus, but if we wanted a nicer looking plot we could use
another program. I used Stata getting this graph.
The Stata command is (this is driven by a drop down menu)
twoway (connected Girls Age, lcolor(black) lpattern(dash) ///
lwidth(medthick)) (connected Boys Age, lcolor(black) ///
lpattern(solid) lwidth(medthick)), ///
ytitle(Body Mass Index) xtitle(Age of Adolescent) ///
caption(NLSY97 Data)
and the data is
Growth Curve and Related Models, Alan C. Acock
42
+-----------------------+
| Age
Girls
Boys |
|-----------------------|
1. | 12
20.911
21.153 |
2. | 18
24.847
25.419 |
+-----------------------+
Body Mass Index for Adolescents
Comparison of Girls and Boys
Limitations of this approach
 When we treat a categorical variable as a grouping variable and do multiple comparisons we
can test the equality of all the parameters.
 When we treat it as a predictor as in this example, we only test whether the intercept and
slope are different for the two groups (interaction). In this example we do not allow the
other parameters to be different for boys and girls and this might be a problem in some
applications.
Section 9: Growth Curves with Time Invariant Covariates
An extension of having a single categorical predictor includes having a series of covariates that
explain variance in the intercept and slope. In this example we use what are known as time
Growth Curve and Related Models, Alan C. Acock
43
invariant covariates. These are covariates that either remain constant (gender) or for which you
have a measure only at the start of the study. These are some times considered fixed effects since
their value cannot change from one wave to another. It is possible to add time varying covariates
as well.
9.1 A conditional latent trajectory model
This has been called the Conditional Latent Trajectory Modeling (Curran & Hussong, 2003)
because your initial level and trajectory (slope) are conditional on other variables.
The covariates are moderators that moderate the initial level and trajectory.
In this figure we have two covariates.
 One is whether the adolescent is white (coded 1) versus African American or Hispanic
(coded 0).
 The other is a latent variable reflecting the level of emotional problems a youth has. There
are two indicators of emotional problems, one from a parent report, boyprb_p, and the
other from a youth report, boyprb_y.
 The emotional problems are problems as reported at age 12.
 A researcher may predict that Whites have a lower initial BMI (intercept) which persists during
adolescence, but the White advantage does not increase (same slope as nonwhites).
 Alternatively, a researcher may predict that being White predicts a lower initial BMI (intercept)
and less increase of the BMI (smaller slope) during adolescence.
a. This suggests that minorities start with a disadvantage (high BMI) and
b. This disadvantaged gets even greater across adolescence.
 A researcher may argue that emotional problems are associated with both higher initial BMI
(intercept) and a more rapid increase in BMI over time (slope).
 By including a covariate that is a latent variable itself, emotional problems, we will show how
these are handled by Mplus.
We estimated this model for boys only; girls were excluded.
9.2 Mplus program and output
The following is our Mplus program:
Title:
bmi_timea.inp
bmi growth curve using race/ethnicity and emotional
problems as a second covariate. There are two indicators
Growth Curve and Related Models, Alan C. Acock
44
Data:
Variable:
Model:
Output:
of emotional problems.
File is "c:\Mplus examples\bmi_stata.dat" ;
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97
bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
Usevariables are boyprb_y boyprb_p white bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Useobservations = male eq 1 and asian ne 1 and other ne 1;
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5
bmi03@6;
emot_prb by boyprb_p boyprb_y ;
i on white emot_prb;
s on white emot_prb;
q on white emot_prb;
Sampstat Mod(3.84) standardized;
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
I have highlighted the new lines in the Mplus program.
 The format of the Useobservations subcommand is similar to if or select used
with other programs.
 The Useobservations = male eq 1 and asian ne 1 and other ne 1;
restricts our sample to males (male eq 1). This is very handy when using the same
dataset for a variety of models where you want some models to only include selected
participants.
 We have dropped Asians and members of the “other” category. There are relatively few of
them in this sample dataset and they may have very different BMI trajectories. Also, the
meaning of the category “other” is ambiguous.
 I added a quadratic term in the Model: command. I first estimated this model using just a
linear slope and the fit was not very good (results not shown here). Adding the quadratic
improved the fit.
 This example has a measurement model for a latent covariate, emot_prb. In other
programs this can involve complicated programming. Here it is done with the single line.
(You usually would like to have 3 and preferably 4 indicators of a latent variable.)
emot_prb by boyprb_p boyprb_y ;
 The by is a key word in Mplus for creating latent variables used in Confirmatory Factor
Analysis and SEM.
Growth Curve and Related Models, Alan C. Acock
45
 On the right of the by are two observed variables. The boyprb_p is the report of parents
about the adolescent’s emotional problems. The boyprb_y is the youths own report.
 It is desirable to have three or more indicators of a latent variable, but we only have two
here so that will have to do.
 To the left of the by is the name we give to the latent variable, emot_prb. This new latent
variable did not appear in the list of variables we are using, but it is defined here.
 The “by” term
o fixes the first variable to the right as a reference indicator, boyprb_p, and assigns a
loading of 1 to it.
o It lets the loading of the second variable, boyprb_y, be estimated. It also creates
error/residual variances that are labeled e1 and e2 in the figure.
o The default is that these errors are uncorrelated.
o It is good practice to have the strongest indicator on the right of the “by” be the
reference indicator with a loading fixed at 1.0. You can run the model and if this does
not happen, you can re-run it, reversing the order of the items on the right of the
“by.”
 The next three new lines,
o
o
o
o
i on white emot_prb;
s on white emot_prb; and
q on white emot_prb;
Define the relationship between the covariates and the intercept and slope. These
represent interactions of each covariate with the intercept and slope.
o These are the 1wi in the equation for HLM users.
o Mplus uses the on command to signify that a variable depends on another variable in
the structural part of the model. The by command is the key to understanding how
Mplus sets up the measurement model and the on is the key to how Mplus sets up the
structural model.
There are many defaults. Mplus assumes there are residual variances and covariances for the
intercept and slopes. It fixes the intercepts at zero. It assumes the intercept and slope variances are
correlated.
Here is selected results:
Mplus VERSION 5.1
MUTHEN & MUTHEN
07/01/2008
8:01 PM
Growth Curve and Related Models, Alan C. Acock
46
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
88.824
Degrees of Freedom
34
P-Value
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
5975.020
Degrees of Freedom
45
P-Value
0.0000
CFI/TLI
CFI
0.991
TLI
0.988
Loglikelihood
H0 Value
-17221.918
H1 Value
-17177.507
Information Criteria
Number of Free Parameters
29
Akaike (AIC)
34501.837
Bayesian (BIC)
34640.190
Sample-Size Adjusted BIC
34548.093
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.043
90 Percent C.I.
0.032 0.054
Probability RMSEA <= .05
0.845
SRMR (Standardized Root Mean Square Residual)
Value
0.031
MODEL RESULTS
Estimate
S.E.
Est./S.E.
Two-Tailed
P-Value
Measurement of latent variable.
EMOT_PRB BY
BOYPRB_P
1.000
0.000
999.000
999.000
BOYPRB_Y
0.575
0.171
3.374
0.001
Emotional problems does not have a significant effect on the initial level
at age 12, but significantly increases the slope. Significant negative
effect on quadratic is a bit confusing.
I
ON
EMOT_PRB
0.300
0.168
1.793
0.073
S
ON
EMOT_PRB
0.212
0.089
2.370
0.018
Q
ON
Growth Curve and Related Models, Alan C. Acock
47
EMOT_PRB
-0.037
0.015
-2.462
0.014
Whites have a significant advantage initially (intercept), but there is not
a significant compounding of this over time since White does not
significantly influence the slope or quadratic.
I
ON
WHITE
-1.030
0.292
-3.529
0.000
S
ON
WHITE
0.130
0.138
0.941
0.346
Q
ON
WHITE
-0.030
0.023
-1.293
0.196
S
WITH
I
0.701
0.329
2.128
0.033
Q
WITH
I
-0.101
0.053
-1.922
0.055
S
-0.174
0.034
-5.081
0.000
WHITE
WITH
EMOT_PRB
-0.111
0.028
-4.013
0.000
Intercepts
BOYPRB_Y
2.108
0.052
40.712
0.000
BOYPRB_P
1.893
0.058
32.668
0.000
BMI97
0.000
0.000
999.000
999.000
BMI98
0.000
0.000
999.000
999.000
BMI99
0.000
0.000
999.000
999.000
BMI00
0.000
0.000
999.000
999.000
BMI01
0.000
0.000
999.000
999.000
BMI02
0.000
0.000
999.000
999.000
BMI03
0.000
0.000
999.000
999.000
I
21.279
0.210
101.495
0.000
S
1.171
0.100
11.719
0.000
Q
-0.077
0.016
-4.649
0.000
The linear slope of 1.171 is huge when you project this over the six years.
The quadratic slope being negative indicates that there is some leveling off
in the increase in BMI.
Variances
EMOT_PRB
1.485
0.455
3.262
0.001
Residual Variances
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I
1.850
1.203
5.535
3.276
3.333
3.182
2.361
5.225
8.961
13.094
0.170
0.442
0.479
0.227
0.221
0.218
0.186
0.357
0.743
0.869
Growth Curve and Related Models, Alan C. Acock
10.875
2.722
11.564
14.402
15.067
14.586
12.670
14.654
12.057
15.070
0.000
0.006
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
48
S
Q
1.291
0.030
0.215
0.006
6.007
5.092
S.E.
Est./S.E.
0.000
0.000
STANDARDIZED MODEL RESULTS
STDYX Standardization
Estimate
EMOT_PRB BY
BOYPRB_P
BOYPRB_Y
Notice the z-tests are
unstandardized test is
I
ON
EMOT_PRB
S
Q
Two-Tailed
P-Value
0.743
0.111
6.708
0.000
0.458
0.073
6.308
0.000
slightly different. Most standard packages assume the
the same.
0.099
0.052
1.909
0.056
ON
EMOT_PRB
0.222
0.080
2.775
0.006
ON
EMOT_PRB
-0.253
0.087
-2.918
0.004
-0.140
0.039
-3.558
0.000
0.056
0.059
0.942
0.346
-0.083
0.064
-1.293
0.196
0.170
0.090
1.891
0.059
-0.163
-0.891
0.094
0.021
-1.731
-42.268
0.083
0.000
-0.183
0.049
-3.748
0.000
I
ON
WHITE
S
ON
WHITE
Q
ON
WHITE
S
WITH
I
Q
WITH
I
S
WHITE
WITH
EMOT_PRB
Growth Curve and Related Models, Alan C. Acock
49
Residual Variances
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I
S
Q
0.790
0.448
0.290
0.171
0.151
0.132
0.096
0.185
0.269
0.966
0.952
0.937
0.067
0.165
0.025
0.012
0.011
0.010
0.008
0.013
0.022
0.016
0.034
0.043
11.864
2.717
11.460
13.889
13.794
13.364
11.571
14.176
12.260
62.173
27.829
22.016
0.000
0.007
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Unfortunately, we cannot get graphs when we have covariates. You could create these yourself
by substituting fix values for race and emotional problems.
Growth Curve and Related Models, Alan C. Acock
50
Section 10: Meditation & Moderation
Sometimes all of the covariates are time invariant or at least measured at just the start of the study.
Curran and Hussong (2003) discuss a study of a latent growth curve on drinking problems with a
covariate of parental drinking. Parental drinking influences both the initial level and the rate of
growth of drinking problem behavior among adolescents. The question is whether some other
variables might mediate this relationship
 Parental monitoring
 Peer influence
Mplus allows us to estimate the direct and indirect effect of Parent Drinking on the Intercept and
Slope. It also provides a test of significance for these effects.
Growth Curve and Related Models, Alan C. Acock
51
Section 11: Time Varying Covariates
We have illustrated time invariant covariates that are measured at time 1. It is possible to extend
this to include time varying covariates. Time varying covariates either are measured after the
process has started or have a value that changes (hours of nutrition education, level of program
fidelity). Although we will not show our output, we will illustrate the use of time varying
covariates in a figure. In this figure the time varying covariates, a21 to a24 might be
 Hours of nutrition education completed between waves. Independent of the overall growth
trajectory, η1, students who have several hours of nutrition education programming may have a
decrease in their BMI
 Physical education curriculum. A physical activity program might lead to reduced BMI.
Students who spend more time in this physical activity program might have a lower BMI
independent of the overall growth trend. Hours in physical education courses will vary from
year to year.
 This would be a good way to incorporate fidelity into a program evaluation.
This figure is borrowed from Muthén where he is examining growth in math performance over 4
years. The w vector contains x variables are covariates that directly influence the intercept, η0, or
slope, η1. The aij are number of math courses taken each year.
yit
a1it
= repeated measures on the outcome (math achievement)
= Time score (0, 1, 2, 3) as discussed previously
a2it = Time varying covariates
(# of math courses taken that year)
w
= Vector of x covariates that
are time invariant and measured at
or before the first yit
In this example we might think of
the yi variables being measures of
conflict behavior where y1 is at age
17 and y4 is at age 25. We know
there is a general decline in
conflict behavior during this time
interval. Therefore, the slope η1 is
expected to be negative.
Growth Curve and Related Models, Alan C. Acock
52
Now suppose we also have a measure of alcohol abuse for each of the 4 waves (aij). We might
hypothesize that during a year in which an adolescent has a high score on alcohol abuse (say
number of days the person drinks 5 or more drinks in the last 30 days) that there will be an
elevated level of conflict behavior that cannot be explained by the general decline (negative
slope).
The negative slope reflects the general decline in conflict behavior by young adults as the move
from age 17 to age 25. The effect of aij on yi provides the additional explanation that those years
when there is a lot of drinking; there will be an elevated level of conflict that does not fit the
general decline.
Section 12: Extensions and Suggested Readings
If you want more, here are a few references
b. Basic growth curve modeling
a. Bollen, K. A., & Curran, P. J. (2006). Latent Curve Models: A Structural Equation
Perspective. Hoboken, NJ: Wiley.
b. Curran, F. J., & Hussong, A. M. (2003). The Use of latent Trajectory Models in
Psychopathology Research. Journal of Abnormal Psychology. 112:526-544. This is a
general introduction to growth curves that is accessible.
c. Duncan, T. E., Duncan, S. C., & Strycker, L. A. (2006). An Introduction to Latent
Variable Growth Curve Modeling: Concepts, Issues, and Applications (2nd ed.).
Mahwah NJ: Lawrence Erlbaum. The second edition of a classic text on growth curve
modeling.
d. Kaplan, D. (2000). Chapter 8: Latent Growth Curve Modeling. In D. Kaplan,
Structural Equation Modeling: Foundations and Extensions (pp 149-170). Thousand
Oaks, CA: Sage. This is a short overview.
e. Wang, M. (2007). Profiling retirees in the retirement transition and adjustment
process: Examining the longitudinal change patterns of retirees' psychological wellbeing. Journal of Applied Psychology, 92(2), 455-474. This is a nice example of
presenting results showing some graphs and tables.
c. Limited Outcome Variables: Binary and count variables
Growth Curve and Related Models, Alan C. Acock
53
a. Muthén, B. (1996). Growth modeling with binary responses. In A. V. Eye & C.
Clogg (Eds.) Categorical Variables in Developmental Research: Methods of analysis
(pp 37-54). San Diego, CA: Academic Press.
b. Long, J. S., & Freese, J. (2006). Regression Models for Categorical Dependent
Variables Using Stata, 2nd ed. Stata Press (www.stata-press.com). This provides the
most accessible and still rigorous treatment of how to use an interpret limited
dependent variables.
c. Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling
Using Stata. Stata Press (www.stata-press.com). This discusses a free set of
commands that can be added to Stata that will do most of what Mplus can do and
some things Mplus cannot do. It is hard to use and very slow.
d. Growth mixture modeling
a. Muthén, B., & Muthén, L. K. (2000). Integrating person-centered and variablecentered analysis: Growth mixture modeling with latent trajectory classes.
Alcoholism: Clinical and Experimental Research. 24:882-891.
This is an excellent and accessible conceptual introduction.
b. Muthén, B. (2001). Latent variable mixture modeling. In G. Marcoulides, & R.
Schumacker (Eds.) New Developments and Techniques in Structural Equation
Modeling (pp. 1-34). Mahwah, NJ: Lawrence Erlbaum.
c. Muthén, B., Brown, C. H., Booil, J., Khoo, S. Yang, C. Wang, C., Kellam, S., Carlin,
J., & Liao, J. (2002). General growth mixture modeling for randomized preventive
interventions. Biostatistics, 3:459-475
d. Muthén, B. Latent Variable analysis: Growth Mixture Modeling and Related
Techniques for Longitudinal Data. (2004) In D. Kaplan (ed.), Handbook of
quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA:
Sage Publications
e. Muthén, B., Brown, C. H., Booil Jo, K, M., Khoo, S., Yang, C. Wang, C., Kellam, S.,
Carlin, J., Liao, J. (2002). General growth mixture modeling for randomized
preventive interventions. Biostatistics. 3,4, pp. 459-475.
e. The web page for Mplus, www.statmodel.com , maintains a current set of references, many
as PDF files. These are organized by topic and some include data and the Mplus program.
Growth Curve and Related Models, Alan C. Acock
54
Download