Growth Curve Presentation

advertisement
GROWTH CURVES AND EXTENSIONS USING MPLUS
Alan C. Acock
alan.acock@oregonstate.edu
Department of HDFS
322 Milam Hall
Oregon State University
Corvallis, OR 97331
This document and selected references, data, and programs can be downloaded from
http://oregonstate.edu/~acock/growth-curves/
A Note to Readers
These are lecture notes for a presentation at Academica Sinica in December of 2005. This is not a
self-contained, systematic treatment of the topic. This is not intended for publication and has not
been carefully edited for publication purposes. Instead, these notes intended to complement a twoday workshop presentation. The workshop will expand and clarify many of the points presented in
this document. The intention of this document is to help workshop participants follow the
presentation. They are much more detailed than a usual power point set of slides, but much less
detailed than a self-contained treatment of the topics. Others may find these notes useful but they
are not intended to be complete, nor as substitute for participation in the workshop.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
1
GROWTH CURVES AND EXTENSIONS USING MPLUS
Outline
1. Preparing data for Mplus
2. Basic analysis using Mplus
3. A basic growth curve
a. Conceptual Model of a Growth Curve
b. The Mplus program
c. Interpreting Output
d. Interpreting Graphic Output
4. Quadratic terms in growth curves
a. Conceptual model of a growth curve
b. The Mplus program
c. Interpreting output
d. Interpreting graphic output
5. Working with missing values in growth models
a. Introduction
b. The Mplus Program
c. Interpreting output
6. Multiple group models with growth curves
a. Simultaneous estimation in multiple groups
b. Including categorical predictors to show group differences
i. Conceptual model of a growth curve
ii. The Mplus program
iii. Interpreting output
iv. Interpreting graphic output
7. Inclusion of covariates to explain variation in level and trend
a. Conceptual model of a growth curve
b. The Mplus program
c. Interpreting output
d. Interpreting graphic output
8. Growth curves with binary variables
a. Conceptual model of a growth curve
b. The Mplus program
c. Interpreting output
d. Interpreting graphic output
9. Growth curves with counts and zero inflated counts
a. Conceptual model of a growth curve
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
2
10.
b. The Mplus program
c. Interpreting output
d. Interpreting graphic output
Growth mixture models
a. Variable centered vs. person centered research
b. Conceptual model of a growth curve
c. The Mplus program
d. Interpreting output
e. Interpreting graphic output
Goal of the Workshop
The goal of this workshop is to explore a variety of applications of latent growth curve models
using the Mplus program. Because we will cover a wide variety of applications and extensions of
growth curve modeling, we will not cover each of them in great detail. A reading list is provided
for those who want more extensive treatments of the topics we cover. At the end of this workshop
it is hoped that participants will be able to run Mplus programs to execute a variety of growth
curve modeling applications and to correctly interpret the results.
Assumed Background
It will be assumed that participants in the workshop have some background in Structural Equation
Modeling. Background in multilevel analysis will also be useful. It is possible to learn how to
estimate the specific models we will cover without a comprehensive knowledge of Mplus, but
some background using an SEM program is useful.
Recommended Readings (selected readings can be downloaded from
http://oregonstate.edu/~acock/growth-curves/
1. Preparatory readings
a. Kline, R. B. (2005). Principles and Practice of Structural Equation Modeling, 2nd ed.
New York: Guilford Press.
This is a general introduction to structural equation modeling that is more assessable
than others. It does not cover growth curve modeling but does provide a solid
background for what will be covered in the workshop.
b. Muthén, L., & Muthén, B. (2004). Mplus Statistical Analysis with Latent Variables:
User’s Guide. Los Angles, CA: Statmodel.
Participants who plan to use Mplus need a copy of the manual. The tentative target
date for release of a new version of Mplus is the end of this year.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
3
c. Acock, A. C. (2006). A Gentle Introduction to Stata. Stata Press. (www.statapress.com). For those not already familiar with Stata, this is a basic introduction.
2. Basic growth curve modeling
a. Curran, F. J., & Hussong, A. M. (2003). The Use of latent Trajectory Models in
Psychopathology Research. Journal of Abnormal Psychology. 112:526-544. This is a
general introduction to growth curves that is accessible.
b. Duncan, T. E., Duncan, S. C., Strycker, A. L. Li, F., & Alpert, A. (1999). An
Introduction to Latent Variable Growth Curve Modeling: Concepts, Issues, and
Applications. Mahwah, NJ: Lawrence Erbaum Associates.
Classic text on growth curve modeling.
c. Kaplan, D. (2000). Chapter 8: Latent Growth Curve Modeling. In D. Kaplan,
Structural Equation Modeling: Foundations and Extensions (pp 149-170). Thousand
Oaks, CA: Sage. This is a short overview.
3. Working with missing values
a. Acock, A. (2005). Working with missing values. Journal of Marriage and Family
67:1012-1028.
b. Davey, A. Savla, J., & Luo, Z. (2005). Issues in Evaluating Model Fit with Missing
Data. Structural Equation Modeling 12:578-597.
c. Royston, P. (2005). Multiple Imputation of Missing Values: Update. The Stata
Journal 2:1-14.
4. Limited Outcome Variables: Binary and count variables
a. Muthén, B. (1996). Growth modeling with binary responses. In A. V. Eye & C.
Clogg (Eds.) Categorical Variables in Developmental Research: Methods of analysis
(pp 37-54). San Diego, CA: Academic Press.
b. Long, J. S., & Freese, J. (2006). Regression Models for Categorical Dependent
Variables Using Stata, 2nd ed. Stata Press (www.stata-press.com). This provides the
most accessible and still rigorous treatment of how to use an interpret limited
dependent variables.
c. Rabe-Hesketh, S., & Skrondal, A. (2005). Multilevel and Longitudinal Modeling
Using Stata. Stata Press (www.stata-press.com). This discusses a free set of
commands that can be added to Stata that will do most of what Mplus can do.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
4
5. Growth mixture modeling
a. Muthén, B., & Muthén, L. K. (2000). Integrating person-centered and variablecentered analysis: Growth mixture modeling with latent trajectory classes.
Alcoholism: Clinical and Experimental Research. vol 24:882-891.
This is an excellent and accessible conceptual introduction.
b. Muthén, B. (2001). Latent variable mixture modeling. In G. Marcoulides, & R.
Schumacker (Eds.) New Developments and Techniques in Structural Equation
Modeling (pp. 1-34). Mahwah, NJ: Lawrence Erlbaum.
c. Muthén, B., Brown, C. H., Booil, J., Khoo, S. Yang, C. Wang, C., Kellam, S., Carlin,
J., & Liao, J. (2002). General growth mixture modeling for randomized preventive
interventions. Biostatistics, 3:459-475
d. Muthén, B. Latent Variable analysis: Growth Mixture Modeling and Related
Techniques for Longitudinal Data. (2004) In D. Kaplan (ed.), Handbook of
quantitative methodology for the social sciences (pp. 345-368). Newbury Park, CA:
Sage Publications
e. Muthén, B., Brown, C. H., Booil Jo, K, M., Khoo, S., Yang, C. Wang, C., Kellam, S.,
Carlin, J., Liao, J. (2002). General growth mixture modeling for randomized
preventive interventions. Biostatistics. 3,4, pp. 459-475.
Brief Summary of Topics Covered in the Two Day Workshop
Introduction to Growth Curve Modeling
Growth Curves are a new way of thinking that is ideal for longitudinal studies. Instead of
predicting a person’s score on a variable (e.g., mean comparison among scores at
different time points or relationships among variables at different time points), we
predict their growth trajectory—what is their level on the variable AND how is this
changing. We will present a conceptual model, show how to apply the Mplus program,
and interpret the results.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
5
1. Working with Missing Values
Missing values are a problem with most social science research, but it is a special issue with
longitudinal studies. In a 5 wave study a participant may have no data for some waves and
may have incomplete data for the waves in which they were interviewed. We will discuss
strategies for working with missing values (FIML and multiple imputation), show how to
apply Mplus to FIML (Mplus can also analyze multiple data files from MI)), and interpret
the results.
2. Multiple Groups with Growth Curves
Comparing known groups (men vs. women, married vs. single parent) to assess how their
growth trajectories differ. We will show how to do this using the Mplus program and how to
interpret the results.
3. Predicting Patterns of growth
When we have established a growth trajectory, this begs the question of how to explain it.
Why do some individual increase or decrease on a characteristic although other individuals
show little change? What predicts the level (initial level or starting level) and trajectory?
We will show how to do this using Mplus and interpret the results.
4. Growth Curves with Limited Outcome Variables
Sometimes a researcher is interested in growth on a binary variable (Ever drinking alcohol
for adolescents). Some times a researcher is interested in a count variable that involves a
relatively rare event (Number of days an adolescent has 5+ drinks of alcohol in the last 30
days). Sometimes we are interested in both types of variables. Different variables may
predict the binary variable than predict the count variable. We will show how to do this
using Mplus and interpret the results.
5. Growth Mixture Models
It is possible to use Mplus to do an exploratory growth curve analysis where our focus is on
the person and not the variable. We can locate clusters of people who share similar growth
trajectories. This is exploratory research and the standards for it are still evolving. An
example would be a study of alcohol consumption from age 15 to 30. It is possible to
empirically identify different clusters of people. One cluster may never drink or never drink
very much. A second cluster may have increasing alcohol consumption up to about 22 or 23
and then a gradual decline. A third cluster may be very similar to the second cluster but not
decline after 23. After deriving these clusters of people who share growth trajectories, it is
possible to compare them to find what differentiates membership in the different clusters.
We will show how to do these analyses using Mplus and interpret the results.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
6
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
7
Creation of Dataset and Screening Program
Our initial example will look at the BMI (Body Mass Index) of adolescents as the BMI changes
between the age of 12 and the age of 18. This data is from NLSY97 (National Longitudinal
Survey of Youth, 1997), using the first 7 years of data.
Before we can do anything, we need to get data into a format that Mplus can read. At this time,
Mplus cannot read datasets in proprietary formats designed for other packages (Stata, SAS, SPSS).
It needs an ASCII data file in which the values are separated (delimited) by a space, a comma, or
in a fixed format. There are many ways to do this and whatever program you use for your standard
data management/analysis can write a file in one of the formats. Some people put the file in Excel
and then save it as a comma delimited file (.cvs). The file extension you should use for your data
file that Mplus will read is .dat.
I use Stata and there is a close and developing relationship between Stata and Mplus. Michael
Mitchell at UCLA wrote a Stata command that not only creates a dataset for Mplus, but even
writes the initial program Mplus uses for basic analysis. If you have access to Stata, I recommend
this command. It is called stata2mplus. If you have Stata, the command, findit
stata2mplus, will locate this command and show you how to install it. An advantage of
using the stata2mplus command is that it also creates a basic Mplus program that includes
variable names and value labels as part of the title.
First, I open the Stata dataset within Stata and
 Keep only those items that I think might be useful for doing the growth curves. Within Mplus
you have the option to select variables for each analysis, so it makes sense to keep all the
variables you think you might use.
 If you have variables with long names, rename them so each variable is limited to 8 characters.
 Once I dropped the irrelevant variables using Stata, I saved the file to my flash drive. I gave it
the name bmi_stata.dta (.dta is the file time Stata uses for a Stata dataset).
 Finally, entered the following Stata command:
stata2mplus using "F:\flash\academica\bmi_stata"
and this resulted in the following results:
Looks like this was a success.
To convert the file to mplus, start mplus and run
the file F:\flash\academica\bmi_stata.inp
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
8
What this program does is create two files:
 bmi_stata.dat which is a data file Mplus can read, and
 bmi_stata.inp which is a program file Mplus can run to do basic analysis. The .inp is the
file type Mplus uses for its own programs.
Here are the first five cases in my dataset, bmi_stata.dat. The assumed file extension is
*.dat. Mplus can read a file in this format. The stata2mplus command recoded all missing
values into a -9999. You can override this if you want a -9999 to be a real value.
7935,-9999,4,-9999,2,1,2,28.67739,28.33963,26.62286,27.24928,23.62529,25.84016,26.57845,0,1,0,0,0
5526,3,-9999,2,-9999,0,1,39.29696,-9999,44.28067,39.85261,44.28067,46.0519,44.80619,1,0,0,0,0
5369,1,-9999,1,-9999,0,1,18.28824,19.7513,20.5957,19.30637,19.30637,21.03148,20.36911,1,0,0,0,0
919,0,-9999,2,-9999,0,4,17.93367,18.17581,19.18558,19.52778,19.52778,20.50417,20.30889,0,0,0,1,0
7429,4,-9999,4,-9999,0,3,17.56995,21.25472,21.1316,21.28223,21.96355,22.12994,21.96355,0,0,1,0,0
When I open the Mplus Editor I can then open the file bmi_stata.inp. I’ve made some
minor changes in the file by deleting some lines and adding one subcommand I will explain in a
minute.
Title:
bmi_stata.inp
Stata2Mplus convertsion for F:\flash\academica\bmi_stata.dta
id : PUBID - YTH ID CODE 1997
grlprb_y : GIRLS BEHAVE/EMOT SCALE, YTH RPT 1997
boyprb_y : BOYS BEHAVE/EMOT SCALE, YTH RPT 1997
grlprb_p : GIRLS BEHAVE/EMOT SCALE, PAR RPT 1997
boyprb_p : BOYS BEHAVE/EMOT SCALE, PAR RPT 1997
male :
race_eth :
1: white
2: black
3: hispanic
4: asian
5: other
black :
hispanic :
asian :
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
9
other :
Data:
File is F:\flash\academica\bmi_stata.dat ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97
bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 white black hispanic asian other;
Missing are all (-9999) ;
!
usevariables excludes grlprb and boyprob variables
!
because these are sex specific.
Usevariables are race_eth bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 white black hispanic asian other;
Analysis:
Type = basic ;
This basic program, bmi_stata.inp, produces the
 Means,
 Variances, and
 Covariances of the variables
It is useful to run this basic analysis, regardless of how you get the data into Mplus and compare
them to the corresponding values using your standard statistics package to make sure the transfer
was successful.
First, we will go over the command structure of this basic Mplus program. It is surprising how few
commands we need to add as we move on to more complex analysis.
 LISREL users will find the command structure of Mplus remarkably simple
 AMOS users should appreciate how much more efficient this code is than drawing a complex
model on a computer screen
Mplus programs are divided into a series of sections. Each major section of the program with a
key word at the start of the line. The major keywords in this example are
 Title:
 Data:
 Variable:
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
10
 Analysis:
The colon is part of the keyword name. These will be highlighted in blue (automatically) in the
actual program. Mplus uses a “;” to mark the end of a command or subcommand (similar to
SAS).
The Title: section
Everything after Title: is part of the title until a line beginning with Data: appears. It is helpful
to include a description of the purpose of the program as well as a name for the program as part of
the title. In addition to what the stata2mplus program generates, I’ve added a line with the
name of the file, bmi_stata.inp, so I can link a printed copy to the actual file at a later date.
The stata2mplus command we ran in Stata puts a lot in the title including the value labels where
they are available. You might edit these out of the file to make the file shorter.
The Data: section
This section tells Mplus where to find the file containing the data. The full path is provided and I
think it is a good idea to have no spaces in the path. If you do have spaces, put quotation marks
around the path as in
File is “F:\my flash\academica sinica\bmi_stata.dat” ;
Notice the semi colon is the end of a statement. Statements can continue for several lines, but end
with a semi-colon. This is the way SAS does it, for those familiar with SAS.
The Variable: section
This section consists of a series of subcommands that tell Mplus the names of the variables, what
values are missing, and a subset of variables to be included in the current program. Variable names
are case sensitive. The names “hispanic” and “Hispanic” are different variable names.
The subcommand, Names are, is followed by a list of variable names with the order matching
the order of the data file and this can go on for several lines, ending with a semi-colon. Putting the
subsection keywords, Names are on a separate line is unnecessary but helps readers of a
program. Limiting names to 8 characters with no spaces simplifies things. The next subcommand,
Missing are all (-9999); tells Mplus that all variables have a missing value of -9999.
You can use any value here. It is possible to have different values. I recommend that you replace
all missing values in your dataset with some value, such as -9999, that is never a legitimate value.
Mplus can incorporate missing values in the analysis using a FIML approach or multiple
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
11
imputation which we will discuss later, so if there are observations that definitely should be
excluded from analysis, drop those cases before transferring the data to a Mplus data file.
I’ve inserted a comment by putting an exclamation mark, “!” at the start of a line. Then I’ve
inserted the Usevariables are subcommand to have a subset of variables. This is a useful
command if you have a larger file that will be used for a variety of separate Mplus analyses.
Noticed that I’ve dropped the items about problems for girls and boys. Without this deletion, the
program would have no observations with complete data.
The Analysis: section
The last section of the program is the Analysis: and it has a single subcommand, Type =
basic ;. This section is often omitted because the type of analysis is often a default for a
particular model.
There are two major sections that are not in this program because they are not applicable here.
 Model: that includes the model we are estimating and
 Output: that lists the specific statistical and graphic output we want.
The following is selected output from the basic analysis. I’ve put key values in bold and preceded
comments I inserted with an “!”.
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
1098
! listwise
! deletion is the default
Number of dependent variables
Number of independent variables
Number of continuous latent variables
13
0
0
Observed dependent variables
Continuous
RACE_ETH
BMI02
OTHER
! default treats variables as continuous.
BMI97
BMI03
BMI98
WHITE
Estimator
Information matrix
Maximum number of iterations
Convergence criterion
BMI99
BLACK
BMI00
HISPANIC
BMI01
ASIAN
ML
EXPECTED
1000
0.500D-04
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
12
Maximum number of steepest descent iterations
20
! SAMPLE STATISTICS should be compared to original data
1
1
1
Means
RACE_ETH
________
1.762
Means
BMI01
________
23.445
Means
HISPANIC
________
0.179
BMI97
________
20.279
BMI98
________
21.513
BMI99
________
22.315
BMI00
________
22.997
BMI02
________
23.991
BMI03
________
24.486
WHITE
________
0.542
BLACK
________
0.231
ASIAN
________
0.017
OTHER
________
0.030
BMI97
________
BMI98
________
BMI99
________
BMI00
________
1.000
0.762
0.761
0.731
0.714
0.638
0.664
-0.175
0.117
0.094
-0.006
0.012
1.000
0.852
0.816
0.809
0.705
0.715
-0.168
0.143
0.069
-0.034
0.009
1.000
0.862
0.861
0.739
0.759
-0.151
0.136
0.051
-0.016
0.001
1.000
0.874
0.744
0.775
-0.152
0.128
0.052
-0.017
0.023
BMI02
________
BMI03
________
WHITE
________
BLACK
________
1.000
0.753
-0.162
0.108
0.090
-0.003
0.004
1.000
-0.149
0.118
0.053
0.003
0.024
1.000
-0.597
-0.509
-0.144
-0.191
1.000
-0.257
-0.073
-0.097
ASIAN
________
OTHER
________
. . .
RACE_ETH
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
WHITE
BLACK
HISPANIC
ASIAN
OTHER
BMI01
BMI02
BMI03
WHITE
BLACK
HISPANIC
ASIAN
OTHER
HISPANIC
ASIAN
OTHER
Correlations
RACE_ETH
________
1.000
0.128
0.105
0.091
0.103
0.123
0.116
0.107
-0.827
0.130
0.577
0.296
0.569
Correlations
BMI01
________
1.000
0.802
0.820
-0.167
0.126
0.066
0.003
0.027
Correlations
HISPANIC
________
1.000
-0.062
-0.082
1.000
-0.023
1.000
It is always important to compare these values to those you had using your standard statistical
package.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
13
A Growth Curve
Estimating a basic growth curve using Mplus is quite easy. When developing a complex model it
is best to start easy and gradually build complexity. Starting easy should include data screening to
evaluate the distributions of the variables, patterns of missing values, and possible outliers. We
will start with fitting a basic growth curve. Even if you have a theoretically specified model that is
complex, always start with the simplest model and gradually add the complexity. Here we will
show how structural equation modeling conceptualizes a latent growth curves, show the Mplus
program, explain the new program features, and interpret the output.
Before showing a figure to represent a growth curve, we will examine a small sample of our
observations:
A BMI value of 25 is considered overweight and a BMI of 30 is considered obese. With just 10
observations it is hard to see much of a trend, but it looks like people are getting a bigger BMI
score as they get older. The X-axis value of 0 is when the adolescent was 12 years old, the 1 is
when the adolescent was 13 years old, etc. We are using seven waves of data (labeled 0 to 6) from
the panel study. We will see how to create these graphs shortly.
A growth curve requires us to have a model and we should draw this before writing the Mplus
program. Figure 1 shows a model for our simple growth curve:
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
14
1
0
RI
RS
Intercept
Slope
1
1
1
1
1
2
1
3
1
4
5
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
This figure is much simpler than it first appears.
 The key variables are the two latent variables labeled the Intercept and the Slope.
 The intercept represents the initial level and is sometimes called the initial level for this reason.
It is the estimated initial level and its value may differ from the actual mean for BMI97 because
in this case we have a linear growth model. It may differ from the mean of BMI97 by a lot
when covariates are added because of the adjustments for the covariates.
 Unless the covariates are centered, it usually makes sense to just call it an intercept rather than
the initial level. The intercept is identified by the constant loadings of 1.0 going to each BMI
score. Some programs call the intercept the constant, representing the constant effect.
 The slope is identified by fixing the values of the paths to each BMI variable. In a publication
you normally would not show the path to BMI97, since this is fixed at 0.0.
 We fix the other paths at 1.0, 2,0, 3.0, 4.0, 5.0, and 6.0. Where did we get these values? The
first year is the base year or year zero. The BMI was measured each subsequent year so these
are scored 1.0 through 6.0. Other values are possible. Suppose the survey was not done in 2000
or 2001 so that we had 5 time points rather than 7. We would use paths of 0.0, 1.0, 2.0, 5.0, and
6.0 for years 1997, 1998, 1997, 2002, and 2003.
 It is also possible to fix the first couple years and then allow the subsequent waves to be free.
This might make sense for a developmental process where the yearly intervals may not reflect
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
15
the developmental rate. Developmental time may be quite different than chronological time.
This has the effect of “stretching” or “shrinking” time to the pattern of the data (Curran &
Hussong, 2003). An advantage of this approach is that it uses fewer degrees of freedom than
adding a quadratic slope.
The individuals in our sample will each have their own
 BMI score for each year
 Intercept and
 Slope represent the overall trend.
Features to notice in the figure:
 The individual variation around the Intercept and Slope are represented in Figure 1 by the R I
and RS. These are the variance in the intercept and slope around their respective means.
 We expect there would be substantial variance in both of these as some individuals have a
higher or lower starting BMI and some individuals will increase (or decrease) their BMI at a
different rate than the average growth rate.
 In addition to the mean intercept and slope, each individual will have their own intercept and
slope. We say the intercept and the slope are random effects.
a. They are random in the sense that each individual may have a steeper or flatter slope than
the mean slope and
b. Each individual may have a higher or lower initial level than the mean intercept.
c. In our sample of 10 individuals shown above, notice one adolescent starts with a BMI
around 12 and three adolescents start with a BMI around 30. Some have a BMI that
increases and others do not.
 The variances, RI and RS are critical if we are going to explore more complex models with
covariates (e.g., gender, psychological problems, race) that might explain why some
individuals have a steeper or less steep growth rate than the average.
 The ei terms represent individual error terms for each year. Some years may move above or
below the growth trend described by our Intercept and Slope. Sometimes it might be important
to allow error terms to be correlated, especially subsequent pairs such as e 97-e98, e98-e99, etc.
This is all there is to conceptualizing a growth model within an SEM framework. This is an
equivalent conceptualization to studying growth curves using a multilevel approach.
Here is the Mplus program:
Title:
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
16
bmi_growth.inp
Stata2Mplus convertsion for F:\flash\academica\bmi_stata.dta
Data:
File is "F:\flash\academica\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97
bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 white black hispanic asian other;
Missing are all (-9999) ;
!
usevariables is limited to bmi variables
Usevariables are bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84);
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
What is new in this program?
 The first change is that we modify the Usevariables are: subcommand to only include
the bmi variables since we are doing a growth curve for these variables.
 We drop the Analysis: section because we are doing basic growth curve and can use the
default options.
 We have added a Model: section because we need to describe the model. Because Mplus was
a late arrival to SEM software, he was designed after growth curves were well understood.
 Instead of tricking Mplus into doing a growth curve, Mplus has a simple built in way of doing
this that matches the assumptions that fit our model. There is a single line to describe our
model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
a. In this line the “I” and “s” stand for intercept and slope. We could have called these
anything such as intercept and slope or initial and trend. The vertical line, | ,
tells Stata that it is about to define an intercept and slope.
b. There are defaults that we do not need to note. For example,
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
17
c. the intercept is defined by a constant of 1.0 for each bmi variable. This is normally the case,
so it is a default.
d. The slope is defined by fixing the path from the slope to bmi97 at 0, the slope of bmi98 at
1, etc. The @ sign is used for “at.” Don’t forget the semi-colon to end the command.
 Mplus assumes there is random error, ei for each variable and that these are uncorrelated.
 If we wanted to allow e97 and e98 to be correlated we would need to add a line saying bmi97
with bmi98; . This may seem strange because we are not really correlating bmi97 with
bmi98, but e97 with e98. Mplus knows this and we do not need to generate a separate set of
names for the error terms.
 Mplus also assumes that there is a residual variance for both the intercept and slope (RI and RS)
and that these covary. Therefore, we do not need to mention this.
The last additional section in our Mplus program is for selecting what output we want Mplus to
provide. There are many optional outputs of the program and we will only illustrate a few of these.
The Output: section has the following lines
Output:
Sampstat Mod(3.84);
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
 The first line, Sampstat Mod(3.84) asks for sample statistics and modification indices for
parameters we might free, as long as doing so would reduce chi-square by 3.84 (corresponding
to the .05 level). We do not bother with parameter estimates that would have less effect than
this.
 Next comes the Plot: subcommand, and we say that we want Type is Plot3; for our
output. This gives us the descriptive statistics and graphs for the growth curve.
 The last line of the program specifies the series to plot. By entering the variables with an (*) at
the end we are setting a path at 0.0 for bmi97, 1.0 for bmi98, etc.
Annotated Selected Growth Curve Output
The following is selected output with comments:
Number of observations
1102 ! listwise, an alternative is FIML estimation
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
18
Number of dependent variables
7 !these are the bmi scores
Number of independent variables
0
Number of continuous latent variables
2 !these are the intercept and slope
Continuous latent variables
I
S
!These are the only latent variables
Estimator
ML
TESTS OF MODEL FIT
!These have the standard interpretations. It is okay if the fit is not perfect here
because when we add the covariates we may get a better fit. The chi-square is
significant as it usually is for a large sample because any model is not likely to be
a perfect fit for data. However, the CFI = .977 and TLI = .979 are both in the very
good range (i.e., over .96 is very good). The RMSEA is .088 and this is not very
good. Ideally, this sould be below .06, and a value that is not below .08 is
considered problematic. The Standardized RMSR = .048 is acceptable (less than
.05)
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
220.570
23
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
8568.499
21
0.0000
CFI/TLI
CFI
TLI
0.977
0.979
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
19
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
Probability RMSEA <= .05
0.088
0.078
0.000
0.099
SRMR (Standardized Root Mean Square Residual)
Value
0.048
MODEL RESULTS
Estimates
S.E.
! the I and S are all fixed so no tests for them.
I
|
BMI97
1.000
0.000
BMI98
1.000
0.000
BMI99
1.000
0.000
BMI00
1.000
0.000
BMI01
1.000
0.000
BMI02
1.000
0.000
BMI03
1.000
0.000
S
Est./S.E.
0.000
0.000
0.000
0.000
0.000
0.000
0.000
|
BMI97
0.000
0.000
0.000
BMI98
1.000
0.000
0.000
BMI99
2.000
0.000
0.000
BMI00
3.000
0.000
0.000
BMI01
4.000
0.000
0.000
BMI02
5.000
0.000
0.000
BMI03
6.000
0.000
0.000
! The slope and intercept are correlated, the covariance is
! .416, z = 5.551, p < .001 (WITH means covariance in Mplus)
S
WITH
I
0.416
0.075
5.551
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
20
Means
I
20.798
0.117
178.026
!Initial level, intercept = 20.798, (BMI starts at 20.798) z = 178.026; p < .001
!Slope = .668 (BMI goes up .668 each year), z = 35.183; p < .001
S
0.668
0.019
35.183
Intercepts
BMI97
0.000
0.000
0.000
BMI98
0.000
0.000
0.000
BMI99
0.000
0.000
0.000
BMI00
0.000
0.000
0.000
BMI01
0.000
0.000
0.000
BMI02
0.000
0.000
0.000
BMI03
0.000
0.000
0.000
! Variances, Ri and Rs in the figure, are both significant. This is what covariates
will try to explain—why do some youth start higher/lower and have a different
trend, i.e., slope, for the BMI?
Variances
I
13.184
0.643
20.504
S
0.213
0.018
12.147
! Following are the residual variances for the observed variables, hence they are
the errors, ei’s in our figure.
Residual Variances
BMI97
5.391
0.290
18.583
BMI98
2.729
0.159
17.124
BMI99
2.697
0.144
18.752
BMI00
3.529
0.178
19.860
BMI01
2.334
0.144
16.187
BMI02
9.533
0.457
20.837
BMI03
7.134
0.397
17.956
MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index
M.I.
E.P.C.
Std E.P.C.
3.840
StdYX
E.P.C.
! Many of these changes make no sense. We could let the path of the slope to
BMI03 be free and chi-square would drop by about 45 points.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
21
BY Statements
I
I
I
I
S
S
S
S
BY
BY
BY
BY
BY
BY
BY
BY
BMI97
BMI99
BMI00
BMI03
BMI97
BMI99
BMI00
BMI03
87.808
25.404
21.840
29.103
55.850
17.773
18.572
44.611
-0.038
0.013
0.014
-0.026
-0.870
0.315
0.352
-0.915
-0.139
0.049
0.050
-0.093
-0.402
0.145
0.162
-0.423
-0.032
0.011
0.011
-0.016
-0.093
0.034
0.035
-0.074
! When Mplus has a value it can’t compute it prints 999.000. Normally ignore these
ON/BY Statements
S
I
ON I
BY S
/
999.000
0.000
0.000
0.000
! These “with” statements are for correlated errors. Some make sense, some don’t.
WITH Statements
BMI99
BMI99
BMI00
BMI00
BMI01
BMI01
BMI01
BMI02
BMI02
BMI02
BMI02
BMI03
BMI03
BMI03
BMI03
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
WITH
BMI97
BMI98
BMI97
BMI99
BMI97
BMI98
BMI00
BMI97
BMI99
BMI00
BMI01
BMI97
BMI99
BMI00
BMI02
4.993
8.669
3.912
17.357
8.255
7.032
12.398
4.707
5.455
9.829
4.305
36.224
9.296
8.824
8.242
-0.349
0.362
-0.322
0.503
-0.421
-0.300
0.447
0.560
-0.431
-0.649
0.413
1.488
-0.525
-0.583
0.931
-0.349
0.362
-0.322
0.503
-0.421
-0.300
0.447
0.560
-0.431
-0.649
0.413
1.488
-0.525
-0.583
0.931
-0.019
0.020
-0.016
0.026
-0.021
-0.015
0.021
0.023
-0.018
-0.025
0.015
0.060
-0.021
-0.022
0.029
! We do not pay much attention to these intercepts because Mplus automatically fixes
them at zero. Before freeing these, it would make more sense to free some of the
coefficients for slopes, e.g., 0, 1, *, *, *, * or to try a quadratic slope as discussed in a
latter section.
Means/Intercepts/Thresholds
[
[
[
[
BMI97
BMI99
BMI00
BMI03
]
]
]
]
79.520
19.737
17.444
23.066
-0.770
0.250
0.257
-0.483
-0.770
0.250
0.257
-0.483
-0.179
0.058
0.056
-0.084
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
22
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores, estimated
values)
Scatterplots (sample values, estimated factor scores, estimated
values)
Sample means
Estimated means
Sample and estimated means
Observed individual values
Estimated individual values
Here are Some of the Available Plots
It is often useful to show the actual means for a small random sample of participants. These are
Sample Means.
 Click on Graphs
 Observed Individual Values
This gives you a menu where you can make some selections. I used the clock to seed a random
generation of observations.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
23
Here I selected Random Order and for 20 cases. This results in the following graph:
This shows one person who started at an obese BMI = 30 and then dropped down. However, most
people increased gradually.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
24
Next, lets look at a plot of the actual means and the estimated means using our linear growth
model. Click on
 Graphs and then select
 Sample and estimated means.
You can improve this graph. You might click on the legend and move it so it is not over the trend
lines. You can right click inside the graph and add labels for the X axis and Y axis. You can
change the labels, and you can adjust the range for each axis.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
25
Notice that there is a clear growth trend in BMI. A BMI of 15-20 is considered healthy and a BMI
of 25 is considered overweight. Notice what happens to American youth between the age of 12
and the age of 18.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
26
A Growth Curve with a Quadratic Term
This graph is useful to seeing if there is a nonlinear trend. It is simple to add a quadratic term, if
the curve is departing from linearity. Looking at the graph it may seem that the linear trend works
very well, but our RMSEA was a bit big and the estimated initial BMI is higher than the observed
mean. A quadratic might pick this up by having a curve that drops slightly to pick up the BMI97
mean.
The conceptual model in Figure 1 will be unchanged except a third latent variable is added. We
will have the Intercept, Slope, now called linear trend), and the new latent variable called the
Quadratic trend. Like the first two, the Quadratic trend will have a residual variance (RQ) that will
freely covariate with RI and RL. The paths from the quadratic trend to the individual BMI variables
will be the square of the path from the Linear trend to the BMI variables. Hence the values for the
linear trend will remain 0.0, 1.0, 2.0, 3.0, 4.0, 5.0, and 6.0. For the quadratic these values will be
0.0, 1.0, 4.0, 9.0, 16.0, 25.0, and 36.0.
RL
RI
RQ
Intercept
1
0
Linear
1
1
4
1
1
Quadratic
1
1
2
1
3
1
4
9
16
25
5
36
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
You really appreciate the defaults in Mplus when you see what we need to change in the Mplus
program when we add a quadratic slope. Here is the only change we need to make:
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
27
Model:
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
Mplus will know that the quadratic, q (we could use any name) will have values that are the square
of the values for the slope, s.
Here is selected output:
TESTS OF MODEL FIT
! We have lost 4 degrees of freedom




mean for the quadratic slope,
variance for the quadratic slope,
covariance of the Rq with Ri
covariance with Rq with Rs
! The fit is excellent. a
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
61.791 !Was 220.570
19 !Was 23
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
8568.499
21
0.0000
CFI/TLI
CFI
TLI
0.995 !.977
0.994 !.979
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
Probability RMSEA <= .05
0.045 !.088
0.033 0.058
0.715
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
28
SRMR (Standardized Root Mean Square Residual)
Value
0.022
MODEL RESULTS
! Results for I and S are same as above. The paths for Q are simply the squared
values
Q
|
BMI97
0.000
0.000
0.000
BMI98
1.000
0.000
0.000
BMI99
4.000
0.000
0.000
BMI00
9.000
0.000
0.000
BMI01
16.000
0.000
0.000
BMI02
25.000
0.000
0.000
BMI03
36.000
0.000
0.000
S
WITH
I
Q
0.575
0.220
2.616
WITH
I
-0.038
0.034
-1.116
S
-0.130
0.021
-6.324
! The Negative slope, -.064, for quadratic suggests a leveling off of the growth
curve.
Means
I
20.439
0.118
173.266
S
1.045
0.049
21.108
Q
-0.064
0.008
-8.183
Variances
I
S
Q
Residual Variances
BMI97
BMI98
12.381
0.984
0.023
0.671
0.134
0.004
18.462
7.357
6.412
4.318
2.789
0.316
0.158
13.660
17.613
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
29
BMI99
BMI00
BMI01
BMI02
BMI03
2.442
3.187
2.354
9.521
4.989
0.141
0.173
0.147
0.454
0.491
17.357
18.418
16.022
20.948
10.157
The fit is so good because the estimated means and observed means are so close. However, there
is still significance variance among individual adolescents that needs to be explained. Here are 20
estimated individual growth curves. Notice that each of these is a curve, but they start at different
initial levels and have different trajectories. Next, we want to use covariates to explain these
differences in the initial levels and growth trajectories.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
30
An Alternative to Use of a Quadratic Slope
An alternative to adding a quadratic slope is to allow some of the time loadings to be
free. We have used loadings of 0, 1, 2, 3, 4 for the linear slope and 0, 1, 4, 9, 16 for
the quadratic slope. Alternatively we could allow all but two of the loadings to be
free. We might use loadings of 0, 1, *, * . It is necessary to have the 0 and 1 fixed but
the 1 does not have to be second; we could use 0, *, *, 1.
You may ask how you could justify allowing some of the time loadings to be free if
there was a one month or one year difference between waves of data. The answer is
that developmental time may be different than chronological time. Allowing these
loadings to be free has an advantage over the quadratic in that it uses fewer degrees of
freedom but still allows for growth spurts. This model is not nested under a quadratic,
but you could think of a linear growth model with fixed values for each year (0, 1, 2,
3, 4) being nested within the free model that uses 0, 1, *, *. If the free model fits much
better than the fixed linear model, you might use this instead of the quadratic model.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
31
1
0
RI
RS
Intercept
Slope
1
1
1
1
1
*
1
*
1
*
*
*
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
32
Working with Missing Values
Mplus has two ways of working with missing values. The simplest is to use full information
maximum likelihood estimation with missing values (FIML). This uses all available data. For
example, some adolescents were interviewed all six years but others may have skipped one, two,
or even more years. We use all available information with this approach. The second approach is
to utilize multiple imputations.
 Multiple imputations should not be confused with single imputation available from SPSS if a
person purchases their missing values module and which gives incorrect standard errors.
 Multiple imputation involves
a. Imputing multiple datasets (usually 5-10) using appropriate procedures,
b. Estimating the model for each of these datasets, and
c. Then pooling the estimates and standard errors.
When the standard errors are pooled this way, they incorporate the variability across the 5-10
solutions and are thereby produced unbiased estimates of standard errors. Multiple imputations
can be done with:
 Norm, a freeware program that works for normally distributed, continuous variables and is
often used even on dichotomized variables.
 A Stata user has written a program called ICE that is an implementation of the S-Plus program
called MICE, that has advantages over Norm. It does the imputation by using different
estimation models for outcome variables that are continuous, counts, or categorical. See
Royston (2005).
 Mplus can read these multiple datasets, estimate the model for each dataset, and pool the
estimates and their standard errors.
We will not illustrate the multiple imputation approach because that involves working with other
programs to impute the datasets. However, the Mplus User’s Guide, discusses how you specify the
datasets in the Data: section. We will illustrate the FIML approach because it is widely used
and easily implemented—and doesn’t require explaining another software package.
The conceptual model does not change with missing values. The programming for implementing
the FIML solution changes very little. You will recall that we did not need an Analysis:
section in our program for doing a growth curve. However, we do need one when we are doing a
growth curve with missing values and using FIML estimation. Directly above the Model
command we insert
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
33
Analysis:
Type = General Missing H1 ;
Estimator = MLR ;




Type = General Missing H1; this line is the key change.
The missing tells Mplus to do the full information maximum likelihood estimation.
The H1 is necessary to get sample statistics in our output.
We could do this with maximum likelihood estimation, but will use a robust maximum
likelihood estimator, Estimator = MLR, instead. This is optional, but generally
conservative when you have substantial missing values.
In the Output: section, we also add a single word, patterns. This will give us a lot of
information about patterns of missing values. We will see just what patterns there are, the
frequency of occurrence of each pattern, and the percentage of data present for each covariance
estimate.
Output:
Sampstat Mod(3.84) patterns ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Also, to simplify our presentation we will take out the quadratic term (the fit is better with the
quadratic term, but it takes more space to present and interpret the results).
Here are selected, annotated results:
*** WARNING
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 3
1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
1768 ! We had 1102 observations using listwise deletion.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
34
Number of dependent variables
7
Number of independent variables
0
Number of continuous latent variables
2
Observed dependent variables
Continuous
BMI97
BMI02
BMI98
BMI03
BMI99
BMI00
BMI01
Continuous latent variables
I
S
Estimator
MLR
! Robust ML estimator
Information matrix
OBSERVED
Maximum number of iterations
1000
Convergence criterion
04
Maximum number of steepest descent iterations
20
Maximum number of iterations for H1
2000
Convergence criterion for H1
03
0.500D-
0.100D-
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
35
! An ‘x’ mean the data are present. Pattern 1 -- no missing values
! Pattern 2 – missing BMI03
SUMMARY OF MISSING DATA PATTERNS
MISSING DATA PATTERNS
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
1
x
x
x
x
x
x
x
2
x
x
x
x
x
x
3
x
x
x
x
x
4
x
x
x
x
x
x
5
x
x
x
x
6
x
x
x
x
x
x
x
7
x
x
x
x
8
x
x
x
x
x
9 10 11 12 13 14 15 16 17 18 19 20
x x x x x x x x x x x x
x x x x x x x x x x x x
x x x x x x x
x x x x x
x x x x
x x x x
x x
x x
x x
x
x
x
x
x
x
x
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
x x x x x x x x x x x x x x x x x x x x
x x x x x x x x x
x x x x x x x x x x x
x x x
x x x x x x x
x x x
x x x x
x x
x
x x
x
x x
x
x
x
x
x
x x
x
x
x
x x
x
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
x x x x x x x x x x x x x x x
x x x x x
x
x x x x x
x x x x x x
x x x x x
x x x x
x x x x
x x x x
x x
x x x x
x x
x x
x
x
x
x
x
x
x
x
x
x
x
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
81
BMI97
BMI98
BMI99
BMI00
BMI01
x
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
36
BMI02
BMI03
x
x
MISSING DATA PATTERN FREQUENCIES
Pattern
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
Frequency
1102
97
73
38
21
11
5
20
23
4
8
3
8
3
11
25
6
3
2
3
1
1
2
7
1
1
6
Pattern
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
Frequency
2
10
51
4
3
1
1
1
3
6
1
1
1
3
6
3
1
1
2
1
6
3
2
3
3
3
3
Pattern
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
Frequency
26
53
9
9
2
4
1
4
1
3
5
1
1
1
1
2
1
14
1
1
2
1
1
7
1
2
4
! We might want to set some minimum standard and drop observations that do not
meet that. For example, we might drop people who are missing their BMI for more
than 3 waves.
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value
0.100
PROPORTION OF DATA PRESENT
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
37
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Covariance Coverage
BMI97
BMI98
________
________
0.925
0.847
0.902
0.850
0.856
0.842
0.846
0.839
0.837
0.796
0.794
0.777
0.775
BMI02
BMI03
Covariance Coverage
BMI02
BMI03
________
________
0.861
0.774
0.840
BMI99
________
BMI00
________
0.910
0.864
0.854
0.805
0.788
0.906
0.859
0.811
0.788
BMI01
________
0.904
0.817
0.801
! We have 77.4% of the 1768 observations answering both BMI02 and BMI03
SAMPLE STATISTICS
! Notice that the means are not dramatically different from the results of the
“basic” analysis that had the 1098 observations using listwise deletion. This is
reassuring that our missing values are not creating a systematic bias.
1
Means
BMI97
________
20.572
BMI98
________
21.839
1
Means
BMI02
________
24.390
BMI03
________
24.935
BMI99
________
22.651
BMI00
________
23.305
BMI01
________
23.846
TESTS OF MODEL FIT
! If you compare nested models with MLR estimation you need to use the scaling
correction factor as discussed on their web page. We are not doing that here, so
this is okay.
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
Scaling Correction Factor
for MLR
*
116.426*
23
0.0000
2.302
The chi-square value for MLM, MLMV, MLR, ULS, WLSM and WLSMV cannot be used
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
38
for chi-square difference tests. MLM, MLR and WLSM chi-square difference
testing is described in the Mplus Technical Appendices at www.statmodel.com.
See chi-square difference testing in the index of the Mplus User's Guide.
! The chi-square is much bigger when we use FIML estimation with missing values,
in part because the sample is so much bigger. Still there are some fit problems
without the quadratic term. Both the CFI and TLI are a bit low to be ideal (under
.96). However the RMSEA is good and that is the most widely used measure of fit.
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
1279.431
21
0.0000
CFI/TLI
CFI
0.926
TLI
0.932
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.048
SRMR (Standardized Root Mean Square Residual)
Value
0.051
! The results are similar to the linear model solution with listwise deletion, but our
z-scores are bigger due to having more observations.
S
WITH
I
0.408
0.112
3.658
Means
I
S
21.035
0.701
0.105
0.022
200.935
32.311
Variances
I
S
15.051
0.255
0.958
0.031
15.714
8.340
5.730
3.276
3.223
4.361
2.845
0.638
0.414
0.351
0.973
0.355
8.981
7.907
9.175
4.483
8.005
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
39
BMI02
BMI03
9.380
8.589
3.384
2.736
2.772
3.139
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores, estimated values)
Scatterplots (sample values, estimated factor scores, estimated values)
Sample means
Estimated means
Sample and estimated means
Observed individual values
Estimated individual values
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
40
Multiple Cohort Growth Model with Missing Waves
Major datasets often have multiple cohorts. NLSY97 has youth who were 12-18 in 1997. Seven
years later, they are 19-25. It is quite likely that many growth processes that involve going from
the age of 12 to the age of 19 are different than going from 19-25. For example, involvement in
minor crimes (petty theft, etc.) may increase from 12 to 19, but then decrease from there to 25.
Here is what we might have for our NLSY97 data
Individual
1
2
3
4
5
Cohort
1985
1985
1984
1982
1982
1997
3
2
4
6
5
1998
4
4
5
7
5
1999
5
3
6
5
6
2000
6
5
7
4
4
2001
7
6
6
3
2
2002
7
7
6
2
2
2003
8
7
5
2
1
We can rearrange this data
Case
1
2
3
4
5
Cohort
1985
1985
1984
1982
1982
HD12
3
2
*
*
*
HD13
4
4
4
*
*
HD14
5
3
5
*
*
HD15
6
5
6
6
5
HD16
7
6
7
7
5
HD17
7
7
6
5
6
HD18
8
7
6
4
4
HD19
*
*
5
3
2
HD20
*
*
*
2
2
HD21
*
*
*
2
1
In this table HD is the age at which the data was collected. To capture everybody we would need
to extend the table to HD25 because the youth who were 18 in 1997 are 25 seven years latter.
This table would have massive amounts of missing data, but the missingness would not be related
to other variables. It would be missing at random.
We could develop a growth curve that covered the full range from age 12 to age 25. We would
have 14 waves of data even though each participant was only measured 7 times. Each participant
would have data for 7 of the years and have missing values for the other 7 years.
We would want to estimate a growth model with a quadratic term and expect the linear slope to be
positive (growth from 12-18) and the quadratic term to be negative (decline from 18-25).
Mplus has a special Analysis: type called MCOHORT. There is an example on the Mplus
WebPage and we will not cover it here. This is an extraordinary way to deal with missing values.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
41
Here is an example from data Muthén analyzed:
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
42
Multiple group growth curves
Multiple group analysis using SEM is extremely flexible—some would say it is too flexible
because there are so many possibilities. We use gender for our grouping variable because we are
interested in the trend in BMI for girls compared to boys. We think of adolescent girls are more
concerned about their weight and therefore more likely to have a lower BMI than boys and to have
a flatter trajectory.
There are several ways of comparing a model across multiple groups.
One approach is to see if the same model fits each group, allowing all of the estimated parameters
to be different.
 Here we are saying that a linear growth model fits the data for both boys and girls, but
 We are not constraining girls and boys to have the same values on any of the parameters
- intercept mean
- slope mean
- intercept variance
- slope variance
- covariance of intercept and slope
- residual errors
We can then put increasing invariance constraints on the model. At a minimum, we want to test
whether the two groups have a different intercept (level) and slope. If this constraint is acceptable
we can add additional constraints on the variances, covariances, and error terms.
First, we will estimate the model simultaneously for girls and boys with no constraints on the
parameters. Here is the program with new commands highlighted:
Title:
bmi_growth_gender.inp
Data:
File is "F:\flash\academica\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97
bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 white black hispanic asian other;
Missing are all (-9999) ;
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
43
!
usevariables keeps bmi variables and gender
Usevariables are male bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Grouping is male (0=female 1=male);
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84) ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
I’ve put the only changes we need to make in bold, underline. We have a binary variable, male,
that is coded 0 for females and 1 for males. We need to add this to the list of variables we are
using. Then, we need to add a subcommand to the Variable: section that says we have a
grouping variable, names it, and defines what the values are so the output will be labeled nicely.
The command Grouping is male (0=female 1 = male); is going to give us a
separate set of estimates for the parameters for girls (labeled female) and boys (labeled male).
Here is selected, annotated output:
SUMMARY OF ANALYSIS
Number of groups
2
Number of observations
Group FEMALE
528
Group MALE
574
Number of dependent variables
7
Number of independent variables
0
Number of continuous latent variables
2
Variables with special functions
Grouping variable
MALE
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
44
SAMPLE STATISTICS FOR FEMALE
1
Means
BMI97
________
19.904
1
Means
BMI02
________
23.606
BMI98
________
21.198
BMI99
________
21.752
BMI00
________
22.349
BMI01
________
22.805
BMI03
________
23.961
SAMPLE STATISTICS FOR MALE
1
Means
BMI97
________
20.652
BMI98
________
21.835
1
Means
BMI02
________
24.370
BMI03
________
24.994
BMI99
________
22.858
BMI00
________
23.638
BMI01
________
24.063
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
Degrees of Freedom
twice the degrees of freedom
P-Value
320.535
46 ! Notice we have
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
8906.678
42
0.0000
CFI/TLI
CFI
TLI
0.969
0.972
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
45
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
0.104
0.093
0.115
SRMR (Standardized Root Mean Square Residual)
Value
0.063
MODEL RESULTS
Estimates
S.E.
Est./S.E.
0.465
0.090
5.187
Means
I
S
20.421
0.610
0.157
0.024
130.261
24.975
Variances
I
S
11.579
0.183
0.801
0.020
14.457
8.920
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
4.632
2.033
1.896
4.567
2.298
15.204
3.400
0.351
0.177
0.153
0.312
0.192
0.991
0.349
13.183
11.463
12.367
14.644
11.984
15.342
9.730
Group FEMALE
I
S
|
WITH
I
Group MALE
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
46
S
WITH
I
Means
I
S
Variances
I
S
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
0.337
0.114
2.956
21.215
0.697
0.171
0.027
124.278
25.551
14.528
0.232
0.991
0.026
14.660
8.918
6.306
3.445
3.405
2.651
2.132
4.304
10.570
0.471
0.269
0.241
0.195
0.183
0.332
0.730
13.391
12.800
14.108
13.612
11.671
12.960
14.484
Here is the graph of the two growth curves. It appears that the girls have a lower initial level and a
flatter rate of growth of BMI.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
47
We can re-estimate the model with the intercept and slope invariant. To do this we make the
following modifications to the model:
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
[i] (1);
[s] (2);
Model male:
[i] (1);
[s] (2);
Output:
Sampstat Mod(3.84) ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Notice that we added two lines to the Model: section,
 [i] (1); and
 [s] (2);.
Then we added a subsection called Model male: where males are the second group and put the
same two lines. The first model command is understood to be the group coded as zero on the male
variable. These changes force the intercept to be equal in both groups because they are both
assigned parameter (1) and the slopes to be equal because they are both assigned a parameter
(2). Any parameters with a (1) after them are equal in both groups as are any parameters with
a (2) after them in both groups.
When we run the revised program we obtain a chi-square that has two extra degrees of freedom
because of the two constraints.
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
338.157 ! Was 320.535
48 ! Was 46
0.0000
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
48
Chi-Square Test of Model Fit for the Baseline Model
Value
8906.678
Degrees of Freedom
42
P-Value
0.0000
CFI/TLI
CFI
0.967 ! .969
TLI
0.971 ! .972
RMSEA (Root Mean Square Error Of Approximation)
Estimate
0.105 ! .104
90 Percent C.I.
0.094 0.115
SRMR (Standardized Root Mean Square Residual)
Value
0.081
We can test the difference between
 the chi-square(48) = 338.17 and
 the chi-square(46) = 320.535.
 This difference, 17.635 has 48-46 = 2 degrees of freedom and is significant at the p < .001
level.
 Although we can say there is a highly significant difference between the level and trend for
girls and boys, we need to be cautious because this difference of chi-square has the same
problem with a large sample size that the original chi-squares have.
 In fact, the measures of fit are hardly changed whether we constrain the intercept and slope to
be equal or not. Moreover, the visual difference in the graph is not dramatic.
We could also put other constraints on the two solutions such as equal variances and covariances,
and even equal residual error variances, but we will not.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
49
Alternative to Multiple Group Analysis
An alternative way of doing this, where there are two groups, is to enter the grouping variable as a
predictor. This requires re-conceptualizing our model. We can think of the indicator variable
Male having a direct path to both the intercept and the slope. Because the indicator variable is
coded as 1 for male and 0 for female,
 If the path from Male to the Intercept is positive this means that boys have a higher initial
level on BMI.
 Similarly, if there is a positive path from Male to the Slope, this indicates that boys have a
steeper slope than girls on BMI.
 Such results would be consistent with our expectation that boys both start higher and gain more
fat than girls during adolescence.
 This approach does not let us test for other types of invariances such as the variances,
covariances, and error terms.
The following figure shows these two paths. We have omitted the residual variances, RI and RS,
and their covariance to simplify the figure. However, it is important to remember that it is theses
two variances we are explaining. We are explaining why some people have a higher or lower
initial level and why some have a steeper or flatter slope by whether they are a girl or a boy. Here
is the figure:
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
50
Male
(+)
(+)
Intercept
1
1
1
1
Slope
1
1
2
1
3
1
4
5
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
Here is part of the program:
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97
bmi98 bmi99 bmi00 bmi01 bmi02 bmi03 white black hispanic
asian other;
Missing are all (-9999) ;
!
usevariables is limited to bmi variables and male
Usevariables are male bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Model:
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
i on male ;
s on male ;
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
51
Output:
Sampstat Mod(3.84) ;
Plot:
Type is Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Here is selected, annotated output:
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
237.517 ! We cannot
compare this to the chi-square for the two group design because this is not nested
in that model.
Degrees of Freedom
28
P-Value
0.0000
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
8602.391
28
0.0000
CFI/TLI
CFI
TLI
0.976
0.976
Loglikelihood
H0 Value
H1 Value
-19515.302
-19396.543
Information Criteria
Number of Free Parameters
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
14
39058.603
39128.672
39084.204
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
52
(n* = (n + 2) / 24)
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
Probability RMSEA <= .05
0.082
0.073
0.000
0.092
SRMR (Standardized Root Mean Square Residual)
Value
0.044
MODEL RESULTS
I
ON
MALE
S
0.793
0.233
3.409 ! Males higher
0.084
0.038
2.203 ! Males steeper
0.400
0.075
5.371
ON
MALE
S
WITH
I
Intercepts
BMI97
0.000
0.000
0.000
BMI98
0.000
0.000
0.000
BMI99
0.000
0.000
0.000
BMI00
0.000
0.000
0.000
BMI01
0.000
0.000
0.000
BMI02
0.000
0.000
0.000
BMI03
0.000
0.000
0.000
I
20.385
0.168
121.416
S
0.625
0.027
22.816
! When we add one or more predictors of the intercept and slope, the intercept and
slope means are not reported under a section called “means” but are now under
“intercepts”
Residual Variances
BMI97
5.391
0.290
18.583
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
53
BMI98
2.731
0.159
17.129
BMI99
2.696
0.144
18.752
BMI00
3.524
0.177
19.858
BMI01
2.327
0.144
16.175
BMI02
9.552
0.458
20.846
BMI03
7.148
0.398
17.974
I
13.027
0.636
20.471
S
0.212
0.017
12.095
!Both the intercept and slope still have variance to explain
We see that the intercept is 20.385 and the slope is .625. How is gender related to this? For girls
the equation is:
Est. BMI = 20.385 + .625(Time) + .793(Male) + .084(Male)(Time)
20.385 + .625(Time) + .793(0) + .084(0)(Time)
= 20.385 + .625(Time)
For boys the equation is:
Est BMI = 20.385 + .625(Time) + .793(1) + .084(1)(Time)
= (20.385 + .793) + (.625 + .084)(Time)
= 21.178 + .709(Time)
Where Time is coded as 0, 1, 2, 3, 4, 5, 6
Using these we estimate the BMI for girls is initially 20.385. By the seventh year (Time = 6) it
will be 20.385 + .625(6) or 24.135
Using these results, we estimate the BMI for boys is initially 21.178. By the seventh year it will be
21.78 + .709(6) or 26.034. Since a BMI of 25 is considered overweight, by the age of 18 we
estimate the average boy will be classified as overweight.
We could use the plots provided by Mplus, but if we wanted a nicer looking plot we could use
another program. I used Stata getting this graph.
The Stata command is twoway (connected Girls Age, lcolor(black)
lpattern(dash) lwidth(medthick)) (connected Boys Age,
lcolor(black) lpattern(solid) lwidth(medthick)), ytitle(Body Mass
Index) xtitle(Age of Adolescent) caption(NLSY97 Data)
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
54
and the data is
+-----------------------+
| Age
Girls
Boys |
|-----------------------|
1. | 12
20.385
21.178 |
2. | 18
24.135
26.034 |
+-----------------------+
Body Mass Index by Age of Adolescent
20
22
24
26
Comparison of Girls with Boys
12
14
16
18
Age 12 to 18
Girls
Boys
When we treat a categorical variable as a grouping variable and do multiple comparisons we can
test the equality of all the parameters. When we treat it as a predictor as in this example, we only
test whether the intercept and slope are different for the two groups. In this example we do not
allow the other parameters to be different for boys and girls and this might be a problem in some
applications.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
55
Growth Curves with Time Invariant Covariates
An extension of having a categorical predictor includes having a series of covariates that explain
variance in the intercept and slope. In this example we use what are known as time invariant
covariates. These are covariates that either remain constant (gender) or for which you have a
measure only at the start of the study. It is possible to add time varying covariates as well.
This has been called Conditional Latent Trajectory Modeling (Curran & Hussong, 2003) because
your initial level and trajectory (slope) are conditional on other variables.
This is equivalent to the multilevel approach that calls the intercept and slope random effects.
With programs such as HLM we use what they call a two level approach. Here are the parallels
using a slide adapted from Muthén. Muthén has said this is the most critical thing to understand
for these procedures.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
56
Level 1 is defined as the measurement model with an intercept (level) and slope (trend/trajectory).
Level 2, represented by equations 2a and 2b treats the intercept and slope as random variables that
are explained by a vector of covariates.
 The yit is the outcome. In our example it is the score on BMI for individual “i” at time “t”.
a. In our figures we show them as yt and the “i” is implicit.
b. That is each individual can have a different y value at each time.
 The xt is the time score. In our example of BMI we use 0, 1, 2, 3, 4, 5, 6
 The 0i is the intercept for individual “i”.
a. The graph just below equation 1 shows three individuals who each have a different
intercept.
b. Individual “1” has a higher starting value than individuals 2 or 3.
c. In the figure we show 0 because this represents the mean of 0i.
d. The paths from 0 and each yt is fixed at 1 because it is a constant effect.
 The 1ixt is the slope for individual “i” times his or her score on time.
a. With our BMI example, we score time as 0, 1, 2, 3, 4, 5, 6.
b. In the figure we use 1 because this represents the mean of 1i.
c. The paths from 1 to each yt are for BMI are 0, 1, 2, 3, 4, 5, 6. Other variables are possible.
 If we had a quadratic, we would add an 2txt2. For BMI the Xt2 would be 0, 1, 4, 9, 16, 25, 36.
 The it is the residual error on y for individual “i” at time “t”.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
57
a. With BMI you can imagine many factors that could have a temporary influence on a
person’s BMI score on the day it was measured.
b. The figure shows et (t = 1, 2, etc.) and the “i” is implicit.
An important distinction that some make between HLM and SEM programs is that SEM programs
cannot have the time vary between individuals. If the youth are measured each year, it is important
that all of them are measured at the same time so they are all one year apart. Mplus has a way of
eliminating this limitation of SEM by allowing each individual to have a different time between
measurements. For example, Li might be measured at 12 month intervals, Jones might be
measured at intervals of 11 months, then 13 months, then 9 months, etc. We are not discussing
these extensions at this point (see TSCORE in the User’s Manual).
Equations (2a) and (2b) are the level two equations. Here we are explaining the individual
variance in the intercept and the slope.
 0i is the random intercept that varies from one individual to another
 1i is the random slope that varies from one individual to another
 The wi is a vector of covariates. This can be generalized to include any number of categorical
or continuous factors that predict the random intercept and random slope from equation (1). In
the last section we had gender as the only w predictor.
 The α0 is the fixed intercept or value on 0i for a person who has a value of zero on all the
covariates.
 The 0 is the fixed slope (notice that there is no “i” subscript so the same slope is applies for all
individuals) for the effect of the covariate when predicting the value on 0i. In the previous
section this was the slope for gender. Because it was positive, we said males had a higher
intercept than females.
 The α1 is the fixed intercept or value of 1i for those who have a value of zero on all the wi
variables.
 The 1wi is the effect of the wi variable on the slope, 1i. In our last example, because males
had a positive slope, I, we can say that males gain weight (BMI) more quickly than females
between the age of 12 and 18.
 The 0i and 1i are the residuals. These are very important. A significant residual means that our
vector of wi does not completely explain the intercept or slope.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
58
Emotional
Problems
Youth
e1
Parent
e2
White
Intercept
1
0
1
1
1
Slope
1
1
2
1
3
1
4
5
6
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
e97
e98
e90
e00
e01
e02
e03
*The variable White (whites = 1; nonwhites = 0) compares Whites to the combination of African
American and Hispanic. Asian & Pacific Islander, and Other have been deleted from this analysis
because of small sample size.
In this figure we have two covariates. One is whether the adolescent is white versus African
American or Hispanic and the other is a latent variable reflecting the level of emotional problems a
youth has.
 A researcher may predict that Whites have a lower initial BMI (intercept) which persists during
adolescence, but the White advantage does not increase (same slope as nonwhites).
 Alternatively, a researcher may predict that being White predicts a lower initial BMI (intercept)
and less increase of the BMI (smaller slope) during adolescence. This suggests that minorities
start with a disadvantage (high BMI) and this disadvantaged gets even greater across
adolescence.
 A researcher may argue that emotional problems are associated with both higher initial BMI
(intercept) and a more rapid increase in BMI over time (slope)
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
59
By including a covariate that is a latent variable itself, emotional problems, we will show how
these are handled by Mplus.
We estimated this model for boys only; girls were excluded.
The following is our Mplus program:
Title:
bmi_growth_covariatesb.inp
Stata2Mplus convertsion for F:\flash\academica\bmi_stata.dta
Data:
File is "F:\flash\academica\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 white black hispanic asian other;
Missing are all (-9999) ;
usevariables is limited to bmi variables and
Usevariables are boyprb_y boyprb_p white bmi97
bmi00 bmi01 bmi02 bmi03 ;
Useobservations = male eq 1 and asian ne 1 and
Model:
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4
emot_prb by boyprb_p boyprb_y ;
i on white emot_prb;
s on white emot_prb;
q on white emot_prb;
!
male
bmi98 bmi99
other ne 1;
bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84) standardized;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
I have highlighted the new lines in the Mplus program. The Useobservations = male eq
1 and asian ne 1 and other ne 1; restricts our sample to males (male eq 1).
This is very handy when using the same dataset for a variety of models where you want some
models to only include selected participants. We have dropped Asians and members of the “other”
category. There are relatively few of them in this sample dataset and they may have very different
BMI trajectories. Also, the meaning of the category “other” is ambiguous. The format of the
Useobservations subcommand is similar to select or if used with other programs.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
60
You may notice that I added a quadratic term in the Model: command. I estimated the model
using just a linear slope and the fit was not very good. Adding the quadratic improved the fit.
This example has a measurement model for a latent covariate, emot_prb. In other programs this
can involve complicated programming. Here it is done with the single line
emot_prb by boyprb_p boyprb_y ;
The by is a key word in Mplus for creating latent variables used in Confirmatory Factor Analysis
and SEM. On the right of the by are two observed variables. The boyprb_p is the report of
parents about the adolescent’s emotional problems. The boyprb_y is the youths own report. It
is usually desirable to have three or more indicators of a latent variable, but we only have two here
so that will have to do. To the left of the by is the name we give to the latent variable,
emot_prb. This new latent variable did not appear in the list of variables we are using, but it is
defined here. The “by” term linking the latent variable, emot_prb, to its two indicators,
boyprb_p and boyprb_y is a remarkably powerful command in Mplus. This fixes the first
variable to the right as a reference indicator, boyprb_p, and assigns a loading of 1 to it. It lets
the loading of the second variable, boyprb_y, be estimated. It also creates error/residual
variances that are labeled e1 and e2 in the figure. The default is that these errors are uncorrelated. It
is good practice to have the strongest indicator on the right of the “by” be the reference indicator
with a loading fixed at 1.0. You can run the model and if this does not happen, you can re-run it,
reversing the order of the items on the right of the “by.” The “by” means the latent variable on
the left is measured “by” the observed variables on the right.
The next three new lines,
i on white emot_prb;
s on white emot_prb; and
q on white emot_prb;
define the relationship between the covariates and the intercept and slope. These are the 1wi in the
equation presented earlier. Mplus uses the on command to signify that a variable depends on
another variable in the structural part of the model. The by command is the key to understanding
how Mplus sets up the measurement model and the on is the key to how Mplus sets up the
structural model.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
61
Starting with Mplus 3 there are many defaults. Mplus assumes there are residual variances and
covariances for the intercept and slopes. It fixes the intercepts at zero. It assumes the intercept and
slope variances are correlated.
The final new line is in the plot: subsection: Series = bmi97 bmi98 bmi99 bmi00
bmi01 bmi02 bmi03(*);. In a growth model the graphics needs to know the name of the
outcome variable for each wave. The “(*)” at the end of this line tells Mplus to start with bmi97 to
the slope having a path of 0, and increment this by 1.0 for each subsequent wave.
Here is the output:
Mplus VERSION 3.12
MUTHEN & MUTHEN
11/25/2005
4:37 PM
INPUT INSTRUCTIONS
Title:
bmi_growth_covariatesb.inp
Stata2Mplus convertsion for F:\flash\academica\bmi_stata.dta
Data:
File is "F:\flash\academica\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 white black hispanic asian other;
Missing are all (-9999) ;
usevariables is limited to bmi variables and
Usevariables are boyprb_y boyprb_p white bmi97
bmi00 bmi01 bmi02 bmi03 ;
Useobservations = male eq 1 and asian ne 1 and
Model:
i s q| bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4
emot_prb by boyprb_p boyprb_y ;
i on white emot_prb;
s on white emot_prb;
q on white emot_prb;
!
male
bmi98 bmi99
other ne 1;
bmi02@5 bmi03@6;
Output:
Sampstat Mod(3.84) standardized;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Number of groups
Number of observations
1
491
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
62
Number of dependent variables
Number of independent variables
Number of continuous latent variables
9
1
4
Observed dependent variables
Continuous
BOYPRB_Y
BMI01
BOYPRB_P
BMI02
BMI97
BMI03
BMI98
BMI99
BMI00
Observed independent variables
WHITE
Continuous latent variables
EMOT_PRB
I
S
Q
Estimator
ML
TESTS OF MODEL FIT
Chi-Square Test of Model Fit
Value
Degrees of Freedom
P-Value
64.201
34
0.0013
Chi-Square Test of Model Fit for the Baseline Model
Value
Degrees of Freedom
P-Value
4075.891
45
0.0000
CFI/TLI
CFI
TLI
0.993
0.990
Loglikelihood
H0 Value
H1 Value
-10433.355
-10401.255
Information Criteria
Number of Free Parameters
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
(n* = (n + 2) / 24)
29
20924.710
21046.407
20954.362
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
63
RMSEA (Root Mean Square Error Of Approximation)
Estimate
90 Percent C.I.
Probability RMSEA <= .05
0.043
0.026
0.767
0.058
SRMR (Standardized Root Mean Square Residual)
Value
0.026
MODEL RESULTS
Estimates
S.E.
Est./S.E.
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
3.644
3.644
3.644
3.644
3.644
3.644
3.644
0.847
0.832
0.777
0.757
0.739
0.699
0.631
0.000
1.000
2.000
3.000
4.000
5.000
6.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.185
2.371
3.556
4.742
5.927
7.112
0.000
0.271
0.506
0.738
0.961
1.137
1.231
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
0.000
1.000
4.000
9.000
16.000
25.000
36.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.173
0.694
1.561
2.775
4.336
6.243
0.000
0.040
0.148
0.324
0.563
0.832
1.081
EMOT_PRB BY
BOYPRB_P
BOYPRB_Y
1.000
0.709
0.000
0.284
0.000
2.492
1.057
0.749
0.663
0.527
0.245
0.249
0.984
0.071
0.071
I
S
|
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Q
S
StdYX
|
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I
Std
|
ON
EMOT_PRB
ON
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
64
EMOT_PRB
0.257
0.130
1.988
0.230
0.230
ON
EMOT_PRB
-0.045
0.021
-2.118
-0.277
-0.277
-1.050
0.380
-2.767
-0.288
-0.142
-0.023
0.172
-0.136
-0.020
-0.010
-0.003
0.028
-0.107
-0.017
-0.008
0.717
0.384
1.869
0.166
0.166
-0.099
-0.174
0.060
0.038
-1.654
-4.592
-0.157
-0.848
-0.157
-0.848
WHITE
WITH
EMOT_PRB
-0.065
0.033
-1.975
-0.061
-0.125
Intercepts
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I
S
Q
1.986
1.676
0.000
0.000
0.000
0.000
0.000
0.000
0.000
21.350
1.272
-0.097
0.064
0.072
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.291
0.132
0.021
31.010
23.382
0.000
0.000
0.000
0.000
0.000
0.000
0.000
73.368
9.651
-4.584
1.986
1.676
0.000
0.000
0.000
0.000
0.000
0.000
0.000
5.858
1.073
-0.560
1.396
1.052
0.000
0.000
0.000
0.000
0.000
0.000
0.000
5.858
1.073
-0.560
Variances
EMOT_PRB
1.117
0.467
2.395
1.000
1.000
1.461
1.424
5.238
3.446
3.269
2.119
1.998
4.356
9.906
12.916
0.243
0.456
0.578
0.287
0.259
0.196
0.193
0.366
0.914
1.091
6.013
3.122
9.060
12.017
12.637
10.805
10.365
11.908
10.833
11.834
1.461
1.424
5.238
3.446
3.269
2.119
1.998
4.356
9.906
0.972
0.723
0.560
0.283
0.180
0.149
0.091
0.082
0.160
0.297
0.972
Q
I
ON
WHITE
S
ON
WHITE
Q
ON
WHITE
S
WITH
I
Q
WITH
I
S
Residual Variances
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
I
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
65
S
Q
1.330
0.028
0.246
0.006
5.417
4.406
0.947
0.924
0.947
0.924
R-SQUARE
Observed
Variable
R-Square
BOYPRB_Y
BOYPRB_P
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
0.277
0.440
0.717
0.820
0.851
0.909
0.918
0.840
0.703
Latent
Variable
R-Square
I
S
Q
0.028
0.053
0.076
! We are not explaining much variance in any of these.
MODEL MODIFICATION INDICES
Minimum M.I. value for printing the modification index
M.I.
E.P.C.
4.422
10.800
7.048
7.693
7.599
9.622
-0.012
0.034
-0.205
0.393
2.240
-4.758
Std E.P.C.
3.840
StdYX E.P.C.
BY Statements
I
I
S
S
Q
Q
BY
BY
BY
BY
BY
BY
BMI02
BMI03
BMI02
BMI03
BMI02
BMI03
-0.043
0.122
-0.243
0.466
0.388
-0.825
-0.008
0.021
-0.047
0.081
0.075
-0.143
-1.119
0.552
0.252
-0.506
0.435
-0.370
-0.648
-0.059
0.027
0.032
-0.022
0.018
-0.045
-0.026
WITH Statements
! Might consider correlating adjacent errors.
BMI98
BMI99
BMI01
BMI01
BMI01
BMI02
BMI02
WITH
WITH
WITH
WITH
WITH
WITH
WITH
BMI97
BMI98
BOYPRB_P
BMI99
BMI00
BOYPRB_P
BMI00
4.091
6.766
4.544
8.391
5.132
4.868
10.058
-1.119
0.552
0.252
-0.506
0.435
-0.370
-0.648
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
66
BMI02
BMI03
WITH BMI01
WITH BMI02
12.449
4.559
0.803
-1.356
0.803
-1.356
0.031
-0.045
10.211
0.685
0.685
0.119
Means/Intercepts/Thresholds
[ BMI03
]
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
67
Mediational Models with Time Invariant Covariates
Sometimes all of the covariates are time invariant or at least measured at just the start of the study.
Curran and Hussong (2003) discuss a study of a latent growth curve on drinking problems with a
covariate of parental drinking. Parental drinking influences both the initial level and the rate of
growth of drinking problem behavior among adolescents. The question is whether some other
variables might mediate this relationship
 Parental monitoring
 Peer influence
Parent
Drinking
Parental
Monitoring
Peer
Influence
Intercept on
Problem Drinking
Slope Problem
Drinking
Mplus allows us to estimate the direct and indirect effect of Parent Drinking on the Intercept and
Slope. It also provides a test of significance for these effects.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
68
Time Varying Covariates
We have illustrated time invariant covariates that are measured at time 1. It is possible to extend
this to include time varying covariates. Time varying covariates either are measured after the
process has started or have a value that changes (hours of nutrition education). Although we will
not show or output, we will illustrate the use of time varying covariates in a figure. In this figure
the time varying covariates, a21 to a24 might be
 Hours of nutrition education completed between waves. Independent of the overall growth
trajectory, η1, students who have several hours of nutrition education programming may have a
decrease in their BMI.
 Physical education curriculum. A physical activity program might lead to reduced BMI.
Students who spend more time in this physical activity program might have a lower BMI
independent of the overall growth trend, η1.
This figure is borrowed from Muthén where he is examining growth in math performance over 4
years. The w vector contains x variables are covariates that directly influence the intercept, η0, or
slope, η1. The aij are number of math courses taken each year.
yit
a1it
a2it
w
= repeated measures on the outcome (math achievement)
= Time score (0, 1, 2, 3) as discussed previously
= Time varying covariates (# of math courses taken that year)
= Vector of x covariates that are time invariant and measured at or before the first yit
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
69
In this example we might think of the yi variables being measures of conflict behavior where y1 is
at age 17 and y4 is at age 25. We know there is a general decline in conflict behavior during this
time interval. there fore the slope η1 is expected to be negative.
Now suppose we also have a measure of alcohol abuse for each of the 4 waves (aij). We might
hypothesize that during a year in which an adolescent has a high score on alcohol abuse (say
number of days the person drinks 5 or more drinks in the last 30 days) that there will be an
elevated level of conflict behavior that cannot be explained by the general decline (negative
slope).
The negative slope reflects the general decline in conflict behavior by young adults as the move
from age 17 to age 25. The effect of aij on yi provides the additional explanation that those years
when there is a lot of drinking, there will be an elevated level of conflict that does not fit the
general decline.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
70
Growth Curve with Binary Outcome
This section will examine the prevalence of drinking behavior (have you drank alcohol in the last
30 days?). This is a binary variable and so we need to utilize special procedures and be careful in
how we interpret the results.




Even though the observed variable is binary, the latent intercept and slope are continuous.
The mean of the intercept is fixed at zero.
The mean of the slope is free.
The variance and covariance of the intercept and slope are also free.
Although the mean of the intercept is fixed at zero, because the variance is free it is possible for a
covariate to influence the intercept for each individual. An adolescent whose mother monitors his
or her behavior very closely may have a lower intercept than an adolescent whose mother does not
monitor closely.
Here is the Mplus program with annotations:
Title:
drinkbin_g.inp
Data:
File is F:\flash\academica\drinkbin.dat ;
Variable:
Names are
drk97 drk98 drk99 drk00 drk01 drk02 drk03 pdrk pcol pcut male famrtn97
mommon97 famrtn98 mommon98 famrtn99 mommon99 famrtn00 mommon00 drk97b
drk98b drk99b drk00b drk01b drk02b drk03b;
Usevariables are
drk97b drk98b drk99b drk00b drk01b drk02b drk03b;
Missing are all (-9999) ;
Categorical are
drk97b drk98b drk99b drk00b drk01b drk02b drk03b;
! We identify which variables are categorical
Analysis:
Type = General Missing H1 ;
Estimator = MLR ;
! We do FIML est. with missing values and use a robust maximum
! likelihood estimator
MODEL:
i s | drk97b@0 drk98b@1 drk99b@2 drk00b@3 drk01b@4 drk02b@5 drk03b@6;
Output:
Sampstat standardized patterns ;
Plot:
Type is
Plot3;
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
71
Series = drk97b drk98b drk99b drk00b drk01b drk02b drk03b(*);
Here is selected output.
SUMMARY OF ANALYSIS
Number of groups
1
Number of observations
1771
Number of dependent variables
Number of independent variables
Number of continuous latent variables
! Used all information
7
0
2
Observed dependent variables
Binary and ordered categorical (ordinal)
DRK97B
DRK98B
DRK99B
DRK00B
DRK03B
DRK01B
DRK02B
Continuous latent variables
I
S
Estimator
MLR
Information matrix
OBSERVED
Optimization Specifications for the Quasi-Newton Algorithm for
Continuous Outcomes
Maximum number of iterations
1000
Convergence criterion
0.100D-05
Optimization Specifications for the EM Algorithm
Maximum number of iterations
500
Convergence criteria
Loglikelihood change
0.100D-02
Relative loglikelihood change
0.100D-05
Derivative
0.100D-02
Optimization Specifications for the M step of the EM Algorithm for
Categorical Latent variables
Number of M step iterations
1
M step convergence criterion
0.100D-02
Basis for M step termination
ITERATION
Optimization Specifications for the M step of the EM Algorithm for
Censored, Binary or Ordered Categorical (Ordinal), Unordered
Categorical (Nominal) and Count Outcomes
Number of M step iterations
1
M step convergence criterion
0.100D-02
Basis for M step termination
ITERATION
Maximum value for logit thresholds
15
Minimum value for logit thresholds
-15
Minimum expected cell size for chi-square
0.100D-01
Maximum number of iterations for H1
2000
Convergence criterion for H1
0.100D-03
Optimization algorithm
EMA
Integration Specifications
Type
STANDARD
Number of integration points
15
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
72
Dimensions of numerical integration
Adaptive quadrature
Progressive quadrature stages
Cholesky
2
ON
1
ON
Input data file(s)
F:\flash\academica\drinkbin.dat
Input data format FREE
SUMMARY OF DATA
Number of patterns
49
SUMMARY OF MISSING DATA PATTERNS
MISSING DATA PATTERNS FOR U
1
x
x
x
x
x
x
x
DRK97B
DRK98B
DRK99B
DRK00B
DRK01B
DRK02B
DRK03B
DRK97B
DRK98B
DRK99B
DRK00B
DRK01B
DRK02B
DRK03B
DRK97B
DRK98B
DRK99B
DRK00B
DRK01B
DRK02B
DRK03B
2
x
x
x
x
x
x
3
x
x
x
x
x
x
4
x
x
x
x
x
5
x
x
x
x
x
x
6
x
x
x
x
7
x
x
x
x
x
x
8
x
x
x
x
x
9 10 11 12 13 14 15 16 17 18 19 20
x x x x x x x x x x x x
x
x x
x
x x
x
x
x x
x
x
x
x x x
x x
x x x
x x x
x
x x x x x x x x x x x x
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
x x x x x x x x x x x x x x x x x x x x
x x x x
x
x x x x
x
x
x
x x x x x x x x x
x
x
x
x
x
x x x x x
x
x x
x
x x
x
x x
x
x x
x
x
x
x
x x
x
x x
x x x x
41 42 43 44 45 46 47 48 49
x x x x x x x x x
x x
x
x
x x
x
x x
x
x x
x
x x
x x x x
x
x
MISSING DATA PATTERN FREQUENCIES FOR U
Pattern
1
2
3
4
5
6
Frequency
1453
12
21
18
6
2
Pattern
18
19
20
21
22
23
Frequency
2
2
1
1
1
1
Pattern
35
36
37
38
39
40
Frequency
1
1
1
3
2
11
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
73
7
8
9
10
11
12
13
14
15
16
17
20
4
7
4
14
4
1
1
4
2
1
24
25
26
27
28
29
30
31
32
33
34
2
60
12
17
2
19
25
3
11
2
4
41
42
43
44
45
46
47
48
49
1
1
2
2
2
1
2
1
1
! Patterns with little missing data are more numerous. For example
! Pattern 25 with all but the last year have 60 observations.
COVARIANCE COVERAGE OF DATA
Minimum covariance coverage value
0.100
PROPORTION OF DATA PRESENT FOR U
DRK97B
DRK98B
DRK99B
DRK00B
DRK01B
DRK02B
DRK03B
Covariance Coverage
DRK97B
DRK98B
________
________
1.000
0.956
0.956
0.949
0.931
0.938
0.919
0.924
0.902
0.925
0.901
0.894
0.872
DRK02B
DRK03B
Covariance Coverage
DRK02B
DRK03B
________
________
0.925
0.875
0.894
DRK99B
________
DRK00B
________
0.949
0.920
0.903
0.902
0.873
0.938
0.903
0.900
0.868
DRK01B
________
0.924
0.894
0.867
SUMMARY OF CATEGORICAL DATA PROPORTIONS
DRK97B
Category 1
Category 2
DRK98B
0.943
0.057
Category 1
0.803
Category 2
0.197
DRK99B
Category 1
0.714
Category 2
DRK00B
0.286
! The proportion drinking (category 2) increases
! each year in a fairly linear growth pattern. In the
! data, category 1, not drink in last 30 days,
! was a zero and category 2, drink in last 30 days
! was a one.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
74
Category
Category
DRK01B
Category
Category
DRK02B
Category
Category
DRK03B
Category
Category
1
2
0.670
0.330
1
2
0.599
0.401
1
2
0.532
0.468
1
2
0.491
0.509
THE MODEL ESTIMATION TERMINATED NORMALLY
TESTS OF MODEL FIT
Loglikelihood
H0 Value
-6371.973
Information Criteria
Number of Free Parameters
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
(n* = (n + 2) / 24)
5
12753.945
12781.342
12765.457
! Muthén likes the Sample-Size Adjusted BIC for comparing models.
Chi-Square Test of Model Fit for the Binary and Ordered Categorical
(Ordinal) Outcomes
Pearson Chi-Square
Value
Degrees of Freedom
P-Value
1293.477
122
0.0000
Likelihood Ratio Chi-Square
Value
Degrees of Freedom
P-Value
1121.943
122
0.0000
MODEL RESULTS
S
Estimates
S.E.
Est./S.E.
0.245
0.018
13.976
Std
StdYX
WITH
I
0.999
0.999
Means
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
75
0.000
0.275
0.000
0.016
0.000
16.957
0.000
0.839
0.000
0.839
! Fixed
I
0.561
0.077
7.314
1.000
1.000
S
0.107
0.018
6.085
1.000
1.000
! Sign. variance
! to explain
I
S
Variances
R-SQUARE
Observed
Variable
R-Square
DRK97B
DRK98B
DRK99B
DRK00B
DRK01B
DRK02B
DRK03B
0.146
0.260
0.375
0.477
0.563
0.634
0.260
QUALITY OF NUMERICAL RESULTS
Condition Number for the Information Matrix
(ratio of smallest to largest eigenvalue)
0.263E-04
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores)
Scatterplots (sample values, estimated factor scores)
Sample proportions
Estimated probabilities
!examine the sample proportions and the estimated probabilities plots.
!Could try a quadratic slope.
!Notice this is the first program that takes measurable time (29 seconds)
!These can be much, much longer for complex models
Beginning Time:
Ending Time:
Elapsed Time:
11:04:12
11:04:41
00:00:29
We will show two graphs.
 The first is the actual proportion of adolescents who have had a drink in the last 30 days. This
increases monotonically.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
76
 The second graph is the estimated proportion of adolescents who have had a drink in the last 30
days. Although the continuous growth curve is linear, the estimated probability follows a slight
S-curve because Mplus is using a logistic function.
The Observed Sample Proportion of Adolescents Age 12-18 who had a drink in the last 30 days
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
77
Adding covariates
Because our model has significant residual variance for both the intercept and the slope, it makes
sense to add covariates. Because the intercept and slope are latent continuous variables (even
though the observed variables are all binary), we can add covariates much like we did with other
models.
Title:
drinkbin_g_cov.inp
Data:
File is F:\flash\academica\drinkbin.dat ;
Variable:
Names are
drk97 drk98 drk99 drk00 drk01 drk02 drk03 pdrk pcol pcut male famrtn97
mommon97 famrtn98 mommon98 famrtn99 mommon99 famrtn00 mommon00 drk97b
drk98b drk99b drk00b drk01b drk02b drk03b;
Usevariables are
drk97b drk98b drk99b drk00b drk01b drk02b drk03b male pdrk pcol pcut
famrtn97 mommon97;
Missing are all (-9999) ;
Categorical are
drk97b drk98b drk99b drk00b drk01b drk02b drk03b;
Analysis:
Type = General Missing H1 ;
Estimator = MLR ;
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
78
MODEL:
i s | drk97b@0 drk98b@1 drk99b@2 drk00b@3 drk01b@4 drk02b@5 drk03b@6;
i s on male pdrk pcol pcut famrtn97 mommon97
Output:
Sampstat standardized patterns ;
Plot:
Type is
Plot3;
Series = drk97b drk98b drk99b drk00b drk01b drk02b drk03b(*);
Here are selected results
TESTS OF MODEL FIT
Loglikelihood
H0 Value
-5448.452
Information Criteria
Number of Free Parameters
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
(n* = (n + 2) / 24)
17
10930.903
11022.703
10968.697
MODEL RESULTS
I
Estimates
S.E.
Est./S.E.
-0.105
0.269
0.021
0.174
-0.017
-0.089
0.140
0.089
0.064
0.067
0.013
0.023
-0.750
3.032
0.332
2.607
-1.293
-3.884
Std
StdYX
ON
MALE
PDRK
PCOL
PCUT
FAMRTN97
MOMMON97
-0.065
0.167
0.013
0.108
-0.011
-0.055
-0.033
0.119
0.015
0.114
-0.060
-0.176
! It is interesting that the percentage of peers who drink influences the
! starting point (I), but not the rate of growth (S).
! Mother monitoring closely reduces the initial likelihood of drinking (I).
! Surprisingly, mother monitoring has the opposite effect on the rate of growth.
S
ON
MALE
PDRK
PCOL
PCUT
FAMRTN97
MOMMON97
0.034
-0.028
0.020
-0.067
-0.005
0.032
0.037
0.027
0.017
0.018
0.003
0.006
0.926
-1.042
1.191
-3.710
-1.412
5.228
0.083
-0.068
0.050
-0.165
-0.012
0.077
0.042
-0.049
0.056
-0.174
-0.066
0.247
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
79
S
WITH
I
-0.106
0.063
-1.682
-0.161
-0.161
!These are the means
Intercepts
I
S
Residual Variances
I
S
0.000
0.316
0.000
0.119
0.000
2.659
0.000
0.774
0.000
0.774
2.347
0.145
0.324
0.021
7.256
6.773
0.906
0.873
0.906
0.873
R-SQUARE
Latent
Variable
R-Square
I
S
0.094
0.127
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores)
Scatterplots (sample values, estimated factor scores)
Sample proportions
Here is a scatterplot showing how the more a mother monitors a child initially (1997 when child is
12), the lower the initial level of the likelihood of drinking.
However, the more the mother monitors the child, the steeper the growth in the likelihood of
drinking.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
80
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
81
Growth Curve using a Poisson Outcome
Many outcome variables involve counts as well as counts of events where there is censoring or an
inflated number of zeros. Mplus can estimate these models by saying which variables are
Counts. It is also possible for Mplus to make adjustment for an excess of zeros because there a
substantial number of adolescents who say zero days—more than you would expect from a
Poisson model. This uses a zero inflated Poisson model (zip). The zero inflated Poisson models
are quite computationally intense. I did this for our example (results not shown here) and the
program took 1.5 hours to run. For more complex models, this can be prohibited.
The idea of a zero inflated model is that some people are always zero. For example, a person who
never drinks will always have an answer of zero for the number of days sh(e) drank in the last 30
days. Other people vary in any given month from 0 to 30 on the number of days they drink. The
idea is that different covariates may predict whether you are always zero or not than predict how
often people do something. An application might be child abuse. We may find different predictors
related to whether a person abuses their child or not than predict how often a child abuser is
abusive.
Parallel Growth curves
I have not included a full treatment of parallel growth curves. These involve two growth curves
that may be interdependent. With our examples we might be interested in growth in conflict
between husbands and wives as one growth curve and growth in social problems of their
adolescent children. We might even add a distal outcome variable that is predicted by the growth
curves. Here is a figure representing such a model. The two latent variables, ICon and SCon are the
intercept and slope on marital conflict. Iprb and Sprb are the intercept and slope on the child’s social
problems. The Rel. Prb latent variable is Relationship Problems the child has in intimate
relationships. Gender is an exogenous variable because we want to control for possible gender
differences. To simplify the diagram I did not include the residual paths for the intercepts and
slopes. The curved, double headed arrows between the latent variables represent the covariance
between these residuals. That is, variation in the intercept for conflict is correlated with variation
in the intercept for social problems, growth in parental conflict is correlated with growth in social
problems.
This figure says that parents who have high initial conflict, ICon, will have children who have
developing social problems, Sprb. I will continue to interpret the paths at the presentation.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
82
y1
y2
y3
y4
y9
ICon
SCon
Rel.
Prb
Gender
y10
IPrb
y5
y6
SPrb
y7
y8
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
83
Mixture Models
Mixture models are the most complex topic we will cover and these are somewhat controversial.
The idea is to do an exploratory analysis to locate clusters of observations who share different
trajectories. This relaxes the assumption of single population with common population parameters.
Mixture Models allows for parameter differences across unobserved subpopulations. It uses latent
trajectory classes. These are latent classes that are unobserved.
 Growth models examine individual variation around a single mean growth curve.
 Growth Mixture models allow different classes of individuals to vary around different mean
growth curves.
Where are mixture models appropriate?
 Populations contain individuals with normative growth trajectories as well as individuals with
non-normative growth.
 Consider alcohol consumption from 15 to 30 in the U.S. The normative growth is an increasing
usage up to age 22 and then a gradual decline to age 30. Non-normative growth includes
alcoholics who follow a similar pattern to age 22 but then do not show a decline in usage
between 22 and 30.
 Different factors may predict individual variation within the groups as well as distal outcomes
of the growth processes.
 May want different interventions for individuals in different subgroups on growth trajectories.
We could focus interventions on individuals in non-normative growth directories that have
undesirable consequences.
 Growth trajectories for prostrate specific antigen (PSA) among older men has a subgroup that
has an exponential growth rate. These men are likely to develop prostrate cancer. Identifying
which individuals will fall in this exponential growth trajectory will allow optimal medical
intervention.
In the case of BMI we may locate a cluster of adolescents that steadily gains weight becoming
obsess, a cluster that steadily loses weight becoming underweight, and a cluster who has a steady
weight. For the drinking behavior we may find a cluster that has a constant low level of drinking, a
cluster that has an increasing rate of drinking, and a cluster that has a steady, but high level of
drinking. We do not know what the results are before we estimate the mixture model. This is
complicated by the fact that there are no uniform standards on how many clusters to extract.
However, locating different clusters we may find different variable predict which cluster in which
a person will be a member.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
84
Mixed Models--Is it a Single Population
Or Two or More Populations Mixed Together
Assume a Single Population
Mixed Model
Mixed Model
0
0
.05
.1
Density
Density
.1
.2
.15
.3
Assume Two Distinct Populations
Assume Two Distinct Populations
0
0
5
10
15
Outcome_Variable
5
10
15
20
0
5
10
15
x
Graphs by Grouping_Variable
Here is a general Growth Mixture Model
The latent intercept, slope, and yi are the standard growth model with RVI and RVS the residual
variances in the intercept and slope that are correlated. The X1 is a single covariate, but could be
any number of covariates. The Ci is the latent class (i = 1 to k) variable representing different
subpopulations that have different growth curves. The latent class a person is in, Ci, and the
Covariate, X1 both influence some distal outcome that in this case is categorical, U1, For example,
If we are modeling alcohol consumption we might have a three class solution. C 1 might be a
normative class that has moderate growth from 12 to 18 (moderate intercept and moderate positive
slope), C2 might have a consistently low level of alcohol consumption (small intercept and small
slope), and C3 might have a dramatic growth in alcohol consumptions (moderate intercept, steep
growth factor). In this case C1 and C2 might lead to a low probability of being classified as having
a serious drinking problem, P(U1 = 1 | C=1 or C = 2) is low, but being in C3 could be associated
with a high probability of being classified as having a serious drinking problem, P(U1 = 1 | C=3) is
high.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
85
Class Ci
Distal
Outcome
U1
Covariate
X1
RVI
RVS
Intercept
1
1
Slope
1
1
2
1
3
y1
y1
y1
y1
e1
e1
e1
e1
Example with BMI
We will show one example of a fairly simple growth mixture model using BMI. After going over
this in some detail, we will briefly discuss possible extensions to this simple model using an
illustration from the User’s Manual for Mplus. We are interested in identifying distinct
subpopulations of adolescents who have different growth trajectories. Before doing this it would
be important to hypotheses what these classes would be. For example, we might hypothesize three
classes:
 A normative class that has a gradual increase in their BMI from a
healthy level at age 12 to an almost overweight level at age 19.
 An obsess class that by age 12 already have a serious weight problem
that becomes much worse during adolescence.
 A lean group that have an initial BMI that is low and have little increase
in their BMI over time
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
86
Class Ci
Distal
Outcome
U1
Covariate
X1
RVI
RVS
Intercept
1
1
Slope
1
1
2
1
3
y1
y1
y1
y1
e1
e1
e1
e1
Mplus VERSION 3.13
MUTHEN & MUTHEN
11/30/2005 10:53 AM
INPUT INSTRUCTIONS
Title:
bmi_GMM.inp
Data:
File is "h:\flash\academica\bmi_stata.dat" ;
Variable:
Names are
id grlprb_y boyprb_y grlprb_p boyprb_p male race_eth bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 white black hispanic asian other;
Missing are all (-9999) ;
Usevariables are bmi97 bmi98 bmi99
bmi00 bmi01 bmi02 bmi03 ;
Classes = c(2) ;
Analysis:
Type = Mixture ;
Starts = 50 5 ;
Model:
%Overall%
i s | bmi97@0 bmi98@1 bmi99@2 bmi00@3 bmi01@4 bmi02@5 bmi03@6;
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
87
Output:
tech1 tech11 ;
Plot:
Type is
Plot3;
Series = bmi97 bmi98 bmi99 bmi00 bmi01 bmi02 bmi03(*);
Savedata:
SAVE=CPROB;
FILE IS "h:\flash\academica\bmi.txt" ;
!
class probability estimates (posterior probabilities)
!
along with with original data (variables listed under Usevariables)
!
in the file called bmi.txt
INPUT READING TERMINATED NORMALLY
bmi_GMM.inp
SUMMARY OF ANALYSIS
Number of groups
Number of observations
Number
Number
Number
Number
of
of
of
of
1
1102
dependent variables
independent variables
continuous latent variables
categorical latent variables
7
0
2
1
Observed dependent variables
Continuous
BMI97
BMI03
BMI98
BMI99
BMI00
BMI01
BMI02
Continuous latent variables
I
S
Categorical latent variables
C
Estimator
MLR
Information matrix
OBSERVED
Optimization Specifications for the Quasi-Newton Algorithm for
Continuous Outcomes
Maximum number of iterations
1000
Convergence criterion
0.100D-05
Optimization Specifications for the EM Algorithm
Maximum number of iterations
500
Convergence criteria
Loglikelihood change
0.100D-06
Relative loglikelihood change
0.100D-06
Derivative
0.100D-05
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
88
Optimization Specifications for the M step of the EM Algorithm for
Categorical Latent variables
Number of M step iterations
1
M step convergence criterion
0.100D-05
Basis for M step termination
ITERATION
Optimization Specifications for the M step of the EM Algorithm for
Censored, Binary or Ordered Categorical (Ordinal), Unordered
Categorical (Nominal) and Count Outcomes
Number of M step iterations
1
M step convergence criterion
0.100D-05
Basis for M step termination
ITERATION
Maximum value for logit thresholds
15
Minimum value for logit thresholds
-15
Minimum expected cell size for chi-square
0.100D-01
Optimization algorithm
EMA
Random Starts Specifications
Number of initial stage starts
50
Number of final stage starts
5
Number of initial stage iterations
10
Initial stage convergence criterion
0.100D+01
Random starts scale
0.500D+01
Random seed for generating random starts
0
Input data file(s)
h:\flash\academica\bmi_stata.dat
Input data format FREE
RANDOM STARTS RESULTS RANKED FROM THE BEST TO THE WORST LOGLIKELIHOOD VALUES
Initial stage loglikelihood values, seeds, and initial stage start numbers:
-18553.603
-18553.746
-18553.793
-18553.795
-18553.826
-18553.838
-18554.012
-18554.163
-18554.203
-18554.900
-18555.624
-18556.544
-18557.068
-18558.409
-18558.457
-18558.596
-18558.734
-18559.057
-18559.280
-18559.391
-18559.566
-18560.577
-18561.299
-18562.265
-18562.616
851945
127215
650371
939021
246261
626891
903420
153942
415931
120506
399671
462953
352277
915642
963053
568859
887676
645664
569131
93468
392418
253358
341041
260601
432148
18
9
14
8
38
32
5
31
10
45
13
7
42
40
43
49
22
39
26
3
28
2
34
36
30
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
89
-18563.451
-18565.256
-18565.860
-18566.235
-18570.220
-18576.074
-18582.169
-18585.197
-18654.086
-18678.820
-18724.361
-18724.888
-18724.913
-18724.913
-18724.914
-18724.918
-18724.930
-18725.914
-18726.150
-18726.347
-18726.360
-18726.375
-18726.489
-18726.511
-18726.713
-18726.796
364676
27071
370466
195873
347515
533739
749453
608496
107446
366706
318230
848163
unperturbed
207896
902278
637345
830392
407168
967237
761633
372176
285380
573096
966014
76974
68985
27
15
41
6
24
11
33
4
12
29
46
47
0
25
21
19
35
44
48
50
23
1
20
37
16
17
Loglikelihood values at local maxima, seeds, and initial stage start numbers:
-18553.315
-18553.315
-18553.315
-18553.315
-18553.315
127215
851945
650371
246261
939021
9
18
14
38
8
THE MODEL ESTIMATION TERMINATED NORMALLY
TESTS OF MODEL FIT
Loglikelihood
H0 Value
Information Criteria
-18553.315
! Information used to decide on the number of clusters
Number of Free Parameters
Akaike (AIC)
Bayesian (BIC)
Sample-Size Adjusted BIC
(n* = (n + 2) / 24)
Entropy
15
37136.630
37211.703
37164.059
0.957
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
90
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL
Latent
Classes
1
2
68.64094
1033.35906
0.06229
0.93771
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASS PATTERNS
BASED ON ESTIMATED POSTERIOR PROBABILITIES
Latent
Classes
1
2
68.64091
1033.35909
0.06229
0.93771
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP
Class Counts and Proportions
Latent
Classes
1
2
64
1038
0.05808
0.94192
Average Latent Class Probabilities for Most Likely Latent Class Membership (Row)
by Latent Class (Column)
1
1
2
0.930
0.009
2
0.070
0.991
MODEL RESULTS
Estimates
S.E.
Est./S.E.
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Latent Class 1
I
|
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
91
S
|
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
0.000
1.000
2.000
3.000
4.000
5.000
6.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
-0.096
0.092
-1.041
30.119
1.498
0.787
0.358
38.290
4.186
Intercepts
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Variances
I
S
7.408
0.168
1.209
0.026
6.127
6.393
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
5.385
2.736
2.691
3.570
2.290
9.468
7.218
0.763
0.322
0.400
1.022
0.358
4.437
3.108
7.060
8.489
6.725
3.494
6.389
2.134
2.322
1.000
1.000
1.000
1.000
1.000
1.000
1.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
1.000
2.000
3.000
4.000
5.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
S
WITH
I
Means
I
S
Latent Class 2
I
|
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
S
|
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
92
BMI03
6.000
0.000
0.000
-0.096
0.092
-1.041
20.178
0.614
0.157
0.021
128.688
28.794
Intercepts
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
0.000
Variances
I
S
7.408
0.168
1.209
0.026
6.127
6.393
Residual Variances
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
5.385
2.736
2.691
3.570
2.290
9.468
7.218
0.763
0.322
0.400
1.022
0.358
4.437
3.108
7.060
8.489
6.725
3.494
6.389
2.134
2.322
0.241
-11.265
S
WITH
I
Means
I
S
Categorical Latent Variables
Means
C#1
-2.712
QUALITY OF NUMERICAL RESULTS
Condition Number for the Information Matrix
(ratio of smallest to largest eigenvalue)
0.237E-02
TECHNICAL 11 OUTPUT
VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 1 (H0) VERSUS 2 CLASSES
H0 Loglikelihood Value
2 Times the Loglikelihood Difference
Difference in the Number of Parameters
Mean
Standard Deviation
P-Value
-18724.913
343.196
3
-604.083
905.921
0.0645
LO-MENDELL-RUBIN ADJUSTED LRT TEST
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
93
Value
P-Value
327.607
0.0676
PLOT INFORMATION
The following plots are available:
Histograms (sample values, estimated factor scores, estimated values)
Scatterplots (sample values, estimated factor scores, estimated values)
Sample means
Estimated means
Sample and estimated means
Observed individual values
Estimated individual values
Estimated means and observed individual values
Estimated means and estimated individual values
Mixture distributions
SAVEDATA INFORMATION
Order and format of variables
BMI97
BMI98
BMI99
BMI00
BMI01
BMI02
BMI03
CPROB1
CPROB2
C
F10.3
F10.3
F10.3
F10.3
F10.3
F10.3
F10.3
F10.3
F10.3
F10.3
Save file
h:\flash\academica\bmi.txt
Save file format
10F10.3
Save file record length
Beginning Time:
Ending Time:
Elapsed Time:
1000
10:53:18
10:53:27
00:00:09
MUTHEN & MUTHEN
3463 Stoner Ave.
Los Angeles, CA 90066
Tel: (310) 391-9971
Fax: (310) 391-8971
Web: www.StatModel.com
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
94
Support: Support@StatModel.com
Copyright (c) 1998-2005 Muthen & Muthen
Deciding on the Number of Classes
Growth mixture models share some of the limitations with other exploratory strategies such as
exploratory factor analysis. The problem is deciding how many classes should be distinguished.
There is no compelling statistical answer to this question and the user needs to combine theory, the
goals of the study, and the statistical procedures we will discuss.
Mplus provides five different criteria for helping us select the number of classes to keep.
Akaike AIC = -2LogL + 2p
 AIC = -2LogL + 2p
 where p is number of free parameters (15)
 -2(-18553.315) +2(15)
Bayesian Information Criterion
 BIC = -2logL + p ln n
 where p is number of free parameters (15)
 n is sample size (1102)
 -2(-18553.315) + 15(log(1102)) = 37211.703
 smaller is better, pick solution that minimizes BIC
Sample Size adjusted
 Adj_BIC = -2logL + p[ln((n+2)/24)
 -2(-18553.315) + 15(log(1104/24)) = 37164.06
 Simulations have shown this more useful than AIC or BIC and Muthén recommends it.
Entropy
 This is a measure of how clearly distinguishable the classes are based on how distinctly each
individual’s estimated class probability is.
 If each individual has a high probability of being in just one class, this will be high.
 It ranges from zero to one with values close to one indicating clear classification. Muthén does
not seem to emphasize this measure.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
95
Lo, Mendell, and Rubin likelihood ratio test
 This test uses a special distribution (not chi-square) for estimating the probability.
 This test is somewhat controversial because it can show a significant need for at least two
classes when skewed data was generated from a single population.
Here are the results for our analysis:
AIC
BIC
Sample
Adjusted
BIC
Entropy
Lo,
Mendell,
Rubin
N for
each
class
1
Class
37473.8
37533.9
37495.8
2
Classes
37136.6
37211.7
37211.7
3
Classes
37025.0
37115.1
37057.9
4
Classes
36941.6
37046.7
36980.0
5
Classes
36857.1
36977.2
36901.0
6
Classes
36858.3
36993.5
36907.7
na
.957
2 v 1
p = .07
.918
3 v 2
p = .49
.906
4 v 3
p = .60
.890
5 v 4
p = .06
.891
6 v 4
p = .13
C1=69
C2=1033
C1=908
C2=42
C3=108
C1=926
C2=37
C3=94
C4=45
C1=6
C2=90
C3=863
C4=100
C5=43
C1=64
C2=46
C3=873
C4=100
C5=1
C6=18
C=1102
The Lo, Mendell, Rubin test finds that 2 classes do marginally better than a single class, but also
that 5 classes do marginally better than 4 classes. The lack of significance my be because for all
solutions there is a dominant normative class.
The following graph shows what happens to the sample size adjusted BIC as the number of classes
increases and also what happens to the entropy measure. These are similar curves. The problem is
that we want a number of solution that both minimizes the adjusted BIC and still gives a large
entropy value. The entropy measure is very high for two classes and drops sharply as we add
additional classes to level off between five and six classes. Using only the adjusted BIC we might
want five classes, although in the five class solution there is one class with just 6 members and this
would not make much sense.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
96
.92
.88
36800
.9
37000
37200
Entropy
.94
37400
.96
37600
Selecting the Number of Classes
1
2
3
4
Number of Classes
Sample Adjusted BIC
5
6
Entropy
The number of cases in each class need to be considered. For many applications we would want a
normative case with a clear majority of the observations in it. However, we do not want to select
solutions that have just a few observations in a class unless there is some compelling theoretical
reason for this. The six class solution can be ruled out because one class has a single observation.
Although the 5 class solution might be justified using the Lo, Mendell, Rubin test and the adjusted
BIC criterion, I’m bothered by it having a class with just 6 people in it.
Studying your different solutions is probably the most important, although somewhat subjective,
way of deciding on the number of classes. It is useful to use Mplus’ graphic representations to see
if the different growth curves make sense.
Here is the two class solution graph for estimated and sample means. The normative group shown
in green has a much lower intercept and somewhat less growth. Although this group still
approaches having a mean BMI of over weight, this is the normative group for U.S. adolescents
and characterizes the preponderance of them. On the other hand, the non-normative group, shown
in red starts at the obese level and continues to increase its BMI. Although this group includes just
6.2% of the sample of adolescents, this group is clearly at risk. Identifying what covariates are
associated with being in this group would be an important contribution.
One Class Solution
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
97
The one class solution shows that adolescents have a BMI of around 20 at 12 and this approaches
25 (over weight) by 18. The sample means and estimated means show a striking linear trend. If we
use this solution, we would look for covariates that would explain individual variation in the
intercept and slope.
Two Class Solution
The two class model shows two of the groups we initially hypothesized. The normative group
which is over 93% of the adolescents are in green and the non-normative group we called obese
are in red. This makes a lot of sense and we might want to analyze this solution. We might find
that covariates have different effects on these two groups. A standard school activity program
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
98
might be very helpful for the normative group but not for the obese group because these youth
would not be actively involved in the program.
Three Class Solution
The three class solution does not fit our initial hypothesis very well.
Four Class Solution
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
99
As we get more groups the interpretation becomes more complex. Also notice that some of the
groups have hardly any members.
Five Class Solution
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
100
Sometimes a growth mixture model that has no compelling number of classes may become
compelling if a covariate is added or if known groups are analyzed separately. For example, if we
limited our sample to girls we might be more likely to find a third group that remains thin.
The key point to understand is that growth mixture models will rarely give you a solution that is
clearly, compellingly correct. The Lo, Mendell, Rubin test, adjusted BIC, and entropy measures
are useful, but only serve as guides. Consistency between a good solution using these statistical
criteria and your hypotheses is probably the best way.
Can a Wrong Number of Classes Still Be Useful?
Imagine you had the three groups we hypothesized (obese, normative, thin) and this was the true
situation. That is, there really were three distinct groups of adolescents. Now, suppose because of
sample size or data limitations, you could only identify two classes, the normative and the obese.
Could your work still be valuable? I think the answer is yes it can be valuable. You may miss out
on finding covariates that influence the growth trajectory for thin people and this could be an
important omission. Some think people maintain a low, but healthy BMI and some may become
anorexic. Knowing covariates that explain this variation in the thin group would be very useful.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
101
However, if you have just identified two classes, you may find important results. It is important to
know how covariates influence the adolescents in the normative class. Since the normative class
develops a weight problem, knowing what covariates might lower the growth rate (slope) would
be very useful. Similarly, knowing what covariates will influence the growth rate in the obese
group might have life saving implications.
Extensions of Growth Mixture Models
The following figure from the Mplus manual illustrates important extensions to what we have
covered. This model examines a growth mixture model of math achievement between grades 7
and 10. It has all the features we have discussed and illustrates the richness of the analysis that is
possible using this approach. The program and data are available from www.statmodel.com . We
will only comment on what it is showing.
Math achievement is observed at each of four years and this information is used to establish a
growth curve. The program will help you decide how many classes there are. At the very least we
might predict that there are three classes of adolescents:
 There are adolescents who do well from the start and do increasingly well over time, thereby
pulling themselves apart form their peers.
 We might hypotheses that there is a larger group who perform at an average level and “normal”
progress.
 We might also want to see if we can identify a third group that start with poor math ability and
make little progress. We might call these underachievers.
Focusing on the path from C to dropping out of high school, a binary variable, we might expect
that membership in the third class greatly increases the likelihood of dropping out of school.
Focusing on the paths from the covariates we know there is strong evidence how some of these
characteristics are associated with math achievement, rate of growth in math achievement,
dropping out of school, and probably in which class an adolescent fits.
What is most exciting about this figure is the broken paths from C to the other paths in the figure.
This is Muthén’s way of saying there is an interaction effect. Being in a class of math
underachievers, might reduce the effect of the covariates on other paths. Those in the normative
class might have very strong effects of covariates on the intercept, slope, and school drop out.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
102
Figure 19.7 is from Muthén (2004). It is interested in growth trajectories on math achievement
between grades 7 and 10.
 The intercept (i) and slope (s) are now familiar.
 The w vector of covariates includes a series of background factors (gender, race/ethnicity,
mother’s education, etc.
 These covariates directly influence the initial level (i) and the growth trend (s)
 They are also related to the latent cluster the student is in (c ).
 Finally, the covariates influence the likelihood of being a school dropout. This is a distal
outcome.
 The latent class variable (c)
 Directly influence the initial level (i) and the growth trend (s)
 Directly influences the distal outcome variable, high school dropout
 The dotted lines means that classes can vary on how much the covariates in vector w
influence the intercept, the slope, and the distal outcome variable.
Step 1. Estimate a conventional one-class growth curve.
Step 2. Estimate a two-class growth mixture model.
One class has a low initial value and a low rate of growth. None of the covariates have a
significant effect.
One class has a high initial value and strong rate of growth. Several covariates are
predictors.
Predicting Class Membership and Predicting Distal Outcomes
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
103
The degree to which the latent classes are useful can be assessed by estimating the conditional
probability for each individual to be in each class. This can tell us the most likely class for each
individual.
Growth Curve and Related Models, Alan C. Acock, Presented at Academica Sinica, December, 2005
104
Download