Latent Class Analysis Presentation

advertisement
Latent Class Analysis (LCA) and Latent Profile Analysis (LPA)
Latent Class Analysis (LCA) and Latent Profile Analysis (LPA) can extend growth modeling
when these are done as mixture models. We will review both LCA for binary indicators and
LPA for continuous indicators. In addition to clarifying the vocabulary for subsequent sections,
one outcome of this section will be a time varying covariate which will be a class variable.
Conceptual Example: Developing a Typology of Parenting Style.
There are many indicators we could use for parenting style. A few of them are:
 How closely the father and mother supervise or monitor the adolescent
 How supportive the father and mother are
 How closely the father and mother control the adolescent’s activities
 How often the father and mother praise the child.
We can have a large number of families for which we have scores on each of these dimensions
that are organized by the couple as the unit of analysis. The first five couples might look like
the following, if scores were quantitative
Couple
A
B
C
D
E
His
Her
His
Her
His
Her
His
Her
Super- Super- Support Support Control Control Praise Praise
vision vision
9
9
9
9
9
9
8
9
3
4
9
9
4
4
9
9
2
3
1
3
8
8
2
3
1
9
1
8
1
9
1
9
6
2
8
1
6
1
9
1
and our data might look like this if scores were binary. Here we have assigned a 1 for any score
above the threshold value of 4.
Couple
A
B
C
D
E
His
Supervision
1
0
0
0
1
Her
Supervision
1
0
0
1
0
His
Her
His
Her
His
Her
Support Support Control Control Praise Praise
1
1
0
0
1
1
1
0
1
0
1
0
1
0
1
1
0
1
1
0
1
1
0
0
1
1
1
0
1
0
LPA and LCA attempt to find subsets of families who share similar patterns of responses.
Latent Class & Latent Profile Analysis
1
 Couple A is high on all dimensions and this is true for both the husband and the wife.
 Couple B is supportive and praising their adolescent, but do little monitoring—joint
permissive. Families conforming to this pattern might have very different adolescent
outcomes than families in the first type.
 The third couple is high only on control. Couples that fit this pattern might be labeled
authoritarian.
 The fourth couple has a mother who scores high on all aspects and a father who is
disengaged.
 The last couple is just the opposite of the fourth couple.
If our measures were binary outcomes and we had eight variables as shown in the second data
matrix, we would have 28 = 256 possible combinations of zeros and ones.
 Each of these would be an empirically possible class. With 20 binary indicators we could
have 1,048,576 unique combinations.
 LCA seeks subgroups of observations, i.e., classes that have similar patterns. LCA may to
confirm or disconfirm a theory or it can be used in an exploratory way.
 When the indicators are binary this is called LCA. When the indicators are quantitative this
is usually called LPA.
 This distinction is unimportant when using the Mplus program.
 These are person centered methods in contrast to factor analysis that is variable centered
(Muthén and Muthén, 2002).
Latent Class & Latent Profile Analysis
2
Figure 1
A Model for Latent Class Analysis and Latent Profile Analysis
We can distinguish four types of variables.
 Indicators. Yi and Ui are the observed indicators. In this example the Yi indicators are the
father’s support, mother’s support, father’s control, mother’s control, etc.
o Although not shown in this figure, each of the quantitative indicators may contain
measurement errors and some of these may be correlated.
o This is an option. This option may be invoked when the classification fails to
explain their relationship fully. For example the error in father’s control and
mother’s control might be related.
o Continuous indicators are represented by the letter Yi .For categorical indicators, Ui
is used.
 Latent Classes. We have a single latent class variable in this figure, but we may have several
classes. An important question is how many classes we include.
 Covariates. Xi is a vector of covariates that predict class membership. In this example we
might have X1 be mother’s education and X2 be father’s education.
o The Xi variables may be categorical or continuous.
o These are distinguished from indicators by the direction of the relationship
(arrows). The classification explains the scores on the indicator variables, Yi and
Ui. The Xi covariates explain the classification.
Latent Class & Latent Profile Analysis
3
 Distal Outcomes. There may be one or more outcome variables that depend on the latent
class.
o Each latent class may have a different likelihood of having an adolescent who
leaves home early, goes to college, and so on. These variables may be categorical
or continuous.
o Note that the figure doesn’t differentiate between the indicators and the distal
outcomes. In the estimation the distal outcome is really treated as an indicator. The
distinction between indicators and distal outcomes is conceptual and based on time
ordering.
How does Mplus decide on the loadings of the Yi?
 It begins with a random split and then tries different combination until an algorithm is
optimized.
 An important and somewhat controversial assumption of LCA is that all of the variance is
explained by class membership. That is, each class will differ in their responses, but the
members in each class will be homogeneous.
 All of the variance in the responses is explained by class membership and the residual
variances for each latent class is fixed at zero.
 There is also an assumption that the empirically identified classes actually exist and are
meaningful (see, for example, Raudenbush, 2005). These assumptions make the selection of
the number of classes very subjective and requiring theoretical or practical justifications
beyond the statistical criteria discussed below.
Empirical Example—Level of Implementation
In many real world evaluations of program effectiveness the level of implementation of the
program is difficult to control. When this is the case, the level of implementation may be an
important factor in program effectiveness. Can we identify clusters of youth who report
experiencing different levels of implementation of a program? We often compare the
effectiveness of being in a treatment group compared to being in a control group. In a large
study with multiple sites and personnel, some participants in a treatment program may
experience a very low level of program implementation. Some may have fairly high
implementation except for some aspects of the program. In a study by Flay, et al. (2006) of a
Positive Action Program in 20 Hawaii schools, they measured implementation by 10 items
answered by the participating elementary school students. These items assessed the youth’s own
rating of the level of implementation he or she experienced. The items are
Latent Class & Latent Profile Analysis
4
Variable
s3ptp1
s3ptp2
s3ptp3
s3ptp4
s3ptp5
s3ptp6
s3ptp7
s3ptp8
s3ptp9
s3ptp12
Label
You receive stickers from your teacher for doing po...
You receive a word of the week card from your teacher?
You put notes in an icu box?
Your teacher read notes about you from the icu box?
Your teacher read your notes from the icu box?
Your class receive a token for meeting your classro...
You participate in a positive action assembly?
Your class receive a balloon in an assembly for ach...
Your class participate in whole school positive act...
On how many days most weeks were you taught a posit...
For purposes of illustrating LCA, I dichotomized the responses so that a 0 = not implemented
and a score of 1 = implemented at some level. The Mplus program for this is straightforward
(getting the data into a format that Mplus can read is discussed briefly in the Appendix followed
by all key Mplus programs used in this paper):
Title:
workshop lca2.inp
Latent Class Analysis of implementation for year 2 classes
Data:
File is lcalpa34.dat ;
Variable:
Names are
idnum s3ptp1 s3ptp2 s3ptp3 s3ptp4 s3ptp5 s3ptp6 s3ptp7
s3ptp8 s3ptp9 s3ptp12 s4ptp1 s4ptp2 s4ptp3 s4ptp4 s4ptp5
s4ptp6 s4ptp13 s4ptp7 s4ptp14 s4ptp8 s4ptp9 s4ptp10
s4ptp11 s4ptp12 s3ptp1b s3ptp2b s3ptp3b s3ptp4b
s3ptp5b s3ptp6b s3ptp7b s3ptp8b s3ptp9b s3ptp12b
s4ptp1b s4ptp2b s4ptp3b s4ptp4b s4ptp5b s4ptp6b
s4ptp7b s4ptp8b s4ptp9b s4ptp10b s4ptp11b s4ptp12b
s4ptp13b s4ptp14b s3techer room ;
Missing are all (-9999) ;
Usevariables are
s3ptp1b s3ptp2b s3ptp3b s3ptp4b s3ptp5b s3ptp6b s3ptp7b
s3ptp8b s3ptp9b s3ptp12b ;
Categorical are
s3ptp1b s3ptp2b s3ptp3b s3ptp4b s3ptp5b s3ptp6b s3ptp7b
s3ptp8b s3ptp9b s3ptp12b ;
Classes = c(2) ;
Analysis:
Latent Class & Latent Profile Analysis
5
Type = Mixture ;
Starts = 20 2;
processors = 2 ;
Output:
Stand Tech11 ;
Plot:
Type = Plot3 ;
series = s3ptp1b(1) s3ptp2b(2) s3ptp3b(3) s3ptp4b(4)
s3ptp5b(5) s3ptp6b(6) s3ptp7b(7) s3ptp8b(8) s3ptp9b(9)
s3ptp12b(10) ;
Mplus utilizes a series of sections. The Title: has whatever you want to put and this is simply
printed out. The Title: section ends when the program sees the keyword for the second section,
Data:. The Data: section tells Mplus where the data set is stored. If you store it in the same
folder as the program, you can just say File is xxx, where xxx is the name of the file. This
section and subsequent sections and subsections must end with a semi-colon. The next section
is Variable: and this describes all the variables. It has subsections. Names are lists the
variables in the order they are in the data file. Missing are all (-9999) ; tells Mplus
how missing values are coded. The Usevariables are subsection lists variables you will use
in the current analysis.
Mplus was written to make LCA simple to implement. All we need to do is to enter one
subsection under the Variable: section. We have entered Classes = c(2) ;.
This says there are two classes in this solution. If we wanted three classes we would say Classes
= c(3) ;. The Analysis: section tells Mplus how to estimate the model. We do this by saying
Type = Mixture and Starts = 20 2 ;.
 The term mixture refers to a model that seeks to find subsets of observations that have
different distributions than the combined dataset. The default estimation method for mixture
models is maximum likelihood with robust standard errors (MLR in Mplus).
 Mplus generates starting values for each parameter. By default Mplus will generate 10
different sets of starting values for the parameters. Our Start = 20 2 ; means that we
are having Mplus generate 20 different sets of starting values. The program estimates this
model for each of these and does a full iteration for the two best.
 With several classes identified it is often necessary to set this to 50 or more starts. Always
try a large value for a final solution to make sure your solution yields the same results.
Latent Class & Latent Profile Analysis
6
Usually there is a Model: section. This is not needed here because as soon as we said Classes
= c(2) ;, Mplus knew to use all the variables as indicators for identifying two classes. But if we
had covariates, Xi, that predict the class membership, e. g., age and gender, we would need a
Model: section. There we could specify these additional relationships and could even have
different covariates influence the likelihood of each of the classes. The Output: section has
Tech11 that gives us technical output including the Lo-Mendell-Rubin adjusted likelihood ratio
test of the hypothesis that a simpler model does as well. This is a somewhat controversial test
(Jefferies, 2003). The Plot: section creates a graph that is helpful for comparing classes to see
where they differ.
Annotated Selected Output. Mplus documentation provides limited examples of the
interpretation of the rich output it provides. The output labeling is sometimes less than ideal.
Therefore, we will provide a relatively detailed annotation of selected output from the LCA.
These descriptions are useful when we get to growth modeling.
We have two classes but they are all in one group. We could for instance do this simultaneously
for boys and girls and treat them as two groups, each with two possible classes. Notice all of our
indicators are dependent variables. This is consistent with our figures. There is a listing of
variables that are binary. If we had a trichotomy, Mplus would pick that up from the data and
make appropriate adjustments. When we said classes = c(2) ; the name of the categorical
latent variable became C. We could have used another name.
Following are the sample proportions for each item. We coded an answer of not implemented 0
and implemented 1. Mplus calls these categories 1 and 2, respectively. For the item s3ptp3b
we have 21.4% in the category 1, i.e., coded 0 on implementation, and 78.6% in category 2, i.e.,
coded 1 on implementation.
SUMMARY OF CATEGORICAL DATA PROPORTIONS
S3PTP1B
Category 1
0.214
Category 2
0.786
S3PTP2B
Category 1
0.432
Category 2
0.568
S3PTP3B
Category 1
0.459
Category 2
0.541
Latent Class & Latent Profile Analysis
7
The following line reporting on how the program terminated is good news, but we need to make
sure the minimum for the likelihood function was replicated and check for warnings—THE
MODEL ESTIMATION TERMINATED NORMALLY
The log likelihood value is hard to interpret by itself, but can be used to compare models. We
want to minimize it. I rely on the Sample-Size Adjusted BIC, and Entropy measure for
comparing models. Ideally the same number of classes will minimize the AIC and BIC
measures and have a strong entropy measure. Mplus reports both the Pearson Chi-Square and
the Likelihood Ratio Chi-Square. These are asymptotically equivalent. They lead to the same
decision in this case.
TESTS OF MODEL FIT
Loglikelihood
H0 Value
-8315.092
H0 Scaling Correction Factor
1.042
for MLR
Information Criteria
Number of Free Parameters
21
Akaike (AIC)
16672.184
Bayesian (BIC)
16784.666
Sample-Size Adjusted BIC
16717.954
(n* = (n + 2) / 24)
Entropy
0.827
Chi-Square Test of Model Fit for the Binary and Ordered Categorical
(Ordinal) Outcomes**
Pearson Chi-Square
Value
2617.543
Degrees of Freedom
977
P-Value
0.0000
Likelihood Ratio Chi-Square
Value
1253.581
Degrees of Freedom
977
P-Value
0.0000
** Of the 13864 cells in the latent class indicator table, 25
were deleted in the calculation of chi-square due to extreme
values.
Latent Class & Latent Profile Analysis
8
Chi-Square Test for MCAR under the Unrestricted Latent Class Indicator
Model
Pearson Chi-Square
Value
2777.125
Degrees of Freedom
12772
P-Value
1.0000
Likelihood Ratio Chi-Square
Value
1039.830
Degrees of Freedom
12772
P-Value
1.0000
Three tables are presented to show the number of observations in each class for our two class
solution. The first two utilize posterior probabilities. In the last table (shown here), the 769
children are assigned to class 1 and 797 children are assigned to class 2 when children are
assigned to the category that has the highest likelihood for them.
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS
MEMBERSHIP
Class Counts and Proportions
Latent
Classes
1
769
0.49106
2
797
0.50894
Each person has a probability of being in each class, but the big probabilities should be on the
diagonal. Those in class one have an average probability of being in class one of .955. The
mean estimated probability of these children being in class two is .045.
Average Latent Class Probabilities for Most Likely Latent Class
Membership (Row) by Latent Class (Column)
1
1
0.955
2
0.045
Latent Class & Latent Profile Analysis
9
2
0.053
0.947
The $ sign at the end of the variable names that follow indicate that the variables are binary,
hence there is one threshold value separating the 0’s from the 1’s on that variable. If the latent
class is above that threshold, these children are put in Class 2. If it is below that threshold, these
children are put in Class 1. The estimated thresholds are difficult to interpret, but the
Est./S.E. is interpreted as a z-score and can be used to test the significance of each indicator.
You will need to look up the probability for each z-score. Statistics packages make this simple,
e.g., in Stata you would enter display 2*(1-normal(z)), where z is the Est./S.E.
Why Mplus output does not do this for us is unknown
Estimates
S.E.
Est./S.E.
Thresholds
S3PTP1B$1
S3PTP2B$1
S3PTP3B$1
S3PTP4B$1
-2.223
-1.074
-2.312
-1.906
0.134
0.096
0.166
0.169
-16.602
-11.196
-13.941
-11.244
Latent Class 2
Thresholds
S3PTP1B$1
S3PTP2B$1
S3PTP3B$1
S3PTP4B$1
-0.714
0.425
1.535
2.125
0.084
0.082
0.140
0.163
-8.531
5.191
10.927
13.013
We can use the probability of being in each category for individual indicators to assign a
meaningful label to each class. This is similar to using loadings to identify the latent factors.
Previously, we saw that 78.6% of the overall sample was in Category 2 for s3ptp1b (i.e.
implemented). For Latent Class 1, 90.2% of individuals picked the Category 2 compared
to 67.1% for children in Latent Class 2. Thus, Latent Class 2 has a lower level of
implementation on this aspect.
Latent Class 1
RESULTS IN PROBABILITY SCALE
Latent Class & Latent Profile Analysis
10
S3PTP1B
Category
Category
S3PTP2B
Category
Category
S3PTP3B
Category
Category
Latent Class
S3PTP1B
Category
Category
S3PTP2B
Category
Category
S3PTP3B
Category
Category
1
2
0.098
0.902
0.012
0.012
8.276
76.451
1
2
0.255
0.745
0.018
0.018
13.977
40.933
1
2
0.090
0.910
0.014
0.014
6.627
66.898
1
2
0.329
0.671
0.018
0.018
17.809
36.357
1
2
0.605
0.395
0.020
0.020
30.916
20.219
1
2
0.823
0.177
0.020
0.020
40.158
8.654
2
We can also interpret the odds ratios. A child in Latent Class 1 is 46.84 times as likely to
say the program aspect in s3ptp3b was implemented (> 1) as is a child in Latent Class 2.
These odds ratios highlight where differences are most pronounced. It is informative that those
in Latent Class 1 have greater odds of endorsing every single indicator of implementation
and sometimes greatly greater odds. This indicates that Class 1 could be described as high
implementation and Class 2 could be described as low implementation.
LATENT CLASS
Latent Class
S3PTP1B
Category
S3PTP2B
Category
S3PTP3B
ODDS RATIO RESULTS
1 Compared to Latent Class 2
> 1
4.525
0.734
6.165
> 1
4.478
0.570
7.852
Latent Class & Latent Profile Analysis
11
Category > 1
46.844
8.909
5.258
The Results in Probability Scale are extremely useful for understanding how the
classes differ on these indicators.
 The following table is not in the output but generated from the output. The Overall
Proportion is from the initial result before the classes were created.
 The proportions for each class are from the results in probability scale on the output. The
differences indicate that if we stick with two classes, we clearly have a low and a high
implementation group.
 The first class with 769 people has a lower proportion checking each item as being
implemented at all. Some of the differences are substantial and some are enormous. For
example, just 11% of the children in the low implementation class report that their teacher
read their notes in the ICU compared to 87% of those children classified as high
implementation. Indeed, 7 of the 10 items have the proportion doubled in the high
implementation group. Here is the proportion agreeing to each item being implemented
Indicator
Stickers for PA
Word of the week
You put notes in the ICU
box
Teacher read ICU notes
about you
Teacher read your ICU
notes
Tokens for meeting goals
PA Assembly activities
Assembly Balloon for PA
Overall
Proportion
.79
.57
.54
Two Class Solution
First
Second
Class Label
Class Label
High
Low
Implementation Implementation
.90
.67
.74
.40
.91
.18
.48
.87
.11
.48
.88
.08
.68
.71
.33
.84
.80
.45
.52
.62
.22
Latent Class & Latent Profile Analysis
12
Whole school PA
Days/wk taught PA
N
.62
.89
1566
.73
.93
769
.51
.86
797
We asked for a plot of these probabilities. We can get this from Mplus by clicking Graph (on
the top bar). We say we want the Probability of one category and pick Category 2
(remember, Category 1 was disagree and Category 2 was agree that the aspect of the Positive
Action Program was implemented).
Figure 3
Latent Class Solution with Two Classes
The final outcome provides a test of whether a two class solution as shown does significantly
better than a one class solution. The Lo-Mendell-Rubin adjusted likelihood ratio test has a
computed value of 1998.54. This is statistically significant at the p < .001 level. Thus two
classes make a significant improvement in fit over a single class.
TECHNICAL 11 OUTPUT
Latent Class & Latent Profile Analysis
13
VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 1 (H0) VERSUS 2
CLASSES
H0 Loglikelihood Value
-9326.708
2 Times the Loglikelihood Difference
2023.233
Difference in the Number of Parameters
11
Mean
10.857
Standard Deviation
14.085
P-Value
0.0000
LO-MENDELL-RUBIN ADJUSTED LRT TEST
Value
1998.535
P-Value
0.0000
Deciding on the Number of Classes. There is no compelling statistical answer to this question
and the user needs to combine theory, the goals of the study, and the statistical criteria. A
potentially serious problem with Latent Class Analysis and its extensions to growth mixture
models discussed later in the paper is the danger of over-extracton where multiple classes are
identified even though they could result from a chance process (Bauer and Curran, 2002;
Muthén, 2002; Nylund, Asparouhov, & Muthén, 2006). This is a special problem when
variables are not normally distributed. Mplus provides several criteria:
 Akaike—AIC = -2*LogLikelihood + 2p. Where p is number of free parameters (15).
Smaller is better.
 Bayesian Information Criterion—BIC = -2*logLikelihood + p*ln(n). Where p is number
of free parameters (15), n is sample size (1102). Smaller is better.
 Sample Size adjusted—Adj BIC = -2*logLikelihood + p[ln((n+2)/24). Smaller is better.
Muthén reports that simulation studies indicate this is superior to BIC.
 Entropy—this is a measure of how clearly distinguishable the classes are based on how
distinctly each individual’s estimated class probability is.Lo, Mendell, and Rubin
likelihood ratio test—this test uses a special distribution (not chi-square) for estimating
the probability. This test is somewhat controversial because it can show a significant need
for at least two classes when random data are generated from a single, skewed
population.
Latent Class & Latent Profile Analysis
14
Here are the results for our analysis when we compare 1 to 5 class solutions.
1
Class
18673
18727
17685
na
na
AIC
BIC
Sample
Adjusted
BIC
Entropy
Lo,
Mendell,
Rubin
C=1566 N for
each
class
2
Classes
16672
16785
16718
3
Classes
16440
16611
16510
4
Classes
16296
16526
16389
5
Classes
16258
16548
16376
.827
2v1
Value
=1998
p = .0000
C1=769
C2=797
.748
3v2
Value =
251
p = .0000
C1=749
C2=447
C3=370
.684
4v3
Value =
164
p = .0000
C1=384
C2=395
C3=446
C4=341
.683
5v4
Value =
37
p = .1414
C1=432
C2=107
c3=243
C4=448
C5=336
The Lo, Mendell, Rubin test finds that 2 classes do better than a single class, but also that 3 or 4
classes do even better. The Sample Adjusted BIC shows improvement for each additional class,
although there are big drops between 1 and 2 classes and between 2 and 3 classes. The 2 class
solution shows an even split whereas the 3 class solution has a normative response and then two
special classes. The Entropy is good for 2 classes but drops noticeably for 3 or more classes.
We saw that the figure for the 2 class solution made a lot of sense with one class clearly high
implementation and the other low implementation. The figure for the 3 class solution shows that
class 2 is similar to class 1 on 5 indicators (1, 7, 8, 9, and 10), similar to class 3 on 3 indicators
(3, 4, and 5), and half way between on the remaining two indicators (2 and 6). Choosing
between a two class solution and a three class solution would depend largely on your research
goals.
Latent Class & Latent Profile Analysis
15
Figure 4
Latent Class Analysis Solution with Three Classes
Latent Profile Analysis. In the LCA we collapsed the response options for each item into
endorsed versus not endorsed. Actually each item was on a four point scale from 0 to 3 where a
0 represent the aspect of the program was not implemented and 1, 2, and 3 represented the
degree to which it was implemented. Mplus makes it possible to perform the equivalent of LCA
on continuous variables and it can also be done when there is a mixture of indicators that are
continuous and categorical. We will use the continuous measure to do what is most often called
LPA.
Title:
workshop LPA.inp
latent profile analysis of implementation for year 3
Data:
File is lcalpa34.dat ;
Variable:Names are
idnum s3ptp1 s3ptp2 s3ptp3 s3ptp4 s3ptp5 s3ptp6 s3ptp7
s3ptp8 s3ptp9 s3ptp12 s4ptp1 s4ptp2 s4ptp3 s4ptp4 s4ptp5
s4ptp6 s4ptp13 s4ptp7 s4ptp14 s4ptp8 s4ptp9 s4ptp10
s4ptp11 s4ptp12 s3ptp1b s3ptp2b s3ptp3b s3ptp4b
s3ptp5b s3ptp6b s3ptp7b s3ptp8b s3ptp9b s3ptp12b
s4ptp1b s4ptp2b s4ptp3b s4ptp4b s4ptp5b s4ptp6b
s4ptp7b s4ptp8b s4ptp9b s4ptp10b s4ptp11b s4ptp12b
s4ptp13b s4ptp14b s3techer room ;
Missing are all (-9999) ;
Usevariables are
s3ptp1 s3ptp2 s3ptp3 s3ptp4 s3ptp5 s3ptp6 s3ptp7
s3ptp8 s3ptp9 s3ptp12 ;
Classes = c(2) ;
Latent Class & Latent Profile Analysis
16
Cluster = s3techer ;
Idvariable = idnum ;
Analysis:
Type = Mixture Complex ;
Starts = 40 2;
Output:
samp Stand Tech11 ;
Plot:
Type = Plot3 ;
series = s3ptp1(1) s3ptp2(2) s3ptp3(3) s3ptp4(4)
s3ptp5(5) s3ptp6(6) s3ptp7(7) s3ptp8(8) s3ptp9(9)
s3ptp12(10) ;
Savedata:
File is wave3.dat ;
Save = Cprobabilities ;
Format is F6.0 ;
The Mplus program LPA has no variables that we label as categorical. We also ask for sample
statistics that will be provided for the continuous variables (means, variances, covariances, and
correlations). We did this under the Output: section by the keyword Sampstat.
We’ve added two new features that we could have used with the binary sample. First, we should
recognize that there are over 100 teachers for our students and students within each of these
classrooms will be more homogenous in their ratings of implementation than they would have if
they were sampled independently. We can adjust for the intraclass correlation by two additions
to our program. First, under the Variable: section we add the subcommand cluster =
s3techer ; where s3techer is the name of the teacher. Secondly, because this
acknowledges that our sample is complex rather than a simple random sample, under the
Analysis: section we add the keyword Complex. This is all we have to do to get unbiased
standard errors for our parameter estimates.
We are going to use the results of this LPA in our presentation of growth curves. The
implementation class for each child at wave 3 and again at wave 4 will be used as a time
varying covariate in our growth curve. Therefore, we need to save a file containing the
identification variable, idnum, and the classification. We do this by making two changes in the
program.
 First, under the Variable: section we add the subcommand idvariable = idnum ;
where idnum is the name of the identification variable in this dataset and in the dataset
we will merge with this created file.
 Second, we add a new section at the end of the program:
Latent Class & Latent Profile Analysis
17
Savedata:
File is wave3.dat ;
Save = Cprobabilities ;
Format is F6.0 ;
The Savedata: section needs to have a name of the file we will save. This new file will go to
the folder in which we have this particular program file. We need to tell Mplus what to save in
this file and it saves the classification when we use the keyword cprobabilities ;. Here it
will save the variable we identified as the idvariable and the class each child is coded as 1
and 2 where we have two classes. It also saves several other variables that are listed at the end
of the output.
The resulting file has variables in the order listed at the end of the output file. One problem is
that a missing value appears as an asterisk, *, without a space preceding it:
3.
3.
2.
2.
0.
2.
3.
2.
1.
1.
3.
3.
3.
1.
2.
0.
0.
3.
3.
0.
2.
2.
2.
2.
0.
3.
2.
0.
3.
2.
0.
2.
1.
2.
0.
2.
3.
0.
0.
3.
3.
0.
0.
1.
3.
2.
0.
2.
0.
0.
3.
2.
0.
0.
0.
2.
0.
3.
2.
0.
3.
3.
2.
0.
0.
0.
1.
3.
3.
1.
0.
3.
3.
3.
0.
0.
3.*
0.
0.
0.
0.
0.
3.
0.
3.
0.
0.
0.
3.
3.
1.
0.
3.
1.
2.
0.
1.*
0.*
2.
4.
3.
3.
3.
1.
3.
2.
2.
2702
4976
2488
2763
2368
1195
2747
2777
5295
1668
639
0.
0.
1.
0.
1.
1.
0.
0.
1.
1.
1.
1.
1.
0.
1.
0.
0.
1.
1.
0.
0.
0.
2.
2.
1.
2.
1.
1.
2.
2.
1.
1.
1.
80
45
10
54
63
41
73
1
65
53
55
The last case has scores of
3 2 1 1 0 0 3 . 0 . 639 1 0 1 55. You can see that some editing will be needed to bring this into
your standard statistics package such as Stata or SAS.
Let’s examine selected output focusing only on differences between LCA and LPA. We get
three warnings. The first tells us all the variables are uncorrelated with all other variables. This
is intended because we are forcing the latent classes to explain the correlations. If our model
does not fit because some items are more or less correlated than can be explained by the
classification system, we could add explicit correlations of error terms. The second warning
tells us that we have 16 children for whom we have missing values on their cluster. Mplus must
drop these observations. The third warning tells us there are 407 people who have a missing
value on all of our indicators. These are children for whom we have data in at least one wave,
but no data for wave 3. Since there are no data on these 407 children for wave 3, these
observations are deleted.
Latent Class & Latent Profile Analysis
18
*** WARNING in Model command
All variables are uncorrelated with all other variables within
class.
Check that this is what is intended.
*** WARNING
Data set contains unknown or missing values for GROUPING,
PATTERN, COHORT and/or CLUSTER variables.
Number of cases with unknown or missing values: 16
*** WARNING
Data set contains cases with missing on all variables.
These cases were not included in the analysis.
Number of cases with missing on all variables: 407
Where we had the proportion checking the two classes when our variables were binary, with
continuous variables we get the means. We compare these means to the corresponding means
for each class.
ESTIMATED SAMPLE STATISTICS
Means
S3PTP1
________
1.745
Means
S3PTP6
________
1.552
S3PTP2
________
1.143
S3PTP3
________
1.203
S3PTP4
________
1.000
S3PTP5
________
0.996
S3PTP7
________
1.526
S3PTP8
________
0.612
S3PTP9
________
1.207
S3PTP12
________
2.415
We get a different number of observations in each class using the continuous information than
we did when we used binary information. This may reflect the greater sensitivity of the 4-point
scale compared to the dichotomy.
CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS
MEMBERSHIP
Class Counts and Proportions
Latent
Classes
1
1021
0.65871
2
529
0.34129
Latent Class & Latent Profile Analysis
19
Instead of getting the probability of endorsing each item for each class, now we get estimates
that are actually the means. For example, children classified as in Latent Class 1 have a
mean of 2.24 on the s3ptp4 item and children classified as in Latent Class 2 have a mean
of .36 on this item. Although the LPA has a different number of children in each class than we
had with the LCA, it is interesting that the LCA probabilities and the LPA means for the two
class solution are so consistent.
MODEL RESULTS
Estimates
S.E.
Est./S.E.
Std
StdYX
Latent Class 1
Means
S3PTP1
S3PTP2
S3PTP3
S3PTP4
2.179
1.797
2.387
2.243
0.047
0.055
0.046
0.053
46.182
32.768
51.463
42.551
2.179
1.797
2.387
2.243
1.916
1.651
2.542
2.851
Latent Class 2
Means
S3PTP1
S3PTP2
S3PTP3
S3PTP4
1.517
0.806
0.587
0.357
0.039
0.035
0.041
0.037
38.889
23.132
14.373
9.634
1.517
0.806
0.587
0.357
1.334
0.740
0.625
0.454
We can compare different classes by putting these results in tables we create from the output.
Here are the results for two class and three class solutions followed by a summary table of
different measures of fit.
Latent Class & Latent Profile Analysis
20
Table of Means: Two Class Solution
Variable
Overall
Item
Means
Stickers for PA
Word of the week
You put notes in icu box
Teacher read ICU notes about you
Teacher read your ICU notes
Tokens for meeting goals
PA Assembly activities
Assembly Balloon for PA
Whole school PA
Days/wk taught PA
N
1.74
1.14
1.20
1.00
.99
1.55
1.52
.62
1.21
2.41
1,550
Two Class
Solution
First
Second
Class
Class
2.18
1.52
1.80
.81
2.39
.59
2.24
.36
2.46
.24
2.16
1.24
1.96
1.30
.93
.45
1.55
1.03
2.78
2.24
1,021
529
Table of Means: Three Class Solution
Variable
Stickers for PA
Word of the week
You put notes in icu box
Teacher read ICU notes about
you
Teacher read your ICU notes
Tokens for meeting goals
PA Assembly activities
Assembly Balloon for PA
Whole school PA
Days/wk taught PA
N
Latent Class & Latent Profile Analysis
Overall
Item
Means
1.74
1.14
1.20
1.00
Three Class Solution
First Second Third
Class Class Class
1.82
1.48
2.22
1.06
.80
1.90
1.42
.46
2.52
1.12
.27
2.38
.99
1.55
1.52
.62
1.21
2.41
1550
1.28
1.60
1.38
.67
1.16
2.40
332
.01
1.20
1.33
.41
1.02
2.46
808
2.76
2.22
2.03
.97
1.63
2.80
410
21
Criteria for Assessing Fit for Different Number of Classes
AIC
BIC
Sample Size
Adjusted BIC
Entropy
Lo, Mendell, Rubin
Test
N for each class
1
Class
48125
48232
48169
2
Classes
44239
44405
44306
3
Classes
43574
43798
43665
4
Classes
42783
43067
42898
na
na
.930
2v1
Value
3361
p = .001
C1 =
1021
C2 =
529
.929
3v2
Value
679
p = .075
C1 = 332
C2 = 808
C3 = 410
.955
4v3
Value
803
p = .008
C1 = 890
C2 = 146
C3 = 137
C4 = 377
C1 =
1566
These results are somewhat inconsistent. We selected the two class solution for our time
varying covariate. This has a good Entropy value, .930, does significantly better than a single
class and a three class solution does not do significantly better than a two class solution.
Importantly, the two class solution has a normative group of 1,021 children who have higher
implementation of every aspect of the Positive Action program than the 529 children in the low
implementation class.
The Savedata: command yields the following results:
SAVEDATA INFORMATION
Order and format of variables
S3PTP1
F6.0
S3PTP2
F6.0
S3PTP3
F6.0
S3PTP4
F6.0
S3PTP5
F6.0
S3PTP6
F6.0
S3PTP7
F6.0
Latent Class & Latent Profile Analysis
22
S3PTP8
F6.0
S3PTP9
F6.0
S3PTP12
F6.0
IDNUM
F6.0
CPROB1
F6.0
CPROB2
F6.0
C
F6.0
S3TECHER
I3
Save file
wave3.dat
Save file format
14F6.0 I3
Save file record length
1000
This file contains the score of each observation on each indicator, the identification number, the
status of the child for class 1 and class 2, a variable labeled C that is 1 if the child is in Latent
Class 1 and 2 if the child is in Latent Class 2, and the cluster variable value. I opened this file in
a text editor and dropped all the variables except the identification number and the class, C. I
repeated this analysis for wave 4 and did the same. I then merged these two datasets with the
data for the growth curve analysis.
Latent Class & Latent Profile Analysis
23
Download