APPLIED LATENT CLASS ANALYSIS: A WORKSHOP O

advertisement
APPLIED
LATENT CLASS ANALYSIS:
A WORKSHOP
Katherine Masyn, Ph.D.
Harvard University
katherine_masyn@gse.harvard.edu
December 5, 2013
Texas Tech University
Lubbock, TX
OVERVIEW
Statistical Modeling in the Mplus Framework
The Finite Mixture Model Family
Latent Class Analysis (LCA)
LCA Example: LSAY
LCA Model Building
Direct and Indirect Applications
Model Estimation
Class Enumeration
Fit Indices
Classification Quality
Summing It Up
Latent Class Regression (LCR)
“1-STEP” APPROACH for Latent Class Predictors
“OLD” 3-STEP APPROACH for Latent Class Predictors
NEW 3-STEP APPROACH for Latent Class Predictors
Distal Outcomes
Modeling Extensions
Longitudinal Mixture Models
Parting Words
Questions?
Select References & Resources
© Masyn (2013)
-2-
3
11
16
27
36
38
42
55
60
70
79
89
102
105
107
114
123
132
143
151
153
LCA Workshop
MODEL DIAGRAMS
Boxes for observed measures
Circles for latent variables
STATISTICAL MODELING IN THE
MPLUS FRAMEWORK
Arrow for “causal”/directional relationship
Arrow for “noncausal” relationship
Arrow, not originating from box or circle, for residual
or “unique” variance
© Masyn (2013)
-3-
LCA Workshop
© Masyn (2013)
-4-
LCA Workshop
MPLUS MODELING FRAMEWORK
K = continuous latent variable;
c = categorical latent variable
y = continuous observed variable;
u = discrete observed variable
T =continuous event time; x = observed continuous/categorical covariate
y
y
K
K
x
x
T
T
c
c
u
u
WITHIN
© Muthén & Muthén (2013)
© Masyn (2013)
-5-
LCA Workshop
STATISTICAL CONCEPTS CAPTURED
BY LATENT VARIABLES
BETWEEN
From: Muthén & Muthén, 1998-2013
STATISTICAL MODELS
USING LATENT VARIABLES
Continuous LVs
Categorical LVs
Continuous LVs
Categorical LVs
•
•
•
•
•
•
•
•
•
•
• Factor analysis; IRT
• Structural equation
models
• Growth models
• Multilevel models
• Missing data models
• Latent class analysis
• Finite mixture models
• Discrete-time survival
analysis
• Missing data models
Measurement errors
Factors
Random effects
Frailties, liabilities
Variance components
Missing data
Latent classes
Clusters
Finite mixtures
Missing data
Mplus integrates the statistical concepts captured by latent
variables into a general modeling framework that includes not only
all of the models listed above but also combinations and extensions
of these models.
© Muthén & Muthén (2013)
© Masyn (2013)
-7-
LCA Workshop
© Muthén & Muthén (2013)
© Masyn (2013)
-8-
LCA Workshop
MPLUS V7.1*
(WWW.STATMODEL.COM)
MPLUS BACKGROUND
• Inefficient dissemination of statistical methods:
Several programs in one
– Many good methods contributions from biostatistics, psychometrics, etc.
are underutilized in practice
–
–
–
–
–
–
–
–
–
–
–
–
–
• Fragmented presentation of methods:
– Technical descriptions in many different journals
– Many different pieces of limited software
• Mplus: Integration of methods in one framework
– Easy to use: Simple, non-technical language, graphics
– Powerful: General modeling capabilities
• Mplus versions
V1: November 1998
V3: March 2004
V5: November 2007
V6: April 2010
V7: September 2012
V2: February 2001
V4: February 2006
V5.2: November 2008
V6.12: November 2011
V7.1: May 2013
Fully integrated in the general latent variable framework
• Mplus team: Linda & Bengt Muthén, Thuy Nguyen,
Tihomir Asparouhov, Michelle Conn, Jean Maninger
© Muthén & Muthén (2013)
© Masyn (2013)
-9-
Exploratory factor analysis
Structural equation modeling
Item response theory analysis
Latent class analysis
Latent transition analysis
Mediation analysis
Survival analysis
Growth modeling
Multilevel analysis
Complex survey data analysis
Monte Carlo simulation
Bayesian analysis
Multiple imputation
*
LCA Workshop
Released in May 2013
© Muthén & Muthén (2013)
© Masyn (2013)
- 10 -
LCA Workshop
FAMILY MEMBERS
The finite mixture model family includes:
• Cross-sectional:
–
–
–
–
–
–
THE FINITE MIXTURE MODEL
FAMILY
Latent class analysis (LCA)
Latent profile analysis (LPA)
Latent class cluster analysis (LCCA)
Regression mixture models
Factor mixture models (FMM)
Etc.
• Longitudinal:
–
–
–
–
© Masyn (2013)
11
LCA Workshop
Growth mixture models (GMM)
Latent transition models (LTA)
Survival mixture analysis (SMA)
Etc.
© Masyn (2013)
12
LCA Workshop
LATENT CLASS ANALYSIS –
CATEGORICAL LV AND CATEGORICAL MVS
LATENT PROFILE ANALYSIS/LATENT CLASS
CLUSTER ANALYSIS – CATEGORICAL LV AND
CONTINUOUS MVS
y
y
K
K
x
x
T
c
T
c
u
© Masyn (2013)
13
u
LCA Workshop
© Masyn (2013)
14
LCA Workshop
FINITE MIXTURE MODEL LIKELIHOOD
• The basic finite mixture model has the following
likelihood function:
• K is the number of latent classes
•
is the proportion of the total population
belonging to Class k.
•
is the class-specific density function for
the latent class indicator (manifest) variables
with class-specific parameters,
.
© Masyn (2013)
15
LCA Workshop
LATENT CLASS ANALYSIS (LCA)
© Masyn (2013)
16
LCA Workshop
TRADITIONAL LCA
u1
u2
u3
•
•
•
•
Categorical indicators
Categorical latent variable
Cross-sectional data
Some consider LCA the categorical analogue
to factor analysis.
• Sometimes referred to as person-centered
analysis to stand in contrast to variablecentered analysis such as CFA.
• Different from IRT that models categorical
variables as indicators of an underlying
continuous trait (ability).
u4
c
© Masyn (2013)
17
LCA Workshop
FOR EXAMPLE
EXAMPLE DATA
• Binary test items as multiple indicators for
an underlying 2-level categorical latent
variable representing profiles of Mastery
and Non-mastery.
• DSM-VI symptom checklist (diagnostic
criteria) for depression.
© Masyn (2013)
19
LCA Workshop
18
© Masyn (2013)
LCA Workshop
Item 1
Item 2
Item 3
Item 4
Student
1
1
1
1
1
2
0
0
0
0
3
1
0
1
0
4
1
0
0
0
5
0
0
1
0
6
1
1
1
0
7
1
1
1
0
© Masyn (2013)
20
LCA Workshop
NAÏVE APPROACH
LCA APPROACH
• Create a cut-point based on the sum
score, e.g., clinical depression if satisfying
5 or more of the 9 symptoms; mastery
defined as 80% of items correctly
answered.
• Problems
– Treats all items the same, e.g., doesn’t take
into account that some items may be more
“difficult” than others
– Doesn’t take into account measurement error,
e.g., some with Mastery status may still make
a careless error.
© Masyn (2013)
21
LCA Workshop
ITEM PROBABILITY PLOTS
• Characterizes groups of individuals based on
response patterns for multiple indicators.
• Class membership “explains” observed covariation
between indicators.
• Allows for measurement error in that class-specific
item probabilities may be between zero and one.
• Allows comparisons of indicator sensitivity and
specificity to identify items that best differentiate the
classes
• Estimates the prevalence of each class in the
population
• Enables stochastic classification of individuals into
classes
© Masyn (2013)
22
LCA Workshop
MEASUREMENT CHARACTERISTICS
• Class homogeneity – Individuals within a
given class are similar to each other with
respect to item responses, e.g., for binary
items, class-specific response probabilities
above .70 or below .30 indicate high
homogeneity.
• Class separation – Individual across two
classes are dissimilar with respect to item
responses, e.g., for binary items, odds ratios
(ORs) of item endorsements between two
classes >5 or <.2 indicate high separation.
© Masyn (2013)
23
LCA Workshop
© Masyn (2013)
24
LCA Workshop
ITEM PROBABILITY PLOTS
Item
*
**
© Masyn (2013)
25
LCA Workshop
Class Class Class
3
2
1
(70%) (20%) (10%)
Class
1 vs. 2
Class
1 vs. 3
Class
2 vs. 3
1.00
0.01
u1
.90*
.10
.90
u2
.80
.20
.90
16.00
0.44
0.03
u3
.90
.40
.50
13.50
9.00
0.67
u4
.80
.10
.20
36.00
16.00
0.44
u5
.60
.50
.40
1.50
2.25
1.50
81.00**
Item probabilities >.7 or <.3 are bolded to indicate a high degree of class homogeneity.
Odds ratios >5 or <.2 are bolded to indicate a high degree of class separation.
© Masyn (2013)
26
LCA Workshop
EXAMPLE: LONGITUDINAL STUDY OF
AMERICAN YOUTH (LSAY)
• A national longitudinal study funded by the
National Science Foundation(NSF)
• Designed to investigate the development of
students learning and achievement, particularly
related to math, science, and technology and to
examine the relationship of those student
outcomes across middle and high school to
post-secondary education and early career
choices.
• More information can be found out
http://lsay.org/index.html
LCA EXAMPLE: LSAY
© Masyn (2013)
27
LCA Workshop
© Masyn (2013)
28
LCA Workshop
Survey Prompt:
LCA EXAMPLE: LSAY
“Now we would like you to tell us how you feel about math and science. Please
indicate for you feel about each of the following statements.”
• Research Aim:
– Characterize population heterogeneity in math
attitudes (manifest in 9 survey items) using
latent classes of math dispositions.
• Why not state research questions like:
– Are there different profiles of math
dispositions based on the math attitude
items?
– How many profiles are there?
– What are the profiles?
© Masyn (2013)
29
LCA Workshop
Usevariables = ca28ar ca28br ca28cr ca28er ca28gr
ca28hr ca28ir ca28kr ca28lr;
f
rf
1) I enjoy math.
1784
.67
2) I am good at math.
1850
.69
3) I usually understand what we are doing in math.
2020
.76
4) Doing math often makes me nervous or upset.
1546
.59
5) I often get scared when I open my math book see a page of
problems.
1821
.69
6) Math is useful in everyday problems.
1835
.70
7) Math helps a person think logically.
1686
.64
8) It is important to know math to get a good job.
1947
.74
9) I will use math in many ways as an adult.
1858
.70
© Masyn (2013)
30
%c#1%
[ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1
ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ];
Analysis:
type=mixture;
starts=500 100;
processors=4;
%c#2%
[ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1
ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ];
.
.
.
%c#5%
[ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1
ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ];
Model:
Next slide
- 31 -
(nT =
LCA Workshop
Model:
%overall%
[ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1
ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ];
CATEGORICAL = ca28ar ca28br ca28cr ca28er ca28gr
ca28hr ca28ir ca28kr ca28lr;
missing=all(9999);
classes= c(5);
© Masyn (2013)
Total sample
2675)
LCA Workshop
© Masyn (2013)
32
Note:
With categorical
indicators, the
following model
statement would
produce the same
result!
Model:
LCA Workshop
LCA EXAMPLE: LSAY Two-Tailed
LCA EXAMPLE: LSAY
Estimate
S.E.
Est./S.E.
P-Value
-2.122
-2.539
-3.081
-1.791
-15.000
-2.498
-1.839
-2.876
-2.723
0.185
0.242
0.291
0.371
0.000
0.262
0.188
0.324
0.310
-11.442
-10.514
-10.577
-4.825
999.000
-9.533
-9.781
-8.866
-8.775
0.000
0.000
0.000
0.000
999.000
0.000
0.000
0.000
0.000
RESULTS IN PROBABILITY SCALE
Latent Class 1
CA28AR
Category 1
0.107
Category 2
0.893
0.018
0.018
6.039
50.392
0.000
0.000
Latent Class 1
Thresholds
CA28AR$1
CA28BR$1
CA28CR$1
CA28ER$1
CA28GR$1
CA28HR$1
CA28IR$1
CA28KR$1
CA28LR$1
݁ ଶǤଵଶଶ ൌ ͲǤͺͻ͵
1-Pro-math without anxiety; 2-Pro-math with anxiety; 3- Math Lover;
4- I don’t like math but I know it’s good for me; 5- Anti-Math with anxiety
© Masyn (2013)
34
LCA Workshop
LCA EXAMPLE: LSAY
FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES
BASED ON THE ESTIMATED MODEL
Latent
Classes
1
2
3
4
5
© Masyn (2013)
525.13598
173.96909
244.13155
254.57820
140.18517
LCA MODEL BUILDING
0.39248
0.13002
0.18246
0.19027
0.10477
35
LCA Workshop
© Masyn (2013)
36
LCA Workshop
MIXTURE MODEL BUILDING STEPS
1. Data screening and descriptives.
2. Class enumeration process.
3. Select final unconditional model (this is
your measurement model).
4. Add potential predictors (and check for
measurement invariance).
5. Add potential distal outcomes.
© Masyn (2013)
37
LCA Workshop
DIRECT VS. INDIRECT APPLICATION
Is the “Truth” a
heterogeneous
population composed of
a mixture of two
normally-distributed
homogeneous
subpopulations?
Is the “Truth” a single,
non-normally-distributed
homogeneous
population?
y
c
© Masyn (2013)
39
LCA Workshop
DIRECT AND INDIRECT
APPLICATIONS
© Masyn (2013)
38
LCA Workshop
DIRECT APPLICATIONS OF MIXTURE
MODELING
• Mixture models are used with the a priori
assumption that the overall population is
heterogeneous, and made up of a finite
number of (latent and substantively
meaningful) homogeneous groups or
subpopulations, usually specified to have
tractable distributions of indicators within
groups, such as a multivariate normal
distribution.
© Masyn (2013)
40
LCA Workshop
INDIRECT APPLICATIONS OF MIXTURE
MODELING
• It is assumed that the overall population is
homogeneous and finite mixtures are simply
used as more tractable, semi-parametric
technique for modeling a population of
outcomes for which it may not be possible
(practically- or analytically-speaking) to
specify a parametric model.
• The focus for indirect applications is then not
on the resultant mixture components nor their
interpretation, but rather on the overall
population distribution approximated by the
mixing.
© Masyn (2013)
41
LCA Workshop
MODEL ESTIMATION
© Masyn (2013)
42
LCA Workshop
ML ESTIMATION FOR LCA
• c is treated as missing data under MAR.
• MAR assumes that the probabilities of
values being missing are independent of
the missing values conditional on those
values that are observed (both u and x).
(Little and Rubin, 2002)
© Masyn (2013)
43
LCA Workshop
• Basic principle of ML: Choose estimates
of the model parameters whose values, if
true, would maximize the probability of
observing what had, in fact, been
observed.
• This requires an expression that describes
the distribution of the data as a function of
the unknown parameters, i.e., the
likelihood function.
© Masyn (2013)
44
LCA Workshop
• Under MAR, the ML estimates for the
complete data may be obtained by
maximizing the likelihood function
summed over all possible values of the
missing data, i.e., integrate out the
missingness.
• Often, this integrated likelihood cannot
be maximized analytically and requires
an iterative estimation procedure, e.g.,
EM.
© Masyn (2013)
45
LCA Workshop
ML ESTIMATION VIA EM ALGORITHM
E(xpectation) step: c is treated as missing data. Missing
values ci are replaced by the conditional means of ci
given the yi’s. These means are the posterior
probabilities for each class.
M(aximization) step: New estimates of the parameters
are obtained from the maximization based on the
estimated complete-data. Pr(yj|c=k) and Pr(c=k)
parameters are estimated by regression and
summation over the posterior probabilities.
• Missing data is allowed on the y’s as well, assuming MAR.
• Standard errors are obtained using some approximation to the
Fisher information matrix. (In Mplus, “ML” default for no
missing data on the y’s; “MLR” for missing data on indicators).
© Masyn (2013)
47
LCA Workshop
THE EM ALGORITHM
• How does it work?
– Start with random split of people into classes.
– Reclassify based on a improvement criterion
– Reclassify until the “best” classification of
people is found.
• The EM algorithm is a missing data
technique. In this application, the latent class
variable is the missing data– and it happens
to be missing for the entire data set.
© Masyn (2013)
46
LCA Workshop
THE CHALLENGES OF ML VIA EM
• MLE for mixture models can present
statistical and numeric challenges that
must be addressed during the application
of mixture modeling:
– The estimation may fail to converge even if
the model is theoretically identified.
– If the estimation algorithm does converge,
since the log likelihood surface for mixtures is
often multimodal, there is no way to prove the
solution is a global rather than local
maximum.
© Masyn (2013)
48
LCA Workshop
How would you distinguish
between these two cases?
© Masyn (2013)
49
LCA Workshop
© Masyn (2013)
50
LCA Workshop
MOST IMPORTANTLY:
• Use multiple random sets of starting values with the
estimation algorithm—it is recommended that a
minimum of 50 to 100 sets of extensively, randomly
varied starting values are used (Hipp & Bauer, 2006)
but more may be necessary to observe satisfactory
replication of the best maximum log likelihood value.
• Recommendations for a more thorough investigation
of multiple solutions when there are more than two
classes:
ANALYSIS: STARTS = 50 5;
or with many classes
ANALYSIS: STARTS = 500 10
© Masyn (2013)
51
LCA Workshop
Note: LL replication is neither necessary or sufficient
for a given solution to be the global maximum.
© Masyn (2013)
52
LCA Workshop
And keep track of the following information:
• The number and proportion of sets of random starting values
that converge to proper solution (as failure to consistently
converge can indicate weak identification);
• The number and proportion of replicated maximum likelihood
values for each local and the apparent global solution (as a
high frequency of replication of the apparent global solution
across the sets of random starting values increases
confidence that the “best” solution found is the true maximum
likelihood solution);
• The condition number. It is computed as the ratio of the
smallest to largest eigenvalue of the information matrix
estimate based on the maximum likelihood solution. A low
condition number, less than 10-6, may indicate singularity (or
near singularity) of the information matrix and, hence, model
non-identification (or empirical underidentification)
• The smallest estimated class proportion and estimated class
size among all the latent classes estimated in the model (as a
class proportion near zero can be a sign of class collapsing
and class over-extraction).
© Masyn (2013)
53
LCA Workshop
• This information, when examined collectively,
will assist in tagging models that are nonidentified or not well-identified and whose
maximum likelihoods solutions, if obtained,
are not likely to be stable or trustworthy.
These not well-identified models should be
discarded from further consideration or
mindfully modified in such a way that the
empirical issues surrounding the estimation
for that particular model are resolved without
compromising the theoretical integrity and
substantive foundations of the analytic model.
© Masyn (2013)
54
LCA Workshop
NOW THE HARD PART
• In the majority of applications of mixture
modeling, the number of classes is not
known.
• Even in direct applications, when one
assumes a priori that the population is
heterogeneous, you rarely have specific
hypotheses regarding the exact number or
nature of the subpopulations.
• Thus, in either case (direct or indirect), you
must begin with the model building with an
exploratory class enumeration step.
CLASS ENUMERATION
© Masyn (2013)
55
LCA Workshop
© Masyn (2013)
56
LCA Workshop
• Deciding on the number of classes is often the
most arduous phase of the mixture modeling
process.
• It is labor intensive because it requires
consideration (and, therefore, estimation) of a set
of models with a varying numbers of classes
• It is complicated in that the selection of a “final”
model from the set of models under consideration
requires the examination of a host of fit indices
along with substantive scrutiny and practical
reflection, as there is no single method for
comparing models with differing numbers of latent
classes that is widely accepted as best.
© Masyn (2013)
57
LCA Workshop
EVALUATING THE MODEL
The statistical tools
are divided into three
categories:
1. evaluations of
absolute fit;
2. evaluations of
relative fit;
3. evaluations of
classification.
© Masyn (2013)
Model Usefulness
• Substantive meaningful
and substantively distinct
classes (face + content
validity)
• Cross-validation in
second sample (or split
sample)
• Parsimony principle
• Criterion-related validity
58
LCA Workshop
CLASS ENUMERATION PROCESS FOR LCA
• Fit models for K=1, 2, 3, increasing K
until the models become not wellidentified.
• Collect fit information on each model using
a combination of statistical tools
• Decide on 1-2 “plausible” models
• Apply broader set of statistical tools to set
of candidate models and evaluate the
model usefulness.
© Masyn (2013)
59
LCA Workshop
FIT INDICES
© Masyn (2013)
- 60 -
LCA Workshop
ABSOLUTE FIT
• There is an overall likelihood ratio model
chi-square goodness-of-fit for mixture
measurement model with only categorical
indicators (using similar formula to the
goodness-of-fit chi-square for contingency
table analyses and log linear models).
© Masyn (2013)
61
LCA Workshop
• “Inspection” = Look at standardized
residuals evaluating difference between
the observed response pattern
frequencies the model-estimated
frequencies.
© Masyn (2013)
• Analytically-derive distribution of LRTSÆadjusted
VLMR-LRT (Tech11 in Mplus)
Inferential: The most common ML-based inferential
comparison is the likelihood ratio test (LRT) for nested
models (e.g. K=3 vs. K=4 class model).
– Vuong (1989) derived a LRT for model selection based on
the Kullback & Leibler (1951) information criterion. Lo,
Mendel, and Rubin (2001) extended Vuong’s theorem to cover
the LRT for a k-class normal mixture versus a (k+g) class
normal mixture.
Hypothesis testing using the likelihood ratio
H0: k classes
H1: k+1 classes
• Empirically-derive distribution of LRTSÆ
(parametric) Bootstrap LRT (Tech14 in Mplus)
LRTS = -2 [ log L(H0) - log L(H1) ]
When testing a k-class mixture model versus a (k+g)-class
model, the LRTS does not have an asymptotic chi-squared
distribution.
Why?
Regularity conditions are not met: Mixing proportion of zero is on
the boundary of the parameter space and the parameters under
the null model are not identifiable.
© Masyn (2013)
63
LCA Workshop
SOLUTIONS?
RELATIVE FIT
1.
- 62 -
LCA Workshop
NOTE: For both Tech11 and Tech14, Mplus computes the
LRT for your K-class model compared to a model with one
less class (i.e., K-1 class model as the Null). Make sure
the H0 loglikelihood value given in Tech11/Tech14
matches the best LL solution you obtained in your
own K-1 class run.
© Masyn (2013)
- 64 -
LCA Workshop
2. Information-heuristic criteria: These indices
weigh the fit of the model (as captured by
the maximum log likelihood value) in
consideration of the model complexity
(recognizing that although one can always
improve the fit of a model by adding
parameters, there is a cost that
improvement in fit to model parsimony).
• These information criteria can be expressed
in the following form:
• Traditional penalty is a function of n and d; n=
sample size, d= number of parameters
© Masyn (2013)
65
LCA Workshop
How much lower does an IC values have to
be to mean the model is really better?
• Bayes Factor: Which model, A or B, is
more likely to be the true model if one of
the two is the true model?
© Masyn (2013)
67
LCA Workshop
INFORMATION CRITERIA
• Bayesian Information Criterion
• Consistent Akaike’s Information Criterion
• Approximate Weight of Evidence Criterion
• For these ICs, lower values indicate a better
model, relatively-speaking. Sometime, a
minimum values if not reached and
scree/”elbow” plots are utilized.
© Masyn (2013)
66
LCA Workshop
• The approximate correct model probability
(cmP) for a Model A is an approximation of
the actual probability of Model A being the
correct model relative to a set of J models
under consideration .
© Masyn (2013)
68
LCA Workshop
CLASSIFICATION QUALITY
© Masyn (2013)
- 69 -
LCA Workshop
CLASSIFICATION QUALITY/CLASS
SEPARATION
• A good mixture model in a direct
application* should yield empirically,
highly-differentiated, well-separated latent
classes whose members have a high
degree of homogeneity in their responses
on the class indicators.
© Masyn (2013)
- 70 -
LCA Workshop
• Most all of the classification diagnostics are
based on estimated posterior class
probabilities.
• Posterior class probabilities are the modelestimated values for each individual’s
probabilities of being in each of the latent
classes based on the maximum likelihood
parameter estimates and the individual’s
observed responses on the indicator
variables (similar estimated factor scores).
*A
well-fitting mixture model can have very
poor class separationÆClassification
quality is not a measure of model fit!
© Masyn (2013)
71
LCA Workshop
© Masyn (2013)
72
LCA Workshop
RELATIVE ENTROPY
• An index that summarizes the overall
precision of classification for the whole
sample across all the latent classes
• When posterior classification is no better than
random guessing, E=0, and when there is
perfect posterior classification for all
individuals in the sample, E=1.
© Masyn (2013)
- 73 -
LCA Workshop
• Since even when E is close to 1.00 there can be a
high degree of latent class assignment error for
particular individuals, and since posterior
classification uncertainty may increase simply by
chance for models with more latent classes, E
was never intended nor should it be used for
model selection during the class enumeration
process. (REMEMBER: A mixture model with low
entropy could still fit the data well.)
• However, values near zero may indicate that the
latent classes are not sufficiently well-separated
for the classes that have been estimated. Thus,
E may be used to identify problematic overextraction of latent classes and may also be used
to judge the utility of the latent class analysis
directly applied to a particular set of indicators to
produce empirically, highly-differentiated groups in
the sample.
© Masyn (2013)
• Average posterior class probability (AvePP),
enables evaluation of the classification uncertainty
for each of the latent classes separately.
• The average posterior class probability for each
class, k, among all individuals whose maximum
posterior class probability is for Class k (i.e.,
individuals modally assigned to Class k).
• Nagin suggests AvePP values >.7 indicate
adequate separation and classification precision.
- 75 -
LCA Workshop
OCC
AVEPP
© Masyn (2013)
- 74 -
LCA Workshop
• The denominator of the odds of correct classification (OCC) is
the odds of correct classification based on random
assignment using the model-estimated marginal class
proportions.
• The numerator is the odds of correct classification based on
the maximum posterior class probability assignment rule (i.e.,
modal class assignment).
• When the modal class assignment for Class k is no better
than chance, then OCC(k)=0.
• As AvePP(k) gets close to one, OCC(k) gets large.
• Nagin suggests OCC(k)>5 indicate adequate separation and
classification precision.
© Masyn (2013)
- 76 -
LCA Workshop
MCAP
• Modal class assignment proportion (mcaP) is
the proportion of individuals in the sample
modally-assigned to Class k.
• If individuals were assigned to Class k with
perfect certainty, then mcaP(k) would be
equal to the model-estimated Pr(c=k). Larger
discrepancies are indicative of larger latent
class assignment errors.
• To gauge the discrepancy, each mcaP can be
compared to the to 95% confidence interval
for the corresponding model-estimated
Pr(c=k).
© Masyn (2013)
- 77 -
LCA Workshop
© Masyn (2013)
- 78 -
LCA Workshop
- 80 -
LCA Workshop
1)
2)
SUMMING IT UP
© Masyn (2013)
- 79 -
LCA Workshop
© Masyn (2013)
4)
3)
5)
© Masyn (2013)
- 81 -
LCA Workshop
© Masyn (2013)
- 82 -
LCA Workshop
- 84 -
LCA Workshop
5b)
5a)
© Masyn (2013)
- 83 -
LCA Workshop
© Masyn (2013)
5d)
5c)
5e)
© Masyn (2013)
- 85 -
LCA Workshop
© Masyn (2013)
- 86 -
LCA Workshop
6)
AND, FINALLY
•7) On the basis of all the comparisons made in
Steps 5 and 6, select the final model in the
class enumeration process.
– Note: You may end up carrying forward two
candidate models into the conditional modeling
stage.
• If you had a large enough sample to do a
split-half cross-validation, now is when you
would look at the validation sample.
© Masyn (2013)
- 87 -
LCA Workshop
© Masyn (2013)
- 88 -
LCA Workshop
LATENT CLASS VALIDATION
• Link the conceptual/theoretical aspects of
the latent class variable with observable
variables
• “[To] make clear what something is”
means to set forth the laws in which it
occurs
• Cronbach & Meehl (1955) termed this
process the nomological (or lawful)
network
LATENT CLASS REGRESSION
(LCR)
© Masyn (2013)
- 89 -
LCA Workshop
LINKAGES { CRITERION-RELATED VALIDITY
© Masyn (2013)
90
COVARIATES AND MIXTURE MODELS
• In criteria-related validity (concurrent and
predictive), we check the performance of our
latent classes against some criterion based
on our theory of the construct represent by
the latent class variable.
– Concurrent: Latent class membership predicted
by or covarying with past or concurrent events
(Latent class regression)
– Predictive: Latent class membership predicting
future concrete events (Latent class w/ distal
outcomes).
© Masyn (2013)
91
LCA Workshop
LCA Workshop
u1
u2
u3
u4
u5
Directeffect
RiskFactor
C
IndirectEffect
© Masyn (2013)
92
LCA Workshop
LATENT CLASS REGRESSION
INCLUDING COVARIATES INTO LCA
• Like a MIMIC model in regular CFA/SEM
• Categorical latent variable
• Continuous or categorical covariates with
direct effects on y’s or indirect effects on
y’s through c.
– Indirect effects can also be thought of as
predictors of class membership.
– Direct effects can also be thought of as
differential item functioning.
© Masyn (2013)
93
LCA Workshop
“C ON X” =
MULTINOMIAL REGRESSION
• Multinomial logistic regression is
essentially simultaneous pairs of logistic
regression of the odds in each outcome
category versus a reference/baseline
category.
• Mplus uses the last category/class as the
baseline.
• So for K classes, we have K-1 logit
equations.
© Masyn (2013)
95
LCA Workshop
• The inclusion of covariates into mixture
models
– Allow us to explore relationships of mixture
classes and auxiliary information.
– Understand how different classes relate to risk
and protective factors
– Explore differences in demographics across
the classes
© Masyn (2013)
94
LCA Workshop
• We model the following: Given
membership in either Class k or K, what is
the log odds that class membership is k
(instead of K), given x? That is,
© Masyn (2013)
96
LCA Workshop
LCA EXAMPLE: LSAY
LSAY EXAMPLE
I enjoy Math
I am good
at math
...
MODEL:
%Overall%
c on male;
C
Male
© Masyn (2013)
I will use
math later
97
LCA Workshop
EXAMPLE: LSAY WITH COVARIATE
Categorical
C#1
FEMALE
C#2
FEMALE
C#3
FEMALE
C#4
FEMALE
Latent Variables*
ON
0.320
ON
-0.343
ON
0.485
ON
0.865
1-Pro-math without anxiety, 2-Pro-math with anxiety, 3- Math Lover,
4- I don’t like math but I know it’s good for me, 5- Anti-Math with anxiety
EXAMPLE: LSAY WITH COVARIATE
0.217
1.476
0.140
0.269
-1.274
0.203
0.266
1.823
0.068
ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT
VARIABLE REGRESSION
0.258
3.356
0.001
Parameterization using Reference Class 1
*Class 5 is reference group
There is a statistically significant overall association with gender and math deposition:
- Null Model (no effect of female) vs. Alt. Model (c on female):
, df = 4, p<.001)
- Interpretation of coefficients:
- Given membership in either Class 1 or 5, girls are as likely to be in Class 1 as boys
(p=.14).
- Given membership in either Class 2 or 5, girls are as likely to be in Class 2 as boys
(p=.20).
- Etc.
1-Pro-math without anxiety, 2-Pro-math with anxiety, 3- Math Lover, 4- I don’t like math but I know it’s good for me, 5- Anti-Math with anxiety
Switching the reference group to Class 1:
C#2
ON
FEMALE
C#3
-0.662
0.205
-3.223
0.001
0.165
0.207
0.798
0.425
0.545
0.187
2.916
0.004
-0.320
0.217
-1.476
0.140
ON
FEMALE
C#4
ON
FEMALE
C#5
ON
FEMALE
1-Pro-math without anxiety, 2-Pro-math with anxiety, 3- Math Lover, 4- I don’t like math but I know it’s good for me, 5- Anti-Math with anxiety
© Masyn (2013)
100
LCA Workshop
EXAMPLE: LSAY WITH COVARIATE
“1-STEP” APPROACH FOR
LATENT CLASS PREDICTORS
© Masyn (2013)
101
LCA Workshop
© Masyn (2013)
- 102 -
WHY NOT ADD CLASS-VARYING
DIRECT EFFECTS?
LCR MODELING PROCESS
1. Fit models without covariates first.
2. Decide on the number of classes.
u1
3. Integrate covariate (indirect) effects in a
systematic way. (You can preview covariate, x,
using auxiliary = x (r) or (r3step) option in Variable
command.) Include indirect effects (class
predictors) first with direct effects @0 and then
explore the evidence for direct effects using
modindices.
4. Add direct effects as suggest by modindices but
do not vary across class.
Covariate
NOTE: This is just like MIMIC modeling in SEM
Also NOTE: There are other approaches currently in development for
detection of direct effects and DIF more generally.
103
LCA Workshop
u2
u3
Direct effect:
%overall%
U4 on X;
5. Trim until only significant direct effects remain.
© Masyn (2013)
LCA Workshop
© Masyn (2013)
C
Indirect Effect;
Mplus:
%overall%
C on X;
104
u4
u5
Class-varying Direct Effect:
%c#1%
U4 on X;
%c#2%
U4 on X;
LCA Workshop
“OLD” 3-STEP APPROACH FOR
LATENT CLASS PREDICTORS
© Masyn (2013)
- 105 -
LCA Workshop
• Estimate the LCA model
• Determine each subject’s most likely class
membership (“hard” classify people using modal
class assingment)
• Save the class assignment and use in separate
analysis as observed multinomial outcome to
relate predictors to class membership.
• Problematic: Unless the classification is very good
(high entropy), this gives biased estimates and
biased standard errors for the relationships of
class membership with other variables.
© Masyn (2013)
- 106 -
LCA Workshop
BASIC IDEA
NEW 3-STEP APPROACH FOR
LATENT CLASS PREDICTORS
• The real problem with the classify-analyze
(“old” 3-step approach) is that it ignores the
uncertainly/imprecision in classification.
• Based on the results of the unconditional
LCA, we can compile information about
classification quality that we can then use in a
subsequent model (akin to using a previously
estimated scale reliability to specify the
measurement error variance in an SEM
model).
– The information is summarized in: Logits for the
Classification Probabilities for Most Likely Class
Membership (Row) by Latent Class (Column)
© Masyn (2013)
- 107 -
LCA Workshop
© Masyn (2013)
- 108 -
LCA Workshop
• Average Latent Class Probabilities for Most
Likely Latent Class Membership (Row) by
Latent Class (Column) estimates
Pr(C = j | CMOD = k) for j=1,
,K, k=1,
,K
• Classification Probabilities for the Most Likely
Latent Class Membership (Row) by Latent
Class (Column) estimates
Pr(CMOD = k | C = j) for j=1,
,K, k=1,
,K
1. Estimate the LCA model
2. Create a nominal most likely class variable,
CMOD
3. Use a mixture model for CMOD, C, and X,
where CMOD is the nominal indicator of C
with measurement error rates prefixed at the
misclassification rates of the estimated
model in the step 1 LCA.
The information is summarized in: Logits of Average
Latent Class Probabilities for Most Likely Class
Membership (Row) by Latent Class (Column)
• How do you get from one quantity to the
others? Bayes' Theorem:
To do this in Mplus for X, use auxiliary = X
(r3step) option in Variable command.
© Masyn (2013)
- 109 -
LCA Workshop
© Masyn (2013)
LCA Workshop
MANUAL R3STEP
CMOD
Fixed according to Step 1
misclassification rates
STEP 1:
• Run model with covariate(s) as auxiliary variable.
Include
Estimated
STEP 2:
• Create new input file using
SAVEDATA:
File is step1save.dat;
SAVE=CPROB;
C
DATA:
File is step1save.dat;
VARIABLE:
UseVar = cmod x;
Nominal = cmod;
X
© Masyn (2013)
- 110 -
- 111 -
LCA Workshop
© Masyn (2013)
- 112 -
LCA Workshop
• Use value from the rows of the Logits for
the Classification Probabilities for the
Most Likely Latent Class Membership
(Row) by Latent Class (Column) table in
Step 1 output to fix the class-specific
multinomial intercepts for cmod.
Step 3:
• Specify LCR of “c on x” and run.
© Masyn (2013)
- 113 -
LCA Workshop
DISTAL OUTCOMES AND MIXTURE
MODELS
u1
u2
u3
u4
DISTAL OUTCOMES
© Masyn (2013)
LCA Workshop
AN EVER-GROWING # OF APPROACHES
•
•
•
•
u5
- 114 -
1-step
“Old” 3-step (classify-analyze)
Modified 1-step
Pseudo-class draws
– Auxiliary = z (E);
• New 3-step
C
DistalOutcome
– Auxiliary = z (DU3step) or (DE3step)
– Manual 3-step
• New Bayes’ Theorem approach by Lanza et
al. (2013)
– Auxiliary = z (DCON) or (DCAT)
© Masyn (2013)
115
LCA Workshop
© Masyn (2013)
- 116 -
LCA Workshop
NOT GOOD OR BAD, JUST MAYBE NOT
1-STEP
WHAT YOU WANT
• Also referred to as the “distal-as-indicator”
approach.
• Distal is treated as an additional latent
class indicator if included as endogenous
variable
– This means you latent class variable is now
specified as measured by all the items and
the distals.
– This may be what you intend but, if so, the
distals should be included as indicators from
the get-go.
© Masyn (2013)
- 117 -
LCA Workshop
ALTERNATIVES TO DISTAL-AS-INDICATOR
• Old 3-step has the same problems as it
does for latent class regression
• Modified 1-step fixes all measurement
parameters (e.g., item thresholds) at their
estimated values from the unconditional
model.
© Masyn (2013)
- 119 -
LCA Workshop
• What if you don’t want your distal
outcomes to characterized/measure the
latent class variable?
• All the other existing approaches are an
attempt to keep the distal outcome from
influencing the class formation.
© Masyn (2013)
- 118 -
LCA Workshop
• New 3-step
– Done the same as for the LCR. Mplus will test for
differences in means assuming equal variances
(DE3step) or allowing unequal variances
(DU3step).
– Mplus implementation is limited but you can
always do a manual 3-step in order to analyze
multiple distal outcomes at the same time while
including covariates, potential moderators, etc.
– WARNING: The 3-step approach does not
guarantee that your distal will not influence the
latent class formation. Mplus checks for this
now—you have to check yourself if using manual
3-step.
© Masyn (2013)
- 120 -
LCA Workshop
AUXILIARY = Z (DCON/DCAT)
• Based on clever application of Bayes’ Theorem by
Lanza et al. (2013)
• Basic idea: Regress C on Z to obtain Pr(C|Z) and
Pr(C), estimate the density function of Z for Pr(Z)
and then apply Bayes’ Theorem to get Pr(Z|C).
• This technique does better w.r.t. not allowing Z to
influence class formation, but is very limited w.r.t.
to the structural models that can be specified (e.g.,
one distal at a time, must assume distal
independent of covariates, etc.)
© Masyn (2013)
- 121 -
LCA Workshop
MIXTURE MODEL BUILDING STEPS
1. Data screening (and unconditional, saturated
non-mixture model if applicable)
2. Class enumeration process (without covariates)
a) Enumeration (within each 6k structure if applicable)
b) Comparisons of most plausible models from (a).
NOTE: You may end up going through this step multiple
times as you may realize to need to modify or reconsider
your set of class indicators.
3. Select final unconditional model.
4. Add potential predictors; Consider both prediction
of class membership and also possibly
measurement non-invariance/DIF
5. Conditional mixture model with distal outcomes:
Add potential distal outcomes of class
membership.
© Masyn (2013)
122
LCA Workshop
PREDICTORS AND DISTALS
= LC MEDIATION!
MODELING EXTENSIONS
© Masyn (2013)
- 123 -
LCA Workshop
© Masyn (2013)
124
LCA Workshop
REGRESSION MIXTURE MODELS
HIGHER-ORDER LATENT CLASS
C1
C2
C3
C
© Masyn (2013)
125
LCA Workshop
© Masyn (2013)
MULTIPLE GROUP LCA
(USES KNOWNCLASS OPTION)
126
LCA Workshop
MULTILEVEL LCA
C1
CG
© Masyn (2013)
127
LCA Workshop
© Masyn (2013)
128
LCA Workshop
GENERAL FACTOR MIXTURE MODEL
SPECIFIC FACTOR MIXTURE MODEL
f1
f2
f3
C
© Masyn (2013)
129
LCA Workshop
© Masyn (2013)
130
LCA Workshop
MANY OTHER EXTENSIONS
• Latent class causal models
– Complier average causal effects
– Latent class causal mediation models
– Causal effects of latent class membership
•
•
•
•
•
Mixture IRT
Pattern mixture models for missing data
Etc.
Etc.
Etc.
© Masyn (2013)
- 131 -
LCA Workshop
LONGITUDINAL MIXTURE MODELS
© Masyn (2013)
- 132 -
LCA Workshop
LONGITUDINAL LCA (LLCA) / RMLCA
LONGITUDINAL LCA
• Use latent class variable to characterize
longitudinal response patterns.
• The EXACT same modeling process as for
LCA/LPA!
• The EXACT same syntax in Mplus.
– The only differences is that in your data, u1uM or y1-yM are single variables measured at
multiple time points rather than multiple
measures at single time point.
© Masyn (2013)
- 133 -
LCA Workshop
© Masyn (2013)
- 134 -
LCA Workshop
GENERAL GROWTH MIXTURE MODEL (GGMM)
GROWTH MIXTURE MODELS
Y1
Y2
Y3
K0
Y4
K1
u
c
x
© Masyn (2013)
- 135 -
LCA Workshop
z
AGGRESSION DEVELOPMENT:
CONTROL AND INTERVENTION GROUPS
LATENT TRANSITION ANALYSIS (LTA)
• Begin with LCA/LPA models for each time
point separately. Use the same exact
modeling process as for a single crosssectional LCA/LPA.
• Bring the latent class variables together in a
single model. Watch for label switching and
actual changes in measurement model
parameters at each wave with all time points
in same model.
Time 2
Time 1
© Masyn (2013)
C2=1
C2=2
C2=3
C1=1
Pr(1Æ1)
Pr(1Æ2)
Pr(1Æ3)
C1=2
Pr(2Æ1)
Pr(2Æ2)
Pr(2Æ3)
C1=3
Pr(3Æ1)
Pr(3Æ2)
Pr(3Æ3)
- 139 -
LTA
– There is a LTA 3-Step. See NEW Webnote 15 for
more information
• Bring in covariates and distal outcomes using
same approaches as for LCA/LPA.
LCA Workshop
© Masyn (2013)
- 140 -
LCA Workshop
LTA with predictors that
influence not only class
membership at each time point
but the transitions as well.
Here’s how you have to
specify that in Mplus.
You can rearrange results to
address questions posed by
model above.
© Masyn (2013)
- 141 -
LCA Workshop
MANY OTHER LONGITUDINAL
MIXTURE MODELS
•
•
•
•
•
•
•
•
Survival mixture models
Latent change score mixture models
Onset-to-growth mixture models
Associative LTA
Latent transition growth mixture models
Etc.
Etc.
Etc.
© Masyn (2013)
- 142 -
LCA Workshop
MIXTURE MODELS:
LAUDED BY SOME
• Theoretical models that conceptualize individual
differences at the latent level as differences in
kind, that consider typologies or taxonomies, map
directly onto analytic latent class models.
• Mixture models give us a great deal of flexibility in
terms of how we characterize population
heterogeneity and individual differences with
respect to a latent phenomenon.
• Can help avoid serious distortions that can results
from ignoring population heterogeneity if it is,
indeed, present.
PARTING WORDS
© Masyn (2013)
- 143 -
LCA Workshop
© Masyn (2013)
144
LCA Workshop
MIXTURE MODELS:
IMPUGNED BY OTHERS
• Latent classes or mixtures may not reflect
the Truth.
• Nominalistic fallacy: Naming the latent
classes does not necessarily make them
what we call them or ensure that we
understand them.
• Reification: Just because the model yield
latent classes doesn’t mean the latent
classes are real or that we’ve done
anything to prove their existence.
© Masyn (2013)
145
LCA Workshop
• The empirically extracted latent classes
depend upon the within- and betweenclass model specification and the joint
distribution of the indicators. Thus, the
resultant classes may diverge markedly
from the underlying “True” latent structure
in the population.
• Do these criticisms sound familiar? They
are nearly identical to the critique of path
analysis and SEM in the second half of
the 20th century because some of the
same bad modeling practices have
reappeared:
– “Nobody pays much attention to the
assumptions, and the technology tends to
overwhelm common sense.” (Friedman, 1987)
© Masyn (2013)
146
LCA Workshop
DON’T CUT OFF YOUR LATENT CLASSES
TO SPITE YOUR MODEL
• Any model is, at best, an approximation to reality.
• “All models are wrong, but some are useful”. (George
Box)
• We can evaluate model-theory consistency.
• We can evaluate model-data consistency.
• There are many alternative ways of thinking about
relationships in a variable system and if mixture
modeling can be useful in empirically distinguishing
between or among alternative perspectives, then
they provide important information.
© Masyn (2013)
147
LCA Workshop
• Understanding individual differences is
paramount in social and developmental
research.
• The flexibility we gain in the
parameterization of individual differences
using mixtures extends to flexibility in
prediction of those differences and
prediction from those differences.
© Masyn (2013)
148
LCA Workshop
MIXTURE MODEL CARE AND FEEDING
• Be sure to very carefully document your model building and
selection for yourself and reviewers. Be prepared to defend
your modeling choices in the event you get a reviews that is
more skeptical than most about the methodology.
• Resist the temptation to take your discrete representation of
population heterogeneity and claim and interpret and
discuss the resultant classes as if you had established their
existence (e.g., if you fit a three class model and you get a
three class solution, you haven’t proved the existence of
three classes generally nor those three classes specifically).
• In designing studies in which you plan to do LCA/LPA, don’t
formulate hypotheses such as “There will be four classes of
engagement” because the exploratory class enumeration
process doesn’t actually test K=4 versus Kz4. This also
makes it impossible to compute power.
© Masyn (2013)
- 149 -
• Don’t be afraid to do some sensitivity analyses to
understand the hierarchy of influence in your variable
system and the vulnerability of your latent class formations
to small shifts in that system.
• Don’t check your common sense and broader modeling
skills at the door when embarking on LCA/LPA. There are
some modeling best-practices that translate extremely well
to the LCA setting.
• Don’t get so overwhelmed with all the fit indices, etc. that
you forget to fully evaluate the substantive utility and
meaning in the resultant classes.
• Don’t be so dazzled by your own results that you aren’t able
to effective and critically evaluate them with respect to
validity criteria.
• Don’t fall so deeply in love with mixture modeling that it
becomes your default analytic approach with any
multivariate data.
LCA Workshop
© Masyn (2013)
- 150 -
LCA Workshop
LCA Workshop
© Masyn (2013)
- 152 -
LCA Workshop
QUESTIONS?
THANK YOU!
© Masyn (2013)
- 151 -
• Mplus website
www.statmodel.com
• Latent GOLD website
http://statisticalinnovations.com/products/latentgold.html
• Penn State Methodology Center
SELECT REFERENCES & RESOURCES
http://methodology.psu.edu/
• UCLA Institute for Digital Research & Educ.
https://idre.ucla.edu/stats
For more, see the text and references of: Masyn, K. (2013).
Latent class analysis and finite mixture modeling. In T. D.
Little (Ed.) The Oxford handbook of quantitative methods in
psychology (Vol. 2, pp. 551-611). New York, NY: Oxford
University Press.
© Masyn (2013)
- 153 -
LCA Workshop
Download