APPLIED LATENT CLASS ANALYSIS: A WORKSHOP Katherine Masyn, Ph.D. Harvard University katherine_masyn@gse.harvard.edu December 5, 2013 Texas Tech University Lubbock, TX OVERVIEW Statistical Modeling in the Mplus Framework The Finite Mixture Model Family Latent Class Analysis (LCA) LCA Example: LSAY LCA Model Building Direct and Indirect Applications Model Estimation Class Enumeration Fit Indices Classification Quality Summing It Up Latent Class Regression (LCR) “1-STEP” APPROACH for Latent Class Predictors “OLD” 3-STEP APPROACH for Latent Class Predictors NEW 3-STEP APPROACH for Latent Class Predictors Distal Outcomes Modeling Extensions Longitudinal Mixture Models Parting Words Questions? Select References & Resources © Masyn (2013) -2- 3 11 16 27 36 38 42 55 60 70 79 89 102 105 107 114 123 132 143 151 153 LCA Workshop MODEL DIAGRAMS Boxes for observed measures Circles for latent variables STATISTICAL MODELING IN THE MPLUS FRAMEWORK Arrow for “causal”/directional relationship Arrow for “noncausal” relationship Arrow, not originating from box or circle, for residual or “unique” variance © Masyn (2013) -3- LCA Workshop © Masyn (2013) -4- LCA Workshop MPLUS MODELING FRAMEWORK K = continuous latent variable; c = categorical latent variable y = continuous observed variable; u = discrete observed variable T =continuous event time; x = observed continuous/categorical covariate y y K K x x T T c c u u WITHIN © Muthén & Muthén (2013) © Masyn (2013) -5- LCA Workshop STATISTICAL CONCEPTS CAPTURED BY LATENT VARIABLES BETWEEN From: Muthén & Muthén, 1998-2013 STATISTICAL MODELS USING LATENT VARIABLES Continuous LVs Categorical LVs Continuous LVs Categorical LVs • • • • • • • • • • • Factor analysis; IRT • Structural equation models • Growth models • Multilevel models • Missing data models • Latent class analysis • Finite mixture models • Discrete-time survival analysis • Missing data models Measurement errors Factors Random effects Frailties, liabilities Variance components Missing data Latent classes Clusters Finite mixtures Missing data Mplus integrates the statistical concepts captured by latent variables into a general modeling framework that includes not only all of the models listed above but also combinations and extensions of these models. © Muthén & Muthén (2013) © Masyn (2013) -7- LCA Workshop © Muthén & Muthén (2013) © Masyn (2013) -8- LCA Workshop MPLUS V7.1* (WWW.STATMODEL.COM) MPLUS BACKGROUND • Inefficient dissemination of statistical methods: Several programs in one – Many good methods contributions from biostatistics, psychometrics, etc. are underutilized in practice – – – – – – – – – – – – – • Fragmented presentation of methods: – Technical descriptions in many different journals – Many different pieces of limited software • Mplus: Integration of methods in one framework – Easy to use: Simple, non-technical language, graphics – Powerful: General modeling capabilities • Mplus versions V1: November 1998 V3: March 2004 V5: November 2007 V6: April 2010 V7: September 2012 V2: February 2001 V4: February 2006 V5.2: November 2008 V6.12: November 2011 V7.1: May 2013 Fully integrated in the general latent variable framework • Mplus team: Linda & Bengt Muthén, Thuy Nguyen, Tihomir Asparouhov, Michelle Conn, Jean Maninger © Muthén & Muthén (2013) © Masyn (2013) -9- Exploratory factor analysis Structural equation modeling Item response theory analysis Latent class analysis Latent transition analysis Mediation analysis Survival analysis Growth modeling Multilevel analysis Complex survey data analysis Monte Carlo simulation Bayesian analysis Multiple imputation * LCA Workshop Released in May 2013 © Muthén & Muthén (2013) © Masyn (2013) - 10 - LCA Workshop FAMILY MEMBERS The finite mixture model family includes: • Cross-sectional: – – – – – – THE FINITE MIXTURE MODEL FAMILY Latent class analysis (LCA) Latent profile analysis (LPA) Latent class cluster analysis (LCCA) Regression mixture models Factor mixture models (FMM) Etc. • Longitudinal: – – – – © Masyn (2013) 11 LCA Workshop Growth mixture models (GMM) Latent transition models (LTA) Survival mixture analysis (SMA) Etc. © Masyn (2013) 12 LCA Workshop LATENT CLASS ANALYSIS – CATEGORICAL LV AND CATEGORICAL MVS LATENT PROFILE ANALYSIS/LATENT CLASS CLUSTER ANALYSIS – CATEGORICAL LV AND CONTINUOUS MVS y y K K x x T c T c u © Masyn (2013) 13 u LCA Workshop © Masyn (2013) 14 LCA Workshop FINITE MIXTURE MODEL LIKELIHOOD • The basic finite mixture model has the following likelihood function: • K is the number of latent classes • is the proportion of the total population belonging to Class k. • is the class-specific density function for the latent class indicator (manifest) variables with class-specific parameters, . © Masyn (2013) 15 LCA Workshop LATENT CLASS ANALYSIS (LCA) © Masyn (2013) 16 LCA Workshop TRADITIONAL LCA u1 u2 u3 • • • • Categorical indicators Categorical latent variable Cross-sectional data Some consider LCA the categorical analogue to factor analysis. • Sometimes referred to as person-centered analysis to stand in contrast to variablecentered analysis such as CFA. • Different from IRT that models categorical variables as indicators of an underlying continuous trait (ability). u4 c © Masyn (2013) 17 LCA Workshop FOR EXAMPLE EXAMPLE DATA • Binary test items as multiple indicators for an underlying 2-level categorical latent variable representing profiles of Mastery and Non-mastery. • DSM-VI symptom checklist (diagnostic criteria) for depression. © Masyn (2013) 19 LCA Workshop 18 © Masyn (2013) LCA Workshop Item 1 Item 2 Item 3 Item 4 Student 1 1 1 1 1 2 0 0 0 0 3 1 0 1 0 4 1 0 0 0 5 0 0 1 0 6 1 1 1 0 7 1 1 1 0 © Masyn (2013) 20 LCA Workshop NAÏVE APPROACH LCA APPROACH • Create a cut-point based on the sum score, e.g., clinical depression if satisfying 5 or more of the 9 symptoms; mastery defined as 80% of items correctly answered. • Problems – Treats all items the same, e.g., doesn’t take into account that some items may be more “difficult” than others – Doesn’t take into account measurement error, e.g., some with Mastery status may still make a careless error. © Masyn (2013) 21 LCA Workshop ITEM PROBABILITY PLOTS • Characterizes groups of individuals based on response patterns for multiple indicators. • Class membership “explains” observed covariation between indicators. • Allows for measurement error in that class-specific item probabilities may be between zero and one. • Allows comparisons of indicator sensitivity and specificity to identify items that best differentiate the classes • Estimates the prevalence of each class in the population • Enables stochastic classification of individuals into classes © Masyn (2013) 22 LCA Workshop MEASUREMENT CHARACTERISTICS • Class homogeneity – Individuals within a given class are similar to each other with respect to item responses, e.g., for binary items, class-specific response probabilities above .70 or below .30 indicate high homogeneity. • Class separation – Individual across two classes are dissimilar with respect to item responses, e.g., for binary items, odds ratios (ORs) of item endorsements between two classes >5 or <.2 indicate high separation. © Masyn (2013) 23 LCA Workshop © Masyn (2013) 24 LCA Workshop ITEM PROBABILITY PLOTS Item * ** © Masyn (2013) 25 LCA Workshop Class Class Class 3 2 1 (70%) (20%) (10%) Class 1 vs. 2 Class 1 vs. 3 Class 2 vs. 3 1.00 0.01 u1 .90* .10 .90 u2 .80 .20 .90 16.00 0.44 0.03 u3 .90 .40 .50 13.50 9.00 0.67 u4 .80 .10 .20 36.00 16.00 0.44 u5 .60 .50 .40 1.50 2.25 1.50 81.00** Item probabilities >.7 or <.3 are bolded to indicate a high degree of class homogeneity. Odds ratios >5 or <.2 are bolded to indicate a high degree of class separation. © Masyn (2013) 26 LCA Workshop EXAMPLE: LONGITUDINAL STUDY OF AMERICAN YOUTH (LSAY) • A national longitudinal study funded by the National Science Foundation(NSF) • Designed to investigate the development of students learning and achievement, particularly related to math, science, and technology and to examine the relationship of those student outcomes across middle and high school to post-secondary education and early career choices. • More information can be found out http://lsay.org/index.html LCA EXAMPLE: LSAY © Masyn (2013) 27 LCA Workshop © Masyn (2013) 28 LCA Workshop Survey Prompt: LCA EXAMPLE: LSAY “Now we would like you to tell us how you feel about math and science. Please indicate for you feel about each of the following statements.” • Research Aim: – Characterize population heterogeneity in math attitudes (manifest in 9 survey items) using latent classes of math dispositions. • Why not state research questions like: – Are there different profiles of math dispositions based on the math attitude items? – How many profiles are there? – What are the profiles? © Masyn (2013) 29 LCA Workshop Usevariables = ca28ar ca28br ca28cr ca28er ca28gr ca28hr ca28ir ca28kr ca28lr; f rf 1) I enjoy math. 1784 .67 2) I am good at math. 1850 .69 3) I usually understand what we are doing in math. 2020 .76 4) Doing math often makes me nervous or upset. 1546 .59 5) I often get scared when I open my math book see a page of problems. 1821 .69 6) Math is useful in everyday problems. 1835 .70 7) Math helps a person think logically. 1686 .64 8) It is important to know math to get a good job. 1947 .74 9) I will use math in many ways as an adult. 1858 .70 © Masyn (2013) 30 %c#1% [ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1 ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ]; Analysis: type=mixture; starts=500 100; processors=4; %c#2% [ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1 ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ]; . . . %c#5% [ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1 ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ]; Model: Next slide - 31 - (nT = LCA Workshop Model: %overall% [ ca28ar$1 ca28br$1 ca28cr$1 ca28er$1 ca28gr$1 ca28hr$1 ca28ir$1 ca28kr$1 ca28lr$1 ]; CATEGORICAL = ca28ar ca28br ca28cr ca28er ca28gr ca28hr ca28ir ca28kr ca28lr; missing=all(9999); classes= c(5); © Masyn (2013) Total sample 2675) LCA Workshop © Masyn (2013) 32 Note: With categorical indicators, the following model statement would produce the same result! Model: LCA Workshop LCA EXAMPLE: LSAY Two-Tailed LCA EXAMPLE: LSAY Estimate S.E. Est./S.E. P-Value -2.122 -2.539 -3.081 -1.791 -15.000 -2.498 -1.839 -2.876 -2.723 0.185 0.242 0.291 0.371 0.000 0.262 0.188 0.324 0.310 -11.442 -10.514 -10.577 -4.825 999.000 -9.533 -9.781 -8.866 -8.775 0.000 0.000 0.000 0.000 999.000 0.000 0.000 0.000 0.000 RESULTS IN PROBABILITY SCALE Latent Class 1 CA28AR Category 1 0.107 Category 2 0.893 0.018 0.018 6.039 50.392 0.000 0.000 Latent Class 1 Thresholds CA28AR$1 CA28BR$1 CA28CR$1 CA28ER$1 CA28GR$1 CA28HR$1 CA28IR$1 CA28KR$1 CA28LR$1 ݁ ଶǤଵଶଶ ൌ ͲǤͺͻ͵ 1-Pro-math without anxiety; 2-Pro-math with anxiety; 3- Math Lover; 4- I don’t like math but I know it’s good for me; 5- Anti-Math with anxiety © Masyn (2013) 34 LCA Workshop LCA EXAMPLE: LSAY FINAL CLASS COUNTS AND PROPORTIONS FOR THE LATENT CLASSES BASED ON THE ESTIMATED MODEL Latent Classes 1 2 3 4 5 © Masyn (2013) 525.13598 173.96909 244.13155 254.57820 140.18517 LCA MODEL BUILDING 0.39248 0.13002 0.18246 0.19027 0.10477 35 LCA Workshop © Masyn (2013) 36 LCA Workshop MIXTURE MODEL BUILDING STEPS 1. Data screening and descriptives. 2. Class enumeration process. 3. Select final unconditional model (this is your measurement model). 4. Add potential predictors (and check for measurement invariance). 5. Add potential distal outcomes. © Masyn (2013) 37 LCA Workshop DIRECT VS. INDIRECT APPLICATION Is the “Truth” a heterogeneous population composed of a mixture of two normally-distributed homogeneous subpopulations? Is the “Truth” a single, non-normally-distributed homogeneous population? y c © Masyn (2013) 39 LCA Workshop DIRECT AND INDIRECT APPLICATIONS © Masyn (2013) 38 LCA Workshop DIRECT APPLICATIONS OF MIXTURE MODELING • Mixture models are used with the a priori assumption that the overall population is heterogeneous, and made up of a finite number of (latent and substantively meaningful) homogeneous groups or subpopulations, usually specified to have tractable distributions of indicators within groups, such as a multivariate normal distribution. © Masyn (2013) 40 LCA Workshop INDIRECT APPLICATIONS OF MIXTURE MODELING • It is assumed that the overall population is homogeneous and finite mixtures are simply used as more tractable, semi-parametric technique for modeling a population of outcomes for which it may not be possible (practically- or analytically-speaking) to specify a parametric model. • The focus for indirect applications is then not on the resultant mixture components nor their interpretation, but rather on the overall population distribution approximated by the mixing. © Masyn (2013) 41 LCA Workshop MODEL ESTIMATION © Masyn (2013) 42 LCA Workshop ML ESTIMATION FOR LCA • c is treated as missing data under MAR. • MAR assumes that the probabilities of values being missing are independent of the missing values conditional on those values that are observed (both u and x). (Little and Rubin, 2002) © Masyn (2013) 43 LCA Workshop • Basic principle of ML: Choose estimates of the model parameters whose values, if true, would maximize the probability of observing what had, in fact, been observed. • This requires an expression that describes the distribution of the data as a function of the unknown parameters, i.e., the likelihood function. © Masyn (2013) 44 LCA Workshop • Under MAR, the ML estimates for the complete data may be obtained by maximizing the likelihood function summed over all possible values of the missing data, i.e., integrate out the missingness. • Often, this integrated likelihood cannot be maximized analytically and requires an iterative estimation procedure, e.g., EM. © Masyn (2013) 45 LCA Workshop ML ESTIMATION VIA EM ALGORITHM E(xpectation) step: c is treated as missing data. Missing values ci are replaced by the conditional means of ci given the yi’s. These means are the posterior probabilities for each class. M(aximization) step: New estimates of the parameters are obtained from the maximization based on the estimated complete-data. Pr(yj|c=k) and Pr(c=k) parameters are estimated by regression and summation over the posterior probabilities. • Missing data is allowed on the y’s as well, assuming MAR. • Standard errors are obtained using some approximation to the Fisher information matrix. (In Mplus, “ML” default for no missing data on the y’s; “MLR” for missing data on indicators). © Masyn (2013) 47 LCA Workshop THE EM ALGORITHM • How does it work? – Start with random split of people into classes. – Reclassify based on a improvement criterion – Reclassify until the “best” classification of people is found. • The EM algorithm is a missing data technique. In this application, the latent class variable is the missing data– and it happens to be missing for the entire data set. © Masyn (2013) 46 LCA Workshop THE CHALLENGES OF ML VIA EM • MLE for mixture models can present statistical and numeric challenges that must be addressed during the application of mixture modeling: – The estimation may fail to converge even if the model is theoretically identified. – If the estimation algorithm does converge, since the log likelihood surface for mixtures is often multimodal, there is no way to prove the solution is a global rather than local maximum. © Masyn (2013) 48 LCA Workshop How would you distinguish between these two cases? © Masyn (2013) 49 LCA Workshop © Masyn (2013) 50 LCA Workshop MOST IMPORTANTLY: • Use multiple random sets of starting values with the estimation algorithm—it is recommended that a minimum of 50 to 100 sets of extensively, randomly varied starting values are used (Hipp & Bauer, 2006) but more may be necessary to observe satisfactory replication of the best maximum log likelihood value. • Recommendations for a more thorough investigation of multiple solutions when there are more than two classes: ANALYSIS: STARTS = 50 5; or with many classes ANALYSIS: STARTS = 500 10 © Masyn (2013) 51 LCA Workshop Note: LL replication is neither necessary or sufficient for a given solution to be the global maximum. © Masyn (2013) 52 LCA Workshop And keep track of the following information: • The number and proportion of sets of random starting values that converge to proper solution (as failure to consistently converge can indicate weak identification); • The number and proportion of replicated maximum likelihood values for each local and the apparent global solution (as a high frequency of replication of the apparent global solution across the sets of random starting values increases confidence that the “best” solution found is the true maximum likelihood solution); • The condition number. It is computed as the ratio of the smallest to largest eigenvalue of the information matrix estimate based on the maximum likelihood solution. A low condition number, less than 10-6, may indicate singularity (or near singularity) of the information matrix and, hence, model non-identification (or empirical underidentification) • The smallest estimated class proportion and estimated class size among all the latent classes estimated in the model (as a class proportion near zero can be a sign of class collapsing and class over-extraction). © Masyn (2013) 53 LCA Workshop • This information, when examined collectively, will assist in tagging models that are nonidentified or not well-identified and whose maximum likelihoods solutions, if obtained, are not likely to be stable or trustworthy. These not well-identified models should be discarded from further consideration or mindfully modified in such a way that the empirical issues surrounding the estimation for that particular model are resolved without compromising the theoretical integrity and substantive foundations of the analytic model. © Masyn (2013) 54 LCA Workshop NOW THE HARD PART • In the majority of applications of mixture modeling, the number of classes is not known. • Even in direct applications, when one assumes a priori that the population is heterogeneous, you rarely have specific hypotheses regarding the exact number or nature of the subpopulations. • Thus, in either case (direct or indirect), you must begin with the model building with an exploratory class enumeration step. CLASS ENUMERATION © Masyn (2013) 55 LCA Workshop © Masyn (2013) 56 LCA Workshop • Deciding on the number of classes is often the most arduous phase of the mixture modeling process. • It is labor intensive because it requires consideration (and, therefore, estimation) of a set of models with a varying numbers of classes • It is complicated in that the selection of a “final” model from the set of models under consideration requires the examination of a host of fit indices along with substantive scrutiny and practical reflection, as there is no single method for comparing models with differing numbers of latent classes that is widely accepted as best. © Masyn (2013) 57 LCA Workshop EVALUATING THE MODEL The statistical tools are divided into three categories: 1. evaluations of absolute fit; 2. evaluations of relative fit; 3. evaluations of classification. © Masyn (2013) Model Usefulness • Substantive meaningful and substantively distinct classes (face + content validity) • Cross-validation in second sample (or split sample) • Parsimony principle • Criterion-related validity 58 LCA Workshop CLASS ENUMERATION PROCESS FOR LCA • Fit models for K=1, 2, 3, increasing K until the models become not wellidentified. • Collect fit information on each model using a combination of statistical tools • Decide on 1-2 “plausible” models • Apply broader set of statistical tools to set of candidate models and evaluate the model usefulness. © Masyn (2013) 59 LCA Workshop FIT INDICES © Masyn (2013) - 60 - LCA Workshop ABSOLUTE FIT • There is an overall likelihood ratio model chi-square goodness-of-fit for mixture measurement model with only categorical indicators (using similar formula to the goodness-of-fit chi-square for contingency table analyses and log linear models). © Masyn (2013) 61 LCA Workshop • “Inspection” = Look at standardized residuals evaluating difference between the observed response pattern frequencies the model-estimated frequencies. © Masyn (2013) • Analytically-derive distribution of LRTSÆadjusted VLMR-LRT (Tech11 in Mplus) Inferential: The most common ML-based inferential comparison is the likelihood ratio test (LRT) for nested models (e.g. K=3 vs. K=4 class model). – Vuong (1989) derived a LRT for model selection based on the Kullback & Leibler (1951) information criterion. Lo, Mendel, and Rubin (2001) extended Vuong’s theorem to cover the LRT for a k-class normal mixture versus a (k+g) class normal mixture. Hypothesis testing using the likelihood ratio H0: k classes H1: k+1 classes • Empirically-derive distribution of LRTSÆ (parametric) Bootstrap LRT (Tech14 in Mplus) LRTS = -2 [ log L(H0) - log L(H1) ] When testing a k-class mixture model versus a (k+g)-class model, the LRTS does not have an asymptotic chi-squared distribution. Why? Regularity conditions are not met: Mixing proportion of zero is on the boundary of the parameter space and the parameters under the null model are not identifiable. © Masyn (2013) 63 LCA Workshop SOLUTIONS? RELATIVE FIT 1. - 62 - LCA Workshop NOTE: For both Tech11 and Tech14, Mplus computes the LRT for your K-class model compared to a model with one less class (i.e., K-1 class model as the Null). Make sure the H0 loglikelihood value given in Tech11/Tech14 matches the best LL solution you obtained in your own K-1 class run. © Masyn (2013) - 64 - LCA Workshop 2. Information-heuristic criteria: These indices weigh the fit of the model (as captured by the maximum log likelihood value) in consideration of the model complexity (recognizing that although one can always improve the fit of a model by adding parameters, there is a cost that improvement in fit to model parsimony). • These information criteria can be expressed in the following form: • Traditional penalty is a function of n and d; n= sample size, d= number of parameters © Masyn (2013) 65 LCA Workshop How much lower does an IC values have to be to mean the model is really better? • Bayes Factor: Which model, A or B, is more likely to be the true model if one of the two is the true model? © Masyn (2013) 67 LCA Workshop INFORMATION CRITERIA • Bayesian Information Criterion • Consistent Akaike’s Information Criterion • Approximate Weight of Evidence Criterion • For these ICs, lower values indicate a better model, relatively-speaking. Sometime, a minimum values if not reached and scree/”elbow” plots are utilized. © Masyn (2013) 66 LCA Workshop • The approximate correct model probability (cmP) for a Model A is an approximation of the actual probability of Model A being the correct model relative to a set of J models under consideration . © Masyn (2013) 68 LCA Workshop CLASSIFICATION QUALITY © Masyn (2013) - 69 - LCA Workshop CLASSIFICATION QUALITY/CLASS SEPARATION • A good mixture model in a direct application* should yield empirically, highly-differentiated, well-separated latent classes whose members have a high degree of homogeneity in their responses on the class indicators. © Masyn (2013) - 70 - LCA Workshop • Most all of the classification diagnostics are based on estimated posterior class probabilities. • Posterior class probabilities are the modelestimated values for each individual’s probabilities of being in each of the latent classes based on the maximum likelihood parameter estimates and the individual’s observed responses on the indicator variables (similar estimated factor scores). *A well-fitting mixture model can have very poor class separationÆClassification quality is not a measure of model fit! © Masyn (2013) 71 LCA Workshop © Masyn (2013) 72 LCA Workshop RELATIVE ENTROPY • An index that summarizes the overall precision of classification for the whole sample across all the latent classes • When posterior classification is no better than random guessing, E=0, and when there is perfect posterior classification for all individuals in the sample, E=1. © Masyn (2013) - 73 - LCA Workshop • Since even when E is close to 1.00 there can be a high degree of latent class assignment error for particular individuals, and since posterior classification uncertainty may increase simply by chance for models with more latent classes, E was never intended nor should it be used for model selection during the class enumeration process. (REMEMBER: A mixture model with low entropy could still fit the data well.) • However, values near zero may indicate that the latent classes are not sufficiently well-separated for the classes that have been estimated. Thus, E may be used to identify problematic overextraction of latent classes and may also be used to judge the utility of the latent class analysis directly applied to a particular set of indicators to produce empirically, highly-differentiated groups in the sample. © Masyn (2013) • Average posterior class probability (AvePP), enables evaluation of the classification uncertainty for each of the latent classes separately. • The average posterior class probability for each class, k, among all individuals whose maximum posterior class probability is for Class k (i.e., individuals modally assigned to Class k). • Nagin suggests AvePP values >.7 indicate adequate separation and classification precision. - 75 - LCA Workshop OCC AVEPP © Masyn (2013) - 74 - LCA Workshop • The denominator of the odds of correct classification (OCC) is the odds of correct classification based on random assignment using the model-estimated marginal class proportions. • The numerator is the odds of correct classification based on the maximum posterior class probability assignment rule (i.e., modal class assignment). • When the modal class assignment for Class k is no better than chance, then OCC(k)=0. • As AvePP(k) gets close to one, OCC(k) gets large. • Nagin suggests OCC(k)>5 indicate adequate separation and classification precision. © Masyn (2013) - 76 - LCA Workshop MCAP • Modal class assignment proportion (mcaP) is the proportion of individuals in the sample modally-assigned to Class k. • If individuals were assigned to Class k with perfect certainty, then mcaP(k) would be equal to the model-estimated Pr(c=k). Larger discrepancies are indicative of larger latent class assignment errors. • To gauge the discrepancy, each mcaP can be compared to the to 95% confidence interval for the corresponding model-estimated Pr(c=k). © Masyn (2013) - 77 - LCA Workshop © Masyn (2013) - 78 - LCA Workshop - 80 - LCA Workshop 1) 2) SUMMING IT UP © Masyn (2013) - 79 - LCA Workshop © Masyn (2013) 4) 3) 5) © Masyn (2013) - 81 - LCA Workshop © Masyn (2013) - 82 - LCA Workshop - 84 - LCA Workshop 5b) 5a) © Masyn (2013) - 83 - LCA Workshop © Masyn (2013) 5d) 5c) 5e) © Masyn (2013) - 85 - LCA Workshop © Masyn (2013) - 86 - LCA Workshop 6) AND, FINALLY •7) On the basis of all the comparisons made in Steps 5 and 6, select the final model in the class enumeration process. – Note: You may end up carrying forward two candidate models into the conditional modeling stage. • If you had a large enough sample to do a split-half cross-validation, now is when you would look at the validation sample. © Masyn (2013) - 87 - LCA Workshop © Masyn (2013) - 88 - LCA Workshop LATENT CLASS VALIDATION • Link the conceptual/theoretical aspects of the latent class variable with observable variables • “[To] make clear what something is” means to set forth the laws in which it occurs • Cronbach & Meehl (1955) termed this process the nomological (or lawful) network LATENT CLASS REGRESSION (LCR) © Masyn (2013) - 89 - LCA Workshop LINKAGES { CRITERION-RELATED VALIDITY © Masyn (2013) 90 COVARIATES AND MIXTURE MODELS • In criteria-related validity (concurrent and predictive), we check the performance of our latent classes against some criterion based on our theory of the construct represent by the latent class variable. – Concurrent: Latent class membership predicted by or covarying with past or concurrent events (Latent class regression) – Predictive: Latent class membership predicting future concrete events (Latent class w/ distal outcomes). © Masyn (2013) 91 LCA Workshop LCA Workshop u1 u2 u3 u4 u5 Directeffect RiskFactor C IndirectEffect © Masyn (2013) 92 LCA Workshop LATENT CLASS REGRESSION INCLUDING COVARIATES INTO LCA • Like a MIMIC model in regular CFA/SEM • Categorical latent variable • Continuous or categorical covariates with direct effects on y’s or indirect effects on y’s through c. – Indirect effects can also be thought of as predictors of class membership. – Direct effects can also be thought of as differential item functioning. © Masyn (2013) 93 LCA Workshop “C ON X” = MULTINOMIAL REGRESSION • Multinomial logistic regression is essentially simultaneous pairs of logistic regression of the odds in each outcome category versus a reference/baseline category. • Mplus uses the last category/class as the baseline. • So for K classes, we have K-1 logit equations. © Masyn (2013) 95 LCA Workshop • The inclusion of covariates into mixture models – Allow us to explore relationships of mixture classes and auxiliary information. – Understand how different classes relate to risk and protective factors – Explore differences in demographics across the classes © Masyn (2013) 94 LCA Workshop • We model the following: Given membership in either Class k or K, what is the log odds that class membership is k (instead of K), given x? That is, © Masyn (2013) 96 LCA Workshop LCA EXAMPLE: LSAY LSAY EXAMPLE I enjoy Math I am good at math ... MODEL: %Overall% c on male; C Male © Masyn (2013) I will use math later 97 LCA Workshop EXAMPLE: LSAY WITH COVARIATE Categorical C#1 FEMALE C#2 FEMALE C#3 FEMALE C#4 FEMALE Latent Variables* ON 0.320 ON -0.343 ON 0.485 ON 0.865 1-Pro-math without anxiety, 2-Pro-math with anxiety, 3- Math Lover, 4- I don’t like math but I know it’s good for me, 5- Anti-Math with anxiety EXAMPLE: LSAY WITH COVARIATE 0.217 1.476 0.140 0.269 -1.274 0.203 0.266 1.823 0.068 ALTERNATIVE PARAMETERIZATIONS FOR THE CATEGORICAL LATENT VARIABLE REGRESSION 0.258 3.356 0.001 Parameterization using Reference Class 1 *Class 5 is reference group There is a statistically significant overall association with gender and math deposition: - Null Model (no effect of female) vs. Alt. Model (c on female): , df = 4, p<.001) - Interpretation of coefficients: - Given membership in either Class 1 or 5, girls are as likely to be in Class 1 as boys (p=.14). - Given membership in either Class 2 or 5, girls are as likely to be in Class 2 as boys (p=.20). - Etc. 1-Pro-math without anxiety, 2-Pro-math with anxiety, 3- Math Lover, 4- I don’t like math but I know it’s good for me, 5- Anti-Math with anxiety Switching the reference group to Class 1: C#2 ON FEMALE C#3 -0.662 0.205 -3.223 0.001 0.165 0.207 0.798 0.425 0.545 0.187 2.916 0.004 -0.320 0.217 -1.476 0.140 ON FEMALE C#4 ON FEMALE C#5 ON FEMALE 1-Pro-math without anxiety, 2-Pro-math with anxiety, 3- Math Lover, 4- I don’t like math but I know it’s good for me, 5- Anti-Math with anxiety © Masyn (2013) 100 LCA Workshop EXAMPLE: LSAY WITH COVARIATE “1-STEP” APPROACH FOR LATENT CLASS PREDICTORS © Masyn (2013) 101 LCA Workshop © Masyn (2013) - 102 - WHY NOT ADD CLASS-VARYING DIRECT EFFECTS? LCR MODELING PROCESS 1. Fit models without covariates first. 2. Decide on the number of classes. u1 3. Integrate covariate (indirect) effects in a systematic way. (You can preview covariate, x, using auxiliary = x (r) or (r3step) option in Variable command.) Include indirect effects (class predictors) first with direct effects @0 and then explore the evidence for direct effects using modindices. 4. Add direct effects as suggest by modindices but do not vary across class. Covariate NOTE: This is just like MIMIC modeling in SEM Also NOTE: There are other approaches currently in development for detection of direct effects and DIF more generally. 103 LCA Workshop u2 u3 Direct effect: %overall% U4 on X; 5. Trim until only significant direct effects remain. © Masyn (2013) LCA Workshop © Masyn (2013) C Indirect Effect; Mplus: %overall% C on X; 104 u4 u5 Class-varying Direct Effect: %c#1% U4 on X; %c#2% U4 on X; LCA Workshop “OLD” 3-STEP APPROACH FOR LATENT CLASS PREDICTORS © Masyn (2013) - 105 - LCA Workshop • Estimate the LCA model • Determine each subject’s most likely class membership (“hard” classify people using modal class assingment) • Save the class assignment and use in separate analysis as observed multinomial outcome to relate predictors to class membership. • Problematic: Unless the classification is very good (high entropy), this gives biased estimates and biased standard errors for the relationships of class membership with other variables. © Masyn (2013) - 106 - LCA Workshop BASIC IDEA NEW 3-STEP APPROACH FOR LATENT CLASS PREDICTORS • The real problem with the classify-analyze (“old” 3-step approach) is that it ignores the uncertainly/imprecision in classification. • Based on the results of the unconditional LCA, we can compile information about classification quality that we can then use in a subsequent model (akin to using a previously estimated scale reliability to specify the measurement error variance in an SEM model). – The information is summarized in: Logits for the Classification Probabilities for Most Likely Class Membership (Row) by Latent Class (Column) © Masyn (2013) - 107 - LCA Workshop © Masyn (2013) - 108 - LCA Workshop • Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) estimates Pr(C = j | CMOD = k) for j=1, ,K, k=1, ,K • Classification Probabilities for the Most Likely Latent Class Membership (Row) by Latent Class (Column) estimates Pr(CMOD = k | C = j) for j=1, ,K, k=1, ,K 1. Estimate the LCA model 2. Create a nominal most likely class variable, CMOD 3. Use a mixture model for CMOD, C, and X, where CMOD is the nominal indicator of C with measurement error rates prefixed at the misclassification rates of the estimated model in the step 1 LCA. The information is summarized in: Logits of Average Latent Class Probabilities for Most Likely Class Membership (Row) by Latent Class (Column) • How do you get from one quantity to the others? Bayes' Theorem: To do this in Mplus for X, use auxiliary = X (r3step) option in Variable command. © Masyn (2013) - 109 - LCA Workshop © Masyn (2013) LCA Workshop MANUAL R3STEP CMOD Fixed according to Step 1 misclassification rates STEP 1: • Run model with covariate(s) as auxiliary variable. Include Estimated STEP 2: • Create new input file using SAVEDATA: File is step1save.dat; SAVE=CPROB; C DATA: File is step1save.dat; VARIABLE: UseVar = cmod x; Nominal = cmod; X © Masyn (2013) - 110 - - 111 - LCA Workshop © Masyn (2013) - 112 - LCA Workshop • Use value from the rows of the Logits for the Classification Probabilities for the Most Likely Latent Class Membership (Row) by Latent Class (Column) table in Step 1 output to fix the class-specific multinomial intercepts for cmod. Step 3: • Specify LCR of “c on x” and run. © Masyn (2013) - 113 - LCA Workshop DISTAL OUTCOMES AND MIXTURE MODELS u1 u2 u3 u4 DISTAL OUTCOMES © Masyn (2013) LCA Workshop AN EVER-GROWING # OF APPROACHES • • • • u5 - 114 - 1-step “Old” 3-step (classify-analyze) Modified 1-step Pseudo-class draws – Auxiliary = z (E); • New 3-step C DistalOutcome – Auxiliary = z (DU3step) or (DE3step) – Manual 3-step • New Bayes’ Theorem approach by Lanza et al. (2013) – Auxiliary = z (DCON) or (DCAT) © Masyn (2013) 115 LCA Workshop © Masyn (2013) - 116 - LCA Workshop NOT GOOD OR BAD, JUST MAYBE NOT 1-STEP WHAT YOU WANT • Also referred to as the “distal-as-indicator” approach. • Distal is treated as an additional latent class indicator if included as endogenous variable – This means you latent class variable is now specified as measured by all the items and the distals. – This may be what you intend but, if so, the distals should be included as indicators from the get-go. © Masyn (2013) - 117 - LCA Workshop ALTERNATIVES TO DISTAL-AS-INDICATOR • Old 3-step has the same problems as it does for latent class regression • Modified 1-step fixes all measurement parameters (e.g., item thresholds) at their estimated values from the unconditional model. © Masyn (2013) - 119 - LCA Workshop • What if you don’t want your distal outcomes to characterized/measure the latent class variable? • All the other existing approaches are an attempt to keep the distal outcome from influencing the class formation. © Masyn (2013) - 118 - LCA Workshop • New 3-step – Done the same as for the LCR. Mplus will test for differences in means assuming equal variances (DE3step) or allowing unequal variances (DU3step). – Mplus implementation is limited but you can always do a manual 3-step in order to analyze multiple distal outcomes at the same time while including covariates, potential moderators, etc. – WARNING: The 3-step approach does not guarantee that your distal will not influence the latent class formation. Mplus checks for this now—you have to check yourself if using manual 3-step. © Masyn (2013) - 120 - LCA Workshop AUXILIARY = Z (DCON/DCAT) • Based on clever application of Bayes’ Theorem by Lanza et al. (2013) • Basic idea: Regress C on Z to obtain Pr(C|Z) and Pr(C), estimate the density function of Z for Pr(Z) and then apply Bayes’ Theorem to get Pr(Z|C). • This technique does better w.r.t. not allowing Z to influence class formation, but is very limited w.r.t. to the structural models that can be specified (e.g., one distal at a time, must assume distal independent of covariates, etc.) © Masyn (2013) - 121 - LCA Workshop MIXTURE MODEL BUILDING STEPS 1. Data screening (and unconditional, saturated non-mixture model if applicable) 2. Class enumeration process (without covariates) a) Enumeration (within each 6k structure if applicable) b) Comparisons of most plausible models from (a). NOTE: You may end up going through this step multiple times as you may realize to need to modify or reconsider your set of class indicators. 3. Select final unconditional model. 4. Add potential predictors; Consider both prediction of class membership and also possibly measurement non-invariance/DIF 5. Conditional mixture model with distal outcomes: Add potential distal outcomes of class membership. © Masyn (2013) 122 LCA Workshop PREDICTORS AND DISTALS = LC MEDIATION! MODELING EXTENSIONS © Masyn (2013) - 123 - LCA Workshop © Masyn (2013) 124 LCA Workshop REGRESSION MIXTURE MODELS HIGHER-ORDER LATENT CLASS C1 C2 C3 C © Masyn (2013) 125 LCA Workshop © Masyn (2013) MULTIPLE GROUP LCA (USES KNOWNCLASS OPTION) 126 LCA Workshop MULTILEVEL LCA C1 CG © Masyn (2013) 127 LCA Workshop © Masyn (2013) 128 LCA Workshop GENERAL FACTOR MIXTURE MODEL SPECIFIC FACTOR MIXTURE MODEL f1 f2 f3 C © Masyn (2013) 129 LCA Workshop © Masyn (2013) 130 LCA Workshop MANY OTHER EXTENSIONS • Latent class causal models – Complier average causal effects – Latent class causal mediation models – Causal effects of latent class membership • • • • • Mixture IRT Pattern mixture models for missing data Etc. Etc. Etc. © Masyn (2013) - 131 - LCA Workshop LONGITUDINAL MIXTURE MODELS © Masyn (2013) - 132 - LCA Workshop LONGITUDINAL LCA (LLCA) / RMLCA LONGITUDINAL LCA • Use latent class variable to characterize longitudinal response patterns. • The EXACT same modeling process as for LCA/LPA! • The EXACT same syntax in Mplus. – The only differences is that in your data, u1uM or y1-yM are single variables measured at multiple time points rather than multiple measures at single time point. © Masyn (2013) - 133 - LCA Workshop © Masyn (2013) - 134 - LCA Workshop GENERAL GROWTH MIXTURE MODEL (GGMM) GROWTH MIXTURE MODELS Y1 Y2 Y3 K0 Y4 K1 u c x © Masyn (2013) - 135 - LCA Workshop z AGGRESSION DEVELOPMENT: CONTROL AND INTERVENTION GROUPS LATENT TRANSITION ANALYSIS (LTA) • Begin with LCA/LPA models for each time point separately. Use the same exact modeling process as for a single crosssectional LCA/LPA. • Bring the latent class variables together in a single model. Watch for label switching and actual changes in measurement model parameters at each wave with all time points in same model. Time 2 Time 1 © Masyn (2013) C2=1 C2=2 C2=3 C1=1 Pr(1Æ1) Pr(1Æ2) Pr(1Æ3) C1=2 Pr(2Æ1) Pr(2Æ2) Pr(2Æ3) C1=3 Pr(3Æ1) Pr(3Æ2) Pr(3Æ3) - 139 - LTA – There is a LTA 3-Step. See NEW Webnote 15 for more information • Bring in covariates and distal outcomes using same approaches as for LCA/LPA. LCA Workshop © Masyn (2013) - 140 - LCA Workshop LTA with predictors that influence not only class membership at each time point but the transitions as well. Here’s how you have to specify that in Mplus. You can rearrange results to address questions posed by model above. © Masyn (2013) - 141 - LCA Workshop MANY OTHER LONGITUDINAL MIXTURE MODELS • • • • • • • • Survival mixture models Latent change score mixture models Onset-to-growth mixture models Associative LTA Latent transition growth mixture models Etc. Etc. Etc. © Masyn (2013) - 142 - LCA Workshop MIXTURE MODELS: LAUDED BY SOME • Theoretical models that conceptualize individual differences at the latent level as differences in kind, that consider typologies or taxonomies, map directly onto analytic latent class models. • Mixture models give us a great deal of flexibility in terms of how we characterize population heterogeneity and individual differences with respect to a latent phenomenon. • Can help avoid serious distortions that can results from ignoring population heterogeneity if it is, indeed, present. PARTING WORDS © Masyn (2013) - 143 - LCA Workshop © Masyn (2013) 144 LCA Workshop MIXTURE MODELS: IMPUGNED BY OTHERS • Latent classes or mixtures may not reflect the Truth. • Nominalistic fallacy: Naming the latent classes does not necessarily make them what we call them or ensure that we understand them. • Reification: Just because the model yield latent classes doesn’t mean the latent classes are real or that we’ve done anything to prove their existence. © Masyn (2013) 145 LCA Workshop • The empirically extracted latent classes depend upon the within- and betweenclass model specification and the joint distribution of the indicators. Thus, the resultant classes may diverge markedly from the underlying “True” latent structure in the population. • Do these criticisms sound familiar? They are nearly identical to the critique of path analysis and SEM in the second half of the 20th century because some of the same bad modeling practices have reappeared: – “Nobody pays much attention to the assumptions, and the technology tends to overwhelm common sense.” (Friedman, 1987) © Masyn (2013) 146 LCA Workshop DON’T CUT OFF YOUR LATENT CLASSES TO SPITE YOUR MODEL • Any model is, at best, an approximation to reality. • “All models are wrong, but some are useful”. (George Box) • We can evaluate model-theory consistency. • We can evaluate model-data consistency. • There are many alternative ways of thinking about relationships in a variable system and if mixture modeling can be useful in empirically distinguishing between or among alternative perspectives, then they provide important information. © Masyn (2013) 147 LCA Workshop • Understanding individual differences is paramount in social and developmental research. • The flexibility we gain in the parameterization of individual differences using mixtures extends to flexibility in prediction of those differences and prediction from those differences. © Masyn (2013) 148 LCA Workshop MIXTURE MODEL CARE AND FEEDING • Be sure to very carefully document your model building and selection for yourself and reviewers. Be prepared to defend your modeling choices in the event you get a reviews that is more skeptical than most about the methodology. • Resist the temptation to take your discrete representation of population heterogeneity and claim and interpret and discuss the resultant classes as if you had established their existence (e.g., if you fit a three class model and you get a three class solution, you haven’t proved the existence of three classes generally nor those three classes specifically). • In designing studies in which you plan to do LCA/LPA, don’t formulate hypotheses such as “There will be four classes of engagement” because the exploratory class enumeration process doesn’t actually test K=4 versus Kz4. This also makes it impossible to compute power. © Masyn (2013) - 149 - • Don’t be afraid to do some sensitivity analyses to understand the hierarchy of influence in your variable system and the vulnerability of your latent class formations to small shifts in that system. • Don’t check your common sense and broader modeling skills at the door when embarking on LCA/LPA. There are some modeling best-practices that translate extremely well to the LCA setting. • Don’t get so overwhelmed with all the fit indices, etc. that you forget to fully evaluate the substantive utility and meaning in the resultant classes. • Don’t be so dazzled by your own results that you aren’t able to effective and critically evaluate them with respect to validity criteria. • Don’t fall so deeply in love with mixture modeling that it becomes your default analytic approach with any multivariate data. LCA Workshop © Masyn (2013) - 150 - LCA Workshop LCA Workshop © Masyn (2013) - 152 - LCA Workshop QUESTIONS? THANK YOU! © Masyn (2013) - 151 - • Mplus website www.statmodel.com • Latent GOLD website http://statisticalinnovations.com/products/latentgold.html • Penn State Methodology Center SELECT REFERENCES & RESOURCES http://methodology.psu.edu/ • UCLA Institute for Digital Research & Educ. https://idre.ucla.edu/stats For more, see the text and references of: Masyn, K. (2013). Latent class analysis and finite mixture modeling. In T. D. Little (Ed.) The Oxford handbook of quantitative methods in psychology (Vol. 2, pp. 551-611). New York, NY: Oxford University Press. © Masyn (2013) - 153 - LCA Workshop