Latent Class Analysis (LCA) and Latent Profile Analysis (LPA) Latent Class Analysis (LCA) and Latent Profile Analysis (LPA) can extend growth modeling when these are done as mixture models. We will review both LCA for binary indicators and LPA for continuous indicators. In addition to clarifying the vocabulary for subsequent sections, one outcome of this section will be a time varying covariate which will be a class variable. Conceptual Example: Developing a Typology of Parenting Style. There are many indicators we could use for parenting style. A few of them are: How closely the father and mother supervise or monitor the adolescent How supportive the father and mother are How closely the father and mother control the adolescent’s activities How often the father and mother praise the child. We can have a large number of families for which we have scores on each of these dimensions that are organized by the couple as the unit of analysis. The first five couples might look like the following, if scores were quantitative Couple A B C D E His Her His Her His Her His Her Super- Super- Support Support Control Control Praise Praise vision vision 9 9 9 9 9 9 8 9 3 4 9 9 4 4 9 9 2 3 1 3 8 8 2 3 1 9 1 8 1 9 1 9 6 2 8 1 6 1 9 1 and our data might look like this if scores were binary. Here we have assigned a 1 for any score above the threshold value of 4. Couple A B C D E His Supervision 1 0 0 0 1 Her Supervision 1 0 0 1 0 His Her His Her His Her Support Support Control Control Praise Praise 1 1 0 0 1 1 1 0 1 0 1 0 1 0 1 1 0 1 1 0 1 1 0 0 1 1 1 0 1 0 LPA and LCA attempt to find subsets of families who share similar patterns of responses. Latent Class & Latent Profile Analysis 1 Couple A is high on all dimensions and this is true for both the husband and the wife. Couple B is supportive and praising their adolescent, but do little monitoring—joint permissive. Families conforming to this pattern might have very different adolescent outcomes than families in the first type. The third couple is high only on control. Couples that fit this pattern might be labeled authoritarian. The fourth couple has a mother who scores high on all aspects and a father who is disengaged. The last couple is just the opposite of the fourth couple. If our measures were binary outcomes and we had eight variables as shown in the second data matrix, we would have 28 = 256 possible combinations of zeros and ones. Each of these would be an empirically possible class. With 20 binary indicators we could have 1,048,576 unique combinations. LCA seeks subgroups of observations, i.e., classes that have similar patterns. LCA may to confirm or disconfirm a theory or it can be used in an exploratory way. When the indicators are binary this is called LCA. When the indicators are quantitative this is usually called LPA. This distinction is unimportant when using the Mplus program. These are person centered methods in contrast to factor analysis that is variable centered (Muthén and Muthén, 2002). Latent Class & Latent Profile Analysis 2 Figure 1 A Model for Latent Class Analysis and Latent Profile Analysis We can distinguish four types of variables. Indicators. Yi and Ui are the observed indicators. In this example the Yi indicators are the father’s support, mother’s support, father’s control, mother’s control, etc. o Although not shown in this figure, each of the quantitative indicators may contain measurement errors and some of these may be correlated. o This is an option. This option may be invoked when the classification fails to explain their relationship fully. For example the error in father’s control and mother’s control might be related. o Continuous indicators are represented by the letter Yi .For categorical indicators, Ui is used. Latent Classes. We have a single latent class variable in this figure, but we may have several classes. An important question is how many classes we include. Covariates. Xi is a vector of covariates that predict class membership. In this example we might have X1 be mother’s education and X2 be father’s education. o The Xi variables may be categorical or continuous. o These are distinguished from indicators by the direction of the relationship (arrows). The classification explains the scores on the indicator variables, Yi and Ui. The Xi covariates explain the classification. Latent Class & Latent Profile Analysis 3 Distal Outcomes. There may be one or more outcome variables that depend on the latent class. o Each latent class may have a different likelihood of having an adolescent who leaves home early, goes to college, and so on. These variables may be categorical or continuous. o Note that the figure doesn’t differentiate between the indicators and the distal outcomes. In the estimation the distal outcome is really treated as an indicator. The distinction between indicators and distal outcomes is conceptual and based on time ordering. How does Mplus decide on the loadings of the Yi? It begins with a random split and then tries different combination until an algorithm is optimized. An important and somewhat controversial assumption of LCA is that all of the variance is explained by class membership. That is, each class will differ in their responses, but the members in each class will be homogeneous. All of the variance in the responses is explained by class membership and the residual variances for each latent class is fixed at zero. There is also an assumption that the empirically identified classes actually exist and are meaningful (see, for example, Raudenbush, 2005). These assumptions make the selection of the number of classes very subjective and requiring theoretical or practical justifications beyond the statistical criteria discussed below. Empirical Example—Level of Implementation In many real world evaluations of program effectiveness the level of implementation of the program is difficult to control. When this is the case, the level of implementation may be an important factor in program effectiveness. Can we identify clusters of youth who report experiencing different levels of implementation of a program? We often compare the effectiveness of being in a treatment group compared to being in a control group. In a large study with multiple sites and personnel, some participants in a treatment program may experience a very low level of program implementation. Some may have fairly high implementation except for some aspects of the program. In a study by Flay, et al. (2006) of a Positive Action Program in 20 Hawaii schools, they measured implementation by 10 items answered by the participating elementary school students. These items assessed the youth’s own rating of the level of implementation he or she experienced. The items are Latent Class & Latent Profile Analysis 4 Variable s3ptp1 s3ptp2 s3ptp3 s3ptp4 s3ptp5 s3ptp6 s3ptp7 s3ptp8 s3ptp9 s3ptp12 Label You receive stickers from your teacher for doing po... You receive a word of the week card from your teacher? You put notes in an icu box? Your teacher read notes about you from the icu box? Your teacher read your notes from the icu box? Your class receive a token for meeting your classro... You participate in a positive action assembly? Your class receive a balloon in an assembly for ach... Your class participate in whole school positive act... On how many days most weeks were you taught a posit... For purposes of illustrating LCA, I dichotomized the responses so that a 0 = not implemented and a score of 1 = implemented at some level. The Mplus program for this is straightforward (getting the data into a format that Mplus can read is discussed briefly in the Appendix followed by all key Mplus programs used in this paper): Title: workshop lca2.inp Latent Class Analysis of implementation for year 2 classes Data: File is lcalpa34.dat ; Variable: Names are idnum s3ptp1 s3ptp2 s3ptp3 s3ptp4 s3ptp5 s3ptp6 s3ptp7 s3ptp8 s3ptp9 s3ptp12 s4ptp1 s4ptp2 s4ptp3 s4ptp4 s4ptp5 s4ptp6 s4ptp13 s4ptp7 s4ptp14 s4ptp8 s4ptp9 s4ptp10 s4ptp11 s4ptp12 s3ptp1b s3ptp2b s3ptp3b s3ptp4b s3ptp5b s3ptp6b s3ptp7b s3ptp8b s3ptp9b s3ptp12b s4ptp1b s4ptp2b s4ptp3b s4ptp4b s4ptp5b s4ptp6b s4ptp7b s4ptp8b s4ptp9b s4ptp10b s4ptp11b s4ptp12b s4ptp13b s4ptp14b s3techer room ; Missing are all (-9999) ; Usevariables are s3ptp1b s3ptp2b s3ptp3b s3ptp4b s3ptp5b s3ptp6b s3ptp7b s3ptp8b s3ptp9b s3ptp12b ; Categorical are s3ptp1b s3ptp2b s3ptp3b s3ptp4b s3ptp5b s3ptp6b s3ptp7b s3ptp8b s3ptp9b s3ptp12b ; Classes = c(2) ; Analysis: Latent Class & Latent Profile Analysis 5 Type = Mixture ; Starts = 20 2; processors = 2 ; Output: Stand Tech11 ; Plot: Type = Plot3 ; series = s3ptp1b(1) s3ptp2b(2) s3ptp3b(3) s3ptp4b(4) s3ptp5b(5) s3ptp6b(6) s3ptp7b(7) s3ptp8b(8) s3ptp9b(9) s3ptp12b(10) ; Mplus utilizes a series of sections. The Title: has whatever you want to put and this is simply printed out. The Title: section ends when the program sees the keyword for the second section, Data:. The Data: section tells Mplus where the data set is stored. If you store it in the same folder as the program, you can just say File is xxx, where xxx is the name of the file. This section and subsequent sections and subsections must end with a semi-colon. The next section is Variable: and this describes all the variables. It has subsections. Names are lists the variables in the order they are in the data file. Missing are all (-9999) ; tells Mplus how missing values are coded. The Usevariables are subsection lists variables you will use in the current analysis. Mplus was written to make LCA simple to implement. All we need to do is to enter one subsection under the Variable: section. We have entered Classes = c(2) ;. This says there are two classes in this solution. If we wanted three classes we would say Classes = c(3) ;. The Analysis: section tells Mplus how to estimate the model. We do this by saying Type = Mixture and Starts = 20 2 ;. The term mixture refers to a model that seeks to find subsets of observations that have different distributions than the combined dataset. The default estimation method for mixture models is maximum likelihood with robust standard errors (MLR in Mplus). Mplus generates starting values for each parameter. By default Mplus will generate 10 different sets of starting values for the parameters. Our Start = 20 2 ; means that we are having Mplus generate 20 different sets of starting values. The program estimates this model for each of these and does a full iteration for the two best. With several classes identified it is often necessary to set this to 50 or more starts. Always try a large value for a final solution to make sure your solution yields the same results. Latent Class & Latent Profile Analysis 6 Usually there is a Model: section. This is not needed here because as soon as we said Classes = c(2) ;, Mplus knew to use all the variables as indicators for identifying two classes. But if we had covariates, Xi, that predict the class membership, e. g., age and gender, we would need a Model: section. There we could specify these additional relationships and could even have different covariates influence the likelihood of each of the classes. The Output: section has Tech11 that gives us technical output including the Lo-Mendell-Rubin adjusted likelihood ratio test of the hypothesis that a simpler model does as well. This is a somewhat controversial test (Jefferies, 2003). The Plot: section creates a graph that is helpful for comparing classes to see where they differ. Annotated Selected Output. Mplus documentation provides limited examples of the interpretation of the rich output it provides. The output labeling is sometimes less than ideal. Therefore, we will provide a relatively detailed annotation of selected output from the LCA. These descriptions are useful when we get to growth modeling. We have two classes but they are all in one group. We could for instance do this simultaneously for boys and girls and treat them as two groups, each with two possible classes. Notice all of our indicators are dependent variables. This is consistent with our figures. There is a listing of variables that are binary. If we had a trichotomy, Mplus would pick that up from the data and make appropriate adjustments. When we said classes = c(2) ; the name of the categorical latent variable became C. We could have used another name. Following are the sample proportions for each item. We coded an answer of not implemented 0 and implemented 1. Mplus calls these categories 1 and 2, respectively. For the item s3ptp3b we have 21.4% in the category 1, i.e., coded 0 on implementation, and 78.6% in category 2, i.e., coded 1 on implementation. SUMMARY OF CATEGORICAL DATA PROPORTIONS S3PTP1B Category 1 0.214 Category 2 0.786 S3PTP2B Category 1 0.432 Category 2 0.568 S3PTP3B Category 1 0.459 Category 2 0.541 Latent Class & Latent Profile Analysis 7 The following line reporting on how the program terminated is good news, but we need to make sure the minimum for the likelihood function was replicated and check for warnings—THE MODEL ESTIMATION TERMINATED NORMALLY The log likelihood value is hard to interpret by itself, but can be used to compare models. We want to minimize it. I rely on the Sample-Size Adjusted BIC, and Entropy measure for comparing models. Ideally the same number of classes will minimize the AIC and BIC measures and have a strong entropy measure. Mplus reports both the Pearson Chi-Square and the Likelihood Ratio Chi-Square. These are asymptotically equivalent. They lead to the same decision in this case. TESTS OF MODEL FIT Loglikelihood H0 Value -8315.092 H0 Scaling Correction Factor 1.042 for MLR Information Criteria Number of Free Parameters 21 Akaike (AIC) 16672.184 Bayesian (BIC) 16784.666 Sample-Size Adjusted BIC 16717.954 (n* = (n + 2) / 24) Entropy 0.827 Chi-Square Test of Model Fit for the Binary and Ordered Categorical (Ordinal) Outcomes** Pearson Chi-Square Value 2617.543 Degrees of Freedom 977 P-Value 0.0000 Likelihood Ratio Chi-Square Value 1253.581 Degrees of Freedom 977 P-Value 0.0000 ** Of the 13864 cells in the latent class indicator table, 25 were deleted in the calculation of chi-square due to extreme values. Latent Class & Latent Profile Analysis 8 Chi-Square Test for MCAR under the Unrestricted Latent Class Indicator Model Pearson Chi-Square Value 2777.125 Degrees of Freedom 12772 P-Value 1.0000 Likelihood Ratio Chi-Square Value 1039.830 Degrees of Freedom 12772 P-Value 1.0000 Three tables are presented to show the number of observations in each class for our two class solution. The first two utilize posterior probabilities. In the last table (shown here), the 769 children are assigned to class 1 and 797 children are assigned to class 2 when children are assigned to the category that has the highest likelihood for them. CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 769 0.49106 2 797 0.50894 Each person has a probability of being in each class, but the big probabilities should be on the diagonal. Those in class one have an average probability of being in class one of .955. The mean estimated probability of these children being in class two is .045. Average Latent Class Probabilities for Most Likely Latent Class Membership (Row) by Latent Class (Column) 1 1 0.955 2 0.045 Latent Class & Latent Profile Analysis 9 2 0.053 0.947 The $ sign at the end of the variable names that follow indicate that the variables are binary, hence there is one threshold value separating the 0’s from the 1’s on that variable. If the latent class is above that threshold, these children are put in Class 2. If it is below that threshold, these children are put in Class 1. The estimated thresholds are difficult to interpret, but the Est./S.E. is interpreted as a z-score and can be used to test the significance of each indicator. You will need to look up the probability for each z-score. Statistics packages make this simple, e.g., in Stata you would enter display 2*(1-normal(z)), where z is the Est./S.E. Why Mplus output does not do this for us is unknown Estimates S.E. Est./S.E. Thresholds S3PTP1B$1 S3PTP2B$1 S3PTP3B$1 S3PTP4B$1 -2.223 -1.074 -2.312 -1.906 0.134 0.096 0.166 0.169 -16.602 -11.196 -13.941 -11.244 Latent Class 2 Thresholds S3PTP1B$1 S3PTP2B$1 S3PTP3B$1 S3PTP4B$1 -0.714 0.425 1.535 2.125 0.084 0.082 0.140 0.163 -8.531 5.191 10.927 13.013 We can use the probability of being in each category for individual indicators to assign a meaningful label to each class. This is similar to using loadings to identify the latent factors. Previously, we saw that 78.6% of the overall sample was in Category 2 for s3ptp1b (i.e. implemented). For Latent Class 1, 90.2% of individuals picked the Category 2 compared to 67.1% for children in Latent Class 2. Thus, Latent Class 2 has a lower level of implementation on this aspect. Latent Class 1 RESULTS IN PROBABILITY SCALE Latent Class & Latent Profile Analysis 10 S3PTP1B Category Category S3PTP2B Category Category S3PTP3B Category Category Latent Class S3PTP1B Category Category S3PTP2B Category Category S3PTP3B Category Category 1 2 0.098 0.902 0.012 0.012 8.276 76.451 1 2 0.255 0.745 0.018 0.018 13.977 40.933 1 2 0.090 0.910 0.014 0.014 6.627 66.898 1 2 0.329 0.671 0.018 0.018 17.809 36.357 1 2 0.605 0.395 0.020 0.020 30.916 20.219 1 2 0.823 0.177 0.020 0.020 40.158 8.654 2 We can also interpret the odds ratios. A child in Latent Class 1 is 46.84 times as likely to say the program aspect in s3ptp3b was implemented (> 1) as is a child in Latent Class 2. These odds ratios highlight where differences are most pronounced. It is informative that those in Latent Class 1 have greater odds of endorsing every single indicator of implementation and sometimes greatly greater odds. This indicates that Class 1 could be described as high implementation and Class 2 could be described as low implementation. LATENT CLASS Latent Class S3PTP1B Category S3PTP2B Category S3PTP3B ODDS RATIO RESULTS 1 Compared to Latent Class 2 > 1 4.525 0.734 6.165 > 1 4.478 0.570 7.852 Latent Class & Latent Profile Analysis 11 Category > 1 46.844 8.909 5.258 The Results in Probability Scale are extremely useful for understanding how the classes differ on these indicators. The following table is not in the output but generated from the output. The Overall Proportion is from the initial result before the classes were created. The proportions for each class are from the results in probability scale on the output. The differences indicate that if we stick with two classes, we clearly have a low and a high implementation group. The first class with 769 people has a lower proportion checking each item as being implemented at all. Some of the differences are substantial and some are enormous. For example, just 11% of the children in the low implementation class report that their teacher read their notes in the ICU compared to 87% of those children classified as high implementation. Indeed, 7 of the 10 items have the proportion doubled in the high implementation group. Here is the proportion agreeing to each item being implemented Indicator Stickers for PA Word of the week You put notes in the ICU box Teacher read ICU notes about you Teacher read your ICU notes Tokens for meeting goals PA Assembly activities Assembly Balloon for PA Overall Proportion .79 .57 .54 Two Class Solution First Second Class Label Class Label High Low Implementation Implementation .90 .67 .74 .40 .91 .18 .48 .87 .11 .48 .88 .08 .68 .71 .33 .84 .80 .45 .52 .62 .22 Latent Class & Latent Profile Analysis 12 Whole school PA Days/wk taught PA N .62 .89 1566 .73 .93 769 .51 .86 797 We asked for a plot of these probabilities. We can get this from Mplus by clicking Graph (on the top bar). We say we want the Probability of one category and pick Category 2 (remember, Category 1 was disagree and Category 2 was agree that the aspect of the Positive Action Program was implemented). Figure 3 Latent Class Solution with Two Classes The final outcome provides a test of whether a two class solution as shown does significantly better than a one class solution. The Lo-Mendell-Rubin adjusted likelihood ratio test has a computed value of 1998.54. This is statistically significant at the p < .001 level. Thus two classes make a significant improvement in fit over a single class. TECHNICAL 11 OUTPUT Latent Class & Latent Profile Analysis 13 VUONG-LO-MENDELL-RUBIN LIKELIHOOD RATIO TEST FOR 1 (H0) VERSUS 2 CLASSES H0 Loglikelihood Value -9326.708 2 Times the Loglikelihood Difference 2023.233 Difference in the Number of Parameters 11 Mean 10.857 Standard Deviation 14.085 P-Value 0.0000 LO-MENDELL-RUBIN ADJUSTED LRT TEST Value 1998.535 P-Value 0.0000 Deciding on the Number of Classes. There is no compelling statistical answer to this question and the user needs to combine theory, the goals of the study, and the statistical criteria. A potentially serious problem with Latent Class Analysis and its extensions to growth mixture models discussed later in the paper is the danger of over-extracton where multiple classes are identified even though they could result from a chance process (Bauer and Curran, 2002; Muthén, 2002; Nylund, Asparouhov, & Muthén, 2006). This is a special problem when variables are not normally distributed. Mplus provides several criteria: Akaike—AIC = -2*LogLikelihood + 2p. Where p is number of free parameters (15). Smaller is better. Bayesian Information Criterion—BIC = -2*logLikelihood + p*ln(n). Where p is number of free parameters (15), n is sample size (1102). Smaller is better. Sample Size adjusted—Adj BIC = -2*logLikelihood + p[ln((n+2)/24). Smaller is better. Muthén reports that simulation studies indicate this is superior to BIC. Entropy—this is a measure of how clearly distinguishable the classes are based on how distinctly each individual’s estimated class probability is.Lo, Mendell, and Rubin likelihood ratio test—this test uses a special distribution (not chi-square) for estimating the probability. This test is somewhat controversial because it can show a significant need for at least two classes when random data are generated from a single, skewed population. Latent Class & Latent Profile Analysis 14 Here are the results for our analysis when we compare 1 to 5 class solutions. 1 Class 18673 18727 17685 na na AIC BIC Sample Adjusted BIC Entropy Lo, Mendell, Rubin C=1566 N for each class 2 Classes 16672 16785 16718 3 Classes 16440 16611 16510 4 Classes 16296 16526 16389 5 Classes 16258 16548 16376 .827 2v1 Value =1998 p = .0000 C1=769 C2=797 .748 3v2 Value = 251 p = .0000 C1=749 C2=447 C3=370 .684 4v3 Value = 164 p = .0000 C1=384 C2=395 C3=446 C4=341 .683 5v4 Value = 37 p = .1414 C1=432 C2=107 c3=243 C4=448 C5=336 The Lo, Mendell, Rubin test finds that 2 classes do better than a single class, but also that 3 or 4 classes do even better. The Sample Adjusted BIC shows improvement for each additional class, although there are big drops between 1 and 2 classes and between 2 and 3 classes. The 2 class solution shows an even split whereas the 3 class solution has a normative response and then two special classes. The Entropy is good for 2 classes but drops noticeably for 3 or more classes. We saw that the figure for the 2 class solution made a lot of sense with one class clearly high implementation and the other low implementation. The figure for the 3 class solution shows that class 2 is similar to class 1 on 5 indicators (1, 7, 8, 9, and 10), similar to class 3 on 3 indicators (3, 4, and 5), and half way between on the remaining two indicators (2 and 6). Choosing between a two class solution and a three class solution would depend largely on your research goals. Latent Class & Latent Profile Analysis 15 Figure 4 Latent Class Analysis Solution with Three Classes Latent Profile Analysis. In the LCA we collapsed the response options for each item into endorsed versus not endorsed. Actually each item was on a four point scale from 0 to 3 where a 0 represent the aspect of the program was not implemented and 1, 2, and 3 represented the degree to which it was implemented. Mplus makes it possible to perform the equivalent of LCA on continuous variables and it can also be done when there is a mixture of indicators that are continuous and categorical. We will use the continuous measure to do what is most often called LPA. Title: workshop LPA.inp latent profile analysis of implementation for year 3 Data: File is lcalpa34.dat ; Variable:Names are idnum s3ptp1 s3ptp2 s3ptp3 s3ptp4 s3ptp5 s3ptp6 s3ptp7 s3ptp8 s3ptp9 s3ptp12 s4ptp1 s4ptp2 s4ptp3 s4ptp4 s4ptp5 s4ptp6 s4ptp13 s4ptp7 s4ptp14 s4ptp8 s4ptp9 s4ptp10 s4ptp11 s4ptp12 s3ptp1b s3ptp2b s3ptp3b s3ptp4b s3ptp5b s3ptp6b s3ptp7b s3ptp8b s3ptp9b s3ptp12b s4ptp1b s4ptp2b s4ptp3b s4ptp4b s4ptp5b s4ptp6b s4ptp7b s4ptp8b s4ptp9b s4ptp10b s4ptp11b s4ptp12b s4ptp13b s4ptp14b s3techer room ; Missing are all (-9999) ; Usevariables are s3ptp1 s3ptp2 s3ptp3 s3ptp4 s3ptp5 s3ptp6 s3ptp7 s3ptp8 s3ptp9 s3ptp12 ; Classes = c(2) ; Latent Class & Latent Profile Analysis 16 Cluster = s3techer ; Idvariable = idnum ; Analysis: Type = Mixture Complex ; Starts = 40 2; Output: samp Stand Tech11 ; Plot: Type = Plot3 ; series = s3ptp1(1) s3ptp2(2) s3ptp3(3) s3ptp4(4) s3ptp5(5) s3ptp6(6) s3ptp7(7) s3ptp8(8) s3ptp9(9) s3ptp12(10) ; Savedata: File is wave3.dat ; Save = Cprobabilities ; Format is F6.0 ; The Mplus program LPA has no variables that we label as categorical. We also ask for sample statistics that will be provided for the continuous variables (means, variances, covariances, and correlations). We did this under the Output: section by the keyword Sampstat. We’ve added two new features that we could have used with the binary sample. First, we should recognize that there are over 100 teachers for our students and students within each of these classrooms will be more homogenous in their ratings of implementation than they would have if they were sampled independently. We can adjust for the intraclass correlation by two additions to our program. First, under the Variable: section we add the subcommand cluster = s3techer ; where s3techer is the name of the teacher. Secondly, because this acknowledges that our sample is complex rather than a simple random sample, under the Analysis: section we add the keyword Complex. This is all we have to do to get unbiased standard errors for our parameter estimates. We are going to use the results of this LPA in our presentation of growth curves. The implementation class for each child at wave 3 and again at wave 4 will be used as a time varying covariate in our growth curve. Therefore, we need to save a file containing the identification variable, idnum, and the classification. We do this by making two changes in the program. First, under the Variable: section we add the subcommand idvariable = idnum ; where idnum is the name of the identification variable in this dataset and in the dataset we will merge with this created file. Second, we add a new section at the end of the program: Latent Class & Latent Profile Analysis 17 Savedata: File is wave3.dat ; Save = Cprobabilities ; Format is F6.0 ; The Savedata: section needs to have a name of the file we will save. This new file will go to the folder in which we have this particular program file. We need to tell Mplus what to save in this file and it saves the classification when we use the keyword cprobabilities ;. Here it will save the variable we identified as the idvariable and the class each child is coded as 1 and 2 where we have two classes. It also saves several other variables that are listed at the end of the output. The resulting file has variables in the order listed at the end of the output file. One problem is that a missing value appears as an asterisk, *, without a space preceding it: 3. 3. 2. 2. 0. 2. 3. 2. 1. 1. 3. 3. 3. 1. 2. 0. 0. 3. 3. 0. 2. 2. 2. 2. 0. 3. 2. 0. 3. 2. 0. 2. 1. 2. 0. 2. 3. 0. 0. 3. 3. 0. 0. 1. 3. 2. 0. 2. 0. 0. 3. 2. 0. 0. 0. 2. 0. 3. 2. 0. 3. 3. 2. 0. 0. 0. 1. 3. 3. 1. 0. 3. 3. 3. 0. 0. 3.* 0. 0. 0. 0. 0. 3. 0. 3. 0. 0. 0. 3. 3. 1. 0. 3. 1. 2. 0. 1.* 0.* 2. 4. 3. 3. 3. 1. 3. 2. 2. 2702 4976 2488 2763 2368 1195 2747 2777 5295 1668 639 0. 0. 1. 0. 1. 1. 0. 0. 1. 1. 1. 1. 1. 0. 1. 0. 0. 1. 1. 0. 0. 0. 2. 2. 1. 2. 1. 1. 2. 2. 1. 1. 1. 80 45 10 54 63 41 73 1 65 53 55 The last case has scores of 3 2 1 1 0 0 3 . 0 . 639 1 0 1 55. You can see that some editing will be needed to bring this into your standard statistics package such as Stata or SAS. Let’s examine selected output focusing only on differences between LCA and LPA. We get three warnings. The first tells us all the variables are uncorrelated with all other variables. This is intended because we are forcing the latent classes to explain the correlations. If our model does not fit because some items are more or less correlated than can be explained by the classification system, we could add explicit correlations of error terms. The second warning tells us that we have 16 children for whom we have missing values on their cluster. Mplus must drop these observations. The third warning tells us there are 407 people who have a missing value on all of our indicators. These are children for whom we have data in at least one wave, but no data for wave 3. Since there are no data on these 407 children for wave 3, these observations are deleted. Latent Class & Latent Profile Analysis 18 *** WARNING in Model command All variables are uncorrelated with all other variables within class. Check that this is what is intended. *** WARNING Data set contains unknown or missing values for GROUPING, PATTERN, COHORT and/or CLUSTER variables. Number of cases with unknown or missing values: 16 *** WARNING Data set contains cases with missing on all variables. These cases were not included in the analysis. Number of cases with missing on all variables: 407 Where we had the proportion checking the two classes when our variables were binary, with continuous variables we get the means. We compare these means to the corresponding means for each class. ESTIMATED SAMPLE STATISTICS Means S3PTP1 ________ 1.745 Means S3PTP6 ________ 1.552 S3PTP2 ________ 1.143 S3PTP3 ________ 1.203 S3PTP4 ________ 1.000 S3PTP5 ________ 0.996 S3PTP7 ________ 1.526 S3PTP8 ________ 0.612 S3PTP9 ________ 1.207 S3PTP12 ________ 2.415 We get a different number of observations in each class using the continuous information than we did when we used binary information. This may reflect the greater sensitivity of the 4-point scale compared to the dichotomy. CLASSIFICATION OF INDIVIDUALS BASED ON THEIR MOST LIKELY LATENT CLASS MEMBERSHIP Class Counts and Proportions Latent Classes 1 1021 0.65871 2 529 0.34129 Latent Class & Latent Profile Analysis 19 Instead of getting the probability of endorsing each item for each class, now we get estimates that are actually the means. For example, children classified as in Latent Class 1 have a mean of 2.24 on the s3ptp4 item and children classified as in Latent Class 2 have a mean of .36 on this item. Although the LPA has a different number of children in each class than we had with the LCA, it is interesting that the LCA probabilities and the LPA means for the two class solution are so consistent. MODEL RESULTS Estimates S.E. Est./S.E. Std StdYX Latent Class 1 Means S3PTP1 S3PTP2 S3PTP3 S3PTP4 2.179 1.797 2.387 2.243 0.047 0.055 0.046 0.053 46.182 32.768 51.463 42.551 2.179 1.797 2.387 2.243 1.916 1.651 2.542 2.851 Latent Class 2 Means S3PTP1 S3PTP2 S3PTP3 S3PTP4 1.517 0.806 0.587 0.357 0.039 0.035 0.041 0.037 38.889 23.132 14.373 9.634 1.517 0.806 0.587 0.357 1.334 0.740 0.625 0.454 We can compare different classes by putting these results in tables we create from the output. Here are the results for two class and three class solutions followed by a summary table of different measures of fit. Latent Class & Latent Profile Analysis 20 Table of Means: Two Class Solution Variable Overall Item Means Stickers for PA Word of the week You put notes in icu box Teacher read ICU notes about you Teacher read your ICU notes Tokens for meeting goals PA Assembly activities Assembly Balloon for PA Whole school PA Days/wk taught PA N 1.74 1.14 1.20 1.00 .99 1.55 1.52 .62 1.21 2.41 1,550 Two Class Solution First Second Class Class 2.18 1.52 1.80 .81 2.39 .59 2.24 .36 2.46 .24 2.16 1.24 1.96 1.30 .93 .45 1.55 1.03 2.78 2.24 1,021 529 Table of Means: Three Class Solution Variable Stickers for PA Word of the week You put notes in icu box Teacher read ICU notes about you Teacher read your ICU notes Tokens for meeting goals PA Assembly activities Assembly Balloon for PA Whole school PA Days/wk taught PA N Latent Class & Latent Profile Analysis Overall Item Means 1.74 1.14 1.20 1.00 Three Class Solution First Second Third Class Class Class 1.82 1.48 2.22 1.06 .80 1.90 1.42 .46 2.52 1.12 .27 2.38 .99 1.55 1.52 .62 1.21 2.41 1550 1.28 1.60 1.38 .67 1.16 2.40 332 .01 1.20 1.33 .41 1.02 2.46 808 2.76 2.22 2.03 .97 1.63 2.80 410 21 Criteria for Assessing Fit for Different Number of Classes AIC BIC Sample Size Adjusted BIC Entropy Lo, Mendell, Rubin Test N for each class 1 Class 48125 48232 48169 2 Classes 44239 44405 44306 3 Classes 43574 43798 43665 4 Classes 42783 43067 42898 na na .930 2v1 Value 3361 p = .001 C1 = 1021 C2 = 529 .929 3v2 Value 679 p = .075 C1 = 332 C2 = 808 C3 = 410 .955 4v3 Value 803 p = .008 C1 = 890 C2 = 146 C3 = 137 C4 = 377 C1 = 1566 These results are somewhat inconsistent. We selected the two class solution for our time varying covariate. This has a good Entropy value, .930, does significantly better than a single class and a three class solution does not do significantly better than a two class solution. Importantly, the two class solution has a normative group of 1,021 children who have higher implementation of every aspect of the Positive Action program than the 529 children in the low implementation class. The Savedata: command yields the following results: SAVEDATA INFORMATION Order and format of variables S3PTP1 F6.0 S3PTP2 F6.0 S3PTP3 F6.0 S3PTP4 F6.0 S3PTP5 F6.0 S3PTP6 F6.0 S3PTP7 F6.0 Latent Class & Latent Profile Analysis 22 S3PTP8 F6.0 S3PTP9 F6.0 S3PTP12 F6.0 IDNUM F6.0 CPROB1 F6.0 CPROB2 F6.0 C F6.0 S3TECHER I3 Save file wave3.dat Save file format 14F6.0 I3 Save file record length 1000 This file contains the score of each observation on each indicator, the identification number, the status of the child for class 1 and class 2, a variable labeled C that is 1 if the child is in Latent Class 1 and 2 if the child is in Latent Class 2, and the cluster variable value. I opened this file in a text editor and dropped all the variables except the identification number and the class, C. I repeated this analysis for wave 4 and did the same. I then merged these two datasets with the data for the growth curve analysis. Latent Class & Latent Profile Analysis 23