distribution copy Dimension Reduction and Latent Class Analysis: A Simple Method for Interpretation of Latent Class Analysis Parameters, with Possible Implications for Factor Analysis of Dichotomous and Ordered-Category Measures John S. Uebersax Department of Public Health Sciences The Bowman Gray School of Medicine of Wake Forest University Winston-Salem, North Carolina 27157-1063 March 1992 (rev. January 1999: Table 3 and Table 4 added) Abstract This note describes a dimension-reduction method for interpreting latent class analysis results. The method also provides a new way to factor analyze dichotomous and orderedcategory variables. The procedure is remarkably simple--it requires little more than probit transformation of latent class parameter estimates and standard eigenanalysis. Implications and possible extensions of the method are discussed. Paper presented at the Annual meeting of the Classification Society of North America, Pittsburgh, June 1993. distribution 1 copy 1. Introduction This paper describes a procedure to help interpret the results of latent class analysis. The method also appears to provide a simple and general method for factor analysis of dichotomous or ordered category items. The procedure is computationally simple, and can be accomplished with routinely available software. Unlike other approaches for factor analysis of ordered category variables, the method does not assume a latent multivariate normal distribution. For readers unfamiliar with LCA, the essential concepts are given below; other readers may wish to proceed directly to Section 2. Goodman (1974), Lazarsfeld & Henry (1968) and McCutcheon (1987) provide good introductions to LCA. LCA is analogous to cluster analysis for categorical data (indeed, recent work, for example, Uebersax 1992, suggests the relationship is even stronger than has been commonly supposed). The goal of LCA is to identify groups (latent classes) of related cases based on observed patterns of traits (manifest variables). Central to LCA is the concept of conditional independence. This stipulates that, for each latent class, manifest variables are statistically independent. With, for example, three manifest variables, the model is expressed as C πijkc = Σ πc πijk|c, c=1 and (1) 2 Uebersax -- Latent Class Analysis Dimensionality πijk|c = πi|c πj|c πk|c, (2) where πijk is the joint probability of observing a case with levels i, j, and k on manifest variables 1, 2, and 3, πc is the prevalence of latent class c (c = 1, ..., C), πijk|c is the joint probability of observing (i, j, k) for a case that belongs to latent class c, and for example, πi|c is the conditional probability (response probability) of observing level i on manifest variable 1 given a member of latent class c. The basic parameters--latent class prevalences and response probabilities--are estimated from observed frequencies of cases cross-classified on the manifest variables. A simple EM algorithm (Dempster, Laird & Rubin, 1977) described by Goodman (1974) provides ML parameter estimates. LCA software, such as MLLSA (Clogg, 1977) and PANMARK (van de Pol, Langeheine & de Jong, 1989) is readily available. 2. Motivation With more than two or three manifest variables, the results of LCA can be hard to interpret. This especially true when the variables have multiple response levels and there are many latent classes. This situation is illustrated in Table 1. Uebersax -- Latent Class Analysis Dimensionality 3 ----------------------Insert Table 1 here ---------------------- Table 1 shows the response probability estimates for a latent class model for eight items of a student survey. The items concern various problem behaviors. The items and their analysis are discussed elsewhere (Uebersax & Hansen, 1993). Here we consider the results only as a typical example. The model summarized has five latent classes. The table shows the conditional probability of each response level (1 = never; 2 = once; 3 = twice or more) for each problem behavior in each latent class. The main thing to appreciate is the large number of parameters to be interpreted. A more condensed way to represent the table's structure would clearly be useful. 3. A Data Reduction Method We now consider a very simple method to reduce such data. The first step probittransforms the response probabilities. Averages of the probit values will then be used to estimate locations of latent classes on latent continuous variables. The latent class locations will then be analyzed by principal components, and the results plotted. Uebersax -- Latent Class Analysis Dimensionality 4 Model and notation We assume C latent classes, and assume R manifest ordered category variables, with, for simplicity, I levels each. Each manifest variable r is assumed a discretized version of latent continuous variable xr. Let βcr denote the location of latent class c on xr; Bcr corresponds to the mean value for members of latent class c on xr (see Figure 1). Due to random variation (measurement error), ----------------------Insert Figure 1 here ----------------------we assume members of latent class c are normally distributed around βcr; measurement error variance is assumed constant across latent classes, and we denote this by sigmaer. Let {τir}, with i = 2, ..., R, denote a series of ordered thresholds on xr; if the value of a case on xr exceeds τirj, the manifest rating will be level i or above. Without loss of generality, we assume sigmaer = 1 for all r. We also "center" the thresholds by assuming Σi=2,I τir = 0 for all r. The assumptions above are plausible and similar to those of common measurement models. Step 1. Probit transformation of response probabilities The first step entails probit transformation of response probabilities. We define Pirc as the probability of observing response level i or above on variable r, for a member of latent class c. That is, Pirc is sum of response probabilities: I 5 Uebersax -- Latent Class Analysis Dimensionality Pirc = Σ πj|cr (3) j=i The probability that a member of latent class c receives rating level i or above on manifest variable r is equal to the probability that the case exceeds threshold i on latent variable xr. Therefore Pirc = 1 - Φ(βcr - τir) = Φ(τir - βcr) (4) where Φ(z) is the cdf for the standard normal curve. It follows that probit(Pirc) = τir - βcr (5) and therefore -probit(Pirc) = βcr - τir (6) where we define probit(p) as the z-score on the standard normal curve to the left of which proportion p of the area falls, easily calculated or obtained from a table. A practical question of how to treat probit values when Pirc is near or equal 1 or 0 occurs, since probit(1) = +inf and probit(0) = -inf. A simple expedient is to place an upper and lower limit on probit(Pirc), for example, +/-2.75 or +/-3.00; this is equivalent to regarding, for example, values of Pirc greater than (less than) .999 (.001) as negligibly different from 1 (0). Step 2. Estimation of latent class locations 6 Uebersax -- Latent Class Analysis Dimensionality We now use the results to estimate latent class location parameters {Bcr}. For a latent class c and variable r, there are I - 1 equations of form (5), one for each threshold tir. For example, with I = 4 response levels, we have βcr - τir = -probit(P2rc) βcr - τir = -probit(P3rc) βcr - τir = -probit(P4rc). Averaging both sides of the equations above gives the result Bcr = -1/3 Σi=2,I probit(Pirc); the τir terms cancel because of the constraint Σiτir = 0. In general, then, Bcr parameters can be estimated as I βcr' = -1/(I - 1) Σ probit(Pirc). (7) I=2 for each combination of c and r. There are potentially better ways to estimate βcr parameters from the probit values. However, the simplicity of the present approach is a considerable advantage. Step 3. Dimensional analysis Application of steps 2 and 3 above to the data in Table 1 produces the results in Table 2. These values represent the locations of the latent classes on eight latent continuous variables. A natural question to ask is What is the dimensionality of these variables? To answer this, one can approach these data just as with any set of scores on continuous Uebersax -- Latent Class Analysis Dimensionality 7 measures, and apply standard factor analysis techniques. Thus, one can calculate the correlations between latent variables as the simple row correlations of Table 2 (an alternative is to weight row elements by the latent class prevalence; see Discussion). The eigenstructure of the resulting correlation matrix can be determined with the usual methods. When unweighted row correlations are calculated for Table 2, and these analyzed by principal components analysis (PCA), the results indicate a two-dimensional solution; the first component accounts for 83.8% of the variance and first two components together account for 97.1%. The strong first component makes sense, and reflects a general association between all of the problem behaviors. But the emergence of the second factor shows that the data are also more complex than a simple 1-dimensional model would imply. Figure 2 illustrates the two-component, varimax-rotated solution. Variables and latent classes are plotted on the same axes. Variables are plotted according to their factor loadings, and latent classes by their factor scores (the values are supplied in Tables 3 and Table 4, respectively). The main point to appreciate is that the configuration portrays differences in latent classes by the differences in their locations relative to two underlying latent factors. For example, latent class 5 is high on both factors, whereas latent class 2 is relatively high on Factor 1, but low on Factor 2. Uebersax -- Latent Class Analysis Dimensionality 8 ------------------------------------------------Insert Table 3, Table4, and Figure 2 here ------------------------------------------------- Factors can be interpreted by considering the loadings of the different variables on the factors, just as with ordinary factor analysis. Figure 2 has lines added that correspond to the variables classroom and parents--these are the variables that correspond most closely to Factors 1 and 2, respectively. 3. Discussion To conclude we consider implications, limitations, extensions, and questions concerning the approach. These are enumerated and briefly discussed as follows: Practical implications This method can clearly help interpret LCA results. More questionable is whether it has advantages over existing binary and ordered-category factor analysis methods--these include factor analysis of tetra-/polychoric correlations, multidimensional item-response models (Bock & Aitkin, 1981; Bock, Gibbons & Muraki, 1988), latent variable models (Bartholomew, 1987; Christoffersson, 1975; Muthen, 1978), and related work by McDonald (1985). Bartholomew (1989) and Mislevy (1986) review this literature. Existing methods mostly assume an underlying multivariate normal trait distribution. Bock and Aitkin (1981) suggest the possible use of their technique with a discrete latent distribution, similar to located latent classes here. If this does work in practice, it would still be much more computation-intensive than the approach proposed here. Muthen Uebersax -- Latent Class Analysis Dimensionality 9 (1989) describes a useful extension of his approach to allow skewed latent distributions, but one may also wish to consider more complex distributions. A fundamental question is How much difference does it make if the multivariate normality assumption is violated? Simulation studies could address this question (perhaps some already have). If distributional assumptions prove important enough for practical concern, then the method here, only slightly harder than factoring tetrachoric correlations (perhaps easier in some cases, since it is guaranteed to produce a positive semi-definite correlation matrix), may be useful. Comparing the method here to existing factor analysis methods, however, may miss the main point. Perhaps its real value is that it allows for the "clustering" and "factoring" of cases with a single analysis, providing more opportunities for data interpretation than either approach alone. Comparing solutions The present method permits easy comparison of different latent class models of the same data. Two or more solutions can be plotted in a common space and compared visually. A possible application would be comparison of different local maximum solutions, or local and global maximum solutions, to assess the importance of their differences. Plotting locations of individual cases The procedure suggests a way to estimate the locations of individual cases in the factor space. Standard LCA methods provide recruitment probabilities, the posterior probabilities of membership in each latent class for each case. A case's location on a Uebersax -- Latent Class Analysis Dimensionality 10 factor can be estimated by summing across classes the product of the case's probability of membership in a class times the class's score on the factor. This would accomplish something similar to grade-of-membership (GOM) analysis (Woodbury & Manton, 1982). Estimation would be simpler, however, since case locations would be estimated a posteriori rather than, as in GOM analysis, simultaneously with other parameters. Once estimated, case locations could be shown on the same axes as the latent class locations and variable loadings. Unweighted or weighted correlations The question whether or not to weight by latent class prevalence when calculating latent variable correlations needs further consideration. Both alternatives are plausible. Factor analysis of unweighted correlations shows the dimensions that differentiate classes-analogous to discriminant analysis. Weighted correlations, however, may be better for understanding the general dimensionality of the population--that is, possibly a better approximation to the results if the latent variables were measured directly as continuous variables. Measurement error Simple PCA of latent class locations, as in the example, does not consider the effect of measurement error on factor structure. This is plausible if the main goal is to understand or represent the structure of the latent classes. However, if the goal is to approximate a factor analysis of the latent continuous variables as measured continuously, one may wish to account for measurement error. Possibly this can be done in a simple algebraic way with the sigmaer parameters and the PCA results. This is a subject for possible further work. Uebersax -- Latent Class Analysis Dimensionality 11 Number of classes and dimensionality The number of latent classes clearly affects the number of resulting dimensions. The number of dimensions must be less than or equal to C-1--this is analogous to discriminant analysis and is not a problem per se. However, if the main concern is to understand the dimensionality of the latent variables, rather than latent class interpretation, it may help to consider a relatively large number of latent classes--say, 7 to 10 for a two or three-factor model--to well-approximate the latent trait distribution. Extreme values The reader may be concerned about the arbitrary selection of an upper and lower limit in the probit transformation step. Study so far suggests that choice of the upper/lower limit is not very important. With the data in Table 2, for example, varying the limits in the range +/-2 to +/-3.5 had little noticeable effect on two-dimensional configurations. Direct parameter estimation A multidimensional located latent class model, including latent class location, variable factor loading, rating threshold, and measurement error parameters, could, in theory, be estimated directly from crossclassified data. Several authors have described ML estimation for such models with one latent factor, using both logistic (Clogg, 1988; Dayton & Macready, 1988; Formann, 1992; Lindsay, Clogg & Grego, 1991; Rost, 1988) and normal ogive (Uebersax, 1993) response functions. Uebersax -- Latent Class Analysis Dimensionality 12 It is not yet clear how easily these approaches extend to the multidimensional case. It appears feasible to estimate parameters for a multidimensional model with fullinformation maximum likelihood, since that is what is done with the Bock & Aitkin (1981) multidimensional IRT model, which, as already noted, is very similar to a located latent class model. An advantage of ML estimation would be statistical model fit testing. The approach here does not include a test of fit, except insofar as the LCA model itself can be tested in the usual way. Perhaps some descriptive approach to model fit can be developed--this would be in keeping with the spirit of the present approach as mainly a data-reduction and exploratory technique. References Bartholomew, D. J. Latent variable models and factor analysis. New York: Oxford University Press, 1987. Bock, R. D., and Aitkin, M. Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 1981, 46, 443-459. Bock, R. D., Gibbons, R., and Muraki, E. Full-information item factor analysis. Applied Psychological Measurement, 1988, 12, 261-280. Christoffersson, A. Factor analysis of dichotomized variables. Psychometrika, 1975, 40, 5-32. Clogg, C. C. Unrestricted and restricted maximum likelihood latent structure analysis: A manual for users. Working paper 1977-09, Pennsylvania State University, Population Uebersax -- Latent Class Analysis Dimensionality 13 Issues Research Center. Clogg, C. C. Latent class models for measuring. In Latent trait and latent class models, eds. R. Langeheine and J. Rost, New York: Plenum, 1988, pp. 173-205. Dayton, C. M. and Macready, G. B. Concomitant-variable latent class models. Journal of the American Statistical Association, 1988, 83, 173-178. Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc., B., 1977, 39, 1-38. Formann, A. K. Linear logistic latent class analysis for polytomous data. Journal of the American Statistical Association, 1992, 87, 476-486. Goodman, L. A. Exploratory latent structure analysis using both identifiable and unidentifiable models. Biometrika, 1974, 61, 215-231. Lazarsfeld, P. F. and Henry, N. W. Latent structure analysis. Boston: Houghton Mifflin, 1968. Lindsay, B., Clogg, C. C., and Grego, J. Semiparametric estimation in the Rasch model and related exponential response models, including a simple latent class model for item analysis. Journal of the American Statistical Association, 1991, 86, 96-107. McCutcheon, A. C. Latent class analysis. Beverly Hills: Sage Publications, 1987. McDonald, R. P. Linear versus non-linear models in item response theory. Applied Uebersax -- Latent Class Analysis Dimensionality 14 Psychological Measurement, 1982, 6, 379-396. McDonald, R. P. Unidimensional and multidimensional models for item response theory. In D. J. Weiss (Ed.), Proceedings of the 1982 Item Response Theory and Computerized Adaptive Testing Conference. Minneapolis: University of Minnesota, 1985. Mislevy, R. J. Recent developments in the factor analysis of categorical variables. Journal of Educational Statistics, 1986, 11, 3-31. Muthen, B. Contributions to factor analysis of dichotomized variables. Psychometrika, 1978, 43, 551-560. Muthen, B. Dichotomous factor analysis of symptom data. Sociological Methods and Research, 1989, 18, 19-65. Pol, F. van de, R. Langeheine, W. de Jong, PANMARK user manual, Netherlands Central Bureau of Statistics, Voorburg, The Netherlands, 1989. Rost, J. Rating scale analysis with latent class models. Psychometrika, 1988, 53, 327-348. Uebersax, J. S. A framework to unify latent class analysis and multivariate mixture estimation. Unpublished manuscript, 1992. Uebersax, J. S. Statistical modeling of expert ratings on medical treatment appropriateness. Journal of the American Statistical Association, 1993, 88 (June). Uebersax, J. S., Hansen, W. G. Latent class analysis of substance use-related problem Uebersax -- Latent Class Analysis Dimensionality behaviors in male high school students. In preparation. Woodbury, M. A., K. G. Manton, "A New Procedure for Analysis of Medical Classification," Methods of Information in Medicine, Vol. 21, pp. 210-220, 1982. 15 16 Uebersax -- Latent Class Analysis Dimensionality Table 1 Parameter Estimates for a Five-Latent Class Model of Eight Problem Behaviors -----------------------------------------------------------------------------------------Conditional probablity of Conditional probablity of Latent response level response level a b class Item 1 2 3 Item 1 2 3 --------------- --------------------------------------------------------1 driving 0.838 0.090 0.072 fight 0.889 0.054 0.057 2 0.692 0.307 0.001 0.594 0.293 0.113 3 0.676 0.119 0.205 0.515 0.402 0.083 4 0.564 0.195 0.240 0.343 0.123 0.534 5 0.209 0.000 0.791 0.099 0.061 0.840 1 2 3 4 5 classroom 0.842 0.266 0.486 0.355 0.058 0.146 0.261 0.267 0.106 0.051 0.011 0.472 0.247 0.538 0.891 parents 0.859 1.000 0.430 0.279 0.086 0.097 0.000 0.538 0.170 0.000 0.044 0.000 0.032 0.551 0.914 1 2 3 4 5 suspend 0.982 0.635 0.674 0.590 0.144 0.017 0.300 0.287 0.206 0.109 0.001 0.065 0.039 0.204 0.747 police 0.943 1.000 0.570 0.374 0.070 0.047 0.000 0.430 0.274 0.051 0.010 0.000 0.000 0.352 0.879 1 property 0.915 0.051 0.034 accident 0.914 0.077 0.009 2 0.661 0.093 0.246 0.801 0.166 0.033 3 0.475 0.390 0.135 0.830 0.170 0.000 4 0.311 0.203 0.486 0.788 0.173 0.038 5 0.047 0.099 0.854 0.180 0.000 0.820 --------------------------------------------------------------------------------------NOTE: For 814 male 11th and 12th grade students. aEstimated prevalences of classes 1--5 as follows: .458, .116, .206, .199, .021. bItems as follows: driving, driving under the influence of alcohol; classroom, sent from classroom for misbehavior; suspend, suspension from school; property, intentionally damaging property; fight, fighting; parents, trouble with parents (parents); Uebersax -- Latent Class Analysis Dimensionality 17 police, trouble with police; accident, automobile accident (accident). 18 Uebersax -- Latent Class Analysis Dimensionality Table 2 Estimated Latent Class Locations Obtained from Data in Table 1 ------------------------------------------------Latent class Variable 1 2 3 4 5 --------- ------------------------------------ driving -1.22 -1.80 -0.64 -0.43 0.81 classroom -1.63 0.28 -0.32 0.24 1.40 suspend -2.59 -0.93 -1.11 -0.53 0.86 property -1.60 -0.55 -0.52 0.23 1.36 fight -1.40 -0.72 -0.71 0.24 1.14 parents -1.39 -2.75 -0.84 0.36 1.37 police -1.95 -2.75 -1.46 -0.03 1.32 accident -1.87 -1.34 -1.85 -1.28 0.92 ------------------------------------------------NOTE: Maximum/minimum probit values set as +/-2.75. 19 Uebersax -- Latent Class Analysis Dimensionality Table 3 Loadings of Eight Problem Behavior Items on First Two Principal Components -------------------------------Principal Component ------------------Item 1 2 -------------------------------driving 0.426 0.890 classroom 0.966 0.244 suspend 0.907 0.408 property 0.836 0.542 fight 0.778 0.616 parents 0.308 0.947 police 0.457 0.886 accident 0.757 0.525 -------------------------------NOTE: Based on principal components analysis of data in Table 2; varimax-rotated solution. 20 Uebersax -- Latent Class Analysis Dimensionality Table 4 Factor Scores for Latent Classes on First Two Principal Components -----------------------------Principal Component Latent ------------------- Class 1 2 -----------------------------1 -1.451 0.059 2 0.700 -1.612 3 -0.391 -0.027 4 0.023 0.515 5 1.119 1.065 -----------------------------NOTE: Based on principal components analysis of data in Table 2; varimax-rotated solution. 21 Uebersax -- Latent Class Analysis Dimensionality 2 5 Principal Component 2 1 4 0 1 3 -1 2 -2 -2 -1 0 1 Principal Component 1 Figure 2 Item loadings (diamonds) and latent class factor scores (triangles), jointly plotted relative to first two principal components, rotated solution. 2