John Uebersax

advertisement
distribution copy
Dimension Reduction and Latent Class Analysis: A Simple Method for
Interpretation of Latent Class Analysis Parameters, with Possible Implications for
Factor Analysis of Dichotomous and Ordered-Category Measures
John S. Uebersax
Department of Public Health Sciences
The Bowman Gray School of Medicine of Wake Forest University
Winston-Salem, North Carolina 27157-1063
March 1992
(rev. January 1999: Table 3 and Table 4 added)
Abstract
This note describes a dimension-reduction method for interpreting latent class analysis
results. The method also provides a new way to factor analyze dichotomous and orderedcategory variables. The procedure is remarkably simple--it requires little more than probit
transformation of latent class parameter estimates and standard eigenanalysis.
Implications and possible extensions of the method are discussed.
Paper presented at the Annual meeting of the Classification Society of North America,
Pittsburgh, June 1993.
distribution 1
copy
1. Introduction
This paper describes a procedure to help interpret the results of latent class analysis. The
method also appears to provide a simple and general method for factor analysis of
dichotomous or ordered category items. The procedure is computationally simple, and
can be accomplished with routinely available software. Unlike other approaches for
factor analysis of ordered category variables, the method does not assume a latent
multivariate normal distribution.
For readers unfamiliar with LCA, the essential concepts are given below; other readers
may wish to proceed directly to Section 2. Goodman (1974), Lazarsfeld & Henry (1968)
and McCutcheon (1987) provide good introductions to LCA.
LCA is analogous to cluster analysis for categorical data (indeed, recent work, for
example, Uebersax 1992, suggests the relationship is even stronger than has been
commonly supposed). The goal of LCA is to identify groups (latent classes) of related
cases based on observed patterns of traits (manifest variables).
Central to LCA is the concept of conditional independence. This stipulates that, for each
latent class, manifest variables are statistically independent. With, for example, three
manifest variables, the model is expressed as
C
πijkc
=
Σ πc πijk|c,
c=1
and
(1)
2
Uebersax -- Latent Class Analysis Dimensionality
πijk|c = πi|c πj|c πk|c,
(2)
where
πijk is the joint probability of observing a case with levels i, j, and k on manifest
variables 1, 2, and 3,
πc is the prevalence of latent class c (c = 1, ..., C),
πijk|c is the joint probability of observing (i, j, k) for a case that belongs to latent
class c, and
for example, πi|c is the conditional probability (response probability) of observing
level i on manifest variable 1 given a member of latent class c.
The basic parameters--latent class prevalences and response probabilities--are estimated
from observed frequencies of cases cross-classified on the manifest variables. A simple
EM algorithm (Dempster, Laird & Rubin, 1977) described by Goodman (1974) provides
ML parameter estimates. LCA software, such as MLLSA (Clogg, 1977) and PANMARK
(van de Pol, Langeheine & de Jong, 1989) is readily available.
2. Motivation
With more than two or three manifest variables, the results of LCA can be hard to
interpret. This especially true when the variables have multiple response levels and there
are many latent classes. This situation is illustrated in Table 1.
Uebersax -- Latent Class Analysis Dimensionality
3
----------------------Insert Table 1 here
----------------------
Table 1 shows the response probability estimates for a latent class model for eight items
of a student survey. The items concern various problem behaviors. The items and their
analysis are discussed elsewhere (Uebersax & Hansen, 1993). Here we consider the
results only as a typical example.
The model summarized has five latent classes. The table shows the conditional
probability of each response level (1 = never; 2 = once; 3 = twice or more) for each
problem behavior in each latent class. The main thing to appreciate is the large number
of parameters to be interpreted. A more condensed way to represent the table's structure
would clearly be useful.
3. A Data Reduction Method
We now consider a very simple method to reduce such data. The first step probittransforms the response probabilities. Averages of the probit values will then be used to
estimate locations of latent classes on latent continuous variables. The latent class
locations will then be analyzed by principal components, and the results plotted.
Uebersax -- Latent Class Analysis Dimensionality
4
Model and notation
We assume C latent classes, and assume R manifest ordered category variables, with, for
simplicity, I levels each. Each manifest variable r is assumed a discretized version of
latent continuous variable xr. Let βcr denote the location of latent class c on xr; Bcr
corresponds to the mean value for members of latent class c on xr (see Figure 1). Due to
random variation (measurement error),
----------------------Insert Figure 1 here
----------------------we assume members of latent class c are normally distributed around βcr; measurement
error variance is assumed constant across latent classes, and we denote this by sigmaer.
Let {τir}, with i = 2, ..., R, denote a series of ordered thresholds on xr; if the value of a
case on xr exceeds τirj, the manifest rating will be level i or above.
Without loss of generality, we assume sigmaer = 1 for all r. We also "center" the
thresholds by assuming Σi=2,I τir = 0 for all r. The assumptions above are plausible and
similar to those of common measurement models.
Step 1. Probit transformation of response probabilities
The first step entails probit transformation of response probabilities. We define Pirc as the
probability of observing response level i or above on variable r, for a member of latent
class c. That is, Pirc is sum of response probabilities:
I
5
Uebersax -- Latent Class Analysis Dimensionality
Pirc = Σ πj|cr
(3)
j=i
The probability that a member of latent class c receives rating level i or above on manifest
variable r is equal to the probability that the case exceeds threshold i on latent variable xr.
Therefore
Pirc = 1 - Φ(βcr - τir) = Φ(τir - βcr)
(4)
where Φ(z) is the cdf for the standard normal curve. It follows that
probit(Pirc) = τir - βcr
(5)
and therefore
-probit(Pirc) = βcr - τir
(6)
where we define probit(p) as the z-score on the standard normal curve to the left of which
proportion p of the area falls, easily calculated or obtained from a table.
A practical question of how to treat probit values when Pirc is near or equal 1 or 0 occurs,
since probit(1) = +inf and probit(0) = -inf. A simple expedient is to place an upper and
lower limit on probit(Pirc), for example, +/-2.75 or +/-3.00; this is equivalent to regarding,
for example, values of Pirc greater than (less than) .999 (.001) as negligibly different
from 1 (0).
Step 2. Estimation of latent class locations
6
Uebersax -- Latent Class Analysis Dimensionality
We now use the results to estimate latent class location parameters {Bcr}. For a latent
class c and variable r, there are I - 1 equations of form (5), one for each threshold tir. For
example, with I = 4 response levels, we have
βcr - τir = -probit(P2rc)
βcr - τir = -probit(P3rc)
βcr - τir = -probit(P4rc).
Averaging both sides of the equations above gives the result Bcr = -1/3 Σi=2,I probit(Pirc);
the τir terms cancel because of the constraint Σiτir = 0. In general, then, Bcr parameters
can be estimated as
I
βcr' = -1/(I - 1)
Σ
probit(Pirc).
(7)
I=2
for each combination of c and r. There are potentially better ways to estimate βcr
parameters from the probit values. However, the simplicity of the present approach is a
considerable advantage.
Step 3. Dimensional analysis
Application of steps 2 and 3 above to the data in Table 1 produces the results in Table 2.
These values represent the locations of the latent classes on eight latent continuous
variables. A natural question to ask is What is the dimensionality of these variables? To
answer this, one can approach these data just as with any set of scores on continuous
Uebersax -- Latent Class Analysis Dimensionality
7
measures, and apply standard factor analysis techniques. Thus, one can calculate the
correlations between latent variables as the simple row correlations of Table 2 (an
alternative is to weight row elements by the latent class prevalence; see Discussion). The
eigenstructure of the resulting correlation matrix can be determined with the usual
methods.
When unweighted row correlations are calculated for Table 2, and these analyzed by
principal components analysis (PCA), the results indicate a two-dimensional solution; the
first component accounts for 83.8% of the variance and first two components together
account for 97.1%. The strong first component makes sense, and reflects a general
association between all of the problem behaviors. But the emergence of the second factor
shows that the data are also more complex than a simple 1-dimensional model would
imply.
Figure 2 illustrates the two-component, varimax-rotated solution. Variables and latent
classes are plotted on the same axes. Variables are plotted according to their factor
loadings, and latent classes by their factor scores (the values are supplied in Tables 3 and
Table 4, respectively). The main point to appreciate is that the configuration portrays
differences in latent classes by the differences in their locations relative to two underlying
latent factors. For example, latent class 5 is high on both factors, whereas latent class 2 is
relatively high on Factor 1, but low on Factor 2.
Uebersax -- Latent Class Analysis Dimensionality
8
------------------------------------------------Insert Table 3, Table4, and Figure 2 here
-------------------------------------------------
Factors can be interpreted by considering the loadings of the different variables on the
factors, just as with ordinary factor analysis. Figure 2 has lines added that correspond to
the variables classroom and parents--these are the variables that correspond most closely
to Factors 1 and 2, respectively.
3. Discussion
To conclude we consider implications, limitations, extensions, and questions concerning
the approach. These are enumerated and briefly discussed as follows:
Practical implications
This method can clearly help interpret LCA results. More questionable is whether it has
advantages over existing binary and ordered-category factor analysis methods--these
include factor analysis of tetra-/polychoric correlations, multidimensional item-response
models (Bock & Aitkin, 1981; Bock, Gibbons & Muraki, 1988), latent variable models
(Bartholomew, 1987; Christoffersson, 1975; Muthen, 1978), and related work by
McDonald (1985). Bartholomew (1989) and Mislevy (1986) review this literature.
Existing methods mostly assume an underlying multivariate normal trait distribution.
Bock and Aitkin (1981) suggest the possible use of their technique with a discrete latent
distribution, similar to located latent classes here. If this does work in practice, it would
still be much more computation-intensive than the approach proposed here. Muthen
Uebersax -- Latent Class Analysis Dimensionality
9
(1989) describes a useful extension of his approach to allow skewed latent distributions,
but one may also wish to consider more complex distributions.
A fundamental question is How much difference does it make if the multivariate
normality assumption is violated? Simulation studies could address this question
(perhaps some already have). If distributional assumptions prove important enough for
practical concern, then the method here, only slightly harder than factoring tetrachoric
correlations (perhaps easier in some cases, since it is guaranteed to produce a positive
semi-definite correlation matrix), may be useful.
Comparing the method here to existing factor analysis methods, however, may miss the
main point. Perhaps its real value is that it allows for the "clustering" and "factoring" of
cases with a single analysis, providing more opportunities for data interpretation than
either approach alone.
Comparing solutions
The present method permits easy comparison of different latent class models of the same
data. Two or more solutions can be plotted in a common space and compared visually. A
possible application would be comparison of different local maximum solutions, or local
and global maximum solutions, to assess the importance of their differences.
Plotting locations of individual cases
The procedure suggests a way to estimate the locations of individual cases in the factor
space. Standard LCA methods provide recruitment probabilities, the posterior
probabilities of membership in each latent class for each case. A case's location on a
Uebersax -- Latent Class Analysis Dimensionality
10
factor can be estimated by summing across classes the product of the case's probability of
membership in a class times the class's score on the factor. This would accomplish
something similar to grade-of-membership (GOM) analysis (Woodbury & Manton, 1982).
Estimation would be simpler, however, since case locations would be estimated a
posteriori rather than, as in GOM analysis, simultaneously with other parameters. Once
estimated, case locations could be shown on the same axes as the latent class locations
and variable loadings.
Unweighted or weighted correlations
The question whether or not to weight by latent class prevalence when calculating latent
variable correlations needs further consideration. Both alternatives are plausible. Factor
analysis of unweighted correlations shows the dimensions that differentiate classes-analogous to discriminant analysis. Weighted correlations, however, may be better for
understanding the general dimensionality of the population--that is, possibly a better
approximation to the results if the latent variables were measured directly as continuous
variables.
Measurement error
Simple PCA of latent class locations, as in the example, does not consider the effect of
measurement error on factor structure. This is plausible if the main goal is to understand
or represent the structure of the latent classes. However, if the goal is to approximate a
factor analysis of the latent continuous variables as measured continuously, one may wish
to account for measurement error. Possibly this can be done in a simple algebraic way
with the sigmaer parameters and the PCA results. This is a subject for possible further
work.
Uebersax -- Latent Class Analysis Dimensionality
11
Number of classes and dimensionality
The number of latent classes clearly affects the number of resulting dimensions. The
number of dimensions must be less than or equal to C-1--this is analogous to discriminant
analysis and is not a problem per se. However, if the main concern is to understand the
dimensionality of the latent variables, rather than latent class interpretation, it may help to
consider a relatively large number of latent classes--say, 7 to 10 for a two or three-factor
model--to well-approximate the latent trait distribution.
Extreme values
The reader may be concerned about the arbitrary selection of an upper and lower limit in
the probit transformation step. Study so far suggests that choice of the upper/lower limit
is not very important. With the data in Table 2, for example, varying the limits in the
range +/-2 to +/-3.5 had little noticeable effect on two-dimensional configurations.
Direct parameter estimation
A multidimensional located latent class model, including latent class location, variable
factor loading, rating threshold, and measurement error parameters, could, in theory, be
estimated directly from crossclassified data. Several authors have described ML
estimation for such models with one latent factor, using both logistic (Clogg, 1988;
Dayton & Macready, 1988; Formann, 1992; Lindsay, Clogg & Grego, 1991; Rost, 1988)
and normal ogive (Uebersax, 1993) response
functions.
Uebersax -- Latent Class Analysis Dimensionality
12
It is not yet clear how easily these approaches extend to the multidimensional case. It
appears feasible to estimate parameters for a multidimensional model with fullinformation maximum likelihood, since that is what is done with the Bock & Aitkin
(1981) multidimensional IRT model, which, as already noted, is very similar to a located
latent class model.
An advantage of ML estimation would be statistical model fit testing. The approach here
does not include a test of fit, except insofar as the LCA model itself can be tested in the
usual way. Perhaps some descriptive approach to model fit can be developed--this would
be in keeping with the spirit of the present approach as mainly a data-reduction and
exploratory technique.
References
Bartholomew, D. J. Latent variable models and factor analysis. New York: Oxford
University Press, 1987.
Bock, R. D., and Aitkin, M. Marginal maximum likelihood estimation of item
parameters: Application of an EM algorithm. Psychometrika, 1981, 46, 443-459.
Bock, R. D., Gibbons, R., and Muraki, E. Full-information item factor analysis. Applied
Psychological Measurement, 1988, 12, 261-280.
Christoffersson, A. Factor analysis of dichotomized variables. Psychometrika, 1975, 40,
5-32.
Clogg, C. C. Unrestricted and restricted maximum likelihood latent structure analysis: A
manual for users. Working paper 1977-09, Pennsylvania State University, Population
Uebersax -- Latent Class Analysis Dimensionality
13
Issues Research Center.
Clogg, C. C. Latent class models for measuring. In Latent trait and latent class models,
eds. R. Langeheine and J. Rost, New York: Plenum, 1988, pp. 173-205.
Dayton, C. M. and Macready, G. B. Concomitant-variable latent class models. Journal
of the American Statistical Association, 1988, 83, 173-178.
Dempster, A. P., Laird, N. M., and Rubin, D. B. Maximum likelihood from incomplete
data via the EM algorithm. J. Roy. Statist. Soc., B., 1977, 39, 1-38.
Formann, A. K. Linear logistic latent class analysis for polytomous data. Journal of the
American Statistical Association, 1992, 87, 476-486.
Goodman, L. A. Exploratory latent structure analysis using both identifiable and
unidentifiable models. Biometrika, 1974, 61, 215-231.
Lazarsfeld, P. F. and Henry, N. W. Latent structure analysis. Boston: Houghton Mifflin,
1968.
Lindsay, B., Clogg, C. C., and Grego, J. Semiparametric estimation in the Rasch model
and related exponential response models, including a simple latent class model for item
analysis. Journal of the American Statistical Association, 1991, 86, 96-107.
McCutcheon, A. C. Latent class analysis. Beverly Hills: Sage Publications, 1987.
McDonald, R. P. Linear versus non-linear models in item response theory. Applied
Uebersax -- Latent Class Analysis Dimensionality
14
Psychological Measurement, 1982, 6, 379-396.
McDonald, R. P. Unidimensional and multidimensional models for item response theory.
In D. J. Weiss (Ed.), Proceedings of the 1982 Item Response Theory and Computerized
Adaptive Testing Conference. Minneapolis: University of Minnesota, 1985.
Mislevy, R. J. Recent developments in the factor analysis of categorical variables.
Journal of Educational Statistics, 1986, 11, 3-31.
Muthen, B. Contributions to factor analysis of dichotomized variables. Psychometrika,
1978, 43, 551-560.
Muthen, B. Dichotomous factor analysis of symptom data. Sociological Methods and
Research, 1989, 18, 19-65.
Pol, F. van de, R. Langeheine, W. de Jong, PANMARK user manual, Netherlands Central
Bureau of Statistics, Voorburg, The Netherlands, 1989.
Rost, J. Rating scale analysis with latent class models. Psychometrika, 1988, 53,
327-348.
Uebersax, J. S. A framework to unify latent class analysis and multivariate mixture
estimation. Unpublished manuscript, 1992.
Uebersax, J. S. Statistical modeling of expert ratings on medical treatment
appropriateness. Journal of the American Statistical Association, 1993, 88 (June).
Uebersax, J. S., Hansen, W. G. Latent class analysis of substance use-related problem
Uebersax -- Latent Class Analysis Dimensionality
behaviors in male high school students. In preparation.
Woodbury, M. A., K. G. Manton, "A New Procedure for Analysis of Medical
Classification," Methods of Information in Medicine, Vol. 21, pp. 210-220, 1982.
15
16
Uebersax -- Latent Class Analysis Dimensionality
Table 1
Parameter Estimates for a Five-Latent Class Model of Eight Problem Behaviors
-----------------------------------------------------------------------------------------Conditional probablity of
Conditional probablity of
Latent
response level
response level
a
b
class
Item
1
2
3
Item
1
2
3
--------------- --------------------------------------------------------1
driving
0.838
0.090
0.072
fight
0.889
0.054
0.057
2
0.692
0.307
0.001
0.594
0.293
0.113
3
0.676
0.119
0.205
0.515
0.402
0.083
4
0.564
0.195
0.240
0.343
0.123
0.534
5
0.209
0.000
0.791
0.099
0.061
0.840
1
2
3
4
5
classroom
0.842
0.266
0.486
0.355
0.058
0.146
0.261
0.267
0.106
0.051
0.011
0.472
0.247
0.538
0.891
parents
0.859
1.000
0.430
0.279
0.086
0.097
0.000
0.538
0.170
0.000
0.044
0.000
0.032
0.551
0.914
1
2
3
4
5
suspend
0.982
0.635
0.674
0.590
0.144
0.017
0.300
0.287
0.206
0.109
0.001
0.065
0.039
0.204
0.747
police
0.943
1.000
0.570
0.374
0.070
0.047
0.000
0.430
0.274
0.051
0.010
0.000
0.000
0.352
0.879
1
property
0.915
0.051
0.034
accident
0.914
0.077
0.009
2
0.661
0.093
0.246
0.801
0.166
0.033
3
0.475
0.390
0.135
0.830
0.170
0.000
4
0.311
0.203
0.486
0.788
0.173
0.038
5
0.047
0.099
0.854
0.180
0.000
0.820
--------------------------------------------------------------------------------------NOTE: For 814 male 11th and 12th grade students.
aEstimated prevalences of classes 1--5 as follows:
.458, .116, .206, .199, .021.
bItems as follows:
driving, driving under the influence of alcohol; classroom, sent
from classroom for misbehavior; suspend, suspension from school; property, intentionally damaging property; fight, fighting; parents, trouble with parents (parents);
Uebersax -- Latent Class Analysis Dimensionality
17
police, trouble with police; accident, automobile accident (accident).
18
Uebersax -- Latent Class Analysis Dimensionality
Table 2
Estimated Latent Class Locations Obtained
from Data in Table 1
------------------------------------------------Latent class
Variable
1
2
3
4
5
---------
------------------------------------
driving
-1.22
-1.80
-0.64
-0.43
0.81
classroom
-1.63
0.28
-0.32
0.24
1.40
suspend
-2.59
-0.93
-1.11
-0.53
0.86
property
-1.60
-0.55
-0.52
0.23
1.36
fight
-1.40
-0.72
-0.71
0.24
1.14
parents
-1.39
-2.75
-0.84
0.36
1.37
police
-1.95
-2.75
-1.46
-0.03
1.32
accident
-1.87
-1.34
-1.85
-1.28
0.92
------------------------------------------------NOTE:
Maximum/minimum probit values set as
+/-2.75.
19
Uebersax -- Latent Class Analysis Dimensionality
Table 3
Loadings of Eight Problem
Behavior Items on First Two
Principal Components
-------------------------------Principal Component
------------------Item
1
2
-------------------------------driving
0.426
0.890
classroom
0.966
0.244
suspend
0.907
0.408
property
0.836
0.542
fight
0.778
0.616
parents
0.308
0.947
police
0.457
0.886
accident
0.757
0.525
-------------------------------NOTE:
Based on principal
components analysis of data in
Table 2; varimax-rotated
solution.
20
Uebersax -- Latent Class Analysis Dimensionality
Table 4
Factor Scores for Latent
Classes on First Two
Principal Components
-----------------------------Principal Component
Latent
-------------------
Class
1
2
-----------------------------1
-1.451
0.059
2
0.700
-1.612
3
-0.391
-0.027
4
0.023
0.515
5
1.119
1.065
-----------------------------NOTE:
Based on principal
components analysis of data in
Table 2; varimax-rotated
solution.
21
Uebersax -- Latent Class Analysis Dimensionality
2
5
Principal Component 2
1
4
0
1
3
-1
2
-2
-2
-1
0
1
Principal Component 1
Figure 2
Item loadings (diamonds) and latent class factor scores
(triangles), jointly plotted relative to first two principal
components, rotated solution.
2
Download