Hierarchical Linear Models versus Alternative Methods of

advertisement
Draft- For Comment Only
Hierarchical Linear Models: Strengths and Weaknesses
By Duncan Chaplin
December 9th, 2003
The Urban Institute
2100 M St. NW
Washington, D.C.
This paper is a revised version of a paper prepared for the meetings of the Association for Public Policy Analysis and Management in
November of 2003 and is based on a presentation given to the UI Modeling Group at the Urban Institute in March of 2003. Much
thanks goes to Chris Bollinger, Mary Noonan, and all the members of the UI Modeling Group for helpful discussions of the issues
presented here. All omissions, mistakes (glaring and otherwise) and tangential asides are those of the author and should not be
attributed to the Urban Institute or any of the people kind enough to share their thoughts with me on this topic. Questions and queries
should be addressed to the author at DChaplin@ui.urban.org.
Draft- For Comment Only
Chaplin
Abstract
Regression models are often run using data with observations that are highly correlated within
subgroups. Ignoring these correlations can yield biased standard errors. Consequently a number of statistical
methods have been developed to help adjust estimated standard errors appropriately. Hierarchical Linear
Models (HLM) is one such method, particularly common in Education research, but growing in popularity
elsewhere. In this paper I compare HLM to a variety of alternative estimation methods more commonly used by
economists that also deal with clustering. In particular, I compare HLM to random and fixed effects (as used in
the Econometrics literature), random coefficients, Generalized Least Squares, Huber-White corrections, and
simulation methods (Jackknife/Bootstrap). I also discuss general strengths and weaknesses of HLM.
Draft- For Comment Only
Chaplin
Introduction
HLM is an estimation method1 that was developed by education researchers (Bryk and
Raudenbush, 1992) that is now growing in popularity in a number of other research areas
including health and psychology. It is used primarily to estimate cross-sectional linear models
and has received relatively little attention from economists.2
In this paper I describe HLM, how it relates to models more commonly used by
economists, and discuss what HLM does and does not do. The standard way of presenting HLM
is to start by discussing the concept of conducting regressions at two levels. For example, when
analyzing factors that impact student test scores using data on students from a number of schools
(with many students in each school), one could run one regression for each school and then a
second set of regressions using the coefficient estimates from the school-level regressions as
outcomes. Such a model can be estimated and Hanushek (1974) describes a method of doing so
which is similar, in some ways, to the Fixed Effects Model common in Econometrics. This is
powerful because the Fixed Effects Model controls for all unobserved factors at the school level.
This is not, however, what standard HLM does. Thus, while the standard HLM does correct
standard errors and produce more efficient estimates than standard Ordinary Least Squares
methods, it does not correct the estimated impacts of student-level variables for any bias caused
by unobserved school-level variables. Since HLM is usually presented as if it were estimated in
two stages, this clarification seems likely to be an important one for many researchers who are
learning to use HLM for the first time and for economists who may not be familiar with the
HLM method.
1
HLM models can be estimated using a variety of software packages (Singer, 1998). These include Proc Mixed in
SAS and the software package HLM, developed by Bryk and Raudenbush (1992) who coined the term HLM.
2
Interesting work has been done using HLM to estimate “growth models” with panel data (Bryk and Raudenbush,
1992) and some researchers have used HLM-type models for discrete outcomes (Swanson et al, 2002).
1
Draft- For Comment Only
Chaplin
HLM has a number of benefits. First, it can help to control for clustering of observations
and heteroskedasticity. Secondly, it can improve the efficiency of estimated impacts, given that
the assumptions of the HLM are correct. Third, even if the assumptions are violated HLM will
still produce a best “HLM” fit, similar to the Best Linear Unbiased Estimate property of an OLS
model (Goldberger, 1991).3 Fourth, a variation of the HLM model, with group mean centering,
does produce unbiased slope estimates under the same conditions that are normally used to
justify a Fixed Effects Model in economics.
There are alternative methods of controlling for clustering and heteroskedasticity—for
instance the simulation methods of controlling for clustering (Jack-knife and Boot-strap) and the
Huber-White corrections.4 These methods have a number of advantages over the HLM methods.
First, HLM constrains the variance of the error to be a function of the same factors that affect the
mean value of the outcome while the other methods allow the variance to depend on additional
factors. Second, the estimated standard errors from the alternative methods are robust to more
forms of heteroskedasticity than are allowed by HLM.5 None the less, HLM will produce valid
standard error estimates under a wide variety of conditions and more efficient coefficient
estimates given the HLM assumptions.
To help put HLM in context I begin with a section describing the motivation behind
HLM. I then show how HLM is related to random coefficients, random effects, and fixed
effects, as used in the Econometrics literature. This is followed by a discussion of what HLM
3
Relative to OLS, the HLM estimates give more weight to observations for which the estimates suggest the data
are more precise. For example, if the HLM model suggests that the data on women are more precise than the data
on men, then the resulting HLM slope estimates will give more weight to the observations for the women.
4
The simulation methods can be easily implemented in packages like Wesvar, Shazam, and SAS, while the HuberWhite corrections can be done in Stata.
5
This is because the White correction allows for any form of heteroskedasticity while HLM only allows for
heteroskedasticity that is captured by including random coefficients in the model.
2
Draft- For Comment Only
Chaplin
does and does not do and some comments on what econometricians might view as an odd feature
of HLM.
HLM Motivation
One of the research questions that can be well addressed by the HLM is whether Catholic
schools help to reduce inequality in outcomes compared to public schools. To answer this
question researchers often look at how the estimated effect of student socioeconomic status
(SES) on student test scores varies by school type (Catholic vs. Public). Evidence of a smaller
SES slope for Catholic schools is taken as evidence that the Catholic schools help to reduce
inequality. Figure 1 illustrates this point.
One can estimate interaction terms between school type and SES in ordinary least squares
regression models (OLS). However, this would not account for clustering of observations within
schools. For this reason, many (if not most) econometricians would use a Random Effects
Model in this situation. While this is an improvement over OLS, it only allows the intercept to
vary randomly across schools, and not the SES slope. Figure 2, presents the data used to
generate Figure 1 by school. As Figure 2 shows, even though on average the Catholic school
SES slope is smaller than that of public schools, there is a great deal of variation in the SES
slopes across schools, so much so that one might wonder if the slope differences by school type
were truly statistically significant. More importantly, the issue is not just that the intercepts vary
randomly across schools. It is also clear that there is a great deal of variation in the slope
estimates. HLM allows for random variation in both the intercepts and slopes.
As noted earlier, HLM is often described as if it uses two sets of regression models—one
at the student level and a second at the school level, as shown below.
Level 1:
Yij=0j +1j *Xij + eij (at student level)
3
Draft- For Comment Only
Level 2:
Chaplin
0j = 00 + 01*Wj +u0j (at school level)
1j = 10 + 11*Wj +u1j
where, for our example problem,
Yij=test score of student i in school j,
Wj=1 if school is Catholic,
Xij=Student SES
Cov(Xij,eij,Wj,u0j,u1j)=0
By substituting the random coefficients in the Level 1 equation with their components
shown in the Level 2 equations one can write a combined model:
Yij= 00+01*Wj+10*Xij+11*Wj*Xij+ij
where ij = u1j *Xij +u0j + eij
This is the Random Coefficients Model which enjoyed some attention from economists in
the 1970s and 1980s. It is also a subset of the Generalized Least Squares Models (GLS).6 In
addition, if u1j (the school-level component of the error term that is multiplied by Xij) is set to 0
then the model boils down to the Random Effects Models more commonly used in the
econometrics literature today. Indeed, a large number of papers that use the HLM method find
no evidence that V(u1j)>0 and consequently end up estimating Random Effects Models. Thus,
while HLM allows for estimation of a much broader set of models than a Random Effects
Model, it appears that in many practical situations, analysts end up estimating a Random Effects
Model when they use the HLM method.
6
GLS is more flexible in that it allows the variance of Y to be impacted by factors that may not impact the mean
value of Y. In practice, however, GLS models may be harder to estimate due to a lack of available software.
4
Draft- For Comment Only
Chaplin
What HLM Does Not Do
HLM has a number of important strengths and weaknesses. I start by discussing one of
the major problems with HLM which is caused by how the model is presented rather than by any
inherent flaw in the estimation method. In order to better describe this weakness, however, it
will be useful to clarify some of the terms used in the HLM literature as they overlap in rather
unfortunate ways with terms commonly used by economists. In particular the terms random and
fixed effects in HLM mean somewhat different things than they do in economics. In HLM
random effects refer to the error terms in the level 2 equation—i.e. the error terms for the
coefficient estimates. Fixed effects refer to the non-random parts of the coefficient estimates. In
contrast, in economics the term random effects is generally used to refer to only the random
component of the intercept (u0j). In economics fixed effects also refer to u0j but only in the
context of a very different model—one in which u1j=0 and cov(u0j,Xij) is not constrained to be 0.
In Fixed Effects Models in Economics, the fixed effects refer to values of u0j which are treated as
fixed rather than as random. Estimation is generally accomplished using a dummy variable for
each school. In the rest of this paper I will be using the terms random and fixed effects as they
are used by economists rather than in the way they are used in the HLM literature.
Fixed Effects Models are considered a very powerful tool for economists as they can be
used to control for bias caused by a large set of unobserved variables. For example, if one were
interested in estimating the impact of student SES on student achievement, controlling for all
school-level variables (observed and unobserved), one could use a Fixed Effect Model with a
dummy variable for each school. Using such a model one could correctly claim that the
estimated impact of SES was not biased by any school-level variables, including those not
observed. Interestingly, the same result would hold if one were to estimate multilevel models in
5
Draft- For Comment Only
Chaplin
two stages as HLM is presented. In the example given above this would mean first estimating
the impact of student SES for each school and then estimating second stage equations to
determine how school-level factors impact the intercept and slope of the school by school
regression coefficient estimates. Were one to estimate such a model one could legitimately
claim that the SES slope estimates would not be biased by the omission of any school-level
factors. This is not, however, how HLM is estimated. Instead, HLM is estimated in one stage
and the standard HLM model (without group mean centering) uses both the between and within
school variation to estimate the SES slope estimates. The result is that omitted school-level
variables can bias the SES slope estimates. Statistically speaking, Fixed Effects Models allow
cov(uij,Xij) to be non-zero while the standard HLM model assumes that this covariance is 0.
Interestingly, economists sometimes estimate models that are done in the way the HLM
presentation implies (i.e. in two stages), although these models have received relatively little use
(Hanushek, 1974; Chaplin, 1993). Such models have the advantage of allowing one to both deal
with the fact that slope estimates may vary across schools and to control for all unobserved
factors at the school-level. Their major weakness, however, is that these models can only be
estimated if there are sufficient data at each level—for example enough students within schools
to estimate a separate regression for each school. In contrast HLM (and Random Effects
Models) can be estimated even if there are only two observations per school for at least some
schools because they use both the within and between school variation to estimate the coefficient
estimates. This last point illustrates another example of how the presentation of HLM has misled
researchers as some have argued that HLM models should only be estimated using schools that
have a sufficient number of observations per school.
6
Draft- For Comment Only
Chaplin
Of course they way HLM is presented may only be a theoretical weakness if all
researchers understand the model well. However, there is substantial evidence that many
prominent researchers are not well aware of this problem. In particular, many appear to believe
that HLM models are estimated in two stages (Yasumoto et al., 2001; Nye et al., 2002;
Alexander et al., 2001; Wenglinsky, 1998; Brewer and Goldhaber, 2000) and many believe it
will drop cases, presumably those with few observations per group (Gamoran et al., 1997;
Alexander et al., 2001; Brewer and Goldhaber, 2000). At least one set of researchers writes as if
the HLM model controls for unobserved group-level variables (Gamoran et al., 1997).7 All of
these points would hold for a model estimated in two stages, but do not hold for the standard
HLM model.
There is a variant of the HLM model that can be used to control for the same types of
biases that a Fixed Effects Model deals with. This is not a standard HLM feature, but does
receive prominent attention in the major book introducing the HLM method (Bryk and
Raudenbush, 1992). The idea is that one can estimate a useful set of models by subtracting
group means from the X’s. For example, as noted above, combining the Level 1 and Level 2
equations of HLM gives the equation:
Yij= 00+01*Wj+10*Xij+11*Wj*Xij+ij
After within group centering this becomes:
Yij= 00+01*Wj+10*(Xij-X.j)+11*Wj*(Xij-X.j)+ij
where X.j = the mean of Xij for group j.
Bryk and Raudenbush (1992) are careful to explain that group centering changes the
underlying model and note that in many cases it may not be clear which model would be
7
Relevant quotes from these papers are provided in an appendix.
7
Draft- For Comment Only
Chaplin
preferred. Many economists might recall that one method of estimating a Fixed Effects Model is
to subtract group means from all variables in the model (Goldberger, 1991). Group-mean
centering in HLM comes close to this except that the group means of the outcome (Y) are not
being subtracted. Nevertheless, it turns out that the result is unbiased slope estimates for the
within group variables even in the presence of unobserved group level variables that are
correlated with the within group variables—the exact issue that is generally highlighted as a
strength of the Fixed Effects Model in economics.8
HLM also shares two weaknesses with Random Effects Models.9 First, it does not allow
for negative within group correlations in the error terms. This could be important for outcomes
that are socially determined if people look to their peer group when making determinations about
their own level of success or achievement. For example, one might expect to see negative
associations between the error terms of self-efficacy ratings of different teachers within the same
school if these teachers generally judge their own performance by making comparisons within
rather than across schools. In HLM the random component of the intercept causes a positive
correlation between observations within the same school. The more general GLS Models allow
that same correlation to be negative, rather than positive.
The second weakness that HLM shares with Random Effects Models is that it may
produce biased estimates for models that need weights. This issue is complicated by the fact that
many economists would argue that weights are not needed in multivariate regressions. Rather,
they argue, one should be able to fully model behavior using appropriate controls and
interactions. If weights change regression results they would argue that this implies that the
8
Mundlak (1978) notes that this property of Fixed Effects Models can also be achieved by including all of the
group means of the individual variables (i.e. the Xijs) as controls in a standard OLS model.
8
Draft- For Comment Only
Chaplin
model is miss-specified—i.e. important variables were omitted. An alternative view is that all
regression models should be viewed as parsimonious descriptions of relationships that are almost
surely far more complicated than any one regression model could capture. Regression results
can provide a useful summary of existing relationships but should not be viewed as providing
evidence against the importance of omitted variables or interactions. Under this view, weights
may be viewed as helping to make sure that the summary is relevant to the population being
studied. If one takes the latter stance, then, there is an implicit admission that there could be
important omitted interactions and that rather than try to estimate all of these interactions one
will simply provide as representative as possible coefficient estimates. The problem with HLM
(and random effects) under this set of assumptions is that it reweights the data based on the
variance/covariance structure of the error terms in a way that effectively offsets the impacts of
the weights themselves (Selden, 1994). Thus, a belief that the weights are important would seem
to be incompatible with the use of HLM (or Random Effects Models.)
HLM also has a feature not shared with Random Coefficients Models that is odd, if not
necessarily incorrect. The standard method of starting an HLM analysis involves a test for
between group variance done without controls. This is used to justify the inclusion of grouplevel variables. While this test is likely to produce correct results in general, it is possible for the
test to suggest no between group variance even when the group-level variables are powerful
predictors of the outcomes. This can happen for two reasons. First, the control variables at the
group (i.e. school) level can offset each other. Second, the within group control variables (i.e.
student SES) may be offsetting the group level variables. For both of these reasons a more
appropriate test for the inclusion of the group-level variables would be a joint test of their
9
Another weakness that has been suggested is that HLM is not compatible with Instrumental Variables estimation
(Brewer and Goldhaber, 2000; Mason, 1995). However, Spencer and Fielding (2000) show that HLM can be used
9
Draft- For Comment Only
Chaplin
statistical significance. To illustrate this point consider the standard combined model discussed
earlier.
Yij= 00+01*Wj+10*Xij+11*Wj*Xij +ij
Now, it is possible to have 01 >0, 10>0, and 11>0 but to also have
V(01*Wj+10*Xij+11*Wj*Xij) approximately equal to 0 if, for example, cov(Wj,Xij ) is negative
and sufficient in magnitude. For a real world example suppose that a certain school district (or
state) put sufficient resources into schools serving primarily lower SES students to effectively
offset the test score gap by parent SES. Were they able to accomplish this goal we would
observe relatively small differences in student outcomes across schools, even if both parental
SES and spending per student had large and important impacts on student performance. If we
were to rely on the HLM test of between school differences we might draw the incorrect
conclusion that school spending did not matter because we would never estimate the full model
having found no evidence of between school differences in the model without controls.
What HLM Does Do
While HLM may be misinterpreted by some researchers and has some odd features, it
does have a number of valuable characteristics that make it worth considering in many
circumstances. First, as noted above, it deals with a fairly large set of possible violations of the
standard OLS model assumptions about the distributions of the error terms in ways that are
somewhat more flexible than the standard Random Effects Models used in econometrics.
Second, like Random Effects Models, it produces more efficient estimates than one would obtain
using OLS or any other method that relies on the OLS slope estimates but produces correct
to estimate IV models.
10
Draft- For Comment Only
Chaplin
standard errors (i.e. the Huber-White and simulation methods of correcting standard errors).10
Third, even if some of the HLM assumptions are violated, HLM can be used to produce a “best”
fit based on the HLM weights, much as OLS is often viewed as producing a Best Linear
Unbiased Estimates (Goldberger, 1991). Fourth, when group mean centering is used, the
unbiased slope estimates for within group variables obtained by using Fixed Effects Models can
be obtained using HLM.
10
This statement assumes that there are random intercepts and/or slopes in the model. In the absence of such
variation, OLS could be more efficient as it estimates fewer parameters.
11
Draft- For Comment Only
Chaplin
Conclusion
The growing use of HLM suggests that researchers are becoming increasingly aware of and willing to
deal with the important issue of clustering of data within groups. This implies that the conclusions reached in
these studies can be taken more seriously than those of many studies in the past that ignored clustering as the
estimated standard errors are less likely to be biased. At the same time, however, the introduction of HLM may
have some costs. In particular, it appears that the standard method of presenting HLM (at two levels) may
mislead some researchers into believing that they have achieved a very different goal—that of controlling for all
unobserved group-level factors. While this ideal can be obtained using other types of models (in particular
Fixed Effects Models common in econometrics) it is not achieved using the standard HLM methods. Clarifying
this important limitation of HLM should help to ensure that it is used more correctly in future and enable
researchers to make better choices when deciding which method is most appropriate for their research
questions.
12
Draft- For Comment Only
Chaplin
References
Alexander, Karl L., Doris R. Entwisle, and Linda S. Olson (2001) “Schools, Achievement, and Inequality: A
Seasonal Perspective,” Education Evaluation and Policy Analysis, 23(2):171-191.
Brewer, Dominic J. and Dan D. Goldhaber (2000) “Improving Longitudinal Data on Student Achievement:
Some Lessons from Recent Research Using NELS:88,” in Analytic Issues in the Assessment of Student
Achievement, by David Grissmer and J. Michael Ross, U.S. Department of Education, National Center
for Education Statistics, NCES 2000-050.
Bryk, A. S., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis
methods. Newbury Park, CA: Sage Publications.
Chaplin, Duncan (1993) Employment Bust or Education Boom?
Dissertation, University of Wisconsin at Madision.
Black Teenage Males:
1960-1988.
Gamoran, Adam, Andrew C. Porter, Jon Smithson, and Paula A. White (1997) “Upgrading High School
Mathematics Instruction: Improving Learning Opportunities for Low-Achieving, Low-Income Youth,”
Education Evaluation and Policy Analysis, 19(4):325-338.
Goldberger, Arthur S. (1991) A Course in Econometrics, Harvard University Press, Cambridge, MA.
HLM
(2000) “HLM Concepts and Background,“
http://www.ssicentral.com/hlm/concept.htm.
Downloaded
October
29th,
2003.
Hanushek, Eric A. (1974) “Efficient Estimators for Regressing Regression Coefficients,” The American
Statistician, 28(2), May.
Mason, W.M. (1995) “Hierarchical Linear Models: Problems and Prospects,” Journal of Educational and
Behavioral Statistics, 20(2):221-227.
Mundlak, Y. (1978) “On the Pooling of Time Series and Cross Section Data,” Econometrica, 46:69-85.
Nye, Barbara, Larry V. Hedges, and Spyros Konstantopoulos (2002) “Do Low-Achieving Students Benefit
More from Small Classes? Evidence from the Tennessee Class Size Experiment,” Educational
Evaluation and Policy Analysis, 24(3):201-217.
Selden, Thomas M (1994) “Weighted generalized least squares estimation for complex survey data,” Economic
Letters, 46:1-6.
Seltzer, Michael, John Novak, Kilchan Choi, and Nelson Lim (2002) “Sensitivity Analysis for Hierarchical
Models Employing t Level-1 Assumptions,” Journal of Educational and Behavioral Statistics,
27(2):181-222.
Singer, Judith (1998) "Using SAS PROC MIXED to Fit Multilevel Models, Hierarchical Models, and
Individual Growth Models," Journal of Educational and Behavioral Statistics.
13
Draft- For Comment Only
Chaplin
Spencer, Neil H. and Anthony Fielding (2000) “An Instrumental Variable Consistent Estimation Procedure to
Overcome the Problem of Endogenous Variables in Multilevel Models,” Mutilevel Modeling Newsletter
12(1):4-7.
Swanson, David B., Brian E. Clauser, Susan M. Case, Ronald J. Nungester, and Carol Feathermean (2002)
“Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models,
Journal of Educational and Behavioral Statistics, 27(1):53-75.White, Halbert (1980) "A
Heteroskedasticity-Consistent Covariance Matrix Estimator and a Direct Test for Heteroskedasticity,"
Econometrica, 48(4):817-838.
Wenglinsky, Harold (1998) “Finance Equalization and Within-School Equity: The Relationship between
Education Spending and the Social Distribution of Achievement,” Education Evaluation and Policy
Analysis, 20(4):269-283.
Yasumoto, Uekawa, and Bidwell (2001) “The Collegial Focus and High School Students’ Achievement,”
Sociology of Education, July, 74(3):181-209.
14
Draft- For Comment Only
Chaplin
Figure 1
Test Scores by SES and School Type
Sector Averages
7.5
7
6.5
Test Scores
6
5.5
5
Public
Catholic
4.5
4
3.5
3
1
2
3
4
5
6
SES
15
7
8
9
10
Draft- For Comment Only
Chaplin
Figure 2
Test Scores by SES and School Type
Individual Schools
8
7.5
7
6.5
Test Scores
6
5.5
5
Catholic 1
4.5
Catholic 2
Catholic 3
4
Public 1
Public 2
Public 3
3.5
3
1
2
3
4
5
6
SES
16
7
8
9
10
Draft- For Comment Only
Chaplin
Appendix
Quotes Suggesting that HLM is Misunderstood
“As in the Level 2 formulation, at Level 3 average 10th-grade achievement and growth rate are estimated
separately for each department.” (Yasumoto et al., 2001)
“…Such models permit the analysis and pooling of school-specific regressions…” and “…in each of the
school-specific regression coefficients…” (Nye et al., 2002).
“This procedure estimates separate error variances for each level, ensuring that parameters at the class
level are not distorted because of similarities among students within classes…” and “At least three cases are
needed to estimate the growth curve, but the estimation procedure accommodates cases for which data are
available at two of the three time points.” (Gamoran et al., 1997.)
“…HLM is used to estimate within-person achievement growth models…Person-specific growth
parameters are estimated at the within-person, or Level 1, stage….” and “HLM screens out many cases because
of strategic gaps in the testing record…” Alexander et al. (2001).
“…Separate equations are estimated for the effect of student-level variables on students and of schoollevel variables on the average of student-level variables.” Wenglinsky (1998).
“The basic approach of HLM is to first estimate a within-group model and use the estimated slope
coefficients as the dependent variable in a second across-group stage.” And “…HLM utilizes only a sub-sample
of all the potential students in the sample…” (Brewer and Goldhaber, 2000).
17
Download